# NIDS based on Regular Expressions

This second part in the workshop handles NIDS that are based on regular expressions (regex). Regexes are used to determine whether or not a certain pattern is present in a string or not. We will rely on the Python module **re** to achieve regex matching.

## Getting started

Let's try to approach this with a little example. The input string for the example will be: **the quick brown fox jumps over the lazy dog**. The pattern that we will look for is written as a regular expression.

The message that we are assuming is a *pangram* which is a sentence that contains each the letter of the English alphabet. The example below will check if the pattern "a" matches with the message.

**Try to determine the outcome of the (possible) match and then execute the code to see if you are right.**

In [2]:
import re

message = "the quick brown fox jumps over the lazy dog"

result = re.match("a", message)

if result:
    print("This pattern matches")
else:
    print("This pattern does not match")

This pattern does not match


The result of the example might surprise you, as there definitely is an *a* in the message. Nonetheless, the pattern doesn't match. The only message which would match this is regex is the message "a".

If we would like to know if the message contains an "a" we should take into account that other letters might be before of after the "a". This can be done with a *wildcard*. The character . (a dot) matches any given character/number/space/... The next thing that we need to overcome is the position of "a" in the message.

In [15]:
message = "the quick brown fox jumps over the lazy dog"
regex   = "....................................a"

if re.match(regex, message):
    print("This pattern matches")
else:
    print("This pattern does not match")

This pattern matches


Luckily there is also a shorthand way for writing this. The character * (an asterisk) stands for **zero or more occurences of the preceding element**.

In [14]:
# If we have run the code above, the variable "message" is already set.
# There is no need to repeat that.
regex = ".*a"

if re.match(regex, message):
    print("This pattern matches")
else:
    print("This pattern does not match")

This pattern matches


Before you start exploring with regexes, a couple more of notations are given below. Note that these can be combined !!

<center>
<img src="images/10_regextable.png"/>
</center>

Now ... let's practice by completing the regexes below:

In [18]:
# must match if the message contains the number "7"
regex = "."

if re.match(regex, message):
    print("This pattern matches")
else:
    print("This pattern does not match")

This pattern matches


In [5]:
# must match if the message contains an "ox"
regex = "."

if re.match(regex, message):
    print("This pattern matches")
else:
    print("This pattern does not match")

This pattern matches


In [7]:
# must match if the message contains a space " "
regex = "."

if re.match(regex, message):
    print("This pattern matches")
else:
    print("This pattern does not match")

This pattern matches


In [22]:
# must match if the animal is either a "fox" or a "cat"
message1 = "the quick brown fox jumps over the lazy dog"
message2 = "the quick brown owl jumps over the lazy dog"
message3 = "the quick brown cat jumps over the lazy dog"
regex = "."

if (re.match(regex, message1)) and not(re.match(regex, message2)) and (re.match(regex, message3)) :
    print("This pattern matches")
else:
    print("This pattern does not match")
    if re.match(regex, message1):
        print("  match with message 1")
    else:
        print("  NO match with message 1")
    if re.match(regex, message2):
        print("  match with message 2 (but it should not)")
    else:
        print("  NO match with message 2")
    if re.match(regex, message3):
        print("  match with message 3")
    else:
        print("  NO match with message 3")

This pattern does not match
  match with message 1
  match with message 2 (but it should not)
  match with message 3


In [26]:
# must match if the message ENDS on "dog"
message1 = "the quick brown fox jumps over the lazy dog"
message2 = "the quick brown dog jumps over the lazy fox"
regex = "."

if (re.match(regex, message1)) and not(re.match(regex, message2)):
    print("This pattern matches")
else:
    print("This pattern does not match")
    if re.match(regex, message1):
        print("  match with message 1")
    else:
        print("  NO match with message 1")
    if re.match(regex, message2):
        print("  match with message 2 (but it should not)")
    else:
        print("  NO match with message 2")

This pattern does not match
  match with message 1
  match with message 2 (but it should not)


<hr/>
<center>
Continue with the <a href="11_regexes.ipynb">next notebook</a> in a new browser tab.<br/><br/>
<img src="images/footer.png"/>
</center>