# Snort

The reason why regular expressions are popular is that they are extremely powerful. The industry-standard for NIDS (and Network Intrusion Protection System (NIPS)) in software is [Snort](https://www.snort.org/). Snort uses roughly 3500 **rules** which classify and/or filter malicious traffic. These rules are written in a regex-like way.

# Let's try to mimic
One possible example of a Snort rule is:

> ```alert tcp 192.168.x.x any -> 172.16.x.x 49535 (msg:”We won't allow this socket”; sid:1000002; rev:1;)```

In a more human language this could be loosely translate to

> all **TCP** traffic that comes from an IP address with **192** and **168** as first bytes, and that comes from **any given port**; AND that goes **to** an IP address that starts with **172.16** to **port 49535** should be alerted to the security engineer

The Snort software builds a set of regexes to check all its rules, like the one above. Incoming traffic is then matched with all the regexes of the rule-set. 

It is important to understand that **tcp** (in the example above) already implies a number of things. As TCP is used at the transport layer, it implies that the network layer is IPv4. *For the sake of completeness* it is mentioned that TCP can also be used in combination with other algorithms (e.g. IPv6), but in this workshop we assume the typical Internet traffic.

One could be tempted to check byte number 24 (or 23 when starting at index 0) in a frame. This byte defines which transport layer protocol is chosen. Although that is valid deduction, it is not complete. Byte number 24 only defines the transport layer protocol **IF** IPv4 is used as the networking layer protocol **AND** if Ethernet is used as the link layer protocol.

If other protocols are used, then 

* byte number 24 might represent something different;
* the protocol field might still exist in another location than byte 24.

The subsequent checks that have to be done are shown in the flow chart below.

<center>
<img src="images/11_flow.png"/>
</center>

Now let's try to achieve this in Python with the available dataset. First we start by loading all required variables. **Don't forget to run this block prior to the rest.**

In [5]:
from lib.dataset import NIDSDataset

data_file = 'data/dataset_packets_v1.npy'
labels_file = 'data/dataset_labels_v1.npy'

dataset = NIDSDataset(data_file, labels_file)

The next step is to distinguish which frames use the TCP protocol. Running the code below will tell you how many frames are analysed and how many of them are IPv4 frames and/or TCP frames.

In the exercise below you should parse every incoming word and increment the variables **number_of_ipv4_frames** and **number_of_tcp_frames** accordingly. Don't forget the [section](01_readingframes.ipynb#Parsing-the-dataset) where the parsing is introduced. And finally, don't use regexes in this exercise.

In [6]:
framecounter = 0
number_of_ipv4_frames = 0
number_of_tcp_frames = 0

# loop over all datasets
for d in dataset:

    wordcounter = 0

    # loop over all words
    for word in d:
        # examine Ethertype - in link layer header
        # if the Ethertype field is not 0x0800, the frame is allowed
                
        # examine Protocol - in network layer header
        # if the Protocol field is not 0x6, the frame is allowed

        # examine Source Address - in network layer header
        # if the first to source address fields are not 192 and 126,
        #   the frame is allowed

        # examine Destination Address - in network layer header
        # if the first to destination address fields are not 172 and
        #   16, the frame is allowed

        # examine Destination port - in transport layer header
        # if the destination port fields are not 49535, the 
        #   frame is allowed

        # print(word, end='')
        wordcounter += 1
    
    # end of iteration over words
    framecounter += 1
# end of iteration over datasets

# print summary
print("\nWe've received %d frames" % framecounter)
print("\tIPv4: %d frames" % number_of_ipv4_frames)
print("\t\tTCP: %d frames" % number_of_tcp_frames)



We've received 1314 frames
	IPv4: 0 frames
		TCP: 0 frames


<center><div style="background-color: #10FF107f;">The code above should report that there are <b>1314 frames</b> in the dataset of which <b>1214</b> are IPv4(<b>904</b>x TCP, 296x UDP, and 14x ICMP). The remaining 100 bits are ARP.</div></center>

With the acquired parsing, it should be possible to filter the dataset with the example rule from the beginning of this notebook :

> all TCP traffic that comes from an IP address with 192 and 168 as first bytes, and that comes from any given port; AND that goes to an IP address that starts with 172.16 to port 49535 should be alerted to the security engineer

Parse the incoming network stream again and also determine how many packets can be allowed through and for how many packets an *alert* has to be raised.

In [7]:
framecounter = 0
number_of_ipv4_frames = 0
number_of_tcp_frames = 0

decision_pass = 0
decision_alert = 0

# loop over all datasets
for d in dataset:

    wordcounter = 0

    # loop over all words
    for word in d:
        # examine Ethertype - in link layer header
        # if the Ethertype field is not 0x0800, the frame is allowed
                
        # examine Protocol - in network layer header
        # if the Protocol field is not 0x6, the frame is allowed

        # examine Source Address - in network layer header
        # if the first to source address fields are not 192 and 126,
        #   the frame is allowed

        # examine Destination Address - in network layer header
        # if the first to destination address fields are not 172 and
        #   16, the frame is allowed

        # examine Destination port - in transport layer header
        # if the destination port fields are not 49535, the 
        #   frame is allowed

        # print(word, end='')
        wordcounter += 1
    
    # end of iteration over words
    framecounter += 1
# end of iteration over datasets

# print summary
print("\nWe've received %d frames" % framecounter)
print("\tIPv4: %d frames" % number_of_ipv4_frames)
print("\t\tTCP: %d frames" % number_of_tcp_frames)
print("We've decided")
print("\tok frames: %d frames" % decision_pass)
print("\talert frames: %d frames" % decision_alert)


We've received 1314 frames
	IPv4: 0 frames
		TCP: 0 frames
We've decided
	ok frames: 0 frames
	alert frames: 0 frames


<center><div style="background-color: #10FF107f;">The code above should report that <b>1308</b> frames are ok and for <b>6</b> frames an alert should be raised.</div></center>

<hr/>
<center>
Continue with the <a href="12_regexes.ipynb">next notebook</a> in a new browser tab.<br/><br/>
<img src="images/footer.png"/>
</center>