<a href="https://colab.research.google.com/github/damianiRiccardo90/BHP/blob/master/C2-Writing_A_Sniffer/IP_Decoder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# *__Decoding the IP Layer__*

In its current form, our sniffer receives all of the __IP headers__, along with any higher protocols such as __TCP__, __UDP__, or __ICMP__. The information is packed into binary form and, as shown previously, is quite difficult to understand. Let's work on decoding the IP portion of a packet so that we can pull useful information from it, such as the protocol type (__TCP__, __UDP__, or __ICMP__) and the source and destination IP addresses. This will serve as a foundation for further protocol parsing later on.

If we examine what an actual packet looks like on the network, you should understand how we need to decode the incoming packets. Refer to __Figure 3-1__ for the makeup of an __IP header__.

<div align="center" width="100%">
<img src="https://github.com/damianiRiccardo90/BHP/blob/master/C2-Writing_A_Sniffer/IPv4_Header.png?raw=true" alt="From Client to Server" width="50%">
<p style="text-align:center"><em><strong>Figure 3-1:</strong> Typical IPv4 header structure</em></p>
</div>

We will decode the entire IP Header (except the Options field) and extract the protocol type, source, and destination IP address. This means we'll be working directly with the binary, and we'll have to come up with a strategy for separating each part of the IP header using Python.

In Python, there are a couple of ways to get external binary data into a data structure. You can use either the __ctype__ module or the __struct__ module to define the data structure. The __ctype__ module is a foreign function library for Python. It provides a bridge to C-based languages, enabling you to use C-compatible data types and call functions in shared libraries. On the other hand, __struct__ converts between Python values and C structs represented as Python byte object. In other words, the __ctype__ module handles binary data types in addition to providing a lot of other functionalities, while the __struct__ module primarily handles binary data.

You will see both methods used when you explore tool repositories on the web. This esction shows you how to use each one to read an __IPv4 header__ off the network. It's up to you to decide which method you prefer, either will work fine.

# *__The ctypes Module__*

The following code snippet defines a new class, __IP__, that can read a packet and parse the header into its separate fields:

In [None]:
from ctypes import *
import socket
import struct

class IP(Structure):
    _fields_ = [
        ("version",      c_ubyte,  4),  # 4 bit unsigned char
        ("ihl",          c_ubyte,  4),  # 4 bit unsigned char
        ("tos",          c_ubyte,  8),  # 1 byte char
        ("len",          c_ushort, 16), # 2 byte unsigned short
        ("id",           c_ushort, 16), # 2 byte unsigned short
        ("offset",       c_ushort, 16), # 2 byte unsigned short
        ("ttl",          c_ubyte,  8),  # 1 byte char
        ("protocol_num", c_ubyte,  8),  # 1 byte char
        ("sum",          c_ushort, 16), # 2 byte unsigned short
        ("src",          c_uint32, 32), # 4 byte unsigned int
        ("dst",          c_uint32, 32), # 4 byte unsigned int
    ]

    def __new__(cls, socket_buffer=None):
        return cls.from_buffer_copy(socket_buffer)

    def __init__(self, socket_buffer=None):
        # Human readable IP addresses
        self.src_address = socket.inet_ntoa(struct.pack("<" + "L", self.src))
        self.dst_address = socket.inet_ntoa(struct.pack("<" + "L", self.dst))

This class creates a __\_fields\___ structure to define each part of the IP header. The structure uses C types that are defined in the __ctypes__ module. For example, the __c_ubyte__ type is an unsigned char, the __c_ushort__ type is an unsigned short, and so on. You can see that each field matches the IP header diagram in __Figure 3-1__. Each field description takes three arguments: The name of the field (such as __ihl__ or __offset__), the type of value it takes (such as __c_ubyte__ or __c_ushort__), and the width in bits for that field (such as 4 for __ihl__ and __version__). Being able to specify the bit width is handy because it provides the freedom to specify any length we need, not only at the byte level (specification at the byte level would force our defined fields to always be a multiple of 8 bits).

The IP class inherits from the __ctypes__ module's __Structure__ class, which specifies that we must have a defined __\_fields\___ structure before creating any object. To fill the __\_fields\___ structure, the __Structure__ class uses the __\_new\___ method, which takes the class reference as the first argument. It creates and returns an object of the class, which passes to the __\_init\___ method. When we create our IP object, we'll do so as we ordinarily would, but underneath, Python invokes __\_new\___, which fills out the __\_fields\___ data structure immediately before the object is created (when the __\_init\___ method is called). As long as you've defined the structure beforehand, you can just pass the __\_new\___ method the external network packet data, and the fields should magically appear as your object's attributes.

You now have an idea of how to map the C data types to the IP header values. Using C code as a reference when translating to Python objects can be useful, because the conversion to pure Python is seamless. See the __ctypes__ documentation for full details about working with this module.

# *__The struct Module__*

The struct module provides format characters that you can use to specify the structure of the binary data. In the following example, we'll once again define an IP class to hold the header information. This time, though, we'll use format characters to represent the parts of the header:

In [None]:
import ipaddress
import struct

class IP:
    def __init__(self, buff=None):
        header = struct.unpack("<" + "BBHHHBBH4s4s", buff)
        self.ver = header[0] >> 4 #[1]
        self.ihl = header[0] & 0xF #[2]

        self.tos = header[1]
        self.len = header[2]
        self.id = header[3]
        self.offset = header[4]
        self.ttl = header[5]
        self.protocol_num = header[6]
        self.sum = header[7]
        self.src = header[8]
        self.dst = header[9]

        # Human readable IP addresses
        self.src_address = ipaddress.ip_address(self.src)
        self.dst_address = ipaddress.ip_address(self.dst)

        # Map protocol constants to their names
        self.protocol_map = {1: "ICMP", 6: "TCP", 17: "UDP"}

The first format character (in our case, __<__) always specifies the endianness of the data, or the order of bytes within a binary number. C types are represented in the machine's native format and byte order. In this case, we're on Kali (x64), which is little-endian. In a little-endian machine, the least significant byte is stored in the lower address, and the most significant byte in the highest address.

The next format characters represent the individual parts of the header. The __struct__ module provides several format characters. For the IP header, we need only the format characters __B__ (1-byte unsigned char), __H__ (2-byte unsigned short), and __s__ (a byte array that requires a byte-width specification: __4s__ means a 4-byte string). Note how our format string matches the structure of the IP header diagram in __Figure 3-1__.

Remember that with __ctypes__, we could specify the bit-width of the individual header parts. With __struct__, there's no format character for a __nybble__ (a 4-bit unit of data, also known as a __nibble__), so we have to do some manipulation to get the __ver__ and __hdrlen__ variables from the first part of the header.

Of the first byte of header data we receive, we want to assign the __ver__ variable only the __high-order__ nybble (the first nybble in the byte). The typical way you get the high-order nybble of a byte is to __right-shift__ the byte by four places, which is the equivalent of prepending four 0s to the front of the byte, causing the last 4 bits to fall off __[1]__. This leaves us with only the first nybble of the original byte. The Python code essentially does the following:
```
0   1   0   1   0   1   1   0   >> 4
-----------------------------
0   0   0   0   0   1   0   1
```
We want to assign the __hdrlen__ variable the __low-order__ nybble, or the last 4 bits of the byte. The typical way to get the second nybble of a byte is to use the Boolean __AND__ operator with __0xF__ (00001111) __[2]__. This applies the Boolean operation such that __0 AND 1__ produce 0 (since 0 is equivalent to __False__, and 1 is equivalent to __True__). For the expression to be true, both the first part and the last part must be true. Therefore, this operation deletes the first 4 bits, as anything ANDed with 0 will be 0. It leaves the last 4 bits unaltered, as anything ANDed with 1 will return the original value. Essentially, the Python code manipulates the byte as follows:
```
      0   1   0   1   0   1   1   0
AND   0   0   0   0   1   1   1   1
-----------------------------------
      0   0   0   0   0   1   1   0
```
You don't have to know very much about binary manipulation to decode an IP header, but you'll see certain patterns, like using shifts and AND over and over as you explore other hackers' code, so it's worth understanding those techniques.
In cases like this that require some bit-shifting, decoding binary data takes some effort. But for many cases (such as reading __ICMP__ messages), it's very simple to set up: Each portion of the __ICMP__ message is a multiple of 8 bits, and the format characters provided by the __struct__ module are multiples of 8 bits, so there's no need to split a byte into separate nybbles. In the __Echo Reply ICMP message__ shown in __Figure 3-2__, you can see that each parameter of the ICMP header can be defined in a struct with one of the existing format letters (BBHHH).

<div align="center" width="100%">
<img src="https://github.com/damianiRiccardo90/BHP/blob/master/C2-Writing_A_Sniffer/ICMP_Echo_Reply_Message.png?raw=true" alt="From Client to Server" width="50%">
<p style="text-align:center"><em><strong>Figure 3-2:</strong> Sample Echo Reply ICMP message</em></p>
</div>

A quick way to parse this message would be to simply assign 1 byte to the first two attributes and 2 bytes to the next three attributes:

In [None]:
class ICMP:
    def __init__(self, buff):
        header = struct.unpack("<" + "BBHHH", buff)
        self.type = header[0]
        self.code = header[1]
        self.sum = header[2]
        self.id = header[3]
        self.seq = header[4]

Read the __struct__ [documentation](https://docs.python.org/3/library/struct.html) for full details about using this module. You can use either the __ctypes__ module or the __struct__ module to read and parse binary data. No matter which approach you take, you'll instantiate the class like this:

In [None]:
mypacket = IP(buff)
print(f'{mypacket.src_address} -> {mypacket.dst_address}')

In this example, you instantiate the IP class with your packet data in the variable __buff__.

# *__Writing the IP Decoder__*

Let's implement the IP decoding routine we just created into a file called __sniffer_ip_header_decode.py__, as shown here:

In [None]:
import ipaddress
import os
import socket
import struct
import sys

class IP:
    def __init__(self, buff=None):
        header = struct.unpack("<" + "BBHHHBBH4s4s", buff)
        self.ver = header[0] >> 4
        self.idl = header[0] & 0xF

        self.tos = header[1]
        self.len = header[2]
        self.id = header[3]
        self.offset = header[4]
        self.ttl = header[5]
        self.protocol_num = header[6]
        self.sum = header[7]
        self.src = header[8]
        self.dst = header[9]

        # Human readable IP addresses
        self.src_address = ipaddress.ip_address(self.src)
        self.dst_address = ipaddress.ip_address(self.dst)

        # Map protocol constants to their names
        self.protocol_map = {1: "ICMP", 6: "TCP", 17: "UDP"}
        try:
            self.protocol = self.protocol_map[self.protocol_num]
        except Exception as e:
            print("%s No protocol for %s" % (e, self.protocol_num))
            self.protocol = str(self.protocol_num)

    def sniff(host):
        # Should look familiar from previous example
        if os.name == "nt":
            socket_protocol = socket.IPPROTO_IP