Skip to content

MarilenaKokkini/IoT-Packets-Classification-Thesis-Aueb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

IoT-packets-classification

classify packets to mqtt/coap/amqp protocols This parser was a part of my senior thesis project. The parser reads any pcap file and classify them into 3 categories, creating corresponding csv files. Each file has significant characteristics for each category such as the size of the payload, the payload ratio, the difference in time between icoming packets and other important attributes. programming language: Python The code is divided into multiple methods and each method has a specific functionality. The basic method that I created is called “pcap_pkt_reader” and it summons the majority of the other methods in the file. This method checks whether the given pcap file exists and if it does, it lists the given packets. After the packets are listed, I locate if the packet uses Tcp or Udp protocol and for both cases I follow the same process. Firstly, I check whether the packet runs over ethernet/cooked-linux or another protocol and I get the mac addresses of the packet, using my method “find_first_layer_protocol”. After that I detect the Ip addresses and the corresponding ports, using the methods “get_ip_addresses” and “get_ports”. Then, I capture tcp/udp payload in order to examine the existence of IoT protocols. As we know, the higher level a protocol is, the inner its header in the packet is. I converted the payload into string format because the previous type was a Scapy type and it was quite uncomfortable to work with. I make sure that the payload is not empty, otherwise there is no use of examining this packet. If the payload is empty, it means that there is no header for a higher layer. Two characteristics that I embedded in my code later was the date and the time the source sent each packet, using “get_date_and_time” method. These characteristics seem to be vital for any process of the packets. Both date and time needed a reformation in order to be converted in human readable form.

I first investigate the existence of Amqp protocol. This is checked by my method “is_amqp” [20], with the instance of the packet, its payload and its transfer protocol as parameters. To construct this method, I first found out the architecture of an amqp packet. I figured out that the first packet that establish the connection will conclude the word “AMPQ” inside the payload. In addition, each amqp packet has a fixed header (8 bytes) and always ends in the same way. The frame always ends with this specific byte: %xce. Due to Iana organisation, amqp packets always use 5671 or 5272 port. This information was the criteria for me to classify a packet as amqp or not.

If the packet does not use Amqp, then I check if it is an mqtt packet. I found again the architecture of the packet an created an algorithm based on that. I examine as before if the word mqtt is in the payload. After that I search whether the first byte of the packet is one of the possible mqtt header combinations. I have listed all the possible combinations in another method, called “create_mqtt_first_byte” [21]. This method is based on the information that each mqtt packet has a fixed protocol type and a fixed combination of flags. So, I created a list with all possible combinations of types and flags and I use the list to match the first byte of the given packet with one of these codes. If the packet matches with one of these codes then we mark it as mqtt. Otherwise we examine ports. If the packet uses 1883 or 8883 port, then due to Iana organisation the packet runs over mqtt.

The last check is about coap protocol. Coap protocol has always a fixed header of 4 bytes. The first two bits are the version, which must always be 01. Then there are four different coap packet types, two bits each. Then there is a random token length, which is a number between 0-8 and needs 4 bits. I combine those cases and then I check if the first byte of the packet match with one of those combinations. If there is a match then I mark this packet as coap otherwise, I scan the ports. Iana has recorded that coap protocol uses the port 5683.

For each case I store the vital information in a corresponding list, so that I can pass it in a csv file. Those characteristics are extracted with the help of “extract_characteristics_from_packet” method. This method returns the size of the given packet, whether the packet is encrypted by checking the existence of ssl/tls, the size of the payload, the ratio of the payload, the ratio of the previous packet, the size of the previous packet and the payload in a hexadecimal format.

Lastly, I produce three different csv files. I create a coap file using “write_coap_file”, an mqtt file using “write_mqtt_file” and an amqp file using “write_amqp_file”. These methods are using the same algorithm. I firstly construct the header and then I zip the lists with the information of each protocol, in order to create lines with tuples of the characteristics mentioned above. Then I pass these lines into the file.

About

classify packets to mqtt/coap/amqp protocols

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages