Skip to content

TrafficClassification

Adam edited this page Mar 6, 2013 · 2 revisions

Table of Contents

NetFPGA-based Traffic Classifier

This project provides a network traffic classifier based on NetFPGA using NetThreads. A set of common terms is used in this classifier to identify the application protocol of network flows. Traffic classification is one of the most interesting problems proposed in the field of network architecture and routing. Nowadays, there are numerous applications sending traffic on the networks. New Applications are made daily and they use a huge bandwidth for their specific purposes. It is clearly discernible that Internet Service Providers would like to know which applications are sending packets to fully take advantage of their infrastructures.

There are classification algorithms to distinguish between different flows. In these algorithms, network flows are classified using the information in the packet headers (e.g. port based methods), the content of the packets (e.g. Deep Packet Inspection methods), or the features of the flow (e.g. by exploiting machine learning techniques). Although it will lead to higher accuracy, using the content of the flow is very resource intensive because a set of signatures should be kept for each protocol and the flows should be tested against all these signatures. In this approach, if a flow matches a signature it will be labeled as the corresponding protocol.

An alternative approach to signature matching in deep packet inspection methods is to use a limited set of weighted keywords for each protocol. Keywords are searched in the input flow and using a weight function, the flow is labeled as the most weighted protocol. The performance of this method is higher than the signature based methods since string matching algorithms like aho-corasick can be used to match a set of keywords against a string in polynomial time. To improve the performance of such algorithm without sacrificing the accuracy, the very first bytes of a flow can be taken into consideration since the most important keywords of a protocol mostly appear at the beginning of the request or response. This simple alternative to deep packet inspection can be easily implemented on a NetFPGA to achieve an efficient, accurate network classification method. It is important to note that since the training phase (e.g. keyword extraction) is not realtime, we can extract keywords offline and upload the terms into NetFPGA to update the classification method.

Project Summery

Status Beta

Version :
0.5
Authors :
Soheil Hassas Yeganeh, Milad Eftekhar, Mohammad Jalali, Yashar Ganjali
Base source :
NetThreads 1.0

Download

Get Dependencies

  1. Download the NetFPGA Base Package
  2. Download NetThreads

Download Classifier

Download Classifier, the layer 7 network traffic classifier tarball classifier_0.5.tar.bz2.

Installation

Preparation

  • Please ensure that NetThreads is correctly installed. It is strongly recommended to run all the regression tests and examples of NetThreads to verify the installation.

Project Contents

Extract the file classifier_0.5.tar.bz2 in the same directory that you extracted the NetThreads package in. This file contains all the bit files and dependencies required to run the classifier.

 tar xjvf classifier_0.5.tar.bz2

The directory structure should be like:

`-- classifier
    |-- bin
    |-- bit
    |-- compiler
    |-- doc
    |-- loader
    `-- src
        `-- bench
            |-- common
            |-- hello
            |-- packetclassification <-- The main directory containing source codes.
            |   |-- regress
            |   `-- sw
            |-- ping
            |-- pipe
            `-- template

The classifier has two different components

  1. The NetThreads program
    • The C program that runs on the NetThreads platform
    • Located in the classifier/src/bench/packetclassification directory
  1. Monitoring software
    • A userspace python program that shows the state of the classification
    • Located in the classifier/src/bench/packetclassification/sw directory
The first component is required for running the application and the second one is optional.

Normal Installation

Build Process

The only step needed for building the application is to run make embed in the classifier/src/bench/packetclassification directory.

cd classifier/src/bench/packetclassification && make embed

Installation Process

To load the NetThreads application to the NetFPGA, you must load the instruction and data files.

cd classifier/src/bench/packetclassification
../../../loader/loader -i packetclassification.instr.mif
../../../loader/loader -d packetclassification.data.mif -nodebug

Easy Installation

Since this application will work only with a specific version of NetThreads bit files, we have put all the required files in the project package and created a simple script to build and load the software.

cd classifier/src/bench/packetclassification && ./packetclassification.setup.sh

Description

As shown in figure 1, conceptually, our system consists of three main components. Classifier is the main component of our system and runs in the NetFPGA. The classifier module can utilize all of the four network interfaces of the NetFPGA. Different or same interfaces can be used for reading/forwarding packets form/to the network. The classifier module is implemented using NetThreads [1]. Classifier Console is responsible for visualizing the flow statistics periodically sent by the classifier. Term Extraction Engine is the offline component of our system. It periodically captures packets and classify them in order to extract important terms from. Once the term set for a protocol is changed the complete set of extracted terms will be sent for the classifier to update its term set.

Figure 1. Conceptual Design

Regression Test

In order to test the installation, you should audit the packets send through the interfaces by the software. The only type of packets frequently sent by the software are the statistic packets sent by the monitor thread. We have described the regression test in the following steps.

It should be noticed that the ports used to read packets, or send statistics can be defined in packetclassification.c in the classifier/src/bench/packetclassification directory.

...
#define RECV_PORT 0
#define MONITOR_PORT 0
...

Step 1

In the default settings, packets are read from and statistics are send through the first NetFPGA interface. If the software is correctly installed, statistical information will be sent through the MONITOR_PORT at least once a second. The first step to test the installation is to capture those packets through one of the host interfaces.

Wire eth1 of nf2c0 and then run tcpdump to see whether the statistic packets are generated. Before injecting any packet to the NetFPGA, it should show an array of =0=.

tcpdump -n -i eth1 -XX -vvv -s0


22:06:34.920155 00:00:ff:ff:ff:ff > 00:00:00:00:00:00 Null Information, send seq 0, rcv seq 0, Flags [Command], length 50
	0x0000:  0000 0000 0000 0000 ffff ffff 0000 0000  ................
	0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0030:  0000 0000 0000 0000 0000 0000 0000 0000  ................
22:06:35.720155 00:00:ff:ff:ff:ff > 00:00:00:00:00:00 Null Information, send seq 0, rcv seq 0, Flags [Command], length 50
	0x0000:  0000 0000 0000 0000 ffff ffff 0000 0000  ................
	0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0030:  0000 0000 0000 0000 0000 0000 0000 0000  ................

This packet is a raw packet containing one an array of integers (a 32-bit word counter) for each protocol. The array end is marked by 0xFFFFFFFF.

Step 2

To test the classifier a set of flows should be written to the eth1 interface of the host. There is a pcap file available at classifier/src/bench/packetclassification/regress/data.pcap containing 3 HTTP flows. After replaying the pcap file, the first 4 bytes of the packets generated by the NetFPGA must show the number of the HTTP flows injected.

tcpreplay -i eth1 data.pcap

tcpdump -n -i eth1 -XX -vvv -s0

22:12:17.320881 00:00:ff:ff:ff:ff > 00:00:00:03:00:00 Null Information, send seq 0, rcv seq 0, Flags [Command], length 50
	0x0000:  0000 0003 0000 0000 ffff ffff 0000 0000  ................
	0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0030:  0000 0000 0000 0000 0000 0000 0000 0000  ................
22:12:18.120881 00:00:ff:ff:ff:ff > 00:00:00:03:00:00 Null Information, send seq 0, rcv seq 0, Flags [Command], length 50
	0x0000:  0000 0003 0000 0000 ffff ffff 0000 0000  ................
	0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
	0x0030:  0000 0000 0000 0000 0000 0000 0000 0000  ................

To change the terms for classification a packet containing a set of terms should be injected into the interface. The format of term set should be like the INITIAL_TERMS in the packetclassfication.c file.

#define INITIAL_TERMS "$HTTP:1,GET:1$USER:2,PASS:1$$"

References

  1. Martin Labrecque, J. Gregory Steffan, Geoffrey Salmon, Monia Ghobadi, Yashar Ganjali. “NetThreads: Programming NetFPGA with Threaded Software”. NetFPGA Developers Workshop, Palo Alto, CA, August, 2009.
Clone this wiki locally