BJP

This design is an implementation of Bounded Jitter Policy (BJP) algorithm [1] for NetFPGA platform. This implementation has been used in a paper titled "Preserving Pacing in Real Networks - An Experimental Study Using NetFPGA" [2].

Multiple stages of buffering, multiplexing and de-multiplexing of flows through the network can affect packet timings. Even a paced flow at the source of the network can become bursty at the destination. BJP aims to conserve the spacing between packets while they are traversing through a network.

Table of Contents Project summary Download Description What is BJP How does this design works Limitations in this version Some Details on Hardware Design Usage Tests Debugging Functional Related Works References

Project summary

Status :

beta version

Version :

0.9

Authors :

Hamed Tabatabaei (hamedty@cs.toronto.edu) Yashar Ganjali

Download

download source codes here.

Description

What is BJP

BJP functionality: a- two packets arrived at the same time b- normal routers insert different delay for each one c- BJP routers label packet which sent out sooner to have fixed over all delay."

When a paced flow passes through a network of routers and switches, the inter-arrival times of packets can change significantly. Therefore, even if packets of a given flow are paced initially, they can become arbitrarily bursty as they go through the network, which means the benefits of pacing will diminish.

In order to solve this problem and preserve pacing of packets throughout the network, Beheshti et al. [1] introduced BJP algorithm, an extremely simple traffic shaping scheme which aims at retaining the smoothness of traffic no matter how many routers it passes. The basic idea behind BJP is to try to add a fixed delay to each packet at each router. In other words, whenever a packet "P" arrives to a given router at time "t", BJP will try to send the packet out at time "t+delta", for a predefined constant "delta". Clearly, this is not always feasible due to contention: different packets arriving at different ingress ports at a given time, "t", and destined to the same egress port need to depart at the same time "t+delta" which is not possible.

To solve this problem, BJP slightly reduces the delay for one of the incoming packets so that there is no contention: the first packet departs at time "t+delta-1" and the second one departs at "t+delta" (here we are assuming that time is slotted, and it takes one unit of time to transfer each packet). At the same time, BJP marks the first packet so that the reduction in delay is compensated later on one of the subsequent routers (if one exists). The goal is to keep the expected delay of a packet which has gone through "N" routers equal to "N*delta" despite of minor setbacks resulting from contention. If a packet cannot be scheduled within the time interval [t+delta-alpha,] (where "alpha" is another predefined parameter) BJP simply drops the packet.

The above figure illustrates an example of a simple network with ordinary routers and BJP routers. (a) Two packets arrive simultaneously at two input ports. (b) Regular routers send packets out back-to-back immediately.(c) BJP routers try to delay both packets for delta units of time. Since there is congestion, packets are sent out back-to-back, but one of them is marked so that another router on the path delays it with delta-1 rather than delta units of delay.

How does this design works

The major differences between this design and the standard pipeline of NetFPGA are (i) in our design every packet is stored in the SRAM at the arrival time, whereas in the standard design packets are passed through pipeline completely. And all of the stuffs like routing and queuing in switch are performed using packet pointers and some information from their header. But in NetFPGA reference design, packets themselves were passed through packets pipeline. This feature enabled us to do many computation and processes by operating on links rate. (ii) Also in our design every switch activity is logged and log entries are sent-out in a form of packets to PCI slot.

As clear in the image, in our BJP design the first module is input arbiter. The input arbiter continuously fetches ingress ports and when a packet arrives, the exact arrival time will be recorded and the payload of the packet will be copied to SRAM. The input arbiter can serve all of the ingress ports simultaneously; it means when a packet arrives, it should not wait for packets from other ports to be served. The input arbiter also routes the packet based on some bits from packet header using as hop-count. The input arbiter creates an entry for the packet which includes the packet pointer, and passes it to the scheduler of selected port.

For each egress port there is a scheduler which performs BJP procedure. Each scheduler has a table for incoming entries from the input arbiter. When the entry of received packet arrives at a scheduler, the scheduler runs BJP algorithm to find a place for this entry in its scheduling table and if it cannot (based on BJP procedure), the entry (the packet) will be dropped.

Well before sending time of each packet, the schedulers send the entry of the packet to their senders. The senders receive packet pointers from schedulers, read packets from SRAM and exactly at the scheduled time send the packets out.

Also there is a timer module in our design. The timer slots the time domain and also it is connected to all of the other modules to aware them of current time and time-slot number.

When a packet arrives or departs the NetFPGA, there will be a log entry generated either by the input arbiter or the senders. The exact arrival time or departing time of packets will be written in log entries. These log entries will be collected in the logger module. The logger sends out several of log entries in format of a packet to CPU ports of NetFPGA. We used these logs as our measurement tool.

Since there are four senders in this design, we need a SRAM read arbiter, then all of the senders can read from SRAM simultaneously (In the current version we use NetFPGA default SRAM arbiter, then only two egress ports are functional and SRAM reading via "regread" command is disabled).

In the current BJP implementation we use hop counting to choose egress port of each packet. In the header of each packet we specify a field to keep the number of hops each packet should pass and after that the packet would be dropped on the port MAC1 (else would be forwarded on the MAC0) . Also BJP needs a field in the packet header to place the current lag of the packet from desired time. We use first 64 bits of each packet for this purpose.

Limitations in this version

This version uses NetFPGA SRAM-arbiter. Therefore there is only two functional egress port (instead of four) and sram reading via regread command is disabled.

Some Details on Hardware Design

(*many of below parameters can easily be changed by editing the parameter section in the "user_data_path.v")

As it is mentioned above, the BJP slots the time domain. This procedure performed in the timer module. By default each time slot is 1524 clock cycles equal to 12192 ns. These parameters are accessible under the name of SLOT_SIZE.
By default normal delay for each packet in this BJP procedure is 10 time slots.
This design uses first 64 bits of packet headers as below (which are basically using for the destination and source MAC addresses):

BITS: 63-60 59-55 54-12 11-0

FIELD: Hop Counting BJP Lag Field UNUSED Packet ID

PARAMETER: ROUTE_LENGTH LAG_LENGTH PACKET_ID_LENGTH

Hop Counting: When a packet arrives at the router if this field is larger than zero, the packet would be routed to the port MAC0. And if it is 0, the packet would be routed to the MAC1. At the leaving time of packet the hop counting would be decreased by 1. (note that you can easily change this algorithm by editing the verilog source codes. We set as above for simplicity in our tests)

BJP Lag Field: During BJP procedure if any packet can not be sent out on the desired time of the BJP procedure, the lag from desired time would be written in this field to be used and compensated by next BJP router.

Packet ID: This field used only for logging proposes. The ID of the arrived or departed packet is recorded in all of log entries.

The above fields are used as below in normal Ethernet packets:

BITS: 63-16 15-0

FIELD: Destination MAC Address Source MAC Address

The input arbiter module and all of the senders generate log entries for each packet which departs or arrives, as below:

NUMBER OF BITS:	12	19	19	4	5	2
FIELD:	Packet ID	Arrival (Departure)Time	Address(Pointer) of the Packet in SRAM	Hop Count	BJP Lag	Arrived (Departed)Port Number
PARAMETER:	PACKET_ID_LENGTH	MAIN_TIME_BITS	SRAM_ADDR_WIDTH	ROUTE_LENGTH	LAG_LENGTH	PORT_BITS

The width of these fields can be changed by changing listed parameters. The log entries are collected at the logger module and several of them (by default 10; set by LOG_PACKET_SIZE) are sent out on the CPU0 port, which can be collected by tcpdump command on the nf2c0 port.

The modules within the design (the input arbiter, schedulers and senders) are communicating using the below structure:

NUMBER OF BITS: 19 5 1

FIELD: Address(Pointer) of the Packet in SRAM BJP Lag The LSB of the Number of Slot Time Which Packet Arrived in

PARAMETER: SRAM_ADDR_WIDTH LAG_LENGTH

The last field is to recognize if the packet arrived at the current time slot or previous one (to be used in schedulers for more accurate timings)

Usage

Download the bit-file on the board: nf2_download nf2_top_par.bit

Turn the nf2c0 port up: ifconfig nf2c0 up

Make your desired network by connecting ports of the NetFPGA board to your network, or building your own topology with your NetFPGA routers.

Tcpdump the nf2c0 port to dump the log packets: tcpdump -i nf2c0 -w test.pcap

Use the log reader software to read the log files. The "read.py" is located in {project_directory}/sw/python, and written in python to read the pcap files. The pcap library for python should be installed. python read.py test

This command reads test.pcap file and translate it to test.txt; test.txt would be like: Packet_ID Time Pointer_Addr Hop_Count BJP_Lag Port_Number 1 51963 1 14 0 0 packet with ID# 1 and SRAM pointer address 1 arrived @ time 51963 on port 0 (eth0) with initial lag of 0 and 14 remaining hop to pass 2 54959 190 14 0 0 3 57955 379 14 0 0 4 60951 568 14 0 0 5 63947 757 14 0 0 6 66943 946 14 0 0 1 68129 1 14 0 0 packet with ID# 1 and SRAM pointer address 1 leaved the board @ time 68129 on port 0 [because hop count is larger than 0] with lag of 0 and 13 remaining hop to pass 7 69939 1135 14 0 0 2 71177 190 14 0 0

To get real time, multiply the time column by 8 ns. For example, the very first entry means packet with ID 1 arrived at time 415704 ns and left at time 545032 ns. Therefore the packet spent 129,328 ns in the router (+some fixed delays), which is equal to 10.6 time slots. The length of each time slot is 12,192 ns and the normal delay (delta) was set to 10 time slots. The 0.6 extra delay is due to the fact that BJP routers work with slotted time. Hence, the arrival of a packet at BJP procedure is always the start of the next time slot (this concept was completely explained in [2]).

Tests

Debugging

In order to debug our design in ModelSim, we include our verification test in {project_directory}/verif directory. This test can be used to debug custom changes in the design by tracking the incoming and outgoing packets.

Functional

In this section we explain the usage of BJP in a simple network using three NetFPGA cards and run a step-by-step test. We capture the log files, analyze them, and demonstrate the functionality of BJP by plotting relevant graphs using MATLAB.

1- Connect three NetFPGA cards as shown in the figure:

2- Download the packet generator bit-file on the first card and BJP bit-file on second an third cards:

(1st machine): nf2_download packet_generator.bit (2nd & 3rd machines): nf2_download bjp.bit

3- Bring the nf2c0 ports of BJP switches up, and run tcpdump command to catch the log packets:

(BJP #1): ifconfig nf2c0 up; tcpdump -i nf2c0 -w bjp1.pcap (BJP #2): ifconfig nf2c0 up; tcpdump -i nf2c0 -w bjp2.pcap

4- There are two pcap files in {project_directory}/regress/pcap directory named "hop_14.pcap" and "hop_1.pcap". These two files contain 400 packets which hop_counts on them are set to 14 and 1. And the packet ID fields in these packets are set from 1 to 400 in an increasing order (by running tcpdump command with read option one can take a look to these pcap files). Therefor, if we inject these two files from Traffic Generator on the ports 2 and 3, 800 packets (if there is no drop) will be sent out to BJP switch #1 on port 0 (because hop counts are larger than 0). On the BJP switch #2, 400 packets will be sent out on the port 0 with hop count of 12 and 400 packets will be sent out on the port 1 with the hop count of 15 (-1 on 4 bits). ./packet_generator.pl -q2 hop_1.pcap -q3 hop_14.pcap

And we expect to receive 400 packets on port 0 and 400 on port 1 as well (if there was no drop in BJP procedure)[in]: Transmit statistics: '''<tt>============</tt>''' MAC Queue 2: Packets: 400 Completed iterations: 1 MAC Queue 3: Packets: 400 Completed iterations: 1 Receive statistics: '''<tt>===========</tt>''' sh: bc: command not found sh: bc: command not found Argument "" isn't numeric in division (/) at ./packet_generator.pl line 1137. Argument "" isn't numeric in addition (+) at ./packet_generator.pl line 1137. MAC Queue 0: Packets: 399 Bytes: 600096 Time: 0.000000000 s Rate: 0 Kbps (packet data only) Rate: 0 Kbps (including preamble/inter-packet gap) sh: bc: command not found sh: bc: command not found Argument "" isn't numeric in division (/) at ./packet_generator.pl line 1137. Argument "" isn't numeric in addition (+) at ./packet_generator.pl line 1137. MAC Queue 1: Packets: 400 Bytes: 601600 Time: 0.000000000 s Rate: 0 Kbps (packet data only) Rate: 0 Kbps (including preamble/inter-packet gap) sh: bc: command not found sh: bc: command not found Argument "" isn't numeric in division (/) at ./packet_generator.pl line 1137. Argument "" isn't numeric in addition (+) at ./packet_generator.pl line 1137. MAC Queue 2: Packets: 0 sh: bc: command not found sh: bc: command not found Argument "" isn't numeric in division (/) at ./packet_generator.pl line 1137. Argument "" isn't numeric in addition (+) at ./packet_generator.pl line 1137. MAC Queue 3: Packets: 0

5- Now interrupt tcpdump commands and they should collect 160 packets ([800] / 10 entry per log packet) or less if there were drops in the previous stages: tcpdump: WARNING: nf2c0: no IPv4 address assigned tcpdump: listening on nf2c0, link-type EN10MB (Ethernet), capture size 96 bytes ^C160 packets captured 160 packets received by filter 0 packets dropped by kernel

6- Use the python read file to convert pcap files to readable log files. From the directory {project_directory}/sw/python, run:

python read.py bjp1 python read.py bjp2

to convert "bjp1.pcap" and "bjp2.pcap" to "bjp1.txt" and "bjp2.txt". These two TXT files should look like:

bjp1.txt:

Packet_ID Time Pointer_Addr Hop_Count BJP_Lag Port_Number 1 485177 393217 14 0 3 1 485177 262145 1 0 2 2 488799 393406 14 0 3 2 489673 262334 1 0 2 3 491669 262523 1 0 2 3 492420 393595 14 0 3 4 494290 262712 1 0 2 4 496041 393784 14 0 3 5 499037 262901 1 0 2 5 499662 393973 14 0 3 1 499753 262145 1 1 0 this packet sent out with 1 lag (sooner), because two packets arrived at the same time. 1 501277 393217 14 0 0 6 501908 263090 1 0 2 6 503282 394162 14 0 3 2 504325 393406 14 0 0

bjp2.txt:

Packet_ID Time Pointer_Addr Hop_Count BJP_Lag Port_Number 1 381681 262145 0 1 2 1 383205 262334 13 0 2 2 386254 262523 13 0 2 2 387778 262712 0 0 2 3 389302 262901 0 0 2 3 390826 263090 13 0 2 4 392350 263279 0 0 2 4 393874 263468 13 0 2 5 395398 263657 13 1 2 5 396922 263846 0 0 2 1 398637 262334 13 0 0 1 398637 262145 0 0 1 6 399969 264035 0 0 2 6 401493 264224 13 0 2 2 401685 262523 13 0 0

If there are two entry with the same packet ID and address, the first one is showing the packet arrival and the second one is showing the packet departure.

7- Beyond this point we use MATLAB m-files in {project_directory}/sw/matlab to analyze the results.

In {project_directory}/sw/matlab directory, there are several scripts to plot the results. The main file is "log_proc.m" which reads bjp*.txt in MATLAB and plots some useful graphs based on them (note that when these files brought in MATLAB, the log_proc.m changes the time unit from clock-cycle to nano seconds.)

Two of these graphs are shown below. First the CDF of packet inter-arrival times in BJP#1 router ingress ports.

And the second one is the overall delay of packets from source to destination of the flow hop_14.

Related Works

Based on this design, a study performed in University of Toronto. The results of that study is presented at [2].

References

[1] N. Beheshti, Y. Ganjali, A. Goel, and N. McKeown, "Obtaining High Throughput in Networks with Tiny Buffers," Proceedings of 16th International Workshop on Quality of Service (IWQoS), Enschede, Netherlands: 2008.

[2] S. H. Tabatabaei, Y. Ganjali, "Preserving Pacing in Real Networks - An Experimental Study Using NetFPGA," in NetFPGA Workshop 2010.

-- Main.HamedTabatabaei - 29 May 2010

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BJP

Table of Contents