Skip to content


Adam edited this page Feb 27, 2013 · 3 revisions
Clone this wiki locally

Table of Contents


NetThreads allows you to run threaded software on the NetFPGA with very little effort. Researchers interested to prototype ideas and test new theories/algorithms/applications, will find that it is much easier to write a few lines of C code than to write, synthesize and debug Verilog.

NetThreads is part of a larger project studying soft processor architectures, with the goal of allowing software programmers to easily take advantage of the FPGA fabric. The bitfile released corresponds to a 4-way multithreaded, 5-pipeline-stage, two-processor system described in the paper. This version of the processor functions with the default round-robin thread interleaving, which should give you a better performance than a comparable single-threaded 2-processor system, when all the threads compute in parallel. This design may not be ideal when threads spend a significant amount of time waiting for synchronization (as explained in the paper), in which case, other flavors of NetThreads may be more suitable. In particular, NetThreads-RE or NetTM might be of particular interest to you.

Project summary

Status :
Initial release (beta version)
Version :
Authors :
Martin Labrecque, J. Gregory Steffan, Geoffrey Salmon, Monia Ghobadi and Yashar Ganjali
NetFPGA base source :

Installation steps

First, make sure that you installed the NetFPGA base properly (i.e. it is listed as a network device using ifconfig), that you can load a bitfile on the FPGA and verify that packets flow as expected through the design.

To run a program on the NetFPGA, you need to obtain the compiler package, install it, and run the sample applications. Once you establish that those applications work, you can move on to edit those applications or create your own.

Obtain Project Tarball

Download from NetThreads

Decompress the tarball,

 tar xzvf netthreads_1.0.tar.gz

you should obtain the following result:


Load the bitfile onto the FPGA: --all
 nf2_download netthreads/bit/netthreads.bit

If you get a message similar to this:

Error Registers: 0
Good, after resetting programming interface the FIFO is empty
Download completed - 2377668 bytes. (expected 2377668).
DONE went high - chip has been successfully programmed.
CPCI Information
Version: 4 (rev 1)

Device (Virtex) Information
Device info not found

WARNING: NetFPGA device info not found. Cannot verify that the CPCI version matches.
do not be alarmed, all is well. While the host software has new checks to determine if the code is loaded properly (that I didn't take into consideration when I built the bitfile), the FPGA was nonetheless programmed successfully and NetThreads should be operational.

Regression Test 1

This test consists of a executing a simple application where packets are echoed from one port of the NetFPGA to another.

Setting up

Wire your computers:

  Wire eth1 to nf2c0 and eth2 to nf2c1

Obtain a program such as tcpreplay to send packets directly through a network interface. Obtain also a packet trace to replay (you can record one easily with tcpdump).

Compile the loader, i.e. the program used to upload the program to the NetFPGA. Note that NF2_ROOT must be defined in the environment (the root folder of the NetFPGA distribution): this should be setup already if you followed the NetFPGA installation instructions.

 cd netthreads/loader
 cd ../../

Get ready to monitor the two wires that you connected above. In two separate terminals, run:

 tcpdump -i eth1 -n -vv -XX -s 0
 tcpdump -i eth2 -n -vv -XX -s 0

Compile the pipe application

This program takes packets from one port and sends them to port XOR 2 (e.g. packets received on the MAC port 0 will be sent on the MAC port 1, because destination ports are one-hot encoded and the DMA and MAC ports are interleaved Wiki).

 cd netthreads/src/bench/pipe
 make embed

your program is compiled and ready to run.

Load your program on the NetFPGA

The program is loaded in two stages: the instructions first, then the data sections. Once loaded, it will start working almost instantly.

 ../../../loader/loader -i pipe.instr.mif 
 ../../../loader/loader -d -nodebug

Note that you will have to issue these commands as the root user or by preceding the commands by 'sudo '.

Send packets through eth1, they will be received by nf2c0 and sent to nf2c1

 tcpreplay -i eth1 some_packet_trace.trace

In your two terminals, you will see packets being copied from one port to the other: keep in mind that packet ordering is not enforced in this application.

Regression Test 2

In this example, we will create and application from the ground up.

 cd netthreads/src/bench
 mkdir hello

Create a Makefile file with the following contents:

 include ../
 hello: hello.o pktbuff.o memcpy.o

Create a hello.c file with the following contents:

 #include "support.h"
 #include "pktbuff.h"
 const char* mystr = "Hello world";
 #define STR_SIZE 64
 struct netfpga_to_driver {
  char str[STR_SIZE];
 #define PKT_SIZE (sizeof(struct ioq_header) + sizeof(struct netfpga_to_driver))
 int main() 
   // only run this program on thread 0
  if (nf_tid() != 0) {
    while (1) {}
   // initialize
   // allocate an output buffer
  t_addr *pkt = nf_pktout_alloc(PKT_SIZE);
   // setup the ioq_header
  fill_ioq((struct ioq_header*)pkt, 2, sizeof(struct netfpga_to_driver));
   // get a pointer to the payload struct
  struct netfpga_to_driver* n2d = 
    (struct netfpga_to_driver*) (pkt + sizeof(struct ioq_header));
   // initialize the message
  memset(n2d->str, 0, STR_SIZE); 
  memcpy(n2d->str, mystr, strlen(mystr));
  n2d->str[strlen(mystr)+1] = nf_tid() + 0x30; //thread id in ascii
   // send it
  nf_pktout_send(pkt, pkt + PKT_SIZE); 
  return 0;

Compile the program

 make embed

Load it:

 sudo ../../../loader/loader -i hello.instr.mif ;  
 sudo ../../../loader/loader -d -nodebug

From the interface connected to nf2c1, you should see you hello message with tcpdump.

NOTE 1: We have observed that it is required to wait a few seconds after nf2_download and before loading the program, otherwise, the outgoing packets are just ignored. If you want to script both steps at once, insert a "sleep 3" after nf2_download.

NOTE 2: Refer to the documentation in the NetThreads package and below for the meaning of the functions used. In particular, nf_pktout_alloc() allocates a slot in the output memory for a single use. The slot is freed when the packet is sent.

To run this program on all threads (i.e. to get all threads to print an hello world message), we only need to ensure that the initialization is performed by thread 0. For this purpose, Make the following code substitution:

*Lines to change* *New code*
if (nf_tid() != 0) { 
   while (1) {} 
 // initialize
int mytid = nf_tid(); 
if(mytid != 0) 


Creating your own programs

The pipe application above is a very simple application, you can now try the ping application, which answers ARP and ICMP ping requests or run any code you like, following the model in netthreads/compiler/src/bench/template/.


  • You might want to add the loader program in your PATH
  • The compiler provided contains the full compiler tool-chain if you want to dig deeper into what instructions are actually performed, you might want to take a look a the disassembler.

Debugging programs

When compiling with

 make CONTEXT=sw

an executable for the machine you are using will be produced. It will be single threaded and has the option of reading packets either from a packet trace or from the network (by default using tap devices, see the sw_* files in the netthreads/compiler/src/bench/common/ folder). Using this mechanism, you can run the exact same code on the host machine (no changes necessary) to verify that the functionality is correct.

There exists a cycle-accurate simulator/debugger for the processor that models the parallelism but it is not released as of yet.

Getting more performance

Other versions of the processors exist (for example, single threaded, multithreaded with thread scheduling and other new and more exciting features). There also exists an interface to trace processors as they run in hardware. If you have a serious interest in utilizing these other processors, or using the additional packages, please contact the authors. The source code can be made available.

  • After discussions at the NetFPGA'09 workshop, we realized that the processor system could be improved if your main goal is to perform packet forwarding. For this reason, we developed NetThreads-RE.
  • NetTM provides some improvements in terms of speeding-up and making easier synchronization across threads. Note that NetTM doesn't have the merged packet buffer that NetThreads-RE has.
  • Please let us know what features you are interested in and we will include them, if possible, in the next release.

Contributing Applications

We are always on the lookout for interesting applications to run on the NetFPGA: if you think that your program could be a good soft processor benchmark, please contact the authors.

A word of caution

For all NetFPGA-related problems, consult the Netfpga forum.

Few hard-working researchers are involved in this project and unless you have a specific bug report regarding a specific piece of code that doesn't run the way it is expected to run, packaged in a way that we can easily reproduce, it is likely that we will not be able to assist you.

More Detailed Information on NetThreads

NetThreads is a platform for packet processing on the NetFPGA. It lets you write C programs that run on the NetFPGA. The programs are cross- compiled and executed by soft-processors instantiated in the FPGA. By using NetThreads you can create applications for the NetFPGA without having to write a line of Verilog.


NetThreads has two CPUs, each of which has 4 threads. There is only a small software library and no operating system. Unlike threads and processes in a normal CPU, there is no context-switch overhead between the threads. They execute in round-robin order, with each thread executing one instruction every 4 clock cycles. The CPUs use network byte-order regardless of the endian-ness of the host CPU.

The CPUs in NetThreads fit into the same hardware pipeline used by other NetFPGA applications like the NIC and router. A description of the pipeline also applies to NetThreads. In the diagram of the pipeline, the NetThreads CPUs replace the Output Port Lookup module and sit be- tween the Input Arbiter and the Output Queues. In the reference pipeline, there are 4 input queues (labelled "CPU RxQ" in the diagram) for packets copied over the PCI bus from the host computer's CPU. However, in NetThreads, only one of these queues is connected to the input arbiter. The remaining 3 are not connected and packets sent to them will never be read by NetThreads. The 4 MAC RxQs are connected to the input arbiter.

Source and Tools

Developing NetThreads programs requires a cross-compiler and software library. The tools and compiler are available here.


Here is an overview of the important parts of the source code:

  src/ the root of the source repository
       bench/ Contains software that runs in NetFPGA
            common/ library used by all NetThreads applications
            pipe/ simple application that resends all received packets
            template/ example skeleton of a typical application

NetThreads applications have multiple threads with a single entry point. Threads are not dynamically created or destroyed. Instead, they all begin executing the same function at the same time. To change the behaviour of a specific thread, call nf_tid() to get the current thread's unique thread id and branch based on the result.

 #include "support.h"
 int main (void) {
   if ( nf_tid() == 2 ) {
       log (" Hello world. I am the wonderful thread 2\n");
   } else {
       log (" Hello world. I am thread %d\n" , nf_tid() );
   return 0;

Build System

Behind the scenes, the build system for NetThreads applications is a bit complicated. Fortunately, the complexities are mostly hidden and the individual Makefiles in the application directories are relatively simple. Here is pipe's Makefile:

 include ../
 pipe : process.o pktbuff.o memcpy.o

The TARGETS variable should contain the names of the one or more resulting binaries, and the variable must be set before including For each target in TARGETS, the makefile must explicitly say what object files the application depends on. These objects files, plus one or two other default ones, are linked together to build the binary. Note that although pipe depends on process.o, pktbuff.o, and memcpy.o, only process.c is in the pipe directory. The other source files, pktbuff.c and memcpy.c, are in bench/common. Any object file that builds from a source file in the common directory can be used in this way. Also, the headers in the common directory can be included in source files as if they were local. It's not necessary to add "../common/" to the paths.

In this release, NetThreads applications can be compiled for two different contexts:

nf: Builds the app to run on the NetFPGA. This is the default context. All calls to the log function are ignored.
sw: Builds the app to run as software on the host computer. Instead of compiling using a cross-compiler this uses the native gcc toolchain. A lot of the normal NetThreads functions like sending or receiving packets can be noops in this context, use packet traces or use live network devices on the host (see ../common/sw_* files). This is useful when porting existing applications to NetThreads to verify the functionality: the same code running on the NetFPGA should run on the host (the difference is that the supplied C function library is not as extensive as the one on your host computer, but you can always supply your own implementation).

Select the context by passing the name of the context to make, e.g.:

 make CONTEXT=sw

The default context is nf. When switching between contexts, run make clean to ensure all object files are recompiled.

NetThreads API

The bench/common directory contains functions and structs useful for writing NetThreads applications. This document describes the most important and commonly used functions, but the best and most up-to-date source of information is still the code itself.

Misc Functions

uint nf_time() Returns the current NetFPGA time. It increments once every clock cycle at a frequency of 125MHz. The time is returned as a 32-bit number, so it will wrap around to zero roughly every 34 seconds.
int nf_tid() Returns the current threads unique id. The id is a number in the range [0,].
void log(char *frmt, ...) Prints a string. The arguments of log are the same as printf's. This function only has an effect in the sim and sw contexts. In the nf context, the function is defined away.

Receiving Packets

NetThreads places arriving packets into the input memory.

void nf_pktin_init() Initializes NetThreads for receiving packets by dividing the input memory into tens slots of 1600 bytes each. Must be called at most once from a single thread.
t_addr nf_pktin_pop()* Checks if a packet has been received. Arriving packets will be returned by nf_pktin_pop() only once and are returned in the order they arrived. If a packet is waiting, then this function returns a pointer to the IOQ header at the start of the packet. Use nf pktin is valid to determine if a packet is actually returned.
int nf_pktin_is_valid(t_addr addr)* Determines if a pointer returned by nf_pktin_pop() is actually a packet or not. Returns true if the pointer is a valid packet, false otherwise.
void nf_pktin_free(t_addr val)* Tells NetThreads that the application has finished reading a packet returned by nf_pktin_pop(). After calling this function, an application should not read the packet contents again. It is important to call this function as soon as possible. If packets in the input memory are not freed then arriving packets cannot be stored, which quickly leads to packet drops.

Sending Packets

To send packets an application must fill in the packet headers and payload in the NetThreads output memory. The hardware can only send packets that are stored in the output memory. To send a received packet, the application must copy the packet from the input memory to the output memory.

void nf_pktout_init() Initializes NetThreads for sending packets. Must be called only once from a single thread.
t_addr nf_pktout_alloc(uint size)* Allocates space in the output memory for a packet. Currently the argument size is ignored and all spaces returned are 1600 bytes long. The returned address is either a pointer to the newly allocated space or 0 if no space is available.
void nf_pktout_send(char start addr, char* end addr)* Sends a packet. The argument start addr points to the start of the packet and end addr points to the byte after the last byte of the packet.

IOQ Headers

All packets received and sent by NetThreads start with an 8 byte header that is added and removed by the NetFPGA itself and does not exist on the wire. This header is called the IOQ Modules Header (or just IOQ Header) and is described here.

The IOQ header specifies both the length of a packet and which NetFPGA port it was received on and which port it will be sent to. The file bench/common/pktbuff.h contains the following for working with IOQ headers:

struct ioq_header The actual in-memory layout of the header.
void fill_ioq(struct ioq_header *ioq, unsigned short port, unsigned short bytes) Fills the IOQ Header of a packet that will be sent. It does not set the source port of the packet because the hardware for sending packets ignores the source port field.


NetThreads offers 16 mutexes for synchronizing between threads. Each mu- tex or lock is identified by a integer between 0 and 15 (higher numbers simply wrap around and identify the same 16 locks).

void nf_lock(int id) Acquires a lock.
void nf_unlock(int id) Releases a lock.

Think of the locks as 16 booleans. If a lock is false, then nf_lock() will set the lock true and return immediately. If a flag is already true, then nf_lock() will block the calling thread until the lock is false. Calling nf_unlock() sets a lock false and never blocks.


After compiling an application in the nf context, run

 make mif to create *.instr.mif and *.data.mif

which contain the application's instruction and data segments, respectively. These files can then be loaded onto NetThreads. The loader program in netfpga/loader will take this files, download the application to the NetFPGA and start the application running.

Steps to run a NetThreads application:

  1. Reprogram the CPCI. This needs to be done once every time the computer is
         booted (doing it more often doesn't hurt though). The
NetThreads system was built against the Verilog files from an older ver- sion of the NetFPGA distribution (1.2.5), and it requires a matching CPCI bit-file.
  1. Download the NetThreads bitfile to the NetFPGA.
  2. Download the application's instructions.
         netthreads/loader/loader -i app.instr.mif
replacing "app" with the correct name.
  1. Download the application's data.
         netthreads/loader/loader -d -nodebug
replacing "app" with the correct name. As soon as the data is loaded the application will begin running.