Understanding the hardware modules is essential in making the most of the available designs. The distributed projects in the NetFPGA package, all follow the same modular structure except for modifications in the output port lookup module. Here we consider the Reference NIC project as an example. We shall discuss the pipeline where each stage is a separate module. The diagram of the pipeline is shown below.
Packets first enter the device through the nf10_10g_interface module, which is an IP that combines Xilinx XAUI and 10G MAC Xilinx IP cores, in addition to an AXI4-Stream adapter. There are 4 such module instances in the design, one per port. The packets arriving from the external 10G PHY over the XAUI interface are first transformed into XGMII signals by Xilinx XAUI core, those are next read in by Xilinx 10G MAC and finally transformed into AXI4-Stream. The TX side follows the exact same path but in the opposite direction.
The nf10_10g_interface module's RX queues connect next to the input arbiter module. The input arbiter has five input interfaces: four from the nf10_10g_interface modules and one from a DMA module (to be described later on). Each input to the arbiter connects to an input queue, which is in fact a small fall-through FIFO. The simple arbiter rotates between all the input queues in a round robin manner, each time selecting a non-empty queue and writing one full packet from it to the next stage in the data-path, which is the output port lookup module.
The output port lookup module is responsible for deciding which port a packet goes out of. After that decision is made, the packet is then handed to the output queues module. The lookup module implements a very basic lookup scheme, sending all packets from 10G ports to the CPU and vice versa, based on the source port indicated in the packet's header. Notice that although we only have one physical DMA module in Verilog, there are 4 virtual DMA ports. The virtual DMA ports are distinguished by SRC_PORT/DST_PORT field.
Once a packet arrives to the nf10_bram_output_queues module, it already has a marked destination (provided on a side channel). According to the destination it is entered to a dedicated output queue. There are five such output queues: one per each 10G port and one to the DMA block. Note that a packet may be dropped if its output queue is full or almost full. When a packet reaches the head of its output queue, it is sent to the corresponding output port, being either an nf10_10g_interface module or the DMA module. The output queues are arranged in an interleaved order: one physical Ethernet port, one DMA port etc. Even queues are therefore assigned to physical Ethernet ports, and odd queues are assigned to the virtual DMA ports.
The DMA module serves as a DMA engine for the reference NIC design. It includes Xilinx' PCIe core and AXI4-LITE master module. To the other NetFPGA modules it exposes AXIS (master+slave) interfaces for sending/receiving packets, as well as a AXI4-LITE master interface through which all AXI registers can be accessed from the host (over PCIe). To this end it connects to the axi_interconnect module.
The reference NIC design implements a Xilinx Microblaze subsystem, including also a BRAM memory block and its controller. For more information, please refer to the Microblaze reference links provided above.
In addition to the PCIe interface (in the DMA block), there are two additional communication interfaces implemented in the design: a UART interface, for debug purposes, and the MDIO block. The MDIO block, which is a slim version of Xilinx's MDIO core, is mostly used to access and configure the 10G PHY devices (AEL2005) used on the board. Last, a watchdog timer module is implemented as well.