OPED – _O_penCPI _P_CIe _E_ndpoint with _D_MA
OPED is modular component of the NetFPGA-10G project. It provides a communication capability between PCIe and both AXI4-Lite and AXI4-Stream protocols. It is through OPED that memory-mapped slave devices on the FPGA-internal AXI4-Lite bus are accessed. Additionally, OPED provides DMA capability to concurrently stream external data to and from and AXI4-Stream Master and Slave ports respectively.
OpenCPI (Open Component Portability Infrastructure)
OPED is different from other component IPs used in NetFPGA-10G in that is essentially a component of another open-source community, opencpi.org, which has been specialized and reused. This reuse includes source codes and documentation. Many NetFPGA-10G applications will not require deep visibility within OPED, and for those users, the documentation in NetFPGA-10G may be adequate. For those users wishing to dig deeper into OPED internals, more extensive documentation exists at opencpi.org.
BSV (Bluespec SystemVerilog)
OPED is written in BSV. BSV source code is provided in the NetFPGA-10G distribution. The BSV compiler, bsc, compiles BSV source code to IEEE Verilog. The bsc compiler output Verilog is provided in the distribution. The bsc compiler, a commercial product of Bluespec Inc., however is NOT provided.
OPED provides a Gen1 x8 PCIe endpoint facing upstream. OPED provides the following interfaces facing the FPGA user application: AXI4-Lite Master (32b), AXI4-Stream Master (32b), and AXI4-Stream Slave. The component top-level Verilog source file OPED.v unambiguously defines all the RTL signals to the OPED core. Because of the use of standard, well-defined interfaces (PCIe, AXI), it is possible to replace OPED with a universe of implementations which provide the same interface signature.
The figure below shows in internal block diagram of OPED component.
The PCIe endpoint moves PCIe transaction-level (TL) packets to and from the NoC. External access to the AXI4-Lite Master is achieved by the control-plane and W0, a WCI-to-AXI4-Lite bridge. Messages moving downstream from the host to Ethernet advance from W13 (dp0), through W2, and exit on the AXI4-Stream Master on W3. Messages moving upstream from Ethernet to the host enter at the AXI4-Stream Slave on W3, through W4, and to W14 (dp1).
The PCIe endpoint in OPED generates a 125 MHz clock and reset to which almost all internal logic is synchronous. The OPED module outputs this clock and reset to which all three AXI interfaces to OPED must by synchronized.
The NoC is 16B wide and capable at 125 MHz of advancing 2 GB/S upstream and downstream simultaneously. The data-plane engines (dp0, dp1) have are capable of at least half the 2 GB/S figure. However, in the current implementation, a reduction to a 4B data-plane at W2, W3, W4 limits throughput to a maximum of 500 MB/S each upstream and downstream, concurrently.
The data-plane engines (dp0, dp1) employ message-based WMI interfaces internally. This necessitates a store and forward (as opposed to a cut-through) operation. As such, message packets may incur a worst-case latency comprised of the PCIe latency plus the maximum local buffering latency. It is a known limitation of the initial implementation that the multi-buffering data-planes enhance throughput with a penalty on latency.
Some throughput and latency benchmarking has been done on OpenCPI IP in general, and OPED in specific. Please share your benchmark data and test cases so OPED’s strengths and weaknesses can be best understood.
The two data planes, DP0 (W13) and DP1 (W14) are the active actors in DMA message data movement. The behavior and configuration parameters that control those modules are fully described in the opencpi.org documentation. This section highlights the key behaviors of the data planes as they pertain to OPED and netfpga.org .
The DMA is “message based”, where a message is comprised of message data and metadata. The metadata describes the message by explicitly indicating its length, opcode, and other information.
There can be different numbers of message buffers on both sides of the DMA transfer. For example, the host may have a large number of buffers; and the OPED/FPGA can have a smaller number of buffers. Generally, two or more buffers are required on each side to maximize the throughput and hide the latency of communication.
There is no scatter/gather capability; thus the message buffers must exist in contiguous, flat, pinned memory.
6/14/2011 8:30:00 PM