# Introduction

This notebook provides an example demonstrating how to work with Xilinx 100 GbE subsystem. The 100 GbE subsystem consists of a harded IP also known as the CMAC. This hardened IP is available in Xilinx UltraScale+ devices. See the [Xilinx documentation](https://www.xilinx.com/products/intellectual-property/cmac_usplus.html) for more details on the IP core. 

# Overlay Block Design

![alt text](images/intro_overlay.png "Aldec Daughter Card Plugged into ZCU111 FMC+ Connector")

# Hardware Setup

The ZCU111 comes with four, SFP28 ports in a horizontal cage. Each of these is connected to one GTY transceiver and is capable of 25GbE. The four together acheive 100GbE. This design also demonstrates how to instead route the CMAC out of the onboard FMC+ connector and through an adapter board to one of two QSFP connectors. The QSFP connector similarly accepts four GTY transceivers each running 25GbE; however, they are all connected in the same connector. QSFP provides 100GbE in a smaller form factor.

This design uses the [Aldec FMC to QSPF daughter card](https://www.aldec.com/en/products/emulation/daughter_cards/fmc_daughter/fmc_qsfp) plugged into the FMC+ connector on the [ZCU111](https://www.xilinx.com/products/boards-and-kits/zcu111.html).

![alt text](images/CMAC_Aldec_Setup_1.png "Aldec Daughter Card Plugged into ZCU111 FMC+ Connector")

The design also features 4 FS "generic" 25GbE SFP28 passive loopback modules (Product: 109377) and 2 Amphelol 100GbE QSFP passive loopback modules (Product: SF-100GLB0W00-0DB).

![alt text](images/CMAC_Aldec_Setup_2.png "Aldec Daughter Card Plugged into ZCU111 FMC+ Connector")

# Imports

In [1]:
import pynq
import numpy as np
from cmac import CMAC

# Download Overlay, Program FPGA, Initialize IP Cores

In [2]:
# Download Overlay
ol = pynq.Overlay('sfp28.bit')

In [3]:
# Bind Drivers
dma = ol.axi_dma_0
cmac = ol.cmac_usplus_0

If you don't have loopback modules, you can still experiment with the CMAC by enabling "Near-End PMA Loopback" in the cell below which routes the transmit path to the receive path internally as opposed to sending them off package via the transceivers.

In [4]:
# Put CMAC in internal Loopback Mode
cmac.internal_loopback = 0

In [5]:
# Bring up the CMAC Core
cmac.start()

# Test Functionality

In this overlay, we are using a DMA IP to transfer data to and from the processing system. The DMA is configured in packet mode so it supplies the necessary tlast signal for the CMAC core. The CMAC core supports packet sizes from 64 to 9,000 bytes. To illustrate data transfer, we will encode a random string as a list of bytes and store the result in a numpy array. We then allocate input and output dma buffers with size equal to the numpy array size (in this case 472 bytes).

In [6]:
# Random Data
raw_str = ' /$$$$$$$  /$$     /$$ /$$   /$$  /$$$$$$ \n| $$__  $$|  $$   /$$/| $$$ | $$ /$$__  $$\n| $$  \\ $$ \\  $$ /$$/ | $$$$| $$| $$  \\ $$\n| $$$$$$$/  \\  $$$$/  | $$ $$ $$| $$  | $$\n| $$____/    \\  $$/   | $$  $$$$| $$  | $$\n| $$          | $$    | $$\\  $$$| $$/$$ $$\n| $$          | $$    | $$ \\  $$|  $$$$$$/\n|__/          |__/    |__/  \\__/ \\____ $$$\n                                      \\__/\n\n                                          \n                                         '

In [7]:
# Encode string as array of 472 bytes using default UTF-8 encoding
packets_in = np.array(list(raw_str.encode()))

In [8]:
# Allocate input and output buffers for transfer
dma_buf_in = pynq.allocate(packets_in.shape[0])
dma_buf_out = pynq.allocate(packets_in.shape[0])

In [9]:
# Load data into input buffer
dma_buf_in[:]=packets_in

### DMA Transfer

In [10]:
dma.sendchannel.transfer(dma_buf_in)

In [11]:
dma.recvchannel.transfer(dma_buf_out)

In [12]:
dma.sendchannel.wait()

In [13]:
dma.recvchannel.wait()

### Verify

In [14]:
# Decode output data
packets_out = bytes(dma_buf_out)

In [15]:
print(packets_out.decode())

    /   $   $   $   $   $   $   $           /   $   $                       /   $   $       /   $   $               /   $   $           /   $   $   $   $   $   $       
   |       $   $   _   _           $   $   |           $   $               /   $   $   /   |       $   $   $       |       $   $       /   $   $   _   _           $   $   
   |       $   $           \       $   $       \           $   $       /   $   $   /       |       $   $   $   $   |       $   $   |       $   $           \       $   $   
   |       $   $   $   $   $   $   $   /           \           $   $   $   $   /           |       $   $       $   $       $   $   |       $   $           |       $   $   
   |       $   $   _   _   _   _   /                   \           $   $   /               |       $   $           $   $   $   $   |       $   $           |       $   $   
   |       $   $                                           |       $   $                   |       $   $   \           $   $   $   |       $   

In [16]:
# Another way to verify core functionality
np.array_equal(dma_buf_in,dma_buf_out)

True

Lastly, the CMAC has internal registers which can report some basic statistics. Optionally run the four DMA transfer cells multiple times to get statistics on multiple packets. Note the registers automatically self-clear after reporting. Run `cmac.getStats(False)` instead to prevent them from clearing.

In [17]:
cmac.getStats()

{'cycle_count': 2816748107,
 'rx': {'bad_fcs': 0,
  'bytes': 1892,
  'good_bytes': 1892,
  'good_packets': 1,
  'packets': 1,
  'packets_bad_fcs': 0,
  'packets_fragmented': 0,
  'packets_jabber': 0,
  'packets_large': 0,
  'packets_oversize': 0,
  'packets_small': 0,
  'packets_toolong': 0,
  'packets_undersize': 0,
  'pause': 0,
  'stomped_fcs': 0,
  'user_pause': 0},
 'tx': {'bad_fcs': 0,
  'bytes': 1892,
  'good_bytes': 1892,
  'good_packets': 1,
  'packets': 1,
  'packets_large': 0,
  'packets_small': 0,
  'pause': 0,
  'user_pause': 0}}

----
----

Copyright (C) 2021 Xilinx, Inc

SPDX-License-Identifier: BSD-3-Clause