# Using FABRIC Xilinx FPGA U280 with P4

Your compute nodes can include FPGAs. These devices are made available as FABRIC components and can be added to your nodes like any other component. Your project must have Component.FPGA permission tag in order to be able to provision them. 

This example notebook will demonstrate how to reserve and use a single Xilinx FPGA device on FABRIC and use it in conjunction with ConnectX-6 cards. It creates a slice with two nodes - one with FPGA and another with a ConnectX-6 card connected via L2Bridge L2 services. A sample P4 application is loaded using ESnet workflow and a traffic is exchanged between the FPGA and the ConnectX-6 card.


## Setup the Experiment

In [None]:
from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager

fablib = fablib_manager()
                     
fablib.show_config();

## Select a site with E2SAR-assigned FPGA

The cells below help you create a slice that contains a single node with an attached FPGA. 

In [None]:
# FPGA site should only be one of these as these are assigned to the project
# 'WASH', 'KANS', or 'LOSA'
site='LOSA'

FPGA_CHOICE='FPGA_Xilinx_U280'

# name the slice and the node 
slice_name=f'E2SAR U280 LB Slice on {site}'

fpga_node_name='LB-node'
image = 'docker_ubuntu_20'
net_name = '_'.join(['fabnetv4ext', site])

print(f'Will create slice "{slice_name}" with node "{fpga_node_name}"')

## Create a slice with a node with FPGA at desired site

This slice has two VMs - one with the FPGA and the other with a ConnectX-6 card - we will want to pass traffic between them.

In [None]:
# Create Slice. Note that by default submit() call will poll for 360 seconds every 10-20 seconds
# waiting for slice to come up. Normal expected time is around 2 minutes. 
slice = fablib.new_slice(name=slice_name)

# Add node with a 100G drive and 8 of CPU cores using Ubuntu 20 image
node1 = slice.add_node(name=fpga_node_name, site=site, cores=8, ram=8, disk=100, image=image)
# postboot configuration is under 'post-boot' directory
node.add_post_boot_upload_directory('post-boot','.')
node.add_post_boot_execute(f'chmod +x post-boot/lb-node.sh && ./post-boot/lb-node.sh')
# FABNetv4 on shared NIC (to talk to storage)
node.add_fabnet()

fpga_comp = node1.add_component(model=FPGA_CHOICE, name='fpga1')
fpga_p1 = fpga_comp.get_interfaces()[0]
fpga_p2 = fpga_comp.get_interfaces()[1]

# use FABNetv4Ext to connect port 1
net = slice.add_l3network(name=net_name, interfaces=[fpga_p1], type='IPv4Ext')

# Submit Slice Request
slice.submit();

# Setup IOMMU and Hugepages
For DPDK to function properly we need to setup hugepages and IOMMU on the VM

In [None]:
slice = fablib.get_slice(name=slice_name)
node = slice.get_node(name=fpga_node_name)

commands = list()
#commands.append("sudo sed -i 's/GRUB_CMDLINE_LINUX=\"\\(.*\\)\"/GRUB_CMDLINE_LINUX=\"\\1 amd_iommu=on iommu=pt default_hugepagesz=1G hugepagesz=1G hugepages=8\"/' /etc/default/grub")
commands.append("sudo sed -i 's/GRUB_CMDLINE_LINUX=\"\"/GRUB_CMDLINE_LINUX=\"amd_iommu=on iommu=pt default_hugepagesz=1G hugepagesz=1G hugepages=8\"/' /etc/default/grub")
commands.append("sudo grub-mkconfig -o /boot/grub/grub.cfg")
commands.append("sudo update-grub")

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node1.execute(command)
    
print('Done')

Reboot the node (this sometimes generates an EOFError exception - ignore it and continue)

In [None]:
reboot = 'sudo reboot'

print(reboot)
node.execute(reboot)

slice.wait_ssh(timeout=360,interval=10,progress=True)

print("Now testing SSH abilites to reconnect...",end="")
slice.update()
slice.test_ssh()
print("Reconnected!")

Check that IOMMU was enabled

In [None]:
command = 'dmesg | grep -i IOMMU'

print('Observe that the modifications to boot configuration took place and IOMMU is detected')
stdout, stderr = node1.execute(command)

node.config()

Disable IOMMU support in VFIO (the passing through doesn't actually work)

In [None]:
# Enable unsafe_noiommu_mode for the vfio module
command = "echo 1 | sudo tee /sys/module/vfio/parameters/enable_unsafe_noiommu_mode"

stdout, stderr = node.execute(command)

## Program FPGA and run applications on it

First we 
- download pre-built dpdk and xilinx-labtools containers and install their images
- download a previously built p4 artifact
- checkout the `esnet-smartnic-fw` code, add the artifact and build a configuration of containers we can then execute.

In [None]:
#
# the following step ONLY works if you attached storage with docker containers and artifacts to the fpga-node
# if you did not you need to download/build those by other means (e.g. scp from elsewhere or
# install from docker repositories)
#

if use_storage:
    #
    # install existing dpdk and xilinx-labtools containers (pre-built) from
    # https://github.com/esnet/smartnic-dpdk-docker and https://github.com/esnet/xilinx-labtools-docker
    #
    dpdk_docker = 'smartnic-dpdk-docker.tar.gz'
    xilinx_labtools_docker = 'xilinx-labtools-docker-2023.1_0507_1903.tar.gz'
    artifact = '/mnt/xilinx-tools/artifacts/msada/v0/artifacts.au280.p4_only.0.zip'
    
    commands = [
        f"docker load < /mnt/{mount_point}/esnet-dockers/{dpdk_docker}",
        f"docker load < /mnt/{mount_point}/esnet-dockers/{xilinx_labtools_docker}",
        f"docker image ls",
        f"cp {artifact} ~/"
    ]
    for command in commands:
        print(f'Executing {command}')
        stdout, stderr = node1.execute(command)
else:
    print('Please build dpdk, xilinx-labtools dockers and install them on fpga-node manually, also place your artifact file in ~/')

Next clone the repo and using externally provided p4 artifact build a container and a compose structure

In [None]:
# clone the esnet-smartnic-fw repo according to instructions https://github.com/esnet/esnet-smartnic-fw/tree/main (as of 09/2023)
# create a configuration environment file and build a container

# if the artifact file is called artifacts.au280.p4_only.0.zip then it translates into
# the following environment parameters

# update the env_file values to match the name of the artifact file
env_file = """
SN_HW_VER=0
SN_HW_BOARD=au280
SN_HW_APP_NAME=p4_only
"""

# update the artifact name as needed
artifact = '~/artifacts.au280.p4_only.0.zip'

commands = [
    "git clone https://github.com/esnet/esnet-smartnic-fw.git",
    "cd ~/esnet-smartnic-fw; git submodule init; git submodule update",
    f"cp {artifact} ~/esnet-smartnic-fw/sn-hw/",
    f"echo '{env_file}' | sudo tee ~/esnet-smartnic-fw/.env",
]

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node1.execute(command)    

Now build the artifacts.

In [None]:
# finally build, logging to a file
node_thread = node1.execute_thread("cd ~/esnet-smartnic-fw/; ./build.sh", output_file='esnet-smartnic-fw-docker.log')
stdout, stderr = node_thread.result()

You can verify the build step succeeded if you see three docker images (esnet-smartnic-fw, dpdk and xilinx-labtools)

In [None]:
command = "docker image ls"
stdout, stderr = node1.execute(command)

### Test FPGA setup by accessing sn-cli under `smartnic-mgr-vfio-unlock` profile

We use the ESnet workflow to flash the FPGA and access `sn-cli` application to test whether CMACs are up

In [None]:
# set the FPGA device and the profile we want to execute
env_file = """
FPGA_PCIE_DEV=0000:00:1f
COMPOSE_PROFILES=smartnic-mgr-vfio-unlock
"""

# set execution profile to smartnic-mgr-vfio-unlock and run the stack
# notice we append to the pre-generated .env (it was generated as part of previous build)
commands = [
    f"echo '{env_file}' | tee -a ~/esnet-smartnic-fw/sn-stack/.env",
    "cd ~/esnet-smartnic-fw/sn-stack; docker compose up -d"
]

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node1.execute(command)  

Run some healthchecks (should not see errors)

In [None]:
stdout, stderr = node1.execute("cd esnet-smartnic-fw/sn-stack/; docker container logs sn-stack-ubuntu-smartnic-devbind-1")

### Test sn-cli, configure CMACs

Should see normal looking output. If everything is 0x0000 or 0xffff, the binding to FPGA from VFIO did not work.

In [None]:
command = "cd esnet-smartnic-fw/sn-stack/; docker compose exec smartnic-fw sn-cli dev version"

stdout, stderr = node1.execute(command)

Let's configure CMACs so we can test pktgen. It is important that at the end you see `MAC ENABLED/PHY UP -> UP` for both CMACs Rx and Tx. If not, it is possible FEC is not turned off in the dataplane switch.

In [None]:
# upload sn-cli config script
sn_cli_script = 'sn-cli-setup'

result = node1.upload_file(sn_cli_script, sn_cli_script)

commands = [
    f"chmod a+x {sn_cli_script}",
    f"mv {sn_cli_script} ~/esnet-smartnic-fw/sn-stack/scratch",
    f"cd ~/esnet-smartnic-fw/sn-stack/; docker compose exec smartnic-fw scratch/{sn_cli_script}"
]

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node1.execute(command)  

### Try pktgen

First we shutdown the docker stack, modify the profile to be `smartnic-mgr-dpdk-manual`, restart the stack. Then we access pktgen application and configure it to send some packets, finally we use the second host with CX-6 cards to snoop (tcpdump) and receive those packets.

In [None]:
# bring down the stack

command = "cd esnet-smartnic-fw/sn-stack/; docker compose down"

stdout, stderr = node1.execute(command)

Modify the profile in the configuration .env file

In [None]:
# modify the profile to be `smartnic-mgr-dpdk-manual`
commands = [
    "sed -i 's/COMPOSE_PROFILES=smartnic-mgr-vfio-unlock/COMPOSE_PROFILES=smartnic-mgr-dpdk-manual/' ~/esnet-smartnic-fw/sn-stack/.env",
    "tail ~/esnet-smartnic-fw/sn-stack/.env"
]

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node1.execute(command)  

Restart the stack

In [None]:
command = "cd ~/esnet-smartnic-fw/sn-stack; docker compose up -d"
stdout, stderr = node1.execute(command)


The next step should be executed from the console. 
1. SSH into fpga-node
2. `cd ~/esnet-smartnic-fw/sn-stack`
3. `docker compose exec smartnic-dpdk bash`
4. `pktgen -a $SN_PCIE_DEV.0 -a $SN_PCIE_DEV.1 -l 3-7 -n 3 -d librte_net_qdma.so --file-prefix $SN_PCIE_DEV- -- -v -m [4:5].0 -m [6:7].1`

(pktgen should properly initialize, then at the pktgen prompt issue the following commands, which set unicast MAC addresses, set frame size to 128, 1% framerate, i.e. 1Gbps and start sending packets out of both ports)

5. Execute the following inside pktgen to start unicast packet flows on both CMACs
```
set 0-1 dst mac 04:16:17:18:19:1a
set 0-1 src mac 14:16:17:18:19:10
set 0-1 size 128
set 0-1 rate 1
start 0
start 1
```
Next on cx-6-node enable the dataplane interfaces.

In [None]:
node2 = slice.get_node(name=cx6_node_name)
cx_6_port0 = "ens7"
cx_6_port1 = "ens8"

commands = [
    f"sudo ip link set up {cx_6_port0}",
    f"sudo ip link set up {cx_6_port1}"
]

for command in commands:
    print(f'Executing {command}')
    stdout, stderr = node2.execute(command)  

Try tcpdump on appropriate CX6 interfaces.

In [None]:
pktcount = 10
print(f"LISTENING ON {cx_6_port0}")
command = f"sudo tcpdump -nlvvxx -i {cx_6_port0} -c {pktcount} tcp"

stdout, stderr = node2.execute(command)
print("LISTENING ON {cx_6_port1}")
command = f"sudo tcpdump -nlvvxx -i {cx_6_port1} -c {pktcount} tcp"

stdout, stderr = node2.execute(command)

Bring the stack down

In [None]:
command = "cd ~/esnet-smartnic-fw/sn-stack; docker compose down"

stdout, stderr = node1.execute(command)

## Extend the slice (as needed)

If you need to extend the storage slice, you can just execute the following two cells. They display the slice expiration date and optionally extend by 2 weeeks. 

In [None]:
slice = fablib.get_slice(name=slice_name)
a = slice.show()
nets = slice.list_networks()
nodes = slice.list_nodes()

Renew the slice

In [None]:
from datetime import datetime
from datetime import timezone
from datetime import timedelta

# Set end host to now plus 14 days
end_date = (datetime.now(timezone.utc) + timedelta(days=14)).strftime("%Y-%m-%d %H:%M:%S %z")

try:
    slice = fablib.get_slice(name=slice_name)

    slice.renew(end_date)
except Exception as e:
    print(f"Exception: {e}")

## Delete the Slice (as needed)

Please delete your slice when you are done with your experiment.


In [None]:
slice = fablib.get_slice(name=slice_name)
slice.delete()