# EJFAT Performance Tester

This notebook stands up a slice of 2 nodes - sender, receiver. E2SAR code is deployed on `sender` and `receiver` nodes for testing. The slice uses dedicated Mellanox ConnectX 6 NICs and is created within a single FABRIC site for simplicity. It uses a single L2 bridge connection between one port on each of the NICs with RFC1918 IPv4 addressing, allowing nodes to talk to each other. The notebook pins vCPUs and RAM of the VM to the NUMA domain of the NIC to provide a reproducible environment for experiments.

Slice example:

<div>
    <img src="figs/UDP LB Control Plane Testing slice.png" width=500>
</div>

## Preparation and overview

- Be sure to [generate a keypair for Jupyter Hub](GitHubSSH.ipynb) and register it with GitHub - the keys will be used to check out the code from private repositories, like [UDPLBd](https://github.com/esnet/udplbd) and [E2SAR](https://github.com/JeffersonLab/E2SAR).
- Note that for E2SAR development and testing sender and receiver node compile/build environments will be setup via post-boot scripts ([sender](post-boot/sender.sh) and [receiver](post-boot/recver.sh)) and grpc++/boost is installed as a debian package from [Github releases](https://github.com/JeffersonLab/E2SAR/releases) with static and dynamic libraries compiled for ubuntu22
- This does not setup the control plane node for anything, but testing a specific version - you can set which branch of UDPLBd to check out and a containerized version is built and stood up.

## Preamble

This cell must be executed whether you are creating a new slice or continuing work on the old one. If you are continuing work, you then skip the slice create section and proceed to wherever you left off.

In [None]:
#
# EDIT THIS
#

# GitHub SSH key file (private) registered using the GitHubSSH.ipynb notebook referenced above
github_key = '/home/fabric/work/fabric_config/github_ecdsa'

# branches for E2SAR that we want checked out on the VMs
e2sar_branch = 'v0.2.0'

# base distro 'ubuntu' or 'rocky'
distro_name = 'ubuntu'

#base distro version, currently only for ubuntu 20,22,24. E2SAR dependencies will be 
#downloaded for the appropriate versions.
distro_version = '22'

# note that the below is distribution specific ('ubuntu' for ubuntu and so on)
home_location = {
    'ubuntu': '/home/ubuntu',
    'rocky' : '/home/rocky'
}[distro_name]

vm_key_location = f'{home_location}/.ssh/github_ecdsa'

# which test suites in E2SAR to run (leave empty to run all)
# you can set 'unit' or 'live' to run unit or live tests only
e2sar_test_suite = ''

# name of the network connecting the nodes
net_name = 'site_bridge_net'

# url of e2sar deps. Find the appropriate version for the OS at https://github.com/JeffersonLab/E2SAR/releases
static_release_url = 'https://github.com/JeffersonLab/E2SAR/releases/download/' # don't need to change this
e2sar_dep_artifcat = 'e2sar-deps_0.1.5_amd64.deb'
e2sar_release_ver = 'E2SAR-main-0.1.5'
e2sar_dep_url = static_release_url + e2sar_release_ver + "-" + distro_name + "-" + distro_version + ".04/" + e2sar_dep_artifcat

#
# SHOULDN'T NEED TO EDIT BELOW
#
# Preamble
from datetime import datetime
from datetime import timezone
from datetime import timedelta

from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager

from ipaddress import ip_address, IPv4Address, IPv6Address, IPv4Network, IPv6Network
import ipaddress

import json

fablib = fablib_manager()             
fablib.show_config();

# Using docker image for cpnode by default
distro_image = 'default_' + distro_name + '_' + distro_version

# variable settings
slice_name = f'E2SAR Performance Testing with e2sar[{e2sar_branch}] on {distro_name}'

# for each node specify IP address (assuming /24), OS image
# note that most of the keys in these dictionaries map directly
# onto parameters to add_node()

ram = 8
cores = 4
disk = 100

node_config = {
    'sender': {
        'ip':'192.168.0.1', 
        'image': distro_image,
        'cores': cores,
        'ram': ram,
        'disk': disk },
    'recver': {
        'ip':'192.168.0.2', 
        'image':distro_image,
        'cores':cores,
        'ram': ram,
        'disk': disk },
}
# skip these keys as they are not part of add_node params
skip_keys = ['ip']
# this is the NIC to use
nic_model = 'NIC_ConnectX_6'
# the subnet should match IPs
subnet_str = "192.168.0.0/24" 
subnet = IPv4Network(subnet_str)

def execute_single_node(node, commands):
    for command in commands:
        print(f'\tExecuting "{command}" on node {node.get_name()}')
        #stdout, stderr = node.execute(command, quiet=True, output_file=node.get_name() + '_install.log')
        stdout, stderr = node.execute(command)
    if not stderr and len(stderr) > 0:
        print(f'Error encountered with "{command}": {stderr}')
        
def execute_commands(node, commands):
    if isinstance(node, list):
        for n in node:
            execute_single_node(n, commands)
    else:
        execute_single_node(node, commands)

# until fablib fixes this
def get_management_os_interface(node) -> str or None:
        """
        Gets the name of the management interface used by the node's
        operating system. 

        :return: interface name
        :rtype: String
        """
        stdout, stderr = node.execute("sudo ip -j route list", quiet=True)
        stdout_json = json.loads(stdout)

        for i in stdout_json:
            if i["dst"] == "default":
                return i["dev"]

        stdout, stderr = node.execute("sudo ip -6 -j route list", quiet=True)
        stdout_json = json.loads(stdout)

        for i in stdout_json:
            if i["dst"] == "default":
                return i["dev"]

        return None


## Create the slice

In [None]:
# list all slices I have running
output_dataframe = fablib.list_slices(output='pandas')
if output_dataframe:
    print(output_dataframe)
else:
    print('No active slices under this project')

If your slice is already active you can skip to the 'Get Slice Details' section.

In [None]:
# check which sites have available FPGAs on hosts (and also list memory, disk and core on those hosts)
# Overall list of sites that are usable with ESnet workflow
sites_to_check = ['STAR', 'TACC', 'MICH', 'UTAH', 'NCSA', 'WASH', 'DALL', 'SALT', 'UCSD', 'CLEM', 'LOSA', 'KANS', 'PRIN', 'SRI']

# worker name is <site in lower case>-w[0-9]+.fabric.net
hosts = fablib.list_hosts(fields=['name','cores_available','ram_available','disk_available','nic_connectx_6_available'], 
                          filter_function=lambda s: (s['name'].split('-')[0].upper() in sites_to_check) and 
                          s['nic_connectx_6_available']==2)

In [None]:
# recommend a site with availability picking one with the highest RAM

# hosts is a Styler over a DataFrame, we want to get the underlying numpy array
recommended_sites = []
for host in hosts.data.to_numpy():
    if host[1] > 2*cores and host[2] > 2*ram and host[3] > 2*disk and host[4] == 2:
        # each entry is [SITE, cores, ram, disk]
        recommended_sites.append([host[0].split('-')[0].upper(), host[1], host[2], host[3]])

# sort recommended sites by the amount of RAM and get the highest RAM
if len(recommended_sites) > 0:
    selected_site = sorted(recommended_sites, key=lambda entry: entry[2])[-1][0]
    print(f'Recommended site is {selected_site}')
else:
    print(f'Unable to find a usable site among {sites_to_check}')

# write selected site into node attributes
for n in node_config:
    node_config[n]['site'] = selected_site

In [None]:
# build a slice
slice = fablib.new_slice(name=slice_name)

# create a network
net1 = slice.add_l2network(name=net_name, subnet=subnet)

nodes = dict()
# create  nodes for sending and receiving with a selected network card
# use subnet address assignment
for node_name, node_attribs in node_config.items():
    print(f"{node_name=} {node_attribs['ip']}")
    nodes[node_name] = slice.add_node(name=node_name, **{x: node_attribs[x] for x in node_attribs if x not in skip_keys})
    nic_interface = nodes[node_name].add_component(model=nic_model, name='_'.join([node_name, nic_model, 'nic'])).get_interfaces()[0]
    net1.add_interface(nic_interface)
    nic_interface.set_mode('config')
    nic_interface.set_ip_addr(node_attribs['ip'])
    # postboot configuration is under 'post-boot' directory
    nodes[node_name].add_post_boot_upload_directory('post-boot','.')
    nodes[node_name].add_post_boot_execute(f'chmod +x post-boot/{node_name}.sh && ./post-boot/{node_name}.sh')

print(f'Creating a {distro_name} based slice named "{slice_name}" with nodes in {selected_site}')

# Submit the slice
slice.submit();

## Get Slice Details

If not creating a new slice, and just continuing work on an existing one, execute this cell (in addition to the preamble) and then any of the cells below will work.

In [None]:
# get slice details (if not creating new)
slice = fablib.get_slice(name=slice_name)
a = slice.show()
nets = slice.list_networks()
nodes = slice.list_nodes()

sender = slice.get_node(name="sender")
recver = slice.get_node(name="recver")

# get node dataplane addresses
sender_addr = sender.get_interface(network_name=net_name).get_ip_addr()
recver_addr = recver.get_interface(network_name=net_name).get_ip_addr()

sender_iface = sender.get_interface(network_name=net_name)
recver_iface = recver.get_interface(network_name=net_name)

# Performance Tuning

Here we make sure to
1. Pin the vCPUs to physical CPUs that are in the NUMA node of the NIC
2. Pin the RAM the VM is using to the same NUMA node
3. Set up socket buffers, MTU and firewall to maximize the performance

In [None]:
slice = fablib.get_slice(slice_name)

for node in slice.get_nodes():
    nic_name = '_'.join([node.get_name(), nic_model, 'nic'])
    
    # Pin all vCPUs for VM to same Numa node as the component
    node.pin_cpu(component_name=nic_name)
    
    # User can also pass in the range of the vCPUs to be pinned 
    #node.pin_cpu(component_name=nic_name, cpu_range_to_pin="0-3")
    
    # Pin memmory for VM to same Numa node as the components
    node.numa_tune()
    
    # Reboot the VM
    node.os_reboot()

# wait for reboot to complete
slice.wait_ssh()

In [None]:
# recover network configuration after reboot

for n in slice.get_nodes():  
    n.config()

In [None]:
# set system-wide send and receive socket buffer limits to 512MB. e2sar_perf then will set SO_RCVBUF and SO_SNDBUF options on sending and receiving sockets
# this is system specific, so we don't do it through a file, but on command line. Normally this goes into /etc/sysctl.conf or /etc/sysctl.d/90-local.conf 
# or similar
commands = [
    f"sudo sysctl net.core.rmem_max=536870912",
    f"sudo sysctl net.core.wmem_max=536870912",
    f"sysctl net.core.wmem_max net.core.rmem_max"
]
execute_commands([sender, recver], commands)

In [None]:
# note  that in this slice we are guaranteed to have path MTU to be at least 9k, because FABRIC
# switches are configured for jumbo frames. In real life you need to consult your network administrator
# as simply setting MTU on sender and receiver may be insufficient.
mtu = '9000'
sender.execute(f"sudo ip link set dev {sender_iface.get_os_interface()} mtu {mtu}")
recver.execute(f"sudo ip link set dev {recver_iface.get_os_interface()} mtu {mtu}")

# test with no-defragment (DF=1) ping packets that path indeed supports MTU of 9000 
# (ping  packet  of 8972 payload length)
# send 10 packets and expect all of them to make it
stdout, stderr = sender.execute(f"sudo ping -f -s 8972 -c 10 -M do {recver_addr}")

In [None]:
# We need to setup the firewall to allow traffic to pass to the receiver

mgmt_iface_name = get_management_os_interface(recver)
data_iface = recver.get_interface(network_name=net_name)
data_iface_name = data_iface.get_os_interface()

print(f'Adding {mgmt_iface_name} and lo and data interface to trusted zone')
commands = [
    f'sudo firewall-cmd --permanent --zone=trusted --add-interface={data_iface_name}',
    f'sudo firewall-cmd --permanent --zone=trusted --add-interface=lo',
    f'sudo firewall-cmd --permanent --zone=trusted --add-interface={mgmt_iface_name}',
    f'for i in $(sudo firewall-cmd --zone=public --list-services); do sudo firewall-cmd --zone=public --permanent --remove-service=$i; done',
]
commands.append(f'sudo firewall-cmd --reload')
commands.append(f'sudo firewall-cmd --list-all --zone=public')

execute_commands([recver], commands)

## Clone E2SAR code into sender and receiver

In [None]:
# install github ssh key and set up build environment variables for interactive logins
commands = [
    f"chmod go-rwx {vm_key_location}",
    # Meson won't detect boost by merely setting cmake_prefix_path, instead set BOOST_ROOT env variable 
    # for gRPC it is enough to set -Dpkg_config_path option to meson
    f"echo 'export BOOST_ROOT=/usr/local/ LD_LIBRARY_PATH=/usr/local/lib' >> ~/.profile",
    f"echo 'export BOOST_ROOT=/usr/local/ LD_LIBRARY_PATH=/usr/local/lib' >> ~/.bashrc",
]

for node in [sender, recver]:    
    # upload the GitHub SSH key onto the VM
    result = node.upload_file(github_key, vm_key_location)
    execute_commands(node, commands)

In [None]:
#download boost and grpc dependencies from releases
commands = [
    f"wget -q -O boost_grpc.deb {e2sar_dep_url}",
    f"sudo apt -yq install ./boost_grpc.deb",
]
 
execute_commands([sender, recver], commands)

In [None]:
# checkout E2SAR (including the right branch) using that key, install grpc and boost binary that is stored in the repo
commands = [
    f"GIT_SSH_COMMAND='ssh -i {vm_key_location} -o IdentitiesOnly=yes -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no' git clone --recurse-submodules --depth 1 -b {e2sar_branch} git@github.com:JeffersonLab/E2SAR.git",
    #f"cd E2SAR; GIT_SSH_COMMAND='ssh -i {vm_key_location} -o IdentitiesOnly=yes -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no' git submodule init",
]
 
execute_commands([sender, recver], commands)

## Building, Running Unit and Live Tests

On RedHat/Rocky derivatives the build fails. Likely has to do with the outdated gcc/g++ and stdlib. E2SAR/build/ninja.build must be modified - all mentions of `-std=c++11` must be removed from it.

To avoid this after the build step (and the next one after) the below command invokes `sed` to remove mentions of `c++11` from build.ninja file. This is a hack, don't just walk away, **RUN**!

Also note that the install directory is set to `$(HOME)/e2sar-install`. So if you run `meson install -C build` this is where the code will end up.

In [None]:
# if you want to test liburing and/or libnuma, install it here 
# (we are deliberately not adding this to the postboot script)
commands = [
    "sudo apt-get install -y liburing-dev libnuma-dev"
]
execute_commands([sender, recver], commands)

In [None]:
# you can also install and enable perf tools if needed. Last command does a simple test.
commands = [
    "sudo apt-get install -y linux-tools-common linux-tools-generic",
    "echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid",
    "echo 0 | sudo tee /proc/sys/kernel/kptr_restrict",
    "perf stat ls"
]
execute_commands([sender, recver], commands)

In [None]:
# update the code, compile and test
# note that most live tests only need the simplest URI - ejfats://token@ip:port/
# however the e2sar_reas_live_test requires data and sync addresses, and data address must
# be real (so we use loopback). Hence the long form of the URI for live tests 
# (other tests simply ignore the parts of the URI they don't need.)

commands = [
    f"cd E2SAR; PATH=$HOME/.local/bin:/usr/local/bin:$PATH BOOST_ROOT=/usr/local/ LD_LIBRARY_PATH=/usr/local/lib/ meson setup -Dpkg_config_path=/usr/local/lib/pkgconfig/:/usr/lib/lib64/pkgconfig/ --prefix {home_location}/e2sar-install build && sed -i 's/-std=c++11//g' build/build.ninja",
    f"cd E2SAR/build; PATH=$HOME/.local/bin:/usr/local/bin:$PATH LD_LIBRARY_PATH=/usr/local/lib/  meson compile -j 2",
]
 
execute_commands([sender, recver], commands)

If you want to update the code and rebuild, run the cell below. Again, making sure we update ninja.build... Look away...

In [None]:
# update the code, compile and test
# note that most live tests only need the simplest URI - ejfats://token@ip:port/
# however the e2sar_reas_live_test requires data and sync addresses, and data address must
# be real (so we use loopback). Hence the long form of the URI for live tests 
# (other tests simply ignore the parts of the URI they don't need.)

commands = [
    f"cd E2SAR; GIT_SSH_COMMAND='ssh -i {vm_key_location} -o IdentitiesOnly=yes -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no' git pull origin {e2sar_branch}",
    f"cd E2SAR; BOOST_ROOT=/usr/local/ PATH=$HOME/.local/bin:/usr/local/bin:$PATH LD_LIBRARY_PATH=/usr/local/lib/ meson setup -Dpkg_config_path=/usr/local/lib/pkgconfig/:/usr/lib/lib64/pkgconfig/ --prefix {home_location}/e2sar-install build --wipe && sed -i 's/-std=c++11//g' build/build.ninja",
    f"cd E2SAR/build; PATH=$HOME/.local/bin:/usr/local/bin:$PATH LD_LIBRARY_PATH=/usr/local/lib/  meson compile -j 2",
]
  
execute_commands([sender, recver], commands)

## Running performance tests

We use `e2sar_perf` program located under bin/ to test performance of segmenting and reassembly code. 

In [None]:
import time

# for e2sar_perf only the data= part of the query is meaningful. sync= must be present but is ignored
# same for gRPC token, address and port (and lb id)
e2sarPerfURI = f"ejfat://useless@10.10.10.10:1234/lb/1?data={recver_addr}&sync=192.168.77.7:1234"
recverDuration = 20
mtu = 0 # auto-detect
rate = 25 # Gbps
length = 1000000 # event length in bytes
numEvents = 10000 # number of events to send
bufSize = 300 * 1024 * 1024 # 100MB send and receive buffers

# note that in back-to-back scenario you cannot have more than one receive thread as only one port can be received on
recv_command = f"cd E2SAR; PATH=$HOME/.local/bin:/usr/local/bin:$PATH LD_LIBRARY_PATH=/usr/local/lib/ ./build/bin/e2sar_perf -r -u '{e2sarPerfURI}' -d {recverDuration} -b {bufSize} --ip {recver_addr} --port 19522"
send_command = f"cd E2SAR; PATH=$HOME/.local/bin:/usr/local/bin:$PATH LD_LIBRARY_PATH=/usr/local/lib/ ./build/bin/e2sar_perf -s -u '{e2sarPerfURI}' --mtu {mtu} --rate {rate} --length {length} -n {numEvents} -b {bufSize} --ip {sender_addr} -o liburing_send"

# start the receiver for 10 seconds and log its output
print(f'Executing command {recv_command} on receiver')
recver.execute_thread(recv_command, output_file=f"{recver.get_name()}.perf.log")

# sleep 2 seconds to let receiver get going
time.sleep(2)

# start the sender in the foreground
print(f'Executing command {send_command} on sender')
stdout_send, stderr_send = sender.execute(send_command, output_file=f"{sender.get_name()}.perf.log")

print(f"Inspect {recver.get_name()}.perf.log file in your Jupyter container to see the results")

## Manage the slice

### Extend

In [None]:
# Set end to now plus 14 days
end_date = (datetime.now(timezone.utc) + timedelta(days=14)).strftime("%Y-%m-%d %H:%M:%S %z")

try:
    slice = fablib.get_slice(name=slice_name)

    slice.renew(end_date)
except Exception as e:
    print(f"Exception: {e}")

### Delete

In [None]:
slice = fablib.get_slice(slice_name)
slice.delete()