# Benchmark Throughput experiment (Switch)
This notebook will show you how to measure the throughput between two Alveo nodes using the benchmark application with UDP as a transport protocol.
We are going to rely on a Dask cluster to configure the local and remote Alveo cards.

This notebook assumes:
* The Alveo cards are connected to a switch
* Dask cluster is already created and running. For more information about setting up a Dask cluster visit the [Dask documentation](https://docs.dask.org/en/latest/setup.html)

In [1]:
from vnx_utils import *
import pynq

In [2]:
import sys
import os

In [3]:
workers = pynq.Device.devices

## Download xclbin to Alveo cards
1. Create Dask device for each worker
2. Create an overlay object for each worker, this step will download the `xclbin` file to the Alveo card

In [4]:
xclbin = '/home/ubuntu/Projects/StaRR-NIC/xup_vitis_network_example/benchmark.intf3.xilinx_u280_xdma_201920_3/vnx_benchmark_if3.xclbin'
ol_w0 = pynq.Overlay(xclbin, device=workers[0])

## Check Link 

We are going to use the function `linkStatus` that reports if the CMAC is detecting link, which means that the physical connection
between each Alveo card and the switch is established.

In [5]:
print("Link worker 0_0 {}, worker 0_1 {}".format(ol_w0.cmac_0.linkStatus(), ol_w0.cmac_1.linkStatus()))

Link worker 0_0 {'cmac_link': True}, worker 0_1 {'cmac_link': True}


## Configure IP address of the Alveo cards
In the next cell we are going to configure the IP address of the two Alveo cards

In [30]:
ip_w0_0, ip_w0_1 = '10.0.0.47', '10.0.0.45'
if_status_w0_0 = ol_w0.networklayer_0.updateIPAddress(ip_w0_0, debug=True)
if_status_w0_1 = ol_w0.networklayer_1.updateIPAddress(ip_w0_1, debug=True)
print("Worker 0_0: {}\nWorker 0_1: {}".format(if_status_w0_0, if_status_w0_1))

Worker 0_0: {'HWaddr': '00:0a:35:02:9d:2f', 'inet addr': '10.0.0.47', 'gateway addr': '10.0.0.1', 'Mask': '255.255.255.0'}
Worker 0_1: {'HWaddr': '00:0a:35:02:9d:2d', 'inet addr': '10.0.0.45', 'gateway addr': '10.0.0.1', 'Mask': '255.255.255.0'}


## One way experiment with XL710

In [42]:
src_port, dst_port = 60512, 62180
dst_ip = '10.0.0.53'
dst_ip2 = '10.0.0.55'
dst_mac = '00:0a:35:a8:d6:ac'
dst_mac2 = '00:0a:35:36:de:0f'

### Configure port 0

1. Set up connection table
2. Launch ARP discovery
3. Print out ARP Table 

In [43]:
ol_w0.networklayer_0.sockets[12] = (dst_ip, dst_port, src_port, True)
ol_w0.networklayer_0.sockets[13] = (dst_ip2, dst_port, 60513, True)
ol_w0.networklayer_0.populateSocketTable()

ol_w0.networklayer_0.arpDiscovery()

ol_w0.networklayer_0.readARPTable()

{45: {'MAC address': '00:0a:35:02:9d:2d', 'IP address': '10.0.0.45'},
 53: {'MAC address': '00:0a:35:a8:d6:ac', 'IP address': '10.0.0.53'},
 55: {'MAC address': '00:0a:35:37:ed:e2', 'IP address': '10.0.0.55'},
 61: {'MAC address': '40:a6:b7:22:ab:89', 'IP address': '10.0.0.61'},
 63: {'MAC address': '40:a6:b7:22:ab:89', 'IP address': '10.0.0.63'}}

In [44]:
ol_w0.networklayer_1.arpDiscovery()

ol_w0.networklayer_1.readARPTable()

{47: {'MAC address': '00:0a:35:02:9d:2f', 'IP address': '10.0.0.47'},
 55: {'MAC address': '00:0a:35:37:ed:e2', 'IP address': '10.0.0.55'},
 61: {'MAC address': '40:a6:b7:22:ab:89', 'IP address': '10.0.0.61'},
 63: {'MAC address': '40:a6:b7:22:ab:89', 'IP address': '10.0.0.63'}}

In [45]:
ol_w0.networklayer_0.write_arp_entry(dst_mac, dst_ip)
ol_w0.networklayer_0.readARPTable()

{45: {'MAC address': '00:0a:35:02:9d:2d', 'IP address': '10.0.0.45'},
 53: {'MAC address': '00:0a:35:a8:d6:ac', 'IP address': '10.0.0.53'},
 55: {'MAC address': '00:0a:35:37:ed:e2', 'IP address': '10.0.0.55'},
 61: {'MAC address': '40:a6:b7:22:ab:89', 'IP address': '10.0.0.61'},
 63: {'MAC address': '40:a6:b7:22:ab:89', 'IP address': '10.0.0.63'}}

### Configure port 1

1. Set up connection table
2. Launch ARP discovery
3. Print out ARP Table 

## Configure application
* Configure port 1 traffic generator 1 in `CONSUMER` mode

In [51]:
ol_w0_1_tg = ol_w0.traffic_generator_02_3
ol_w0_1_tg.register_map.debug_reset = 1
ol_w0_1_tg.register_map.mode = benchmark_mode.index('CONSUMER')
ol_w0_1_tg.register_map.CTRL.AP_START = 1

* Configure port 0 traffic generator 0
* Run the application for different packet sizes
* Compute and store results for both local (Tx) and remote (Rx)

In [52]:
import time
freq = int(ol_w0.clock_dict['clock0']['frequency'])
freq

294

In [53]:
ol_w0_0_tg = ol_w0.traffic_generator_0_3
experiment_dict = {}
local_dict = {}
ol_w0_0_tg.register_map.mode = benchmark_mode.index('PRODUCER')
ol_w0_0_tg.register_map.dest_id = 12
ol_w0_1_tg.freq = freq
ol_w0_0_tg.freq = freq

In [54]:
pkt = 5_000_000_000
ol_w0_0_tg.register_map.debug_reset = 1
ol_w0_1_tg.register_map.debug_reset = 1
ol_w0_0_tg.register_map.time_between_packets = 0
ol_w0_0_tg.register_map.number_packets = pkt
local_dict = {}
beats = 1
ol_w0_0_tg.register_map.number_beats = beats
ol_w0_0_tg.register_map.CTRL.AP_START = 1
while int(ol_w0_0_tg.register_map.out_traffic_packets) != pkt:
    time.sleep(0.8)

In [55]:
# Get results from local and remote worker
tx_tot_pkt, tx_thr, tx_time = ol_w0_0_tg.computeThroughputApp('tx')
rx_tot_pkt, rx_thr, rx_time = ol_w0_1_tg.computeThroughputApp('rx')
#Create dict entry for this particular experiment
entry_dict = {'size': (beats * 64), 'rx_pkts' : rx_tot_pkt, 'tx_thr': tx_thr, 'rx_thr': rx_thr}
local_dict[beats] = entry_dict
# Reset probes to prepare for next computation
ol_w0_0_tg.resetProbes()
ol_w0_1_tg.resetProbes()
#Compute theoretical maximum at application level, overhead is UDP (8), IP (20), Ethernet(14) and FCS (4)
theoretical = (beats * 64 * 100)/((beats*64)+8+20+14+4) 
print("Sent {:14,} size: {:4}-Byte done!	Got {:14,} took {:8.4f} sec, thr: {:.3f} Gbps, theoretical: {:.3f} Gbps, difference: {:6.3f} Gbps"\
      .format(pkt,beats*64, rx_tot_pkt, rx_time, rx_thr, theoretical, theoretical-rx_thr))
time.sleep(0.5)
experiment_dict[pkt] = local_dict

ZeroDivisionError: float division by zero

In [41]:
tx_tot_pkt, tx_thr, tx_time

(5000000000, 49.907866821343575, 51.29451854081633)

In [None]:
for pkt in [1_000_000, 1_000_000_000]:
    ol_w0_tg.register_map.debug_reset = 1
    ol_w1_tg.register_map.debug_reset = 1
    ol_w0_tg.register_map.time_between_packets = 0
    ol_w0_tg.register_map.number_packets = pkt
    local_dict = {}
    for i in range(23):
        beats = i + 1
        ol_w0_tg.register_map.number_beats = beats
        ol_w0_tg.register_map.CTRL.AP_START = 1
        while int(ol_w0_tg.register_map.out_traffic_packets) != pkt:
            time.sleep(0.8)
        # Get results from local and remote worker
        rx_tot_pkt, rx_thr, rx_time = ol_w1_tg.computeThroughputApp('rx')
        tx_tot_pkt, tx_thr, tx_time = ol_w0_tg.computeThroughputApp('tx')
        #Create dict entry for this particular experiment
        entry_dict = {'size': (beats * 64), 'rx_pkts' : rx_tot_pkt, 'tx_thr': tx_thr, 'rx_thr': rx_thr}
        local_dict[beats] = entry_dict
        # Reset probes to prepare for next computation
        ol_w0_tg.resetProbes()
        ol_w1_tg.resetProbes()
        #Compute theoretical maximum at application level, overhead is UDP (8), IP (20), Ethernet(14) and FCS (4)
        theoretical = (beats * 64 * 100)/((beats*64)+8+20+14+4) 
        print("Sent {:14,} size: {:4}-Byte done!	Got {:14,} took {:8.4f} sec, thr: {:.3f} Gbps, theoretical: {:.3f} Gbps, difference: {:6.3f} Gbps"\
              .format(pkt,beats*64, rx_tot_pkt, rx_time, rx_thr, theoretical, theoretical-rx_thr))
        time.sleep(0.5)
    experiment_dict[pkt] = local_dict

## Plot the results
Finally we can plot the results using matplotlib

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

dict_oneM = experiment_dict[1_000_000]
dict_oneB = experiment_dict[1_000_000_000]
labels = []
oneM_thr = []
oneB_thr = []

for b in dict_oneM:
    labels.append(dict_oneM[b]['size'])
    oneM_thr.append(dict_oneM[b]['rx_thr'])

for b in dict_oneB:
    oneB_thr.append(dict_oneB[b]['rx_thr'])

x = np.arange(len(labels))  # the label locations
width = 0.35  # the width of the bars

fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, oneM_thr, width, label='A Million Packets')
rects2 = ax.bar(x + width/2, oneB_thr, width, label='A Billion Packets')

# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Throughput (Gbit/s)')
ax.set_xlabel('Payload Size (Byte)')
ax.set_title('Throughput for different packet size at application level')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()
ax.set_ylim(40,100)
fig.set_size_inches(18.5, 7)
plt.show()

## Release Alveo cards
To release the alveo cards the pynq overlay is freed

In [None]:
pynq.Overlay.free(ol_w0)

------------------------------------------
Copyright (c) 2020-2021, Xilinx, Inc.