# Optimizations for High-bandwidth Wide-area Networking Experiments

This example shows how to deploy a FABRIC slice across a wide-area connection and measure the bandwidth using iPerf3.

### Background: iPerf3

[iPerf](https://github.com/esnet/iperf) is a tool for measuring the maximum bandwidth achievable across IP networks. iPerf can be complicated to master and this notebook provides several tips and tricks for using iPerf on the high-bandwidth, high-latency dedicated links across the FABRIC testbed. The primary tips for using iPerf on FABRIC are to use the multithreaded version of iPerf3 and tune the host configuration.  

The version of iPerf3 available from most Linux package managers (apt-get, yum, dnf, etc.) is limited by being single threaded. This can be confusing because all iPerf3 versions allow the `-P` option to set the number of parallel streams.  Although this option does change the number of streams, it typically uses a [single thread for all streams](https://fasterdata.es.net/performance-testing/network-troubleshooting-tools/iperf/multi-stream-iperf3/). Single-threaded iPerf3 will be limited to 20-30 Gbps. An updated multi-threaded version of [iPerf](https://github.com/esnet/iperf) is available from ESnet and can be used to test higher bandwidths. In order to simplify the use of multi-threaded iPerf3, FABRIC includes it in a Docker image that can be easily deployed in your experiment.  This notebook shows how to deploy and use this Docker container. 

Regardless of which version of iPerf3 you use, you will need to tune your host TCP/IP parameters in order to achieve higher bandwidths. ESnet's website contains a [discussion of host tuning](https://fasterdata.es.net/host-tuning/linux/) for network performance. This example provides a script with some initial tuning parameters. These tuning parameters are only suggestions and will not be optimal for all FABRIC networks. For maximum performance, you will need to optimize your configuration for your experiment.

### Additional Features

In addition to the new iPerf3 and tuning features, this notebook uses a few other advanced features.  You can learn about these features  in this notebook but additional information can be found at the links included here.

- Automatic FABnet ([example](../../../fabric_examples/fablib_api/create_l3network_fabnet_ipv4_full_auto/create_l3network_fabnet_ipv4_full_auto.ipynb))
- Templated Post boot tasks  ([example](../../../fabric_examples/fablib_api/post_boot_task_templates/post_boot_task_templates.ipynb))
- Docker containers  ([example](../../../fabric_examples/fablib_api/docker_containers/docker_containers.ipynb))


### FABlib API References

- [fablib.get_random_sites](https://fabric-fablib.readthedocs.io/en/latest/fablib.html#fabrictestbed_extensions.fablib.fablib.FablibManager.get_random_sites)
- [node.add_fabnet](https://fabric-fablib.readthedocs.io/en/latest/node.html#fabrictestbed_extensions.fablib.node.Node.add_fabnet)
- [node.add_post_boot_upload_directory](https://fabric-fablib.readthedocs.io/en/latest/node.html#fabrictestbed_extensions.fablib.node.Node.add_post_boot_upload_directory)
- [node.add_post_boot_execute](https://fabric-fablib.readthedocs.io/en/latest/node.html#fabrictestbed_extensions.fablib.node.Node.add_post_boot_execute)
- [node.numa_tune](https://fabric-fablib.readthedocs.io/en/latest/node.html#fabrictestbed_extensions.fablib.node.Node.numa_tune)
- [node.pin_cpu](https://fabric-fablib.readthedocs.io/en/latest/node.html#fabrictestbed_extensions.fablib.node.Node.pin_cpu)
- [node.os_reboot](https://fabric-fablib.readthedocs.io/en/latest/node.html#fabrictestbed_extensions.fablib.node.Node.os_reboot)




## Import the FABlib Library


In [None]:
from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager

fablib = fablib_manager()

fablib.show_config();

## Create the Experiment Slice

The simplest slice that can be used for wide-area iPerf3 performance tests is a pair of nodes on two different FABRIC sites connected to the FABnet network.  The following cells build an appropriate slice including post boot configuration of Docker and host network tuning. 

The slice deployed here uses basic NICs and VMs with modest amounts of compute cores and memory. Achieving the highest bandwidths possible will require additional infrastructure and tuning including VMs with larger capacities, dedicated network cards, and pinning cores/memory to NUMA domains. 

<img src="./figs/iperf3_dumbell.png" width="70%"><br>





### Get Two Random Sites

In [None]:
slice_name = 'iPerf3'
[site1, site2] = fablib.get_random_sites(count=2)

print(f"Sites: {site1}, {site2}")

### Create the Slice

In [None]:
#Create Slice
slice = fablib.new_slice(name=slice_name)

### Add a Nodes and Network

Add two nodes to the slice. Place one node on each of the sites.  

Add a fully automatic FABnetv4 network to each node using the `add_fabnet` method.  This method will create a FABnet network on each site, add a basic NIC to each node, and automatically configure IP addresses and routes.

Note the use of the `docker_rocky_8` image. This image comes with Docker pre-installed enabling fast/simple deployment of Docker containers. 

The nodes are configured with only 4 cores and 8 GB of RAM. Higher bandwidth will require larger VMs.




In [None]:
node1 = slice.add_node(name='Node1', cores=8, ram=16, disk=100, site=site1, image='docker_rocky_8')
node1.add_fabnet()

node2 = slice.add_node(name='Node2', cores=8, ram=16, disk=100, site=site2, image='docker_rocky_8')
node2.add_fabnet()

### Add Post Boot Configuration Tasks

This example includes a directory containing a couple scripts that should be used to configure the nodes. The following post boot tasks will execute after the nodes are booted.  

The tasks include:

- Upload node tools: Copy the `node_tools` directory to each node. This directory contains the custom configuration scripts. 
- Execute `host_tune.sh`: Execute the script that tunes the host for high-bandwidth, high-latency data transfers. Feel free to customize this script for your specific experiment.
- Execute `enable_docker.sh`: This script enables the pre-installed Docker services. The image argument is an example of using templated post boot tasks. 
- Execute Docker pull to get required Docker container


In [None]:
node1.add_post_boot_upload_directory('node_tools','.')
node1.add_post_boot_execute('sudo node_tools/host_tune.sh')
node1.add_post_boot_execute('node_tools/enable_docker.sh {{ _self_.image }} ')
node1.add_post_boot_execute('docker pull fabrictestbed/slice-vm-rocky8-multitool:0.0.2 ')

node2.add_post_boot_upload_directory('node_tools','.')
node2.add_post_boot_execute('sudo node_tools/host_tune.sh')
node2.add_post_boot_execute('node_tools/enable_docker.sh {{ _self_.image }} ')
node2.add_post_boot_execute('docker pull fabrictestbed/slice-vm-rocky8-multitool:0.0.2 ')

### Submit the Slice

The slice request is complete and you can now submit the request. Notice the `post_boot_config` step. During this step, the FABnet network and post boot tasks that were added in the previous cells are executed.

In [None]:
#Submit Slice Request
slice.submit();

## Run iPerf3

Running iPerf3 is simple using the supplied Docker image.  

This cell gets both `node1` and `node2` then gets the target IP address `node1` uses on the FABnet data plane.  It then starts an iPerf3 server in a Docker container on `node1` and an iPerf3 client in a Docker container on `node2`.  The server is configured with the `-1` parameter instructing it to exit after one iPerf3 session. The client is configured to connect to the server's data plane IP.

This cell can be re-run many times.  You may wish to modify the iPerf3 parameters that are passed to the client Docker container to see how they affect the performance.

In [None]:
slice = fablib.get_slice(slice_name)

node1 = slice.get_node(name='Node1')        
node2 = slice.get_node(name='Node2')           

node1_addr = node1.get_interface(network_name=f'FABNET_IPv4_{node1.get_site()}').get_ip_addr()

stdout1, stderr1 = node1.execute("docker run -d --rm "
                                "--network host "
                                "fabrictestbed/slice-vm-rocky8-multitool:0.0.2 "
                                "iperf3 -s -1"
                                , quiet=True, output_file=f"{node1.get_name()}.log");

stdout2, stderr2 = node2.execute("docker run --rm "
                                "--network host "
                                "fabrictestbed/slice-vm-rocky8-multitool:0.0.2 "
                                f"iperf3 -c {node1_addr} -P 4 -t 30 -i 10 -O 10"
                                , quiet=False, output_file=f"{node2.get_name()}.log");

## (Optional) NUMA Tuning

Higher bandwidths can only be achieved with larger VMs that have their CPU cores pinned to the same NUMA domain as the PCI bus that handles their NIC.  The following cell will attempt to pin all of the CPU cores and memory used by your VMs to the same NUMA domain as the NIC that they are using.  

It is important to note that core and memory pinning grants a VM exclusive access to specific resources in the host machines.  NUMA tuning may not be possible because cores and memory cannot be pinned to resources that are already allocated to other VMs owned by other users. If NUMA pinning is not possible, you will see an appropriate error stating that the resources are not currently available.

Note that if you are currently attending a tutorial, there is an increased likelihood that the resources you are requesting have already been allocated to one of your fellow attendees.

### Pin CPUs and Memory

The `pin_cpu` method attempts to pin all of your CPU cores to the same NUMA domain as the PCI component that is your NIC. The `numa_tune` method attempts to pin all of your VM's memory to the same NUMA domain as the PCI component that is your NIC.  After pinning, you must reboot each node.




In [None]:
slice = fablib.get_slice(slice_name)

for node in slice.get_nodes():

    print(f'----- Pinning vCPUs for node {node.get_name()} ------')

    try:
       # Pin all vCPUs for VM to same Numa node as the component
        node.pin_cpu(component_name=f'FABNET_IPv4_{node.get_site()}_nic')
        
        # User can also pass in the range of the vCPUs to be pinned 
        #node.pin_cpu(component_name=nic_name, cpu_range_to_pin="0-3")
        
        # Pin memmory for VM to same Numa node as the components
        node.numa_tune()
        
        # Reboot the VM
        node.os_reboot()
    except Exception as e:
        print(f'{e}')

### Wait and Reconfigure

Use the `wait_ssh` method to block and wait for your nodes to reboot and be accessible using ssh.

After rebooting, you will need to reconfigure the IP addresses of the interfaces and restart the Docker services.


In [None]:
slice = fablib.get_slice(slice_name)

# Wait for the SSH Connectivity to be back
slice.wait_ssh()

print("All nodes are back up!")

# Reconfiguring the Network
for node in slice.get_nodes():
    print(f'Reconfiguring node {node.get_name()}')
    node.config()

    stdout1, stderr1 = node.execute(f"sudo systemctl start docker", 
                                    output_file=f'{node.get_name()}.log', quiet=True)    
    stdout1, stderr1 = node.execute(f"sudo ./node_tools/host_tune.sh", 
                                    output_file=f'{node.get_name()}.log', quiet=True)

## Re-Run iPerf3

Re-run iPerf3 exactly as you ran it before.  

Did you achieve higher performance? Why or why not?

Can you configure your slice to achieve performance approaching the 100 Gbps limit imposed by these FABRIC NICs?




In [None]:
slice = fablib.get_slice(slice_name)

node1 = slice.get_node(name='Node1')        
node2 = slice.get_node(name='Node2')           

node1_addr = node1.get_interface(network_name=f'FABNET_IPv4_{node1.get_site()}').get_ip_addr()

stdout1, stderr1 = node1.execute("docker run -d --rm "
                                "--network host "
                                "fabrictestbed/slice-vm-rocky8-multitool:0.0.2 "
                                "iperf3 -s -1"
                                , quiet=True, output_file=f"{node1.get_name()}.log");

stdout2, stderr2 = node2.execute("docker run --rm "
                                "--network host "
                                "fabrictestbed/slice-vm-rocky8-multitool:0.0.2 "
                                f"iperf3 -c {node1_addr} -P 8 -t 30 -i 10 -O 10"
                                , quiet=False, output_file=f"{node2.get_name()}.log");




## Delete the Slice

Please delete your slice when you are done with your experiment.

In [None]:
slice = fablib.get_slice(slice_name)
slice.delete()