# Create Posiedon Experiment Setup

This notebook shows how to use create an isolated local Ethernet and connect compute nodes to it.  

## Step 1:  Configure the Environment


In [8]:
import os

# If you are using the FABRIC JupyterHub, the following three evnrionment vars
# were automatically provided when you logged in.
#os.environ['FABRIC_CREDMGR_HOST']='cm.fabric-testbed.net'
#os.environ['FABRIC_ORCHESTRATOR_HOST']='orchestrator.fabric-testbed.net'
#os.environ['FABRIC_TOKEN_LOCATION']=os.environ['HOME']+'/work/fabric_token.json'

# Bastion IPs
os.environ['FABRIC_BASTION_HOST'] = 'bastion-1.fabric-testbed.net'

# Set your Bastion username and private key
os.environ['FABRIC_BASTION_USERNAME']="kthare10_0011904101"
os.environ['FABRIC_BASTION_KEY_LOCATION']=os.environ['HOME']+'/work/.ssh/fabric-bastion'

# Set the keypair FABRIC will install in your slice. 
os.environ['FABRIC_SLICE_PRIVATE_KEY_FILE']=os.environ['HOME']+'/work/.ssh/id_rsa'
os.environ['FABRIC_SLICE_PUBLIC_KEY_FILE']=os.environ['HOME']+'/work/.ssh/id_rsa.pub'

# If your slice private key uses a passphrase, set the passphrase
#from getpass import getpass
#print('Please input private key passphrase. Press enter for no passphrase.')
#os.environ['FABRIC_SLICE_PRIVATE_KEY_PASSPHRASE']=getpass()

## Step 2: Import the FABLlib Library


In [9]:
import json
import traceback

from fabrictestbed_extensions.fablib.fablib import fablib

## Step 3 (Optional): Query for Available Tesbed Resources and Settings

This optional command queries the FABRIC services to find the available resources. It may be useful for finding a site with available capacity.

In [10]:
try:
    print(f"{fablib.list_sites()}")
except Exception as e:
    print(f"Exception: {e}")

Name      CPUs  Cores    RAM (G)    Disk (G)       Basic (100 Gbps NIC)    ConnectX-6 (100 Gbps x2 NIC)    ConnectX-5 (25 Gbps x2 NIC)    P4510 (NVMe 1TB)    Tesla T4 (GPU)    RTX6000 (GPU)
------  ------  -------  ---------  -------------  ----------------------  ------------------------------  -----------------------------  ------------------  ----------------  ---------------
TACC        10  318/320  2552/2560  116390/116400  634/635                 2/2                             4/4                            16/16               4/4               6/6
UTAH        10  312/320  2528/2560  116330/116400  622/635                 2/2                             4/4                            16/16               4/4               5/5
WASH         6  192/192  1536/1536  60600/60600    381/381                 2/2                             2/2                            10/10               2/2               3/3
MICH         6  180/192  1488/1536  60450/60600    375/381                 1/2

## Step 4: Create the Experiment Slice

The following creates two nodes with basic NICs connected to an isolated WAN Ethernet.  

Two nodes are created and one NIC component is added to each node.  This example uses components of model `NIC_Basic` which are SR-IOV Virtual Function on a 100 Gpbs Mellanox ConnectX-6 PCI device. The VF is accessed by the node via PCI passthrough. Other NIC models are listed below. When using dedicated PCI devices the whole physical device is allocated to one node and the device is accessed by the node using PCI passthrough. Calling the `get_interfaces()` method on a component will return a list of interfaces. Many dedicated NIC components may have more than one port.  Either port can be connected to the network.

Next, add an `l2network` to the slice and pass the list of interfaces you want connected to this Ethernet. If the interfaces in the list are located on two sites, the network will automatically create a wide-area layer 2 circuit.  By default, a node is put on a random site.  If you want to ensure that your nodes are all on different sites you can specify the name of the sites in the `add_node` methode.  You can use the `fablib.get_random_site()` method to get a set of random site names that guarantee that the sites are different. 

NIC component models options:
- NIC_Basic: 100 Gbps Mellanox ConnectX-6 SR-IOV VF (1 Port)
- NIC_ConnectX_5: 25 Gbps Dedicated Mellanox ConnectX-5 PCI Device (2 Ports) 
- NIC_ConnectX_6: 100 Gbps Dedicated Mellanox ConnectX-6 PCI Device (2 Ports) 

In [11]:
slice_name = 'Posiedon-Slice'
#[site1,site2]  = fablib.get_random_sites(count=2)
[site1,site2]  = ["MAX", "TACC"]
print(f"Sites: {site1}, {site2}")

submit_node_name = 'submit'
worker_name_prefix = 'worker'
worker_count = 1
network_name='net1'
submit_node_nic_name = 'nic1'
worker_nic_name_prefix = 'nic'

Sites: MAX, TACC


In [None]:
try:
    if_list = []
    #Create Slice
    slice = fablib.new_slice(name=slice_name)

    # Node1
    submit_node = slice.add_node(name=submit_node_name, site=site1)
    if1 = submit_node.add_component(model='NIC_Basic', name=submit_node_nic_name).get_interfaces()[0]
    if_list.append(if1)
    
    # Node2
    for i in range(worker_count):
        w = slice.add_node(name=f"{worker_name_prefix}-{i}", site=site2)
        ifw = w.add_component(model='NIC_Basic', name=f"{worker_nic_name_prefix}-{i}").get_interfaces()[0]
        if_list.append(ifw)
    
    # Network
    net1 = slice.add_l2network(name=network_name, interfaces=if_list)

    #Submit Slice Request
    slice.submit()
except Exception as e:
    print(f"Exception: {e}")


-----------  ------------------------------------
Slice Name   Posiedon-Slice
Slice ID     3b12372a-743a-418c-bbdf-0ea7bac33270
Slice State  Configuring
Lease End    2022-04-14 00:55:03
-----------  ------------------------------------

Retry: 10, Time: 113 sec

ID                                    Name      Site    Host                          Cores    RAM    Disk  Image            Management IP    State    Error
------------------------------------  --------  ------  --------------------------  -------  -----  ------  ---------------  ---------------  -------  -------
18523070-82a0-4ddd-8a07-36261a61ac10  submit    MAX     max-w4.fabric-testbed.net         2      8      10  default_rocky_8  63.239.135.102   Active
2be11322-a697-452f-b5bb-6157a35a9a4d  worker-0  TACC    tacc-w1.fabric-testbed.net        2      8      10  default_rocky_8  129.114.110.68   Active


## Step 5: Observe the Slice's Attributes

### Print the slice

In [14]:
try:
    slice = fablib.get_slice(name=slice_name)
    print(f"{slice}")
except Exception as e:
    print(f"Exception: {e}")

-----------  ------------------------------------
Slice Name   Posiedon-Slice
Slice ID     3b12372a-743a-418c-bbdf-0ea7bac33270
Slice State  StableOK
Lease End    2022-04-14 00:55:03
-----------  ------------------------------------


## Print the Node List

In [15]:
try:
    slice = fablib.get_slice(name=slice_name)

    print(f"{slice.list_nodes()}")
except Exception as e:
    print(f"Exception: {e}")

ID                                    Name      Site    Host                          Cores    RAM    Disk  Image            Management IP    State    Error
------------------------------------  --------  ------  --------------------------  -------  -----  ------  ---------------  ---------------  -------  -------
18523070-82a0-4ddd-8a07-36261a61ac10  submit    MAX     max-w4.fabric-testbed.net         2      8      10  default_rocky_8  63.239.135.102   Active
2be11322-a697-452f-b5bb-6157a35a9a4d  worker-0  TACC    tacc-w1.fabric-testbed.net        2      8      10  default_rocky_8  129.114.110.68   Active


## Print the Node Details

In [16]:
try:
    slice = fablib.get_slice(name=slice_name)
    for node in slice.get_nodes():
        print(f"{node}")
except Exception as e:
    print(f"Exception: {e}")

-----------------  -------------------------------------------------------------------------------------------------------------
ID                 18523070-82a0-4ddd-8a07-36261a61ac10
Name               submit
Cores              2
RAM                8
Disk               10
Image              default_rocky_8
Image Type         qcow2
Host               max-w4.fabric-testbed.net
Site               MAX
Management IP      63.239.135.102
Reservation State  Active
Error Message
SSH Command        ssh -i /home/fabric/work/.ssh/id_rsa -J kthare10_0011904101@bastion-1.fabric-testbed.net rocky@63.239.135.102
-----------------  -------------------------------------------------------------------------------------------------------------
-----------------  -------------------------------------------------------------------------------------------------------------
ID                 2be11322-a697-452f-b5bb-6157a35a9a4d
Name               worker-0
Cores              2
RAM                8
Disk      

## Print the Interfaces

In [17]:
try:
    slice = fablib.get_slice(name=slice_name)
    
    print(f"{slice.list_interfaces()}")
except Exception as e:
    print(f"Exception: {e}")

Name               Node      Network      Bandwidth  VLAN    MAC                Physical OS Interface    OS Interface
-----------------  --------  ---------  -----------  ------  -----------------  -----------------------  --------------
submit-nic1-p1     submit    net1                 0          16:55:bb:75:01:1a  eth1                     eth1
worker-0-nic-0-p1  worker-0  net1                 0          02:26:69:8a:cd:1f  eth1                     eth1


## Step 6 (Optional): Configure IP Addresses

Some experiments use FABRIC layer 2 networks to enable deploying non-IP layer 3 networks.  If this describes your exepriment, your nodes and network are ready. You can now login to the nodes and deploy your experiemnt.

Most users will want to configure IP addresses on there new nodes.  FABlib provides some useful methods to help you configure basic IP addreses. 

### Pick a Subnet

Create subnet and list of available IP addresses. All object are Python IP managment objects. You can use either IPv4 or IPv6 subents and addresses.

In [18]:
from ipaddress import ip_address, IPv4Address, IPv6Address, IPv4Network, IPv6Network

try:
    subnet = IPv4Network("192.168.1.0/24")
    available_ips = list(subnet)[1:]
    available_ips.pop(0)
except Exception as e:
    print(f"Exception: {e}")

### Configure Submit

Get the node and the interface you wish to configure.  You can use `node.get_interface` to get the interface that is connected to the specified network.  Then `pop` an IP address from the list of available IPs and call `iface.ip_addr_add` to set the IP and subnet.  

Optionally, use the `node.execute()` method to show the results of adding the IP address.

In [19]:
try:
    submit_node = slice.get_node(name=submit_node_name)        
    submit_node_iface = submit_node.get_interface(network_name=network_name) 
    submit_node_addr = available_ips.pop(0)
    submit_node_iface.ip_addr_add(addr=submit_node_addr, subnet=subnet)
    
    stdout, stderr = submit_node.execute(f'ip addr show {submit_node_iface.get_os_interface()}')
    print (stdout)
    
except Exception as e:
    print(f"Exception: {e}")

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 16:55:bb:75:01:1a brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.2/24 scope global eth1
       valid_lft forever preferred_lft forever



### Configure Workers

Repeat the steps to add the next available IP to the second node.

In [20]:
# workers
for i in range(worker_count):
    try:
        worker = slice.get_node(name=f"{worker_name_prefix}-{i}")        
        worker_iface = worker.get_interface(network_name=network_name)  
        worker_addr = available_ips.pop(0)
        worker_iface.ip_addr_add(addr=worker_addr, subnet=subnet)

        stdout, stderr = worker.execute(f'ip addr show {worker_iface.get_os_interface()}')
        print (stdout)

    except Exception as e:
        print(f"Exception: {e}")

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 02:26:69:8a:cd:1f brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.3/24 scope global eth1
       valid_lft forever preferred_lft forever



## Step 7: Check connectivity between Submit and Workers

In [21]:
try:
    submit_node = slice.get_node(name=submit_node_name)        

    stdout, stderr = submit_node.execute(f'ping -c 5 {worker_addr}')
    print (stdout)
    print (stderr)

except Exception as e:
    print(f"Exception: {e}")

PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.
64 bytes from 192.168.1.3: icmp_seq=1 ttl=64 time=238 ms
64 bytes from 192.168.1.3: icmp_seq=2 ttl=64 time=119 ms
64 bytes from 192.168.1.3: icmp_seq=3 ttl=64 time=119 ms
64 bytes from 192.168.1.3: icmp_seq=4 ttl=64 time=119 ms
64 bytes from 192.168.1.3: icmp_seq=5 ttl=64 time=119 ms

--- 192.168.1.3 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4006ms
rtt min/avg/max/mdev = 118.744/142.576/237.737/47.581 ms




## Step 8: Set up Condor Submit Node

In [22]:
try:
    file_attributes = submit_node.upload_file(local_file_path="submit.sh", remote_file_path="submit.sh")
    
    stdout, stderr = submit_node.execute(f"chmod +x submit.sh && sudo ./submit.sh")
    print(stdout)

except Exception as e:
    print(f"Exception: {e}")

Rocky Linux 8 - AppStream                       5.0 MB/s | 9.7 MB     00:01    
Rocky Linux 8 - BaseOS                           19 MB/s | 6.8 MB     00:00    
Rocky Linux 8 - Extras                           20 kB/s |  12 kB     00:00    
Dependencies resolved.
 Package                      Arch    Version                                   Repo        Size
Installing:
 kernel                       x86_64  4.18.0-348.20.1.el8_5                     baseos     7.0 M
Upgrading:
 cloud-init                   noarch  21.1-7.el8_5.4                            appstream  1.1 M
 cockpit-bridge               x86_64  251.3-1.el8_5                             baseos     538 k
 cockpit-system               noarch  251.3-1.el8_5                             baseos     3.2 M
 cockpit-ws                   x86_64  251.3-1.el8_5                             baseos     1.3 M
 cryptsetup-libs              x86_64  2.3.3-4.el8_5.1                           baseos     473 k
 cyrus-sasl-lib               x86_6

## Step 8: Set up Worker Nodes

In [23]:
# workers
for i in range(worker_count):
    try:
        worker = slice.get_node(name=f"{worker_name_prefix}-{i}")        
        file_attributes = worker.upload_file(local_file_path="worker.sh", remote_file_path="worker.sh")
    
        stdout, stderr = worker.execute(f"chmod +x worker.sh && sudo ./worker.sh")
        print(stdout)
        print(stderr)

    except Exception as e:
        print(f"Exception: {e}")

Rocky Linux 8 - AppStream                        11 MB/s | 9.7 MB     00:00    
Rocky Linux 8 - BaseOS                          6.3 MB/s | 6.8 MB     00:01    
Rocky Linux 8 - Extras                           20 kB/s |  12 kB     00:00    
Dependencies resolved.
 Package                      Arch    Version                                   Repo        Size
Installing:
 kernel                       x86_64  4.18.0-348.20.1.el8_5                     baseos     7.0 M
Upgrading:
 cloud-init                   noarch  21.1-7.el8_5.4                            appstream  1.1 M
 cockpit-bridge               x86_64  251.3-1.el8_5                             baseos     538 k
 cockpit-system               noarch  251.3-1.el8_5                             baseos     3.2 M
 cockpit-ws                   x86_64  251.3-1.el8_5                             baseos     1.3 M
 cryptsetup-libs              x86_64  2.3.3-4.el8_5.1                           baseos     473 k
 cyrus-sasl-lib               x86_6

## Step 8: Delete the Slice

Please delete your slice when you are done with your experiment.

In [7]:
try:
    slice = fablib.get_slice(name=slice_name)
    slice.delete()
except Exception as e:
    print(f"Exception: {e}")