# Setup a K8s Cluster with Calico

The objective of this notebook is to setup a K8s cluster and a Calico CNI (container network interface) on the [Fabric Testbed](https://portal.fabric-testbed.net/) with the base OS as Ubuntu 22.04.

It refers to the tutorials at:
- [ChatGPT guide](https://docs.google.com/document/d/14d6HMI5jW8NLFe0K4Yx1_bO0DV44ynKMJYQe71ikX7c)
- [Calico quickstart](https://docs.tigera.io/calico/latest/getting-started/kubernetes/quickstart)
- [A video tutorial](https://youtu.be/k3iexxiYPI8)

## Preamble: get a Fabric slice with desired configuration

Our slice contains 3 nodes:
1. The `cpnode` for the [K8s Control Plane](https://kubernetes.io/docs/concepts/overview/components/#control-plane-components).
2. 2 worker nodes for the [K8s Node](https://kubernetes.io/docs/concepts/overview/components/#node-components): `wknode1`, `wknode2`.

### Define the node properties

We configure a L2 network on Fabric so we can manually setup the IPv4 addresses.

In [1]:
# Define the network of the slice
FABRIC_NIC_STR = 'NIC_Basic'  # do not update
FABRIC_SUBNET_STR = "192.168.0.0/24"  # so the node IPs would be 192.168.0.1-25x
FABRIC_L2NET_STR = 'site_bridge_net'  # do not update

We need extra storage for the `cpnode`.

In [2]:
# Define the nodes of the slice
node_config = {
    'cpnode': {
        'ip':'192.168.0.1',
        'cores': 8,
        'ram': 24,
        'disk': 100 },
    'wknode1': {
        'ip':'192.168.0.2',
    },
    'wknode2': {
        'ip':'192.168.0.3',
    },
}

### Fabric headers and helper funtions

In [3]:
from datetime import datetime
from datetime import timezone
from datetime import timedelta

from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager

from ipaddress import ip_address, IPv4Address, IPv6Address, IPv4Network, IPv6Network
import ipaddress

import json

# fablib = fablib_manager(fabric_rc=/path/to/fabric_rc)
# If using Fabric Testbed Jupyter Hub, use the above line instead of the below
fablib = fablib_manager(fabric_rc="/Users/xinxinmei/.ssh/fabric_rc")     # path to local file "fabric_rc"
fablib.show_config()

0,1
Credential Manager,cm.fabric-testbed.net
Orchestrator,orchestrator.fabric-testbed.net
Token File,/Users/xinxinmei/.ssh/fabric_token.json
Project ID,bbe0d94c-736b-477a-a2e6-fef9fe7ac9ca
Bastion Host,bastion.fabric-testbed.net
Bastion Username,xmei_0000124604
Bastion Private Key File,/Users/xinxinmei/.ssh/fabric-bastion-key
Slice Public Key File,/Users/xinxinmei/.ssh/slice_key.pub
Slice Private Key File,/Users/xinxinmei/.ssh/slice_key
Sites to avoid,


0,1
Credential Manager,cm.fabric-testbed.net
Orchestrator,orchestrator.fabric-testbed.net
Token File,/Users/xinxinmei/.ssh/fabric_token.json
Project ID,bbe0d94c-736b-477a-a2e6-fef9fe7ac9ca
Bastion Host,bastion.fabric-testbed.net
Bastion Username,xmei_0000124604
Bastion Private Key File,/Users/xinxinmei/.ssh/fabric-bastion-key
Slice Public Key File,/Users/xinxinmei/.ssh/slice_key.pub
Slice Private Key File,/Users/xinxinmei/.ssh/slice_key
Sites to avoid,


In [4]:
FABRIC_SITE_OVERRIDE = "UCSD"
FABRIC_SLICENAME_PREFIX = 'k8s_calico_'
FABRIC_OS_STR = 'default_ubuntu_22'

# Write selected site into node attributes. Fulfill the node_config with site and image.
for n in node_config:
    node_config[n]['site'] = FABRIC_SITE_OVERRIDE
    node_config[n]['image'] = FABRIC_OS_STR

Build the Fabric slice.

In [5]:
# Define the Fabric slice name with user_id as the suffix
user_info = fablib.get_user_info()
slice_name = FABRIC_SLICENAME_PREFIX + FABRIC_SITE_OVERRIDE + "_" + user_info['bastion_login']

slice = fablib.new_slice(name=slice_name)

# Create the network
net1 = slice.add_l2network(name=FABRIC_L2NET_STR, subnet=IPv4Network(FABRIC_SUBNET_STR))

In [6]:
# Create nodes using subnet address assignment
skip_keys = ['ip']

nodes = dict()
for node_name, node_attr in node_config.items():
    print(f"{node_name=}, {node_attr['ip']}")
    nodes[node_name] = slice.add_node(
        name=node_name,
        **{x: node_attr[x] for x in node_attr if x not in skip_keys}
    )
    nic_interface = nodes[node_name].add_component(
        model=FABRIC_NIC_STR,
        name='_'.join([node_name, FABRIC_NIC_STR, 'nic'])
    ).get_interfaces()[0]
    net1.add_interface(nic_interface)
    nic_interface.set_mode('config')
    nic_interface.set_ip_addr(node_attr['ip'])

print(f'\nCreating a slice named "{slice_name}" with nodes in {FABRIC_SITE_OVERRIDE}')

node_name='cpnode', 192.168.0.1
node_name='wknode1', 192.168.0.2
node_name='wknode2', 192.168.0.3

Creating a slice named "k8s_calico_UCSD_xmei_0000124604" with nodes in UCSD


In [7]:
slice.submit()


Retry: 30, Time: 651 sec


0,1
ID,808f3f8b-eaf3-426d-9cad-35681efd2a7a
Name,k8s_calico_UCSD_xmei_0000124604
Lease Expiration (UTC),2025-02-07 05:51:43 +0000
Lease Start (UTC),2025-02-06 05:51:43 +0000
Project ID,bbe0d94c-736b-477a-a2e6-fef9fe7ac9ca
State,StableOK


ID,Name,Cores,RAM,Disk,Image,Image Type,Host,Site,Username,Management IP,State,Error,SSH Command,Public SSH Key File,Private SSH Key File
c2d65eaf-7c75-4734-b30d-1c8748a92620,cpnode,8,32,100,default_ubuntu_22,qcow2,ucsd-w1.fabric-testbed.net,UCSD,ubuntu,132.249.252.188,Active,,ssh -i /Users/xinxinmei/.ssh/slice_key -F /Users/xinxinmei/.ssh/fabric_config ubuntu@132.249.252.188,/Users/xinxinmei/.ssh/slice_key.pub,/Users/xinxinmei/.ssh/slice_key
c4ed8cbe-a0a2-4bf0-870e-20b6bf65b69a,wknode1,2,8,10,default_ubuntu_22,qcow2,ucsd-w1.fabric-testbed.net,UCSD,ubuntu,132.249.252.184,Active,,ssh -i /Users/xinxinmei/.ssh/slice_key -F /Users/xinxinmei/.ssh/fabric_config ubuntu@132.249.252.184,/Users/xinxinmei/.ssh/slice_key.pub,/Users/xinxinmei/.ssh/slice_key
b3885992-edb3-4ba8-b0e7-960dc69a2069,wknode2,2,8,10,default_ubuntu_22,qcow2,ucsd-w1.fabric-testbed.net,UCSD,ubuntu,132.249.252.153,Active,,ssh -i /Users/xinxinmei/.ssh/slice_key -F /Users/xinxinmei/.ssh/fabric_config ubuntu@132.249.252.153,/Users/xinxinmei/.ssh/slice_key.pub,/Users/xinxinmei/.ssh/slice_key


ID,Name,Layer,Type,Site,Subnet,Gateway,State,Error
38e297ae-d0da-4531-81dd-ed9879247a9d,site_bridge_net,L2,L2Bridge,UCSD,192.168.0.0/24,,Active,


Name,Short Name,Node,Network,Bandwidth,Mode,VLAN,MAC,Physical Device,Device,IP Address,Numa Node,Switch Port
cpnode-cpnode_NIC_Basic_nic-p1,p1,cpnode,site_bridge_net,100,config,,02:DC:BE:A5:82:1D,enp7s0,enp7s0,192.168.0.1,6,HundredGigE0/0/0/5
wknode1-wknode1_NIC_Basic_nic-p1,p1,wknode1,site_bridge_net,100,config,,06:13:70:7C:6E:FE,enp7s0,enp7s0,192.168.0.2,6,HundredGigE0/0/0/5
wknode2-wknode2_NIC_Basic_nic-p1,p1,wknode2,site_bridge_net,100,config,,0A:19:42:43:96:03,enp7s0,enp7s0,192.168.0.3,6,HundredGigE0/0/0/5



Time to print interfaces 651 seconds


'808f3f8b-eaf3-426d-9cad-35681efd2a7a'

Get the slice details for the existing slice.

In [8]:
slice = fablib.get_slice(name=slice_name)
slice.show()

nets = slice.list_networks()
nodes = slice.list_nodes()

cpnode = slice.get_node(name="cpnode")    
wknode1 = slice.get_node(name="wknode1")
wknode2 = slice.get_node(name="wknode2")

# Get node IP addresses
cpnode_addr = cpnode.get_interface(network_name=FABRIC_L2NET_STR).get_ip_addr()
wknode1_addr = wknode1.get_interface(network_name=FABRIC_L2NET_STR).get_ip_addr()
wknode2_addr = wknode2.get_interface(network_name=FABRIC_L2NET_STR).get_ip_addr()

wknode1_iface = wknode1.get_interface(network_name=FABRIC_L2NET_STR)
wknode2_iface = wknode2.get_interface(network_name=FABRIC_L2NET_STR)

print(f"{cpnode_addr = } \n{wknode1_addr = } \n{wknode2_addr = }")

0,1
ID,808f3f8b-eaf3-426d-9cad-35681efd2a7a
Name,k8s_calico_UCSD_xmei_0000124604
Lease Expiration (UTC),2025-02-07 05:51:43 +0000
Lease Start (UTC),2025-02-06 05:51:43 +0000
Project ID,bbe0d94c-736b-477a-a2e6-fef9fe7ac9ca
State,StableOK


ID,Name,Layer,Type,Site,Subnet,Gateway,State,Error
38e297ae-d0da-4531-81dd-ed9879247a9d,site_bridge_net,L2,L2Bridge,UCSD,192.168.0.0/24,,Active,


ID,Name,Cores,RAM,Disk,Image,Image Type,Host,Site,Username,Management IP,State,Error,SSH Command,Public SSH Key File,Private SSH Key File
c2d65eaf-7c75-4734-b30d-1c8748a92620,cpnode,8,32,100,default_ubuntu_22,qcow2,ucsd-w1.fabric-testbed.net,UCSD,ubuntu,132.249.252.188,Active,,ssh -i /Users/xinxinmei/.ssh/slice_key -F /Users/xinxinmei/.ssh/fabric_config ubuntu@132.249.252.188,/Users/xinxinmei/.ssh/slice_key.pub,/Users/xinxinmei/.ssh/slice_key
c4ed8cbe-a0a2-4bf0-870e-20b6bf65b69a,wknode1,2,8,10,default_ubuntu_22,qcow2,ucsd-w1.fabric-testbed.net,UCSD,ubuntu,132.249.252.184,Active,,ssh -i /Users/xinxinmei/.ssh/slice_key -F /Users/xinxinmei/.ssh/fabric_config ubuntu@132.249.252.184,/Users/xinxinmei/.ssh/slice_key.pub,/Users/xinxinmei/.ssh/slice_key
b3885992-edb3-4ba8-b0e7-960dc69a2069,wknode2,2,8,10,default_ubuntu_22,qcow2,ucsd-w1.fabric-testbed.net,UCSD,ubuntu,132.249.252.153,Active,,ssh -i /Users/xinxinmei/.ssh/slice_key -F /Users/xinxinmei/.ssh/fabric_config ubuntu@132.249.252.153,/Users/xinxinmei/.ssh/slice_key.pub,/Users/xinxinmei/.ssh/slice_key


cpnode_addr = IPv4Address('192.168.0.1') 
wknode1_addr = IPv4Address('192.168.0.2') 
wknode2_addr = IPv4Address('192.168.0.3')


In [9]:
# Some helper functions
def execute_single_node(node, commands):
    for command in commands:
        stdout, stderr = node.execute(command)
        print(f'Executed "{command}" on node {node.get_name()}')
    if not stderr and len(stderr) > 0:
        print(f'Error encountered with "{command}": {stderr}')

def execute_commands(node, commands):
    if isinstance(node, list):
        for n in node:
            execute_single_node(n, commands)
    else:
        execute_single_node(node, commands)

def upload_file_single(node, local_path, remote_path_prefix):
    import os
    try:
        if not os.path.exists(local_path):
            print(f'Local file [{local_path}] does not exist.')
            exit(1)

        node.execute(f'mkdir -p {remote_path_prefix}')
        remote_full_path = f'{remote_path_prefix}/{os.path.basename(local_path)}'

        # Upload directly to ~/workdir/
        node.upload_file(local_file_path=local_path, remote_file_path=remote_full_path)
        print(f'Uploaded [{local_path}] to node {node.get_name()} at [{remote_full_path}]')
    except Exception as e:
        print(f'Error uploading "{local_path}" to node {node.get_name()}: {e}')

def upload_file(nodes, local_path, remote_path_prefix='/home/ubuntu//workdir'):
    """
    Upload a file to a list of nodes, and the uploaded file will be placed
    under $HOME/workdir/ with the same name as the local file.

    :param nodes: A list of nodes or a single node object
    :param local_path: The local file path to upload. Has to be an absolute path.
                       No '~' or '$HOME' allowed. The default value is for Ubuntu systems.
    """
    if isinstance(nodes, list):
        for node in nodes:
            upload_file_single(node, local_path, remote_path_prefix)
    else:
        upload_file_single(nodes, local_path, remote_path_prefix)


## Install and setup Kubernetes
This include 3 steps:
1. Install K8s and its dependency `docker` on all the nodes.
2. `cpnode`: start the K8s control plane via `kubeadm init`.
3. `wknode[1-2]`: join the control plane with the tokens given by the above step.

In this example, we are using `${HOME}/wkdir` as the remote working space. However, when calling the fabric_lib functions, DO NOT use remote paths including `~`, `$HOME` or `${HOME}`, as they will be not be decoded correctly. If we want to manupulate remote files, always use absolute paths begin with `/home/ubuntu/workdir` (only for Fabric Testbed unspecified Ubuntu hosts).

In [10]:
# Create the working dir on the nodes.
execute_commands([cpnode, wknode1, wknode2], ['mkdir -p ~/workdir'])

Executed "mkdir -p ~/workdir" on node cpnode
Executed "mkdir -p ~/workdir" on node wknode1
Executed "mkdir -p ~/workdir" on node wknode2



### Step 1: Install K8s on all the nodes

Installing K8s is identical on all the nodes. `docker.io` is installed along with `kubelet`, `kubeadm` and `kubectl`. Remember to turn off the `swap` by `sudo swapoff -a` as it's required by running K8s.

In [11]:
upload_file([cpnode, wknode1, wknode2], 'install_k8s.sh')

Uploaded [install_k8s.sh] to node cpnode at [/home/ubuntu//workdir/install_k8s.sh]
Uploaded [install_k8s.sh] to node wknode1 at [/home/ubuntu//workdir/install_k8s.sh]
Uploaded [install_k8s.sh] to node wknode2 at [/home/ubuntu//workdir/install_k8s.sh]


In [12]:
# Install k8s on every node.
# The success sign would be:
    # kubelet set on hold.
    # kubeadm set on hold.
    # kubectl set on hold.
abs_remote_path = '/home/ubuntu/workdir/install_k8s.sh'
execute_commands([cpnode, wknode1, wknode2], [f'sudo bash {abs_remote_path}'])

[31m

[0mGet:1 http://nova.clouds.archive.ubuntu.com/ubuntu jammy InRelease [270 kB]
Get:2 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:3 http://nova.clouds.archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:4 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [2077 kB]
Get:5 http://nova.clouds.archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Get:6 http://nova.clouds.archive.ubuntu.com/ubuntu jammy/universe amd64 Packages [14.1 MB]
Get:7 http://security.ubuntu.com/ubuntu jammy-security/main Translation-en [325 kB]
Get:8 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [2836 kB]
Get:9 http://security.ubuntu.com/ubuntu jammy-security/restricted Translation-en [498 kB]
Get:10 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [961 kB]
Get:11 http://security.ubuntu.com/ubuntu jammy-security/universe Translation-en [205 kB]
Get:12 http://security.ubuntu.com/ubuntu jammy-sec

### Step 2: Start the control plane



## Start the worker nodes

In [None]:
join_cmd = "sudo kubeadm join 10.146.4.2:6443 --token l255bp.wqi5i6br0jg7f4z2 --discovery-token-ca-cert-hash sha256:069646097e377bcd9e6d66ee7779a0d3e0930d95d1e6a79f73f22f70b1098b5e"

try:
    # Configure worker nodes
    worker_nodes = [wknode1, wknode2]
    for node in worker_nodes:
        node_name = node.get_name()
        print(f"\nConfiguring {node_name}...")
        
        # Upload and execute config script
        try:
            print(f"Uploading and executing config_worker_node.sh on {node_name}...")
            file_attributes = node.upload_file(
                local_file_path="config_worker_node.sh", 
                remote_file_path="config_worker_node.sh"
            )
            exec_cmd = f"chmod +x config_worker_node.sh && ./config_worker_node.sh {join_cmd}"
            stdout, stderr = node.execute(exec_cmd)
            print(f"Config output for {node_name}:", stdout)
        except Exception as e:
            print(f"Failed to configure {node_name}: {e}")
            continue  # Skip to next node if configuration fails

except Exception as e:
    print(f"Main exception: {e}")

## Monitor the kubernetes cluster

In [None]:
try:
    # Get all resources across namespaces
    print("Getting all kubernetes resources...")
    stdout, stderr = cpnode.execute("kubectl get all --all-namespaces")
    print(f"All resources:\n{stdout}")
    if stderr:
        print(f"Stderr: {stderr}")
    
    # Get all pods with more details
    print("\nGetting detailed pods status across all namespaces...")
    stdout, stderr = cpnode.execute("kubectl get pods -A -o wide")
    print(f"Detailed pods status:\n{stdout}")
    if stderr:
        print(f"Stderr: {stderr}")
        
    # Describe kube-system namespace pods
    print("\nDescribing kube-system pods...")
    stdout, stderr = cpnode.execute("kubectl describe pods -n kube-system")
    print(f"Kube-system pods details:\n{stdout}")
    if stderr:
        print(f"Stderr: {stderr}")
    
    # Get logs from specific system pods
    system_components = ['kube-apiserver', 'kube-controller-manager', 'kube-scheduler', 'etcd']
    
    for component in system_components:
        print(f"\nGetting logs for {component}...")
        try:
            # First get the pod name
            cmd = f"kubectl get pods -n kube-system -l component={component} -o jsonpath='{{.items[0].metadata.name}}'"
            stdout, stderr = cpnode.execute(cmd)
            if stdout:
                pod_name = stdout.strip()
                # Then get the logs
                stdout, stderr = cpnode.execute(f"kubectl logs -n kube-system {pod_name} --tail=50")
                print(f"Logs from {component} ({pod_name}):\n{stdout}")
            else:
                print(f"No pod found for component {component}")
        except Exception as e:
            print(f"Error getting logs for {component}: {e}")

except Exception as e:
    print(f"Exception while monitoring kubernetes cluster: {e}")



In [None]:
slice.renew(14 * 86400)   # Renew the slice for 14 days

In [94]:
slice.delete()  # Delete the slice