## Reserve and configure resources on KVM

Before you run this experiment, you will:

-   define the specific configuration of resources you need.
-   “instantiate” an experiment with your reserved resources.
-   wait for your resources to be configured.
-   log in to resources to carry out the experiment.

This exercise will guide you through those steps.

### Configure environment

In [None]:
import openstack, chi, chi.ssh, chi.network, chi.server, os

In this section, we configure the Chameleon Python client.

For this experiment, we’re going to use the KVM@TACC site, which we indicate below.

We also need to specify the name of the Chameleon “project” that this experiment is part of. The project name will have the format “CHI-XXXXXX”, where the last part is a 6-digit number, and you can find it on your [user dashboard](https://chameleoncloud.org/user/dashboard/).

In the cell below, replace the project ID with your own project ID, then run the cell.

In [None]:
chi.use_site("KVM@TACC")
PROJECT_NAME = "CHI-XXXXXX"
chi.set("project_name", PROJECT_NAME)

# configure openstacksdk for actions unsupported by python-chi
os_conn = chi.clients.connection()


### Define configuration for this experiment (3 VMs)

For this specific experiment, we will need 1 virtual machines connected to a common network. Each of the virtual machines will be of the `m1.large` type, with 4 VCPUs, 8 GB memory, 40 GB disk space.

In [None]:
username = os.getenv('USER')

node_conf = [
 {'name': "node-0",  'flavor': 'm1.medium', 'image': 'CC-Ubuntu22.04', 'packages': ["virtualenv"]}, 
]
net_conf = [
 {"name": "net0", "subnet": "192.168.1.0/24", "nodes": [{"name": "node-0",   "addr": "192.168.1.10"}]},
]
route_conf = []

### Configure resources

Now, we will prepare the VMs and network links that our experiment requires.

First, we will prepare a “public” network that we will use for SSH access to our VMs -

In [None]:
public_net = os_conn.network.create_network(name="public_net_" + username)
public_net_id = public_net.get("id")
public_subnet = os_conn.network.create_subnet(
    name="public_subnet_" + username,
    network_id=public_net.get("id"),
    ip_version='4',
    cidr="192.168.10.0/24",
    gateway_ip="192.168.10.1",
    is_dhcp_enabled = True
)

Next, we will prepare the “experiment” networks -

In [None]:
nets = []
net_ids = []
subnets = []
for n in net_conf:
    exp_net = os_conn.network.create_network(name="exp_" + n['name']  + '_' + username)
    exp_net_id = exp_net.get("id")
    os_conn.network.update_network(exp_net, is_port_security_enabled=False)
    exp_subnet = os_conn.network.create_subnet(
        name="exp_subnet_" + n['name']  + '_' + username,
        network_id=exp_net.get("id"),
        ip_version='4',
        cidr=n['subnet'],
        gateway_ip=None,
        is_dhcp_enabled = True
    )
    nets.append(exp_net)
    net_ids.append(exp_net_id)
    subnets.append(exp_subnet)

Now we create the VMs -

In [None]:
servers = []
server_ids = []
for i, n in enumerate(node_conf, start=10):
    image_uuid = os_conn.image.find_image(n['image']).id
    flavor_uuid = os_conn.compute.find_flavor(n['flavor']).id
    # find out details of exp interface(s)
    nics = [{'net-id': chi.network.get_network_id( "exp_" + net['name']  + '_' + username ), 'v4-fixed-ip': node['addr']} for net in net_conf for node in net['nodes'] if node['name']==n['name']]
    # also include a public network interface
    nics.insert(0, {"net-id": public_net_id, "v4-fixed-ip":"192.168.10." + str(i)})
    server = chi.server.create_server(
        server_name=n['name'] + "_" + username,
        image_id=image_uuid,
        flavor_id=flavor_uuid,
        nics=nics
    )
    servers.append(server)
    server_ids.append(chi.server.get_server(n['name'] + "_" + username).id)

We wait for all servers to come up before we proceed -

In [None]:
for server_id in server_ids:
    chi.server.wait_for_active(server_id)

Next, we will set up SSH access to the VMs.

First, we will make sure the “public” network is connected to the Internet. Then, we will configure it to permit SSH access on port 22 for each port connected to this network.

In [None]:
# connect them to the Internet on the "public" network (e.g. for software installation)
router = chi.network.create_router('inet_router_' + username, gw_network_name='public')
chi.network.add_subnet_to_router(router.get("id"), public_subnet.get("id"))

In [None]:
# prepare SSH access on the server
# WARNING: this relies on undocumented behavior of associate_floating_ip 
# that it associates the IP with the first port on the server
server_ips = []
for i, n in enumerate(node_conf):
    ip = chi.server.associate_floating_ip(server_ids[i])
    server_ips.append(ip)

In [None]:
if not os_conn.get_security_group("Allow SSH"):
    os_conn.create_security_group("Allow SSH", "Enable SSH traffic on TCP port 22")
    os_conn.create_security_group_rule("Allow SSH", port_range_min=22, port_range_max=22, protocol='tcp', remote_ip_prefix='0.0.0.0/0')

security_group_id = os_conn.get_security_group("Allow SSH").id
for port in chi.network.list_ports(): 
    if port['port_security_enabled'] and port['network_id']==public_net.get("id"):
        os_conn.network.update_port(port['id'], security_groups=[security_group_id])

In [None]:
for ip in server_ips:
    chi.server.wait_for_tcp(ip, port=22)

The following cell may raise an error if some of your nodes are still getting set up! If that happens, wait a few minutes and try again. (And then a few minutes more, and try again, if it still raises an error.)

In [None]:
primary_remote = chi.ssh.Remote(server_ips[0])
physical_ips = [n['addr'] for n in net_conf[0]['nodes']]
server_remotes = [chi.ssh.Remote(physical_ip, gateway=primary_remote) for physical_ip in physical_ips]

Finally, we need to configure our resources, including software package installation and network configuration.

In [None]:
import time
for i, n in enumerate(node_conf):
    remote = server_remotes[i]
    # enable forwarding
    remote.run(f"sudo sysctl -w net.ipv4.ip_forward=1") 
    remote.run(f"sudo firewall-cmd --zone=trusted --add-source=192.168.0.0/16 --permanent")
    remote.run(f"sudo firewall-cmd --zone=trusted --add-source=172.16.0.0/12 --permanent")
    remote.run(f"sudo firewall-cmd --zone=trusted --add-source=10.0.0.0/8 --permanent")
    remote.run(f"sudo firewall-cmd --zone=trusted --add-source=127.0.0.0/8 --permanent")
    time.sleep(3)

In [None]:
for i, n in enumerate(node_conf):
    # install packages
    if len(n['packages']):
            remote = server_remotes[i]
            remote.run(f"sudo apt update; sudo apt -y install " + " ".join(n['packages'])) 

In [None]:
# prepare a "hosts" file that has names and addresses of every node
hosts_txt = [ "%s\t%s" % ( n['addr'], n['name'] ) for net in net_conf  for n in net['nodes'] if type(n) is dict and n['addr']]
for remote in server_remotes:
    for h in hosts_txt:
        remote.run("echo %s | sudo tee -a /etc/hosts > /dev/null" % h)

In [None]:
# we also need to enable incoming traffic on the HTTP port
if not os_conn.get_security_group("Allow HTTP 32000"):
    os_conn.create_security_group("Allow HTTP 32000", "Enable HTTP traffic on TCP port 32000")
    os_conn.create_security_group_rule("Allow HTTP 32000", port_range_min=32000, port_range_max=32000, protocol='tcp', remote_ip_prefix='0.0.0.0/0')

# add existing security group
security_group_id = os_conn.get_security_group("Allow HTTP 32000").id
for port in chi.network.list_ports(): 
    if port['port_security_enabled'] and port['network_id']==public_net.get("id"):
        pri_security_groups = port['security_groups']
        pri_security_groups.append(security_group_id)
        os_conn.network.update_port(port['id'], security_groups=pri_security_groups)

In [None]:
# we also need to enable incoming traffic on the HTTP port
if not os_conn.get_security_group("Allow HTTP 8088"):
    os_conn.create_security_group("Allow HTTP 8088", "Enable HTTP traffic on TCP port 8088")
    os_conn.create_security_group_rule("Allow HTTP 8088", port_range_min=8088, port_range_max=8088, protocol='tcp', remote_ip_prefix='0.0.0.0/0')

# add existing security group
security_group_id = os_conn.get_security_group("Allow HTTP 8088").id
for port in chi.network.list_ports(): 
    if port['port_security_enabled'] and port['network_id']==public_net.get("id"):
        pri_security_groups = port['security_groups']
        pri_security_groups.append(security_group_id)
        os_conn.network.update_port(port['id'], security_groups=pri_security_groups)

### Draw the network topology

The following cells will draw the network topology, for your reference.

In [None]:
!pip install networkx

In [None]:
nodes = [ (n['name'], {'color': 'pink'}) for n in net_conf ] + [(n['name'], {'color': 'lightblue'}) for n in node_conf ]
edges = [(net['name'], node['name'], 
          {'label': node['addr'] + '/' + net['subnet'].split("/")[1] }) if node['addr'] else (net['name'], node['name']) for net in net_conf for node in net['nodes'] ]

In [None]:
import networkx as nx
import matplotlib.pyplot as plt
plt.figure(figsize=(len(nodes),len(nodes)))
G = nx.Graph()
G.add_nodes_from(nodes)
G.add_edges_from(edges)
pos = nx.spring_layout(G)
nx.draw(G, pos, node_shape='s',  
        node_color=[n[1]['color'] for n in nodes], 
        node_size=[len(n[0])*400 for n in nodes],  
        with_labels=True);
nx.draw_networkx_edge_labels(G,pos,
                             edge_labels=nx.get_edge_attributes(G,'label'),
                             font_color='gray',  font_size=8, rotate=False);

## Set up Docker

In [None]:
remote = chi.ssh.Remote(server_ips[0])

In [None]:
remote.run("sudo apt-get update")
remote.run("sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common")

In [None]:
remote.run("curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg")
remote.run('echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null')

In [None]:
remote.run("sudo apt-get update")
remote.run("sudo apt-get install -y docker-ce docker-ce-cli containerd.io")

In [None]:
remote.run('sudo groupadd -f docker; sudo usermod -aG docker $USER')
remote.run("sudo chmod 666 /var/run/docker.sock")

In [None]:
# check configuration
remote.run("docker run hello-world")

In [None]:
remote.run("sudo apt-get install -y python3 python3-pip")
remote.run("python3 -m pip config set global.break-system-packages true")

In [None]:
# install docker compose 
# Download the docker compose plugin
remote.run("sudo curl -L https://github.com/docker/compose/releases/download/v2.24.5/docker-compose-linux-x86_64 -o /usr/local/bin/docker-compose")

# Make it executable
remote.run("sudo chmod +x /usr/local/bin/docker-compose")

# Verify the installation
remote.run("docker-compose --version")

## User Feedback Loop

We have talked about feedback loops. Let’s start by exploring a basic feedback mechanism where users directly provide feedback on food classifications.

In [None]:
# Clone the gourmetgram repository with user_corrected branch 
remote.run("git clone -b user_corrected https://github.com/ShaktidharK1997/gourmetgram.git")


!NOTE : Once Docker-compose is set up, please take the env file that has been shared with you and copy it into the gourmetgram repository. This is important for the application to work correctly.

In [None]:
# Use docker to setup the Flask application
remote.run("cd gourmetgram; docker-compose up -d --build")


### Explore User Feedback System

1.  Access the Web Interface:
    -   Open your browser and navigate to: http://localhost:8000
    -   You’ll see the Gourmet Gram interface for uploading and classifying food images
2.  Test the Classification System:
    -   Upload a food image using the interface
    -   The system will display its prediction
    -   You can either confirm the prediction is correct or select the correct class if it’s wrong
3.  Explore the MinIO Storage:
    -   Navigate to http://localhost:9001
    -   Login with the credentials username: minioadmin, password: minioadmin
    -   Explore the `production-images` bucket to see how images are organized by class
    -   Check the `tracking bucket` to view the JSON files that track user corrections
4.  Generate a Test Suite:
    -   You can generate a test suite based on user corrections by running:

    ``` bash
    curl -X GET "http://localhost:8000/generate_test_suite"
    ```

    -   This will create a timestamped directory in the test-suites bucket containing corrected images

In [None]:
# Command to shut down the application
remote.run("cd gourmetgram; docker-compose down -v")

### Disadvantages of User Feedback

However, this approach of taking the user’s feedback is not optimal for many reasons:

-   Users come to Gourmet Gram seeking food classification, not to provide expert labels. There’s a high probability they’ll misclassify images, and retraining with this feedback would significantly degrade model performance.

-   User feedback in production ML systems introduces subtle biases. For example, in our Gourmet Gram UI, food classes are listed in a fixed order while providing feedback. This creates an implicit bias where options at the top of the menu catch users’ attention first and are selected more frequently, regardless of correctness.

-   This phenomenon, called “degenerate feedback loops” is widespread in recommendation systems. On Netflix, already-popular shows receive more prominent placement, leading to more views, which then reinforces their “popularity” in the algorithm. These loops don’t necessarily reflect quality or relevance to individual users, but rather amplify existing popularity patterns.

## Human in the Loop Approach

For this reason, companies use a Human in the Loop approach to improve their model performance. In this case, predictions which either have a low confidence or user disagrees with are sent to data annotators. These annotators have significant experience in the domain of the model and therefore can provide reliable labels for the tasks at hand. These high-quality labels are then used to retrain the model, leading to better performance.

### Setting up Human in the Loop System

Lets try and setting up a Human in the Loop approach for Gourmetgram! For this, we are using `Label Studio` which is an open source data labeling platform.

https://labelstud.io/

In [None]:
# Lets fetch and checkout to feedback_loop_integration branch
remote.run("cd gourmetgram; git fetch origin feedback_loop_integration; git checkout feedback_loop_integration")

In [None]:
# lets bring the minio and label-studio containers first
remote.run("cd gourmetgram; docker-compose up -d minio label-studio")

# Wait 30 seconds for Label studio to get started ...
remote.run('sleep(30)')

In [None]:
# lets bring the flask application now
remote.run("cd gourmetgram; docker-compose up flask-app --build")

### Explore Human in the Loop System

1.  Access the Web Interface:
    -   Open your browser and navigate to: http://localhost:8000
    -   Upload food images for classification
2.  Provide Feedback:
    -   After seeing a prediction, you can give thumbs up (correct) or thumbs down (incorrect)
    -   Notice that unlike the first system, you don’t directly provide the correct class
    -   Images with low confidence or negative feedback are sent to Label Studio for expert review
3.  Explore Label Studio (Annotation Interface):
    -   Navigate to http://localhost:8080
    -   Login with username: gourmetgramuser@gmail.com, password: gourmetgrampassword
    -   Select the “Food Classification Review” project
    -   Click on “Tasks” to see images waiting for expert review
    -   For each task, select the correct food category and submit your annotation
4.  Create random sampling tasks `bash     curl -X POST "http://localhost:8000/sample_random_images" -H "Content-Type: application/json" -d '{"sample_count": 5}'`
5.  Process Expert Annotations:
    -   After annotating some images in Label Studio, process these annotations:

    ``` bash
    curl -X POST "http://localhost:8000/process_labels"
    ```

    This will move images to their correct class directories
6.  Create test suites:
    -   After processing the labels, create test cases based on task type (random sampling, low confidence or user feedback)

    ``` bash
    curl -X GET http://localhost:8000/generate_test_suite?task_type=user_feedback
    curl -X GET http://localhost:8000/generate_test_suite?task_type=low_confidence
    curl -X GET http://localhost:8000/generate_test_suite?task_type=random_sampling
    curl -X GET http://localhost:8000/generate_test_suite?task_type=all
    ```
7.  Explore MinIO contents:
    -   Navigate to http://localhost:9001
    -   Examine the different buckets to see how data is organized:
        -   `production-images`: Contains all classified images
        -   `tracking`: Contains tracking JSONs for different feedback sources
        -   `target-bucket`: Where Label Studio exports annotations
        -   `test-suites`: Contains organized test suites for evaluation

In [None]:
# Command to shut down the application
remote.run("cd gourmetgram; docker-compose down -v")