## Launch and set up a bare metal server - with python-chi

At the beginning of the lease time, we will bring up our server. We will use the `python-chi` Python API to Chameleon to provision our server.


Run the following cells, and make sure the correct site and project are selected:

###  Prerequisites
- This notebook assumes that You must be logged into Chameleon JupyterHub and you run the following cells there.
- You have already reserved a lease on Chameleon Cloud 

In [None]:
from chi import server, context, lease
import os

context.version = "1.0" 
context.choose_project()
context.choose_site(default="CHI@TACC")

## Select Active Lease


In the previous steps (when setting up your project template)
we hightlights that you to name your lease starting with your project name (e.g., fancyproject-gpu-test).
This naming, because the code below searches for your active lease by:

- listing all leases starting with fancyproject and "ACTIVE"

after looking into into leases provide the lease name you want to select in the parameter `lease_name`

In [None]:
project_name = "fancyproject"
lease_name = None # replace with the lease name you want to select
def get_active_lease(lease_name=None, project_name=None):

    leases = lease.list_leases()

    # retrieve exact lease name provided
    if lease_name:
        match = next((l for l in leases if l.name == lease_name), None)
        if not match:
            raise ValueError(f"no lease found with name '{lease_name}'")
        return match

    matching = [
        l for l in leases
        if l.name.startswith(project_name) and l.status.upper() == "ACTIVE"
    ]

    if not matching:
        raise ValueError(f"no active lease found starting with '{project_name}'")

    if len(matching) > 1:
        print("you have multiple active leases:")
        for l in matching:
            print(f" - {l.name} (status: {l.status})")
        raise ValueError("set 'lease_name' to pick the correct lease.")

    return matching[0]

In [None]:
l = get_active_lease(lease_name, project_name)
print(f"using lease: {l.name}")
l.show()

The status should show as “ACTIVE” now that we are past the lease start time.

The rest of this notebook can be executed without any interactions from you, so at this point, you can save time by clicking on this cell, then selecting “Run” \> “Run Selected Cell and All Below” from the Jupyter menu.

As the notebook executes, monitor its progress to make sure it does not get stuck on any execution error, and also to see what it is doing!

We will use the lease to bring up a server with the `CC-Ubuntu24.04` disk image.

> **Note**: the following cell brings up a server only if you don’t already have one with the same name! (Regardless of its error state.) If you have a server in ERROR state already, delete it first in the Horizon GUI before you run this cell.

In [None]:
username = os.getenv('USER') # all exp resources will have this prefix
s = server.Server(
    f"node-fancyproject-{username}", 
    reservation_id=l.node_reservations[0]["id"],
    image_name="CC-Ubuntu24.04"
)
s.submit(idempotent=True)

Note: security groups are not used at Chameleon bare metal sites, so we do not have to configure any security groups on this instance.

Then, we’ll associate a floating IP with the instance, so that we can access it over SSH.

In [None]:
s.associate_floating_ip()

In [None]:
s.refresh()
s.check_connectivity()

In the output below, make a note of the floating IP that has been assigned to your instance (in the “Addresses” row).

In [None]:
s.refresh()
s.show(type="widget")

## Retrieve code and notebooks on the instance

Now, we can use `python-chi` to execute commands on the instance, to set it up. We’ll start by retrieving the code and other materials on the instance.

In [None]:
repo = "https://github.com/Pantherxe/MLflow_amd.git" 
s.execute(f"git clone {repo}  fancyproject ")

## Set up Docker

To use common deep learning frameworks like Tensorflow or PyTorch, and ML training platforms like MLFlow and Ray, we can run containers that have all the prerequisite libraries necessary for these frameworks. Here, we will set up the container framework.

In [None]:
s.execute("curl -sSL https://get.docker.com/ | sudo sh")
s.execute("sudo groupadd -f docker; sudo usermod -aG docker $USER")

## Mounting S3_buckets to filesystem

We also need to modify the configuration file for FUSE (Filesystem in USErspace), which is the Linux interface that allows user‑space applications (instead of the kernel) to mount and manage virtual filesystems. 

In [None]:
# this line makes sure user_allow_other is un-commented in /etc/fuse.conf
s.execute("sudo sed -i '/^#user_allow_other/s/^#//' /etc/fuse.conf") 

Enabling the `user_allow_other` option ensures that filesystems mounted by our user (such as an object store mounted with rclone) are accessible to other users and processes, including Docker containers running Jupyter notebooks

In [None]:
# Mounting the buckets using rclone 

project_name= 'fancyproject'
buckets = {
    f'{project_name}-data': 'data',
    f'{project_name}-mlflow-metrics': 'metrics'
}

for bucket_name, mount_dir in buckets.items():
    
    s.execute(f"sudo mkdir -p /mnt/{mount_dir}")
    s.execute(f"sudo chown -R cc /mnt/{mount_dir}")
    s.execute(f"sudo chgrp -R cc /mnt/{mount_dir}")
    s.execute(f"rclone mount rclone_s3:{bucket_name} /mnt/{mount_dir} --allow-other --daemon")

In [None]:
s.execute("ls -l /mnt/") # we should be able to see the mounted buckets

Leave that cell running, and in the meantime, open an SSH sesson on your server. From your local terminal, run

    ssh -i ~/.ssh/id_rsa_chameleon cc@A.B.C.D

where

-   in place of `~/.ssh/id_rsa_chameleon`, substitute the path to your own key that you had uploaded to CHI@TACC
-   in place of `A.B.C.D`, use the floating IP address you just associated to your instance.