In [None]:
from chi import server, context
import chi, os, time, datetime

context.version = "1.0" 
context.choose_project()
context.choose_site(default="KVM@TACC")

In [None]:
s = server.Server(
    f"node-persist-project-5", 
    image_name="CC-Ubuntu24.04",
    flavor_name="m1.large"
)
s.submit(idempotent=True)

Then, we’ll associate a floating IP with the instance:

In [None]:
s.associate_floating_ip()

In the output below, make a note of the floating IP that has been assigned to your instance (in the “Addresses” row).

In [None]:
s.refresh()
s.show(type="widget")

By default, all connections to VM resources are blocked, as a security measure. We need to attach one or more “security groups” to our VM resource, to permit access over the Internet to specified ports.

The following security groups will be created (if they do not already exist in our project) and then added to our server:

In [None]:
security_groups = [
  {'name': "allow-ssh", 'port': 22, 'description': "Enable SSH traffic on TCP port 22"},
  {'name': "allow-8888", 'port': 8888, 'description': "Enable TCP port 8888 (used by Jupyter)"},
  {'name': "allow-8000", 'port': 8000, 'description': "Enable TCP port 8000 (used by MLFlow)"},
  {'name': "allow-9000", 'port': 9000, 'description': "Enable TCP port 9000 (used by MinIO API)"},
  {'name': "allow-9001", 'port': 9001, 'description': "Enable TCP port 9001 (used by MinIO Web UI)"}
]

In [None]:
# configure openstacksdk for actions unsupported by python-chi
os_conn = chi.clients.connection()
nova_server = chi.nova().servers.get(s.id)

for sg in security_groups:

  if not os_conn.get_security_group(sg['name']):
      os_conn.create_security_group(sg['name'], sg['description'])
      os_conn.create_security_group_rule(sg['name'], port_range_min=sg['port'], port_range_max=sg['port'], protocol='tcp', remote_ip_prefix='0.0.0.0/0')

  nova_server.add_security_group(sg['name'])

print(f"updated security groups: {[group.name for group in nova_server.list_security_group()]}")

In [None]:
s.refresh()
s.check_connectivity()

### Retrieve code and notebooks on the instance

Now, we can use `python-chi` to execute commands on the instance, to set it up. We’ll start by retrieving the code and other materials on the instance.

In [None]:
s.execute("git clone https://github.com/teaching-on-testbeds/data-persist-chi")

### Set up Docker

Here, we will set up the container framework.

In [None]:
s.execute("curl -sSL https://get.docker.com/ | sudo sh")
s.execute("sudo groupadd -f docker; sudo usermod -aG docker $USER")

## Open an SSH session

Finally, open an SSH sesson on your server. From your local terminal, run

    ssh -i ~/.ssh/id_rsa_chameleon cc@A.B.C.D

where

-   in place of `~/.ssh/id_rsa_chameleon`, substitute the path to your own key that you had uploaded to KVM@TACC
-   in place of `A.B.C.D`, use the floating IP address you just associated to your instance.

### Use `rclone` and authenticate to object store from a compute instance

On the compute instance, install `rclone`:

``` bash
# run on node-persist
curl https://rclone.org/install.sh | sudo bash
```

``` bash
# run on node-persist
# this line makes sure user_allow_other is un-commented in /etc/fuse.conf
sudo sed -i '/^#user_allow_other/s/^#//' /etc/fuse.conf
```


``` bash
# run on node-persist
mkdir -p ~/.config/rclone
nano  ~/.config/rclone/rclone.conf
```

Paste the following into the config file, but substitute your own application credential ID and secret.

You will also need to substitute your own user ID. You can find it using “Identity” \> “Users” in the Horizon GUI; it is an alphanumeric string (*not* the human-readable user name).

    [chi_tacc]
    type = swift
    user_id = 
    application_credential_id = 
    application_credential_secret = 
    auth = https://chi.tacc.chameleoncloud.org:5000/v3
    region = CHI@TACC

Use Ctrl+O and Enter to save the file, and Ctrl+X to exit `nano`.

To test it, run

``` bash
# run on node-persist
rclone lsd chi_tacc:
```

and verify that you see your container listed. This confirms that `rclone` can authenticate to the object store.

### Create a pipeline to load training data into the object store

``` bash
# run on node-persist
mkdir -p ~/data-persist-chi/ammeba
cd ~/data-persist-chi/ammeba
curl -L https://raw.githubusercontent.com/Mypainismorethanyours/A-Machine-Learning-System-for-Detecting-Misinformation-in-Media-Images/main/ammeba-etl-full.yaml -o ammeba-etl-full.yaml
```

``` bash
# run on node-persist
docker compose -f ammeba-etl-full.yaml run extract-data
```

``` bash
# run on node-persist
docker compose -f ammeba-etl-full.yaml run transform-data
```

``` bash
# run on node-persist
docker compose -f ammeba-etl-full.yaml run load-data
```

``` bash
# run on node-persist
rclone tree chi_tacc:object-persist-project-5 --max-depth 2
```

It will show the dir of the dataset saved on object-storage