### Block storage using the Horizon GUI

First, let’s try creating a block storage volume from the OpenStack Horizon GUI. Open the GUI for KVM@TACC:

-   from the [Chameleon website](https://chameleoncloud.org/hardware/)
-   click “Experiment” \> “KVM@TACC”
-   log in if prompted to do so
-   check the project drop-down menu near the top left (which shows e.g. “CHI-XXXXXX”), and make sure the correct project is selected.

In the menu sidebar on the left side, click on “Volumes” \> “Volumes” and then, “Create Volume”. You will be prompted to set up your volume step by step using a graphical “wizard”.

-   Specify the name as <code>block-persist-<b>project15</b></code> 
-   Specify the size as >200 GiB based on number of workers.
-   Leave other settings at their defaults, and click “Create Volume”.

Next, it’s time to to attach the block storage volume to the compute instance we created earlier. From “Volumes” \> “Volumes”, next to *your* volume, click the ▼ in the menu on the right and choose “Manage Attachments”. In the “Attach to Instance” menu, choose your compute instance. Then, click “Attach Volume”.

Now, the “Volumes” overview page in the Horizon GUI should show something like for your volume:

    | Name                | Description | Size | Status | Group | Type     | Attached To                     | Availability Zone | Bootable | Encrypted |
    |---------------------|-------------|------|--------|-------|----------|---------------------------------|-------------------|----------|-----------|
    | block-persist-project15 | -           | 250GiB | In-use | -     | ceph-ssd | /dev/vdb on node-persist-netID  | nova              | No       | No        |

On the instance, let’s confirm that we can see the block storage volume. Run

``` bash
# run on node-project15-data
lsblk
```

and verify that `vdb` appears in the output.

The volume is essentially a raw disk. Before we can use it **for the first time** after creating it, we need to partition the disk, create a filesystem on the partition, and mount it. In subsequent uses, we will only need to mount it.

> **Note**: if the volume already had data on it, creating a filesystem on it would erase all its data! This procedure is *only* for the initial setup of a volume, before it has any data on it.

First, we create a partition with an `ext4` filesystem, occupying the entire volume:

``` bash
# run on node-project15-data
sudo parted -s /dev/vdb mklabel gpt
sudo parted -s /dev/vdb mkpart primary ext4 0% 100%
```

Verify that we now have the partition `vdb1` in the output of

``` bash
# run on node-project15-data
lsblk
```

Next, we format the partition:

``` bash
# run on node-project15-data
sudo mkfs.ext4 /dev/vdb1
```

Finally, we can create a directory in the local filesystem, mount the partition to that directory:

``` bash
# run on node-project15-data
sudo mkdir -p /mnt/block
sudo mount /dev/vdb1 /mnt/block
```

and change the owner of that directory to the `cc` user:

``` bash
# run on node-project15-data
sudo chown -R cc /mnt/block
sudo chgrp -R cc /mnt/block
```

Run

``` bash
# run on node-project15-data
df -h
```

and verify that the output includes a line with `/dev/vdb1` mounted on `/mnt/block`:



### Object storage using the Horizon GUI

Open the GUI for CHI@TACC:

-   from the [Chameleon website](https://chameleoncloud.org/hardware/)
-   click “Experiment” \> “CHI@TACC”
-   log in if prompted to do so
-   check the project drop-down menu near the top left (which shows e.g. “CHI-XXXXXX”), and make sure the correct project is selected.

In the menu sidebar on the left side, click on “Object Store” \> “Containers” and then, “Create Container”. You will be prompted to set up your container step by step using a graphical “wizard”.

-   Specify the name.
-   Leave other settings at their defaults, and click “Submit”.

### Use `rclone` and authenticate to object store from a compute instance

We will want to connect to this object store from the compute instance we configured earlier, and copy some data to it!

For *write* access to the object store from the compute instance, we will need to authenticate with valid OpenStack credentials. To support this, we will create an *application credential*, which consists of an ID and a secret that allows a script or application to authenticate to the service.

An application credential is a good way for something like a data pipeline to authenticate, since it can be used non-interactively, and can be revoked easily in case it is compromised without affecting the entire user account.

In the menu sidebar on the left side of the Horizon GUI, click “Identity” \> “Application Credentials”. Then, click “Create Application Credential”.

-   In the “Name”, field, use “AdFame-project-group15”.
-   Set the “Expiration” date to the end date of the current semester. (Note that this will be in UTC time, not your local time zone.) This ensures that if your credential is leaked (e.g. you accidentially push it to a public Github repository), the damage is mitigated.
-   Click “Create Application Credential”.
-   Copy the “ID” and “Secret” displayed in the dialog, and save them in a safe place. You will not be able to view the secret again from the Horizon GUI. Then, click “Download openrc file” to have another copy of the secret.

Now that we have an application credential, we can use it to allow an application to authenticate to the Chameleon object store service. There are several applications and utilities for working with OpenStack’s Swift object store service; we will use one called [`rclone`](https://github.com/rclone/rclone).

On the compute instance, install `rclone`:
  
``` bash
# run on node-project15-data
curl https://rclone.org/install.sh | sudo bash
```

We also need to modify the configuration file for FUSE (**F**ilesystem in **USE**rspace: the interface that allows user space applications to mount virtual filesystems), so that object store containers mounted by our user will be availabe to others, including Docker containers:

``` bash
# run on node-project15-data
# this line makes sure user_allow_other is un-commented in /etc/fuse.conf
sudo sed -i '/^#user_allow_other/s/^#//' /etc/fuse.conf
```

Next, create a configuration file for `rclone` with the ID and secret from the application credential you just generated:

``` bash
# run on node-project15-data
mkdir -p ~/.config/rclone
nano  ~/.config/rclone/rclone.conf
```

Paste the following into the config file, but substitute your own application credential ID and secret.

You will also need to substitute your own user ID. You can find it using “Identity” \> “Users” in the Horizon GUI; it is an alphanumeric string (*not* the human-readable user name).

    [chi_tacc]
    type = swift
    user_id = YOUR_USER_ID
    application_credential_id = APP_CRED_ID
    application_credential_secret = APP_CRED_SECRET
    auth = https://chi.tacc.chameleoncloud.org:5000/v3
    region = CHI@TACC

Use Ctrl+O and Enter to save the file, and Ctrl+X to exit `nano`.

To test it, run

``` bash
# run on node-project15-data
rclone lsd chi_tacc:
```

and verify that you see your container listed. This confirms that `rclone` can authenticate to the object store.

### Create a pipeline to load training data into the object store

Next, we will prepare a simple ETL pipeline to get the videos and prompts dataset into the object store. It will:

-   extract the data into a staging area (local filesystem on the instance) - volume mounted on VM instance earlier
-   transform the data
-   and then load the data into the object store

``` bash
# run on node-project15-data

docker compose -f ~/AdFame/docker/docker-compose-training-data.yaml run extract-fashion-videos
# run on node-project15-data
docker compose -f ~/AdFame/docker/docker-compose-training-data.yaml run split-fashion-data

# run on node-project15-data

export RCLONE_CONTAINER=AdFame-project-group15

docker compose -f ~/AdFame/docker/docker-compose-training-data.yaml run load-data
```

Now our training data is loaded into the object store and ready to use for training!

Lets make the object store read-only to ensure data isn't deleted.

In [1]:
# run in Chameleon Jupyter environment
from chi import server, context
import chi, os, time, datetime
context.choose_project()
context.choose_site(default="CHI@TACC")

VBox(children=(Dropdown(description='Select Project', options=('CHI-251409',), value='CHI-251409'), Output()))

VBox(children=(Dropdown(description='Select Site', options=('CHI@TACC', 'CHI@UC', 'CHI@EVL', 'CHI@NCAR', 'CHI@…

In [4]:
# run in Chameleon Jupyter environment
os_conn = chi.clients.connection()
token = os_conn.authorize()
storage_url = os_conn.object_store.get_endpoint()

import swiftclient
swift_conn = swiftclient.Connection(preauthurl=storage_url,
                                    preauthtoken=token,
                                    retries=5)

In [5]:
# run in Chameleon Jupyter environment
container_name = "AdFame-project-group15"
headers = {
    'X-Container-Read': '.r:*,.rlistings',
    'X-Container-Write': ''
}
swift_conn.post_container(container_name, headers=headers)

In [6]:
headers = swift_conn.head_container(container_name)
print("X-Container-Read:", headers.get('x-container-read'))
print("X-Container-Write:", headers.get('x-container-write'))

X-Container-Read: .r:*,.rlistings
X-Container-Write: None


### Delete block volume used as staging area


In [None]:
# run in Chameleon Jupyter environment
from chi import server, context
import chi, os, time, datetime

context.version = "1.0" 
context.choose_project()
context.choose_site(default="KVM@TACC")

In [None]:
# run in Chameleon Jupyter environment
username = os.getenv('USER') # all exp resources will have this prefix
s = server.get_server(f"block-persist-project15")
s.delete()

In [None]:
# run in Chameleon Jupyter environment
cinder_client = chi.clients.cinder()
volume = [v for v in cinder_client.volumes.list() if v.name=='block-persist-project15'][0] # Substitute your own net ID
cinder_client.volumes.delete(volume = volume)