# Joshua Ludolf, Yesmin Hernandez-Reyna, Matthew Trevino 
## CSCI 4406 - Computer Networks 
### MileStone 1
## **This project aims to explore and benchmark various machine learning models to detect disks at high risk of experiencing fail-slow anomalies.**

---

### First Part: Preparing Our Chameleon Server

This step includes:

1. **Create Lease**  
   Reserve resources on the Chameleon cloud.

2. **Launch the Server**  
   Start the server instance using the reserved resources.

3. **Associate Floating IP**  
   Assign a public IP address to the server to enable external access.

4. **Connect to the Instance**  
   Use SSH to access the server.

---

## Configuration

In [6]:
%pip install python-chi
%pip install tensorflow

import chi

chi.use_site("CHI@UC")

chi.set("project_name", "CHI-210889")

print(f'Using Project {chi.get("project_name")}')

Note: you may need to restart the kernel to use updated packages.Defaulting to user installation because normal site-packages is not writeable
Collecting python-chi
  Using cached python_chi-1.0.1-py3-none-any.whl.metadata (1.6 kB)
Collecting fabric (from python-chi)
  Using cached fabric-3.2.2-py3-none-any.whl.metadata (3.5 kB)
Collecting keystoneauth1 (from python-chi)
  Using cached keystoneauth1-5.8.0-py3-none-any.whl.metadata (4.1 kB)
Collecting openstacksdk (from python-chi)
  Using cached openstacksdk-4.1.0-py3-none-any.whl.metadata (12 kB)
Collecting paramiko (from python-chi)
  Using cached paramiko-3.5.0-py3-none-any.whl.metadata (4.4 kB)
Collecting python-cinderclient (from python-chi)
  Using cached python_cinderclient-9.6.0-py3-none-any.whl.metadata (19 kB)
Collecting python-glanceclient (from python-chi)
  Using cached python_glanceclient-4.7.0-py3-none-any.whl.metadata (3.9 kB)
Collecting python-ironicclient (from python-chi)
  Using cached python_ironicclient-5.8.0-py3-

  error: subprocess-exited-with-error
  
  × Building wheel for netifaces (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [5 lines of output]
      running bdist_wheel
      running build
      running build_ext
      building 'netifaces' extension
      error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for netifaces
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (netifaces)


Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement tensorflow (from versions: none)
ERROR: No matching distribution found for tensorflow


ModuleNotFoundError: No module named 'chi'

## Create Lease

In [2]:
import os
import keystoneauth1, blazarclient
from chi import lease

reservations = []
lease_node_type = "compute_cascadelake_r"

try:
    print("Creating lease...")
    lease.add_fip_reservation(reservations, count=1)
    lease.add_node_reservation(reservations, node_type=lease_node_type, count=1)

    start_date, end_date = lease.lease_duration(hours=3)

    l = lease.create_lease(
        f"{os.getenv('USER')}-benchmark", 
        reservations, 
        start_date=start_date, 
        end_date=end_date
    )
    lease_id = l["id"]

    print("Waiting for lease to start ...")
    lease.wait_for_active(lease_id)
    print("Lease is now active!")
except keystoneauth1.exceptions.http.Unauthorized as e:
    print("Unauthorized.\nDid set your project name and and run the code in the first cell?")
except blazarclient.exception.BlazarClientException as e:
    print(f"There is an issue making the reservation. Check the calendar to make sure a {lease_node_type} node is available.")
    print("https://chi.uc.chameleoncloud.org/project/leases/calendar/host/")
    print(e)
except Exception as e:
    print("An unexpected error happened.")
    print(e)

Creating lease...


error: not enough resources available with query {'resource_type': 'physical:host', 'resource_properties': '["==", "$node_type", "compute_cascadelake_r"]', 'hypervisor_properties': '', 'min': 1, 'max': 1, 'start_date': datetime.datetime(2024, 10, 23, 17, 47), 'end_date': datetime.datetime(2024, 10, 24, 20, 46), 'project_id': 'd14a518d45cf494eb37b2de09a791a23', 'count_range': '1-1', 'before_end': 'default', 'on_start': 'default'}


An unexpected error happened.
'NoneType' object is not subscriptable


## Provision Node

In [3]:
from chi import server
import os
import threading

image = "CC-Ubuntu20.04"

# Function to wait for server to be active
def wait_for_server_active(server_id):
    print("Waiting for server to start ...")
    server.wait_for_active(server_id)
    print("Done")

# Create server
s = server.create_server(
    f"{os.getenv('USER')}-benchmark",
    image_name=image,
    reservation_id=lease.get_node_reservation(lease_id)
)

# Start a thread to wait for the server to be active
wait_thread = threading.Thread(target=wait_for_server_active, args=(s.id,))
wait_thread.start()

# Continue with other initializations if needed

# Wait for the server to be active before proceeding
wait_thread.join()


NameError: name 'lease_id' is not defined

## Associate Floating-IP

In [None]:
import time
import socket

floating_ip = lease.get_reserved_floating_ips(lease_id)[0]
server.associate_floating_ip(s.id, floating_ip_address=floating_ip)
print(f"Waiting for SSH connectivity on {floating_ip} ...")

# Timeout in seconds
timeout = 60 * 2
interval = 10  # Interval to wait between retries
start_time = time.perf_counter()

while True:
    try:
        with socket.create_connection((floating_ip, 22), timeout=10):  # Shorter connection timeout for each try
            print("Connection successful")
            break
    except (OSError, socket.timeout) as ex:
        if time.perf_counter() - start_time >= timeout:
            print(f"After {timeout} seconds, could not connect via SSH. Please try again.")
            break
        print("Retrying connection...")
        time.sleep(interval)



## Configure Instance

In [None]:
from chi import ssh

with ssh.Remote(floating_ip) as conn:
    # test
    conn.run("ls")

---

### Second Part: Preparing the Environment and Data for the Experiments

This step includes:

1. **Download Data from the Repository**  
   Retrieve the data from my repository, noting there are two clusters from 25-cluster Perseus.

2. **Upload All Necessary Files to the Server**  
   Transfer the experiment scripts, datasets, and any other required files to the server.

3. **Uncompress the Necessary Datasets**  
   Extract the datasets to the appropriate directories for use in the experiments.

4. **Install the Dependencies**  
   Install all required libraries and tools, typically using package managers like `pip` for Python libraries.

---
### Note

Due to the memory limitations of Trovi, they had only provided data for two clusters from the Perseus dataset. If you are interested in the performance of all clusters, please refer to the provided repository. Their repository includes comprehensive test results and heatmaps.

- The **scripts** directory contains all the source code for the algorithms.
- The **index** directory contains index files that map each script to its respective cluster data.
- The **requirements.txt** file lists all the dependencies needed for the project.

---

## Preparing the Experiment

In [None]:
!git clone https://github.com/songxikang/data.git

In [None]:
with ssh.Remote(floating_ip) as conn:
    # Create data, index, and scripts directories
    conn.run("mkdir -p data")
    conn.run("mkdir -p index")
    conn.run("mkdir -p scripts")
    
    # Upload Perseus to the data directory
    conn.put("data/cluster_A.tar.gz", "data/cluster_A.tar.gz")
    conn.put("data/cluster_B.tar.gz", "data/cluster_B.tar.gz")
    conn.put("index/slow_drive_info.csv", "data/slow_drive_info.csv")
    
    # Suppress the output of the following commands
    conn.run("tar -xvzf data/cluster_A.tar.gz -C data > /dev/null 2>&1 && rm data/cluster_A.tar.gz > /dev/null 2>&1")
    conn.run("tar -xvzf data/cluster_B.tar.gz -C data > /dev/null 2>&1 && rm data/cluster_B.tar.gz > /dev/null 2>&1")
    
    # Upload our FSA
    conn.put("scripts/csr.py", "scripts/csr.py")
    conn.put("scripts/multi_pred.py", "scripts/multi_pred.py")
    conn.put("scripts/lstm.py", "scripts/lstm.py")
    conn.put("scripts/patchTST.py", "scripts/patchTST.py")
    conn.put("scripts/xgboost.py", "scripts/xgboost.py")
    
    # Upload index files
    conn.put("index/A_index.csv", "index/A_index.csv")
    conn.put("index/B_index.csv", "index/B_index.csv")
    conn.put("index/all_drive_info.csv", "index/all_drive_info.csv")
    
    # Install dependancies
    conn.put("requirements.txt")
    conn.sudo("apt-get install -y python3-pip")
    conn.run("pip install -r requirements.txt")


---

### Third Part: Running the Experiments

This step includes:

1. **Upload the `run_experiments.sh` Script**
   - Transfer the `run_experiments.sh` script to the server.

2. **Run the `run_experiments.sh` Script**
   - Execute the script to run all the FSA (Fail-Slow Anomaly Detection) algorithms.
   - The script will generate the prediction results.

3. **Compress the Output**
   - The script will compress the output directory into `output.tar.gz`.

4. **Download the Results**
   - Download the `output.tar.gz` file to obtain the prediction results to our local directory.

---

In [None]:
with ssh.Remote(floating_ip) as conn:
    # Upload the script
    conn.put("run_experiments.sh")
    # Run the script 
    conn.run("bash run_experiments.sh")

In [None]:
import tarfile

with ssh.Remote(floating_ip) as conn:
    # Download the output
    conn.get("output.tar.gz")
with tarfile.open("output.tar.gz") as tar:
    # Extract the results to our notebook
    tar.extractall()
print("done")

#### Fourth Part: Parsing the Results

This step includes:

1. **Parse the Results in the Output Directory**
   - Uncompress the `output.tar.gz` file to access the results. ( done in previous step)

2. **Analyze the Results**
   - Open the `result_parser.ipynb` notebook to see all the analysis.
   - The notebook contains detailed analysis and visualizations of the prediction results.
