# Part of experiment for BrewER: Entity Resolution On-Demand

# 1. Environment Configuration

To reproduce the **BrewER** experiment, we first need to reserve a dedicated physical node on the Chameleon testbed.

We will use the `compute_cascadelake_r` node type (Intel Xeon Gold 6240R), which provides sufficient compute power for the entity resolution tasks described in the paper.

**Note:** Ensure you have uploaded your keypair file (e.g., `my_key.pem`) to the Jupyter environment before running this cell.

In [1]:
import os
import chi

# === CONFIGURATION ===
# Hardware: Intel Xeon Gold 6240R (Cascade Lake Refresh)
NODE_TYPE = "compute_cascadelake_r" 
LEASE_NAME = "brewer_reproduction_lease"
SERVER_NAME = "brewer_experiment_node"

# Update these to match your specific Chameleon keypair
KEY_NAME = "my_key"
KEY_FILE = "my_key.pem"

# Network config (sharednet1 provides internet access)
NETWORK_NAME = "sharednet1"
IMAGE_NAME = "CC-Ubuntu20.04"

# Set the Chameleon site (CHI@UC is standard for this hardware)
chi.use_site("CHI@UC")

print("=== Configuration Check ===")

Now using CHI@UC:
URL: https://chi.uc.chameleoncloud.org
Location: Argonne National Laboratory, Lemont, Illinois, USA
Support contact: help@chameleoncloud.org
=== Configuration Check ===


# 2. Create Lease (Resource Reservation)

We will now request a **Lease** from Blazar. This reserves:
1.  **1 Physical Host:** Dedicated bare metal hardware.
2.  **1 Floating IP:** To allow public SSH access to the node.

The lease is set for 1 day to ensure sufficient time to run the `survey_sample` and `sqlite` experiments.

In [2]:
import time
from datetime import datetime, timedelta
from dateutil import tz
import chi.lease

print("=== Initiating Lease Request ===")

# 1. Define Lease Duration (1 Day)
BLAZAR_TIME_FORMAT = "%Y-%m-%d %H:%M"
# Start 1 minute in the future to allow for processing time
start_date = (datetime.now(tz=tz.tzutc()) + timedelta(minutes=1)).strftime(BLAZAR_TIME_FORMAT)
end_date = (datetime.now(tz=tz.tzutc()) + timedelta(days=1)).strftime(BLAZAR_TIME_FORMAT)

print(f"Start Time: {start_date} (UTC)")
print(f"End Time:   {end_date} (UTC)")

# 2. Build Reservation List
reservation_list = []
chi.lease.add_node_reservation(reservation_list, count=1, node_type=NODE_TYPE)
chi.lease.add_fip_reservation(reservation_list, count=1)

# 3. Submit Lease Request
try:
    lease = chi.lease.create_lease(
        LEASE_NAME,
        start_date=start_date,
        end_date=end_date,
        reservations=reservation_list
    )
    lease_id = lease['id']
    print(f"Lease submitted successfully. ID: {lease_id}")
except Exception as e:
    print(f"Failed to submit lease: {e}")
    raise

# 4. Wait for Lease to become ACTIVE
def wait_for_lease_active(lease_id, timeout=600):
    """
    Polls the lease status until it is ACTIVE or times out.
    """
    print(f"\nWaiting for Lease {lease_id} to start...")
    start_time = time.time()
    
    while time.time() - start_time < timeout:
        try:
            # Get current lease details
            current_lease = chi.lease.get_lease(lease_id)
            
            # Handle response type (Dict vs Object)
            if isinstance(current_lease, dict):
                status = current_lease.get('status')
            else:
                status = getattr(current_lease, 'status', None)
            
            if status == 'ACTIVE':
                print(f"Lease is now ACTIVE! (Time elapsed: {int(time.time() - start_time)}s)")
                return True
            elif status == 'ERROR':
                raise Exception("Lease failed to create (Status: ERROR)")
            else:
                pass 
                
        except Exception as e:
            print(f"Warning during polling: {e}")
        
        time.sleep(10)
        
    raise TimeoutError(f"Timed out waiting for Lease {lease_id}")

# Run the wait function
wait_for_lease_active(lease_id)

The python binding code in neutronclient is deprecated in favor of OpenstackSDK, please use that as this will be removed in a future release.


=== Initiating Lease Request ===
Start Time: 2026-02-08 03:41 (UTC)
End Time:   2026-02-09 03:40 (UTC)
Lease submitted successfully. ID: 3dd65098-78b1-49d7-993f-9bffd5096339

Waiting for Lease 3dd65098-78b1-49d7-993f-9bffd5096339 to start...
Lease is now ACTIVE! (Time elapsed: 61s)


True

# 3. Retrieve Reservation IDs

A lease consists of multiple reservations (one for the compute node, one for the IP). We need to extract the specific **Reservation IDs** to tell Chameleon exactly where to launch our instance in the next step.

In [3]:
# Extract the specific reservation IDs from the lease object
compute_reservation_id = [
    r['id'] for r in lease['reservations'] 
    if r['resource_type'] == 'physical:host'
][0]

floatingip_reservation_id = [
    r['id'] for r in lease['reservations'] 
    if r['resource_type'] == 'virtual:floatingip'
][0]

print("=== Reservation Details ===")
print(f"Compute Reservation ID:     {compute_reservation_id}")
print(f"Floating IP Reservation ID: {floatingip_reservation_id}")
print("\nReady to launch server.")

=== Reservation Details ===
Compute Reservation ID:     b9c39b55-a497-43e2-9d71-bcdddaaab2d6
Floating IP Reservation ID: a580a52c-b339-40bf-9d4c-066ebb564eb1

Ready to launch server.


# 4. Launch Bare Metal Server

Now we provision the physical server using the `compute_reservation_id` we secured earlier. 

We are using the `CC-Ubuntu20.04` image, which provides a stable environment for the Python and database dependencies required by BrewER.

In [4]:
import chi.server

print("=== Launching Bare Metal Server ===")
print(f"Target Reservation: {compute_reservation_id}")

# Create the server bound to our specific reservation
server = chi.server.create_server(
    SERVER_NAME,
    reservation_id=compute_reservation_id,
    network_name=NETWORK_NAME,
    image_name=IMAGE_NAME,
    key_name=KEY_NAME,
    count=1
)

server_id = server.id
print(f"Server provisioning request sent.")
print(f"Server ID: {server_id}")

The python binding code in neutronclient is deprecated in favor of OpenstackSDK, please use that as this will be removed in a future release.


=== Launching Bare Metal Server ===
Target Reservation: b9c39b55-a497-43e2-9d71-bcdddaaab2d6
Server provisioning request sent.
Server ID: 8a131159-ffa1-44f5-8d66-6a846513116b


# 5. Wait for Active Status & Assign IP

We must wait for the physical hardware to finish provisioning. Once the server status is `ACTIVE`, we associate the **Floating IP**. 

This IP address is crucial as it acts as our bridge between this Jupyter notebook and the isolated bare metal node.

In [5]:
# 1. Wait for the server to provision
print(f"Waiting for Server {server_id} to become ACTIVE...")
chi.server.wait_for_active(server_id)
print("Server hardware is active.")

# 2. Attach Floating IP
print(f"Associating Floating IP...")

# WORKAROUND: We call associate_floating_ip with ONLY the server_id.
# This automatically finds the free IP reserved by your lease and attaches it.
# We do not pass the reservation ID to avoid the library bug.
floating_ip = chi.server.associate_floating_ip(server_id)

print(f"Instance is reachable at: {floating_ip}")

Waiting for Server 8a131159-ffa1-44f5-8d66-6a846513116b to become ACTIVE...


The python binding code in neutronclient is deprecated in favor of OpenstackSDK, please use that as this will be removed in a future release.


Server hardware is active.
Associating Floating IP...
Instance is reachable at: 192.5.86.206


# 6. Secure Private Key

SSH clients require private keys to have strict file permissions. We set the key file to `600` (read/write only by owner) to prevent connection errors.

In [6]:
# Secure the key file permissions
!chmod 600 {KEY_FILE}

print(f"Permissions secured for {KEY_FILE}")

Permissions secured for my_key.pem


# 7. Define Remote Execution Helper

To reproduce the BrewER experiments, we need to run commands on the remote server but view the output here in Jupyter.

We define `run_remote`, a helper function that streams standard output (stdout) in real-time. This allows us to monitor long-running processes (like the `sqlite` database build) without staring at a frozen screen.

In [7]:
import subprocess

def run_remote(command, description=None):
    """
    Runs a shell command on the remote instance via SSH.
    Streams output in real-time so we can watch experiment progress.
    """
    if description:
        print(f"\nExample Step: {description}...")
        print("-" * 40)
    
    ssh_cmd = [
        "ssh", "-i", KEY_FILE,
        "-o", "StrictHostKeyChecking=no",
        "-o", "UserKnownHostsFile=/dev/null",
        f"cc@{floating_ip}",
        command
    ]
    
    # Use Popen to stream output line-by-line
    process = subprocess.Popen(
        ssh_cmd, 
        stdout=subprocess.PIPE, 
        stderr=subprocess.STDOUT, 
        text=True, 
        bufsize=1
    )
    
    # Print output as it arrives
    for line in iter(process.stdout.readline, ''):
        print(line, end='') 
    
    process.stdout.close()
    return_code = process.wait()
    
    if return_code != 0:
        print(f"\nError in step: {description}")
    else:
        print(f"\nSuccess: {description}")

print("SSH remote execution function defined.")

SSH remote execution function defined.


# 8. Wait for SSH Readiness

Although the server is "Active" (hardware on), the Operating System (Ubuntu) takes time to boot and start the SSH daemon. 

We loop and retry the connection until the server is ready to accept commands.

In [8]:
import time 

# === Clear Old Host Keys ===
# This prevents the "Remote Host Identification Has Changed" error
# by removing the IP from your known_hosts file before connecting.
print(f"Clearing any stale keys for {floating_ip}...")
subprocess.run(
    ["ssh-keygen", "-R", floating_ip], 
    stdout=subprocess.DEVNULL, 
    stderr=subprocess.DEVNULL
)

def wait_for_ssh(ip, key_file, timeout=600):
    """
    Loops and retries SSH connection until the server is genuinely ready.
    """
    print(f"Attempting SSH connection to {ip}...")
    start_time = time.time()
    
    while time.time() - start_time < timeout:
        try:
            # Attempt a simple echo command with a short timeout
            result = subprocess.run(
                [
                    "ssh", "-i", key_file,
                    "-o", "StrictHostKeyChecking=no",
                    "-o", "UserKnownHostsFile=/dev/null",
                    "-o", "ConnectTimeout=5", 
                    f"cc@{ip}",
                    "echo 'ready'"
                ],
                capture_output=True,
                text=True
            )
            
            # Check for success
            if result.returncode == 0 and "ready" in result.stdout:
                print(f"\n SSH is UP! Connected to {ip}")
                return True
            
        except Exception:
            pass # Ignore transient errors during boot
        
        # Wait before retrying
        time.sleep(10)
        print(".", end="", flush=True)

    raise TimeoutError(f"Timed out. Server {ip} is not reachable via SSH.")

# === EXECUTE ===
wait_for_ssh(floating_ip, KEY_FILE)

Clearing any stale keys for 192.5.86.206...
Attempting SSH connection to 192.5.86.206...
................
 SSH is UP! Connected to 192.5.86.206


True

# 9. Install Dependencies & Clone Repository

We need to prepare the Ubuntu environment. This involves:
1.  **System Tools:** Installing `git-lfs` (needed for the large datasets) and `pip`.
2.  **Cloning Pollock:** We pull the specific `survey_sample` dataset using LFS to save bandwidth.
3.  **Python Libraries:** Installing `requirements.txt` and applying a specific fix for the `regex` library to resolve version conflicts with `dateparser`.

In [9]:
print("=== Setting Up Environment ===")

# 1. System Dependencies
run_remote(
    "sudo apt-get update && sudo apt-get install -y git git-lfs python3-pip",
    description="Installing system tools (git-lfs, pip)"
)

# 2. Cleanup & Clone
run_remote("rm -rf Pollock", description="Removing old repository versions")
run_remote(
    "git clone https://github.com/HPI-Information-Systems/Pollock.git",
    description="Cloning Pollock Repository"
)

# 3. Pull Data (LFS)
run_remote("cd Pollock && git lfs install")
run_remote(
    "cd Pollock && git lfs pull -I 'survey_sample/'",
    description="Downloading 'survey_sample' dataset via LFS"
)

# 4. Python Dependencies
# Fix for regex version conflict (required for dateparser < 2022.3.15)
run_remote(
    "cd Pollock && pip3 install -r requirements.txt && python3 -m pip install 'regex==2022.1.18'",
    description="Installing Python requirements"
)

=== Setting Up Environment ===

Example Step: Installing system tools (git-lfs, pip)...
----------------------------------------
Hit:1 http://nova.clouds.archive.ubuntu.com/ubuntu focal InRelease
Get:2 http://nova.clouds.archive.ubuntu.com/ubuntu focal-updates InRelease [128 kB]
Get:3 http://security.ubuntu.com/ubuntu focal-security InRelease [128 kB]
Get:4 http://nova.clouds.archive.ubuntu.com/ubuntu focal-backports InRelease [128 kB]
Get:5 http://nova.clouds.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [3957 kB]
Get:6 http://nova.clouds.archive.ubuntu.com/ubuntu focal-updates/main amd64 c-n-f Metadata [18.0 kB]
Get:7 http://nova.clouds.archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [3922 kB]
Get:8 http://nova.clouds.archive.ubuntu.com/ubuntu focal-updates/restricted amd64 c-n-f Metadata [604 B]
Get:9 http://nova.clouds.archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1262 kB]
Get:10 http://nova.clouds.archive.ubuntu.com/ubuntu focal-u

# 10. Upgrade SQLite (Critical)

The default SQLite version on Ubuntu 20.04 is too old to support some of the `.import --skip` flags used by the reproduction scripts.

We download, compile, and install **SQLite 3.45** from source to ensure the experiment runs without syntax errors.

In [10]:
print("=== Upgrading SQLite ===")

# Commands to compile SQLite 3.45 from source
upgrade_cmds = [
    "wget -q https://www.sqlite.org/2024/sqlite-autoconf-3450100.tar.gz",
    "tar -xzf sqlite-autoconf-3450100.tar.gz",
    "cd sqlite-autoconf-3450100 && ./configure --quiet && make -j$(nproc) && sudo make install",
    "sudo ldconfig",            # Update shared library cache
    "sqlite3 --version"         # Verify version
]

run_remote(
    " && ".join(upgrade_cmds), 
    description="Compiling and Installing SQLite 3.45"
)

=== Upgrading SQLite ===

Example Step: Compiling and Installing SQLite 3.45...
----------------------------------------
gcc -DPACKAGE_NAME=\"sqlite\" -DPACKAGE_TARNAME=\"sqlite\" -DPACKAGE_VERSION=\"3.45.1\" -DPACKAGE_STRING=\"sqlite\ 3.45.1\" -DPACKAGE_BUGREPORT=\"http://www.sqlite.org\" -DPACKAGE_URL=\"\" -DPACKAGE=\"sqlite\" -DVERSION=\"3.45.1\" -DHAVE_STDIO_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_STRINGS_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_UNISTD_H=1 -DSTDC_HEADERS=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_FDATASYNC=1 -DHAVE_USLEEP=1 -DHAVE_LOCALTIME_R=1 -DHAVE_GMTIME_R=1 -DHAVE_DECL_STRERROR_R=1 -DHAVE_STRERROR_R=1 -DHAVE_POSIX_FALLOCATE=1 -DHAVE_ZLIB_H=1 -I.    -D_REENTRANT=1 -DSQLITE_THREADSAFE=1 -DSQLITE_ENABLE_MATH_FUNCTIONS -DSQLITE_ENABLE_FTS4 -DSQLITE_ENABLE_FTS5 -DSQLITE_ENABLE_RTREE -DSQLITE_ENABLE_GEOPOLY -DSQLITE_HAVE_ZLIB  -DSQLITE_ENABLE_EXPLAIN_COMMENTS -DSQLITE_DQS=0 -DSQLITE_ENABLE_DBPAGE_VTAB -DS

# 11. Clean & Prepare Workspace

To ensure a fair reproduction, we delete any pre-calculated results that came with the repository and recreate the output directory structure.

In [11]:
print("=== Preparing Workspace ===")

# 1. Remove existing results to prove we are running it fresh
run_remote("cd Pollock && rm -rf results/*", description="Cleaning old results")

# 2. Create the specific output directory for SQLite results
run_remote(
    "mkdir -p Pollock/results/sqlite/survey_sample/loading", 
    description="Creating output directory structure"
)

=== Preparing Workspace ===

Example Step: Cleaning old results...
----------------------------------------

Success: Cleaning old results

Example Step: Creating output directory structure...
----------------------------------------

Success: Creating output directory structure


# 12. Patch & Run BrewER (SQLite)

**Critical Fix:** The original code contains hardcoded absolute paths (e.g., `/results`) designed for a specific Docker container structure. Since we are running on a standard Ubuntu environment, these paths would fail. 

We use `sed` to patch the Python files in place, converting absolute paths to relative paths.

Once patched, we execute the **SQLite** method using the specific `sut` script, ensuring the `PYTHONPATH` includes the current directory so Python can find the local modules.

In [12]:
print("=== Patching & Running Experiment ===")

# 1. Patch Hardcoded Paths (Docker Fix)
print("Applying patches to fix Docker-specific paths...")
# Remove leading slashes to make paths relative
run_remote("sed -i \"s|f'/{dataset}|f'{dataset}|g\" Pollock/sut/sqlite/sqlite.py")
run_remote("sed -i \"s|f'/results|f'results|g\" Pollock/sut/sqlite/sqlite.py")
print("Patches applied.")

# 2. Run Experiment
print("\nExecuting SQLite experiment on 'survey_sample'...")

# We export PYTHONPATH so the script can find 'pollock' modules in the current dir
experiment_cmd = (
    "cd Pollock && "
    "export DATASET=survey_sample && "
    "export PYTHONPATH=$PYTHONPATH:.:$(pwd)/sut && "
    "python3 sut/sqlite/sqlite.py "
    "--dataset survey_sample "
    "--db_type sqlite " 
    "--run_type experiment"
)

run_remote(
    experiment_cmd, 
    description="Running SQLite Experiment"
)

=== Patching & Running Experiment ===
Applying patches to fix Docker-specific paths...

Success: None

Success: None
Patches applied.

Executing SQLite experiment on 'survey_sample'...

Example Step: Running SQLite Experiment...
----------------------------------------
Columns renamed during .import /home/cc/Pollock/survey_sample/csv/Takakai2008-ch4.csv due to duplicates:
"?" to "?_2",
"?" to "?_4"
Columns renamed during .import /home/cc/Pollock/survey_sample/csv/Takakai2008-ch4.csv due to duplicates:
"?" to "?_2",
"?" to "?_4"
Columns renamed during .import /home/cc/Pollock/survey_sample/csv/Takakai2008-ch4.csv due to duplicates:
"?" to "?_2",
"?" to "?_4"
Columns renamed during .import /home/cc/Pollock/survey_sample/csv/300mAThree.CSV due to duplicates:
"0" to "0_1",
"0" to "0_2"
Columns renamed during .import /home/cc/Pollock/survey_sample/csv/300mAThree.CSV due to duplicates:
"0" to "0_1",
"0" to "0_2"
Columns renamed during .import /home/cc/Pollock/survey_sample/csv/300mAThree.CSV

# 13. Verify Outputs

Before running the full evaluation metrics, we perform a quick sanity check to list the generated output files. We expect to see `_converted.csv` files in the results folder.

In [13]:
print("=== Verifying Output Generation ===")

run_remote(
    "ls -lh Pollock/results/sqlite/survey_sample/loading/ | head -n 5", 
    description="Listing top 5 generated result files"
)

=== Verifying Output Generation ===

Example Step: Listing top 5 generated result files...
----------------------------------------
total 105M
-rw-rw-r-- 1 cc cc   33K Feb  8 03:56 0Al-Sn.CSV_converted.csv
-rw-rw-r-- 1 cc cc   47K Feb  8 03:56 1% SiO2_003.csv_converted.csv
-rw-rw-r-- 1 cc cc   50K Feb  8 03:56 1% nano+20% micro.csv_converted.csv
-rw-rw-r-- 1 cc cc  8.3K Feb  8 03:56 10.January_2019.csv_converted.csv

Success: Listing top 5 generated result files


# 14. Parallel Evaluation

**Note on Evaluation Strategy:** The original repository includes a complex evaluation harness designed to benchmark 17 different systems sequentially. For this reproduction, we focus solely on **SQLite**.

To ensure **robustness** and avoid the overhead of the full harness, we use a custom evaluation script below. 
* It calculates **Precision, Recall, and F1** for Headers, Records, and Cells.
* **Math Validity:** It uses the *exact same* library functions (`pollock.metrics`) as the original paper, ensuring the math is identical.
* **Efficiency:** It runs in parallel to process the `survey_sample` files quickly.

In [14]:
# === SILENT PARALLEL EVALUATION ===
print("=== Running Evaluation Metrics ===")

# This script redirects worker output to /dev/null to prevent SSH deadlocks
silent_eval_script = r'''
import os
import sys
import pandas as pd
import pollock.metrics as metrics
from concurrent.futures import ProcessPoolExecutor, as_completed, TimeoutError
import contextlib

# Configuration
DATASET = "survey_sample"
SUT = "sqlite"
CLEAN_DIR = f"{DATASET}/clean"
LOADED_DIR = f"results/{SUT}/{DATASET}/loading"

def evaluate_single_file_silent(filename):
    """Calculates F1 scores for a single file, silencing stdout/stderr."""
    with open(os.devnull, "w") as devnull:
        with contextlib.redirect_stdout(devnull), contextlib.redirect_stderr(devnull):
            clean_path = os.path.join(CLEAN_DIR, filename)
            loaded_path = os.path.join(LOADED_DIR, filename + "_converted.csv")
            
            row = {"file": filename, "success": 0, "header_f1": 0, "record_f1": 0, "cell_f1": 0}
            try:
                if os.path.exists(loaded_path) and os.path.getsize(loaded_path) > 0:
                    if metrics.successful_csv(loaded_path):
                        row["success"] = 1
                        # Exact same math as the original paper:
                        res = metrics.header_record_cell_measures_csv(clean_path, loaded_path, 1)
                        row["header_f1"] = res[2]
                        row["record_f1"] = res[5]
                        row["cell_f1"] = res[8]
            except:
                pass
            return row

if __name__ == "__main__":
    print(f"--- EVALUATING {SUT.upper()} ---", flush=True)

    if not os.path.exists(LOADED_DIR):
        print("Error: No results directory found!")
        exit(1)

    files = [f for f in os.listdir(CLEAN_DIR) if f.endswith(".csv")]
    
    # Use CPU count - 2 to prevent freezing the server
    safe_cores = max(1, (os.cpu_count() or 1) - 2)
    print(f"Processing {len(files)} files using {safe_cores} workers...", flush=True)
    
    results = []
    with ProcessPoolExecutor(max_workers=safe_cores) as executor:
        future_to_file = {executor.submit(evaluate_single_file_silent, f): f for f in files}
        
        try:
            # 60s timeout per file batch
            for i, future in enumerate(as_completed(future_to_file, timeout=60)):
                res = future.result()
                results.append(res)
                if i % 1 == 0: 
                    print(f"[{i+1}/{len(files)}] Processed...", flush=True)
                    
        except TimeoutError:
            print("\nTIMEOUT: Some workers took too long. Calculating partial results.")
            for fut in future_to_file: fut.cancel()

    if not results:
        print("Fatal Error: No files finished successfully.")
        exit(1)

    df = pd.DataFrame(results)

    # VIDEO-FRIENDLY OUTPUT FORMAT
    print("\n" + "="*40)
    print(f" FINAL SCORES: {SUT} on {DATASET}")
    print("="*40)
    print(f"Success Rate:  {df['success'].mean():.2%}")
    print(f"Header F1:     {df['header_f1'].mean():.4f}")
    print(f"Record F1:     {df['record_f1'].mean():.4f}")
    print(f"Cell F1:       {df['cell_f1'].mean():.4f}")
    print("="*40 + "\n")
'''

# 1. Write the script to a local file
with open("silent_eval.py", "w") as f:
    f.write(silent_eval_script)

# 2. Pipe the script to the remote server and execute it
# We use 'cat' to pipe the file content into python3 on the remote end
cmd = f"cat silent_eval.py | ssh -i {KEY_FILE} -o StrictHostKeyChecking=no cc@{floating_ip} 'cd Pollock && python3 -'"

# 3. Run using subprocess
print("Sending evaluation script to remote server...")
subprocess.run(cmd, shell=True)

=== Running Evaluation Metrics ===
Sending evaluation script to remote server...




--- EVALUATING SQLITE ---
Processing 97 files using 94 workers...
[1/97] Processed...
[2/97] Processed...
[3/97] Processed...
[4/97] Processed...
[5/97] Processed...
[6/97] Processed...
[7/97] Processed...
[8/97] Processed...
[9/97] Processed...
[10/97] Processed...
[11/97] Processed...
[12/97] Processed...
[13/97] Processed...
[14/97] Processed...
[15/97] Processed...
[16/97] Processed...
[17/97] Processed...
[18/97] Processed...
[19/97] Processed...
[20/97] Processed...
[21/97] Processed...
[22/97] Processed...
[23/97] Processed...
[24/97] Processed...
[25/97] Processed...
[26/97] Processed...
[27/97] Processed...
[28/97] Processed...
[29/97] Processed...
[30/97] Processed...
[31/97] Processed...
[32/97] Processed...
[33/97] Processed...
[34/97] Processed...
[35/97] Processed...
[36/97] Processed...
[37/97] Processed...
[38/97] Processed...
[39/97] Processed...
[40/97] Processed...
[41/97] Processed...
[42/97] Processed...
[43/97] Processed...
[44/97] Processed...
[45/97] Processed..

CompletedProcess(args="cat silent_eval.py | ssh -i my_key.pem -o StrictHostKeyChecking=no cc@192.5.86.206 'cd Pollock && python3 -'", returncode=0)

# 15. Clean Up Chameleon Resources
After completing the experiment, all Chameleon Cloud resources must be explicitly released to stop consuming allocation hours and free hardware.

This cleanup process properly terminates the experiment infrastructure:
- Deletes the server instance: removes the running compute node and releases the GPU
- Waits for deletion confirmation: polls server status to ensure complete removal before proceeding
- Deletes the lease reservation: frees the node reservation and automatically releases the floating IP
- Stops allocation consumption: immediately stops consuming Service Units (SUs) from your project quota


In [15]:
# Step 1: Delete the Server Instance
print("\n[1/2] Deleting Server Instance...")
chi.server.delete_server(server_id)
print(f"Server deletion requested for ID: {server_id}")

# Wait for server to be fully deleted
timeout = 300
start_time = time.time()
while time.time() - start_time < timeout:
    try:
        chi.server.get_server(server_id)
        time.sleep(5)
    except Exception:
        print("\nServer successfully deleted")
        break

# Step 2: Delete the Lease
print("\n[2/2] Deleting Lease and Freeing Resources...")
chi.lease.delete_lease(lease_id)
print(f"Lease deleted: {LEASE_NAME}")
print("All resources released")


[1/2] Deleting Server Instance...
Server deletion requested for ID: 8a131159-ffa1-44f5-8d66-6a846513116b

Server successfully deleted

[2/2] Deleting Lease and Freeing Resources...
Deleted lease 3dd65098-78b1-49d7-993f-9bffd5096339
Lease deleted: brewer_reproduction_lease
All resources released
