# Cloud Computing Lab: Practical Exercises

* * * 

<div class="alert alert-success">  
    
### Learning Objectives 
    
* Simulate cloud computing concepts locally
* Practice with cloud APIs and tools
* Run language models efficiently
* Estimate and optimize costs
* Work with remote computing patterns

</div>

### Lab Sections
1. [Environment Setup](#setup)
2. [Simulating Cloud Concepts](#simulate)
3. [Cost Estimation](#costs)
4. [Remote Computing Patterns](#remote)
5. [Running Models Efficiently](#models)
6. [Cloud Storage Simulation](#storage)
7. [Performance Benchmarking](#benchmark)

<a id='setup'></a>

# 1. Environment Setup

Let's set up our environment to simulate cloud computing concepts locally.

In [None]:
# Install required packages
!pip install -q psutil pandas matplotlib requests transformers torch

import os
import sys
import time
import json
import psutil
import platform
import subprocess
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
import numpy as np

print("Environment ready!")
print(f"Python: {sys.version}")
print(f"Platform: {platform.platform()}")

## System Information

Understanding your system helps you choose the right cloud instance.

In [None]:
def get_system_info():
    """Get current system specifications"""
    info = {
        "CPU": {
            "cores_physical": psutil.cpu_count(logical=False),
            "cores_logical": psutil.cpu_count(logical=True),
            "frequency_mhz": psutil.cpu_freq().current if psutil.cpu_freq() else "N/A",
            "usage_percent": psutil.cpu_percent(interval=1)
        },
        "Memory": {
            "total_gb": round(psutil.virtual_memory().total / (1024**3), 2),
            "available_gb": round(psutil.virtual_memory().available / (1024**3), 2),
            "used_percent": psutil.virtual_memory().percent
        },
        "Disk": {
            "total_gb": round(psutil.disk_usage('/').total / (1024**3), 2),
            "free_gb": round(psutil.disk_usage('/').free / (1024**3), 2),
            "used_percent": psutil.disk_usage('/').percent
        }
    }
    return info

# Display system info
system_info = get_system_info()
for category, details in system_info.items():
    print(f"\n{category}:")
    for key, value in details.items():
        print(f"  {key}: {value}")

<a id='simulate'></a>

# 2. Simulating Cloud Concepts

Let's simulate key cloud computing concepts locally.

## Virtual Machine Simulation

We'll create a simple VM simulator to understand resource allocation.

In [None]:
class VirtualMachine:
    """Simulate a cloud virtual machine"""
    
    # GCP Machine Types (simplified)
    MACHINE_TYPES = {
        "e2-micro": {"vcpus": 0.25, "memory_gb": 1, "price_hour": 0.006},
        "e2-small": {"vcpus": 0.5, "memory_gb": 2, "price_hour": 0.012},
        "e2-medium": {"vcpus": 1, "memory_gb": 4, "price_hour": 0.027},
        "e2-standard-2": {"vcpus": 2, "memory_gb": 8, "price_hour": 0.067},
        "e2-standard-4": {"vcpus": 4, "memory_gb": 16, "price_hour": 0.134},
        "n2-standard-8": {"vcpus": 8, "memory_gb": 32, "price_hour": 0.388},
        "a2-highgpu-1g": {"vcpus": 12, "memory_gb": 85, "price_hour": 2.95, "gpu": "A100"}
    }
    
    def __init__(self, name, machine_type, region="us-central1"):
        self.name = name
        self.machine_type = machine_type
        self.region = region
        self.state = "STOPPED"
        self.start_time = None
        self.total_runtime = 0
        
        if machine_type not in self.MACHINE_TYPES:
            raise ValueError(f"Unknown machine type: {machine_type}")
        
        self.specs = self.MACHINE_TYPES[machine_type]
    
    def start(self):
        """Start the VM"""
        if self.state == "RUNNING":
            print(f"VM {self.name} is already running")
            return
        
        self.state = "RUNNING"
        self.start_time = datetime.now()
        print(f"✅ VM {self.name} started")
        print(f"   Type: {self.machine_type}")
        print(f"   vCPUs: {self.specs['vcpus']}, RAM: {self.specs['memory_gb']}GB")
        print(f"   Cost: ${self.specs['price_hour']}/hour")
    
    def stop(self):
        """Stop the VM"""
        if self.state == "STOPPED":
            print(f"VM {self.name} is already stopped")
            return
        
        runtime = (datetime.now() - self.start_time).total_seconds()
        self.total_runtime += runtime
        self.state = "STOPPED"
        
        cost = (runtime / 3600) * self.specs['price_hour']
        print(f"🛑 VM {self.name} stopped")
        print(f"   Runtime: {runtime:.0f} seconds")
        print(f"   Session cost: ${cost:.4f}")
    
    def estimate_cost(self, hours):
        """Estimate cost for running X hours"""
        return hours * self.specs['price_hour']
    
    def __str__(self):
        return f"VM({self.name}, {self.machine_type}, {self.state})"

# Create and test VMs
vm1 = VirtualMachine("test-instance", "e2-micro")
vm2 = VirtualMachine("ml-instance", "e2-standard-4")

print("Created VMs:")
print(f"  {vm1}")
print(f"  {vm2}")

In [None]:
# Simulate VM lifecycle
print("Simulating VM operations...\n")

# Start VM
vm1.start()
print()

# Simulate some work
print("Doing work for 3 seconds...")
time.sleep(3)

# Stop VM
vm1.stop()
print()

# Estimate costs
print("Cost estimates for vm1 (e2-micro):")
for hours in [1, 24, 24*7, 24*30]:
    cost = vm1.estimate_cost(hours)
    period = {1: "1 hour", 24: "1 day", 168: "1 week", 720: "1 month"}[hours]
    print(f"  {period}: ${cost:.2f}")

## Serverless Function Simulation

In [None]:
class CloudFunction:
    """Simulate a serverless cloud function"""
    
    # Pricing: $0.0000004 per invocation + $0.0000025 per GB-second
    PRICE_PER_INVOCATION = 0.0000004
    PRICE_PER_GB_SECOND = 0.0000025
    
    def __init__(self, name, memory_mb=256):
        self.name = name
        self.memory_gb = memory_mb / 1024
        self.invocations = 0
        self.total_runtime = 0
        self.total_cost = 0
    
    def invoke(self, func, *args, **kwargs):
        """Run the function and track costs"""
        start = time.time()
        
        # Run the actual function
        result = func(*args, **kwargs)
        
        # Calculate costs
        runtime = time.time() - start
        invocation_cost = self.PRICE_PER_INVOCATION
        compute_cost = runtime * self.memory_gb * self.PRICE_PER_GB_SECOND
        total_cost = invocation_cost + compute_cost
        
        # Update stats
        self.invocations += 1
        self.total_runtime += runtime
        self.total_cost += total_cost
        
        return result
    
    def get_stats(self):
        """Get usage statistics"""
        return {
            "invocations": self.invocations,
            "total_runtime_s": round(self.total_runtime, 3),
            "avg_runtime_ms": round((self.total_runtime / max(self.invocations, 1)) * 1000, 2),
            "total_cost": f"${self.total_cost:.8f}",
            "cost_per_invocation": f"${self.total_cost / max(self.invocations, 1):.8f}"
        }

# Example serverless function
def process_text(text):
    """Simulate text processing"""
    time.sleep(0.1)  # Simulate processing
    return len(text.split())

# Create and test cloud function
cf = CloudFunction("word-counter", memory_mb=256)

# Simulate multiple invocations
texts = [
    "Cloud computing is transforming research.",
    "Serverless functions scale automatically.",
    "Pay only for what you use with cloud services."
]

print("Running serverless functions...\n")
for text in texts:
    word_count = cf.invoke(process_text, text)
    print(f"Processed: '{text[:30]}...' → {word_count} words")

print("\nFunction Statistics:")
for key, value in cf.get_stats().items():
    print(f"  {key}: {value}")

<a id='costs'></a>

# 3. Cost Estimation

Understanding cloud costs is crucial for research budgets.

In [None]:
class CloudCostCalculator:
    """Calculate and visualize cloud costs"""
    
    def __init__(self):
        self.costs = []
    
    def add_resource(self, name, resource_type, specs, hours):
        """Add a resource to cost calculation"""
        if resource_type == "compute":
            cost = specs['price_hour'] * hours
        elif resource_type == "storage":
            # $0.02 per GB per month
            cost = specs['size_gb'] * 0.02 * (hours / 720)  # Convert to monthly
        elif resource_type == "network":
            # $0.12 per GB egress
            cost = specs['transfer_gb'] * 0.12
        else:
            cost = 0
        
        self.costs.append({
            'name': name,
            'type': resource_type,
            'hours': hours,
            'cost': cost,
            'specs': specs
        })
    
    def get_total_cost(self):
        """Calculate total cost"""
        return sum(item['cost'] for item in self.costs)
    
    def get_breakdown(self):
        """Get cost breakdown by type"""
        breakdown = {}
        for item in self.costs:
            if item['type'] not in breakdown:
                breakdown[item['type']] = 0
            breakdown[item['type']] += item['cost']
        return breakdown
    
    def visualize(self):
        """Create cost visualization"""
        if not self.costs:
            print("No costs to visualize")
            return
        
        # Prepare data
        df = pd.DataFrame(self.costs)
        
        # Create subplots
        fig, axes = plt.subplots(1, 2, figsize=(12, 5))
        
        # Pie chart by type
        breakdown = self.get_breakdown()
        axes[0].pie(breakdown.values(), labels=breakdown.keys(), autopct='%1.1f%%')
        axes[0].set_title('Cost Breakdown by Type')
        
        # Bar chart by resource
        df_sorted = df.sort_values('cost', ascending=False).head(10)
        axes[1].barh(df_sorted['name'], df_sorted['cost'])
        axes[1].set_xlabel('Cost ($)')
        axes[1].set_title('Top 10 Resources by Cost')
        
        plt.tight_layout()
        plt.show()

# Example: Calculate costs for a research project
calc = CloudCostCalculator()

# Add compute resources
calc.add_resource(
    "development-vm",
    "compute",
    {"price_hour": 0.027},  # e2-medium
    hours=8*5*4  # 8 hours/day, 5 days/week, 4 weeks
)

calc.add_resource(
    "training-vm",
    "compute",
    {"price_hour": 2.95},  # a2-highgpu-1g
    hours=4*3  # 4 hours, 3 times
)

# Add storage
calc.add_resource(
    "dataset-storage",
    "storage",
    {"size_gb": 100},
    hours=720  # 1 month
)

# Add network transfer
calc.add_resource(
    "data-download",
    "network",
    {"transfer_gb": 50},
    hours=1
)

# Display results
print("Project Cost Estimation\n" + "="*40)
for item in calc.costs:
    print(f"{item['name']:<20} ${item['cost']:.2f}")
print("="*40)
print(f"{'TOTAL':<20} ${calc.get_total_cost():.2f}")

# Visualize
calc.visualize()

## Cost Optimization Strategies

In [None]:
def compare_instance_strategies():
    """Compare different instance usage strategies"""
    
    strategies = {
        "Always On": {
            "description": "Keep instance running 24/7",
            "hours": 24 * 30,  # Full month
            "instance": "e2-standard-4",
            "price_hour": 0.134
        },
        "Business Hours": {
            "description": "Run only during work hours (9-5, M-F)",
            "hours": 8 * 22,  # 8 hours * 22 work days
            "instance": "e2-standard-4",
            "price_hour": 0.134
        },
        "On Demand": {
            "description": "Start/stop as needed (~2 hrs/day)",
            "hours": 2 * 30,  # 2 hours * 30 days
            "instance": "e2-standard-4",
            "price_hour": 0.134
        },
        "Preemptible": {
            "description": "Use cheaper preemptible instances",
            "hours": 8 * 22,
            "instance": "e2-standard-4 (preemptible)",
            "price_hour": 0.040  # ~70% cheaper
        },
        "Right-sized": {
            "description": "Use smaller instance that fits needs",
            "hours": 24 * 30,
            "instance": "e2-medium",
            "price_hour": 0.027
        }
    }
    
    results = []
    for name, strategy in strategies.items():
        cost = strategy['hours'] * strategy['price_hour']
        results.append({
            'Strategy': name,
            'Description': strategy['description'],
            'Instance': strategy['instance'],
            'Hours': strategy['hours'],
            'Monthly Cost': f"${cost:.2f}",
            'Savings vs Always On': f"{(1 - cost/96.48)*100:.1f}%" if name != "Always On" else "--"
        })
    
    df = pd.DataFrame(results)
    return df

# Compare strategies
comparison = compare_instance_strategies()
print("Cost Optimization Strategies Comparison\n")
print(comparison.to_string(index=False))

# Visualize savings
costs = [float(r['Monthly Cost'].replace('$', '')) for r in comparison.to_dict('records')]
strategies = [r['Strategy'] for r in comparison.to_dict('records')]

plt.figure(figsize=(10, 6))
bars = plt.bar(strategies, costs, color=['red', 'orange', 'yellow', 'lightgreen', 'green'])
plt.ylabel('Monthly Cost ($)')
plt.title('Cloud Cost by Usage Strategy')
plt.xticks(rotation=45)

# Add value labels on bars
for bar, cost in zip(bars, costs):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 1,
             f'${cost:.0f}', ha='center', va='bottom')

plt.tight_layout()
plt.show()

<a id='remote'></a>

# 4. Remote Computing Patterns

Practice patterns for working with remote systems.

## SSH Connection Simulator

In [None]:
class SSHSimulator:
    """Simulate SSH connections and commands"""
    
    def __init__(self, hostname, username="user"):
        self.hostname = hostname
        self.username = username
        self.connected = False
        self.command_history = []
    
    def connect(self):
        """Simulate SSH connection"""
        print(f"Connecting to {self.username}@{self.hostname}...")
        time.sleep(1)  # Simulate connection delay
        self.connected = True
        print(f"✅ Connected to {self.hostname}")
        print(f"Welcome to Ubuntu 22.04.3 LTS (GNU/Linux 5.15.0)\n")
        return True
    
    def run_command(self, command):
        """Simulate running a command on remote system"""
        if not self.connected:
            print("Error: Not connected. Run connect() first.")
            return None
        
        self.command_history.append(command)
        print(f"{self.username}@{self.hostname}:~$ {command}")
        
        # Simulate different commands
        responses = {
            "pwd": "/home/user",
            "ls": "data/  models/  scripts/  results/",
            "whoami": self.username,
            "hostname": self.hostname,
            "free -h": """              total        used        free      shared  buff/cache   available
Mem:           15Gi       2.1Gi        10Gi       156Mi       3.2Gi        13Gi
Swap:            0B          0B          0B""",
            "df -h": """Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        30G  5.2G   24G  19% /
/dev/sda15      105M  5.2M  100M   5% /boot/efi""",
            "python3 --version": "Python 3.10.12",
            "nvidia-smi": "No NVIDIA GPU detected"
        }
        
        # Get response or simulate command execution
        if command in responses:
            output = responses[command]
        elif command.startswith("echo"):
            output = command.replace("echo ", "")
        elif command.startswith("cd"):
            output = ""  # cd has no output
        else:
            output = f"Command executed: {command}"
        
        if output:
            print(output)
        return output
    
    def disconnect(self):
        """Disconnect SSH session"""
        if self.connected:
            print(f"\nDisconnecting from {self.hostname}...")
            self.connected = False
            print("Connection closed.")
    
    def transfer_file(self, local_file, remote_path, direction="upload"):
        """Simulate file transfer"""
        if not self.connected:
            print("Error: Not connected")
            return False
        
        if direction == "upload":
            print(f"Uploading {local_file} to {self.hostname}:{remote_path}")
        else:
            print(f"Downloading {self.hostname}:{remote_path} to {local_file}")
        
        # Simulate transfer with progress
        for i in range(0, 101, 20):
            print(f"Progress: {i}%", end="\r")
            time.sleep(0.2)
        print("Transfer complete!    ")
        return True

# Test SSH simulator
ssh = SSHSimulator("vm-instance-1", "student")

# Connect and run commands
ssh.connect()
ssh.run_command("pwd")
ssh.run_command("ls")
ssh.run_command("free -h")
ssh.run_command("python3 --version")

# Transfer files
print("\n--- File Transfer ---")
ssh.transfer_file("model.py", "/home/student/models/", "upload")
ssh.transfer_file("/home/student/results/output.csv", "output.csv", "download")

# Disconnect
ssh.disconnect()

## Remote Job Management

In [None]:
class RemoteJob:
    """Simulate running jobs on remote systems"""
    
    def __init__(self, name, script, estimated_time_s=60):
        self.name = name
        self.script = script
        self.estimated_time = estimated_time_s
        self.status = "PENDING"
        self.start_time = None
        self.end_time = None
        self.output = []
    
    def start(self):
        """Start the job"""
        self.status = "RUNNING"
        self.start_time = datetime.now()
        print(f"🚀 Job '{self.name}' started")
        print(f"   Script: {self.script}")
        print(f"   Estimated time: {self.estimated_time}s")
    
    def check_status(self):
        """Check job status"""
        if self.status == "RUNNING":
            elapsed = (datetime.now() - self.start_time).total_seconds()
            progress = min(elapsed / self.estimated_time * 100, 100)
            print(f"Job '{self.name}': {self.status} ({progress:.0f}% complete)")
        else:
            print(f"Job '{self.name}': {self.status}")
        return self.status
    
    def simulate_completion(self):
        """Simulate job completion"""
        if self.status == "RUNNING":
            self.status = "COMPLETED"
            self.end_time = datetime.now()
            runtime = (self.end_time - self.start_time).total_seconds()
            
            # Generate fake output
            self.output = [
                f"Processing started at {self.start_time}",
                "Loading data...",
                "Running analysis...",
                "Results computed successfully",
                f"Total runtime: {runtime:.2f} seconds"
            ]
            
            print(f"✅ Job '{self.name}' completed in {runtime:.2f}s")
    
    def get_output(self):
        """Get job output"""
        if self.status == "COMPLETED":
            return "\n".join(self.output)
        else:
            return f"Job is {self.status}. No output available yet."

# Simulate batch job processing
jobs = [
    RemoteJob("data_preprocessing", "preprocess.py", 30),
    RemoteJob("model_training", "train_model.py", 120),
    RemoteJob("evaluation", "evaluate.py", 45)
]

print("Submitting batch jobs...\n")

# Start all jobs
for job in jobs:
    job.start()
    time.sleep(0.5)

print("\n--- Monitoring Jobs ---")
# Check status
for job in jobs:
    job.check_status()

print("\n--- Simulating Completion ---")
# Complete jobs
for job in jobs:
    job.simulate_completion()
    print(f"Output preview: {job.get_output().split(chr(10))[0]}...")

<a id='models'></a>

# 5. Running Models Efficiently

Optimize model inference for cloud environments.

In [None]:
# Note: This simulates model loading without actually loading large models
class CloudModelSimulator:
    """Simulate running models in cloud with resource tracking"""
    
    MODEL_CONFIGS = {
        "tiny": {"size_gb": 0.5, "load_time": 2, "tokens_per_sec": 50},
        "small": {"size_gb": 2, "load_time": 5, "tokens_per_sec": 30},
        "medium": {"size_gb": 7, "load_time": 15, "tokens_per_sec": 15},
        "large": {"size_gb": 13, "load_time": 30, "tokens_per_sec": 8},
        "huge": {"size_gb": 30, "load_time": 60, "tokens_per_sec": 3}
    }
    
    def __init__(self, model_size="small"):
        if model_size not in self.MODEL_CONFIGS:
            raise ValueError(f"Unknown model size: {model_size}")
        
        self.model_size = model_size
        self.config = self.MODEL_CONFIGS[model_size]
        self.loaded = False
        self.total_tokens = 0
        self.total_time = 0
    
    def load(self):
        """Simulate model loading"""
        print(f"Loading {self.model_size} model ({self.config['size_gb']}GB)...")
        
        # Simulate loading with progress bar
        load_steps = 10
        for i in range(load_steps + 1):
            progress = i / load_steps
            bar = "█" * int(progress * 30)
            print(f"\r[{bar:<30}] {progress*100:.0f}%", end="")
            time.sleep(self.config['load_time'] / load_steps / 10)  # Speed up for demo
        
        print(f"\n✅ Model loaded in {self.config['load_time']}s (simulated)")
        self.loaded = True
    
    def generate(self, prompt, max_tokens=100):
        """Simulate text generation"""
        if not self.loaded:
            print("Error: Model not loaded. Call load() first.")
            return None
        
        # Calculate generation time
        gen_time = max_tokens / self.config['tokens_per_sec']
        
        print(f"\nGenerating {max_tokens} tokens...")
        print(f"Speed: {self.config['tokens_per_sec']} tokens/sec")
        
        # Simulate generation with progress
        for i in range(0, max_tokens, 10):
            print(".", end="", flush=True)
            time.sleep(0.1)  # Simulate processing
        
        # Update stats
        self.total_tokens += max_tokens
        self.total_time += gen_time
        
        print(f"\n✅ Generated in {gen_time:.2f}s (simulated)")
        return f"[Generated text for prompt: '{prompt[:50]}...']"  # Placeholder
    
    def estimate_cost(self, instance_type="e2-standard-4"):
        """Estimate cost for running this model"""
        instance_prices = {
            "e2-medium": 0.027,
            "e2-standard-4": 0.134,
            "n2-standard-8": 0.388,
            "a2-highgpu-1g": 2.95
        }
        
        if instance_type not in instance_prices:
            return None
        
        # Estimate based on model size
        min_ram = self.config['size_gb'] * 2  # Rule of thumb: 2x model size
        hourly_rate = instance_prices[instance_type]
        
        return {
            "model": self.model_size,
            "min_ram_gb": min_ram,
            "instance": instance_type,
            "hourly_cost": hourly_rate,
            "cost_per_1k_tokens": (hourly_rate / 3600) * (1000 / self.config['tokens_per_sec'])
        }

# Test different model sizes
print("Model Size Comparison\n" + "="*50)

for size in ["tiny", "small", "medium"]:
    model = CloudModelSimulator(size)
    cost_info = model.estimate_cost("e2-standard-4")
    
    print(f"\n{size.upper()} Model:")
    print(f"  Size: {model.config['size_gb']}GB")
    print(f"  Speed: {model.config['tokens_per_sec']} tokens/sec")
    print(f"  Min RAM: {cost_info['min_ram_gb']}GB")
    print(f"  Cost per 1K tokens: ${cost_info['cost_per_1k_tokens']:.4f}")

In [None]:
# Demonstrate model usage
print("Running Model Demo\n" + "="*50)

# Load and use a model
model = CloudModelSimulator("small")
model.load()

# Generate text
prompts = [
    "Explain cloud computing",
    "What is machine learning?",
    "Benefits of remote work"
]

for prompt in prompts:
    result = model.generate(prompt, max_tokens=50)
    print(f"Result: {result}\n")

<a id='storage'></a>

# 6. Cloud Storage Simulation

Practice working with cloud storage patterns.

In [None]:
class CloudStorage:
    """Simulate cloud storage operations"""
    
    def __init__(self, bucket_name):
        self.bucket_name = bucket_name
        self.files = {}
        self.total_size = 0
        self.operations = []
    
    def upload(self, file_name, size_mb, file_type="data"):
        """Upload file to bucket"""
        print(f"Uploading {file_name} ({size_mb}MB) to gs://{self.bucket_name}/")
        
        # Simulate upload with progress
        upload_time = size_mb / 10  # 10MB/s upload speed
        print(f"Upload time: {upload_time:.1f}s")
        
        # Store file metadata
        self.files[file_name] = {
            "size_mb": size_mb,
            "type": file_type,
            "uploaded": datetime.now(),
            "url": f"gs://{self.bucket_name}/{file_name}"
        }
        
        self.total_size += size_mb
        self.operations.append(("upload", file_name, size_mb))
        
        print(f"✅ Upload complete: {self.files[file_name]['url']}")
        return True
    
    def download(self, file_name):
        """Download file from bucket"""
        if file_name not in self.files:
            print(f"Error: {file_name} not found in bucket")
            return False
        
        file_info = self.files[file_name]
        print(f"Downloading {file_name} ({file_info['size_mb']}MB)...")
        
        download_time = file_info['size_mb'] / 20  # 20MB/s download
        print(f"Download time: {download_time:.1f}s")
        
        self.operations.append(("download", file_name, file_info['size_mb']))
        print(f"✅ Downloaded to ./{file_name}")
        return True
    
    def list_files(self):
        """List all files in bucket"""
        print(f"\nFiles in gs://{self.bucket_name}/")
        print("-" * 60)
        
        if not self.files:
            print("(empty)")
            return
        
        for name, info in self.files.items():
            print(f"{name:<30} {info['size_mb']:>8}MB  {info['type']:<10}")
        
        print("-" * 60)
        print(f"Total: {len(self.files)} files, {self.total_size}MB")
    
    def estimate_monthly_cost(self):
        """Estimate storage costs"""
        storage_cost = self.total_size * 0.02 / 1000  # $0.02 per GB
        
        # Calculate operation costs
        operation_costs = {
            "upload": sum(op[2] for op in self.operations if op[0] == "upload") * 0.005 / 1000,
            "download": sum(op[2] for op in self.operations if op[0] == "download") * 0.12 / 1000
        }
        
        total = storage_cost + sum(operation_costs.values())
        
        return {
            "storage": storage_cost,
            "operations": operation_costs,
            "total": total
        }

# Create and use cloud storage
storage = CloudStorage("research-data-2024")

# Upload files
print("Cloud Storage Operations Demo\n" + "="*50 + "\n")

storage.upload("dataset.csv", 500, "data")
storage.upload("model_weights.pkl", 2000, "model")
storage.upload("results.json", 10, "output")

# List files
storage.list_files()

# Download a file
print("\n" + "="*50)
storage.download("model_weights.pkl")

# Calculate costs
print("\nMonthly Cost Estimate:")
costs = storage.estimate_monthly_cost()
print(f"  Storage: ${costs['storage']:.4f}")
print(f"  Upload: ${costs['operations']['upload']:.4f}")
print(f"  Download: ${costs['operations']['download']:.4f}")
print(f"  Total: ${costs['total']:.4f}")

<a id='benchmark'></a>

# 7. Performance Benchmarking

Compare local vs cloud performance for different tasks.

In [None]:
def benchmark_task(task_func, name, iterations=3):
    """Benchmark a computational task"""
    times = []
    
    print(f"Benchmarking: {name}")
    for i in range(iterations):
        start = time.time()
        result = task_func()
        elapsed = time.time() - start
        times.append(elapsed)
        print(f"  Run {i+1}: {elapsed:.3f}s")
    
    avg_time = np.mean(times)
    std_time = np.std(times)
    
    return {
        "name": name,
        "avg_time": avg_time,
        "std_time": std_time,
        "times": times
    }

# Define benchmark tasks
def cpu_intensive_task():
    """CPU-intensive computation"""
    # Matrix multiplication
    size = 1000
    A = np.random.rand(size, size)
    B = np.random.rand(size, size)
    C = np.dot(A, B)
    return C.sum()

def memory_intensive_task():
    """Memory-intensive task"""
    # Large array operations
    size = 10_000_000
    arr = np.random.rand(size)
    sorted_arr = np.sort(arr)
    return sorted_arr[0]

def io_intensive_task():
    """I/O-intensive task"""
    # File operations (simulated)
    data = "x" * 1_000_000  # 1MB of data
    temp_file = "temp_benchmark.txt"
    
    # Write
    with open(temp_file, 'w') as f:
        for _ in range(10):
            f.write(data)
    
    # Read
    with open(temp_file, 'r') as f:
        content = f.read()
    
    # Cleanup
    os.remove(temp_file)
    return len(content)

# Run benchmarks
print("Performance Benchmarks\n" + "="*50 + "\n")

benchmarks = [
    benchmark_task(cpu_intensive_task, "CPU Intensive (Matrix Mult)", 3),
    benchmark_task(memory_intensive_task, "Memory Intensive (Array Sort)", 3),
    benchmark_task(io_intensive_task, "I/O Intensive (File Ops)", 3)
]

# Display results
print("\n" + "="*50)
print("Benchmark Summary:\n")

results_df = pd.DataFrame(benchmarks)
results_df['avg_time'] = results_df['avg_time'].round(3)
results_df['std_time'] = results_df['std_time'].round(3)

print(results_df[['name', 'avg_time', 'std_time']].to_string(index=False))

# Visualize
plt.figure(figsize=(10, 6))
names = [b['name'].split('(')[0].strip() for b in benchmarks]
times = [b['avg_time'] for b in benchmarks]
errors = [b['std_time'] for b in benchmarks]

bars = plt.bar(names, times, yerr=errors, capsize=5)
plt.ylabel('Time (seconds)')
plt.title('Task Performance Benchmarks')
plt.xticks(rotation=45)

# Add value labels
for bar, time in zip(bars, times):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01,
             f'{time:.3f}s', ha='center', va='bottom')

plt.tight_layout()
plt.show()

print("\n💡 Cloud vs Local Insights:")
print("- CPU tasks: Cloud wins with more cores")
print("- Memory tasks: Cloud wins with more RAM")
print("- I/O tasks: Depends on storage type (SSD vs HDD)")
print("- Network tasks: Local wins (no latency)")

## Final Exercise: Cloud Decision Matrix

Help decide when to use cloud vs local computing.

In [None]:
def cloud_decision_matrix(project_specs):
    """Evaluate whether to use cloud or local computing"""
    
    scores = {"cloud": 0, "local": 0}
    recommendations = []
    
    # Evaluate each factor
    if project_specs["data_size_gb"] > 50:
        scores["cloud"] += 2
        recommendations.append("Large data favors cloud storage")
    else:
        scores["local"] += 1
        recommendations.append("Small data can be handled locally")
    
    if project_specs["needs_gpu"]:
        scores["cloud"] += 3
        recommendations.append("GPU requirement strongly favors cloud")
    
    if project_specs["team_size"] > 1:
        scores["cloud"] += 2
        recommendations.append("Team collaboration easier in cloud")
    else:
        scores["local"] += 1
        recommendations.append("Solo work can be done locally")
    
    if project_specs["duration_days"] < 7:
        scores["cloud"] += 1
        recommendations.append("Short projects benefit from cloud flexibility")
    else:
        scores["local"] += 1
        recommendations.append("Long projects might be cheaper locally")
    
    if project_specs["budget_usd"] < 50:
        scores["local"] += 2
        recommendations.append("Limited budget favors local computing")
    else:
        scores["cloud"] += 1
        recommendations.append("Adequate budget allows cloud usage")
    
    if project_specs["sensitive_data"]:
        scores["local"] += 3
        recommendations.append("Sensitive data may require local processing")
    
    # Determine winner
    if scores["cloud"] > scores["local"]:
        decision = "Use CLOUD Computing"
        color = "green"
    elif scores["local"] > scores["cloud"]:
        decision = "Use LOCAL Computing"
        color = "blue"
    else:
        decision = "Either option works"
        color = "yellow"
    
    return {
        "decision": decision,
        "scores": scores,
        "recommendations": recommendations
    }

# Test with different scenarios
scenarios = [
    {
        "name": "Deep Learning Research",
        "data_size_gb": 100,
        "needs_gpu": True,
        "team_size": 3,
        "duration_days": 30,
        "budget_usd": 500,
        "sensitive_data": False
    },
    {
        "name": "Personal Blog Analysis",
        "data_size_gb": 5,
        "needs_gpu": False,
        "team_size": 1,
        "duration_days": 60,
        "budget_usd": 20,
        "sensitive_data": False
    },
    {
        "name": "Healthcare Data Study",
        "data_size_gb": 30,
        "needs_gpu": False,
        "team_size": 2,
        "duration_days": 90,
        "budget_usd": 200,
        "sensitive_data": True
    }
]

print("Cloud vs Local Decision Matrix\n" + "="*50)

for scenario in scenarios:
    print(f"\n📊 Project: {scenario['name']}")
    print("-" * 40)
    
    # Display specs
    print("Specifications:")
    for key, value in scenario.items():
        if key != "name":
            print(f"  {key}: {value}")
    
    # Get decision
    result = cloud_decision_matrix(scenario)
    
    print(f"\n🎯 Decision: {result['decision']}")
    print(f"   Cloud Score: {result['scores']['cloud']}")
    print(f"   Local Score: {result['scores']['local']}")
    
    print("\nRecommendations:")
    for rec in result['recommendations']:
        print(f"  • {rec}")

print("\n" + "="*50)
print("Remember: These are guidelines. Consider your specific needs!")

## Lab Summary

### Key Takeaways

1. **Cloud Concepts**: Understood VMs, serverless, and storage patterns
2. **Cost Management**: Learned to estimate and optimize cloud costs
3. **Remote Patterns**: Practiced SSH, file transfer, and job management
4. **Performance**: Compared local vs cloud for different workloads
5. **Decision Making**: Developed framework for cloud adoption choices

### Next Steps

1. Create a GCP account and claim your $300 credit
2. Launch your first VM and run a Python script
3. Practice with the cost calculator before starting projects
4. Consider environmental impact in your cloud usage
5. Document your setup for reproducibility

### Resources for Practice

- [Google Cloud Skills Boost](https://www.cloudskillsboost.google/)
- [GCP Free Tier](https://cloud.google.com/free)
- [Cloud Carbon Footprint](https://www.cloudcarbonfootprint.org/)
- [GitHub Codespaces](https://github.com/features/codespaces)