# Module 1B: GPU Session Setup
## Run This on the AMD Cloud Droplet Before Modules 6–7

---

This is a **minimal** setup notebook for GPU training sessions. It only installs what’s needed for PyTorch + ROCm and verifies your GPU.

**Prerequisites:**
- You’ve already completed Modules 2–5 on your local Mac
- You’ve uploaded `dataset_1M.h5` to this droplet (see Step 0 below)

**What this installs:** PyTorch + ROCm, h5py, tqdm

**What this does NOT install:** HITRAN/HAPI, matplotlib, seaborn (not needed for training)

**Time required:** ~3–4 minutes (mostly downloading PyTorch + ROCm)

---

## Step 0: Verify Dataset Upload

Before running this notebook, make sure you’ve uploaded your dataset from your Mac:

```bash
# From your Mac terminal:
scp ~/methane-ml-course/data/datasets/dataset_1M.h5 \
    root@<DROPLET_IP>:/root/methane-ml-course/data/datasets/
```

Run the cell below to check:

In [None]:
from pathlib import Path

# ── Project paths (GPU droplet) ─────────────────────────
PROJECT_DIR = Path.home() / 'methane-ml-course'
DATA_DIR    = PROJECT_DIR / 'data'
DATASET_DIR = DATA_DIR / 'datasets'
MODEL_DIR   = PROJECT_DIR / 'models'
OUTPUT_DIR  = PROJECT_DIR / 'outputs'

# ── PyTorch / ROCm ──────────────────────────────────
PYTORCH_INDEX_URL = 'https://download.pytorch.org/whl/rocm6.2'

# Create dirs
for d in [DATA_DIR, DATASET_DIR, MODEL_DIR, OUTPUT_DIR]:
    d.mkdir(parents=True, exist_ok=True)

# Check for dataset
dataset_path = DATASET_DIR / 'dataset_1M.h5'
if dataset_path.exists():
    size_gb = dataset_path.stat().st_size / 1e9
    print(f"✅ Dataset found: {dataset_path} ({size_gb:.2f} GB)")
else:
    print(f"⚠️  Dataset NOT found at: {dataset_path}")
    print(f"    Upload it from your Mac first (see instructions above).")
    print(f"    You can still proceed with setup, but training will need the file.")

## Step 1: Install PyTorch + ROCm

In [None]:
import subprocess, sys

print("="*60)
print("STEP 1: Installing PyTorch + ROCm")
print("="*60)
print("This takes ~2-3 minutes (downloading ~4 GB)...\n")

subprocess.check_call([
    sys.executable, '-m', 'pip', 'install',
    'torch', 'torchvision',
    '--index-url', PYTORCH_INDEX_URL,
    '--quiet'
])
print("✔ PyTorch + ROCm")

# Minimal additional deps for training
for pkgs, label in [
    (['h5py', 'tqdm', 'pyyaml'], 'h5py, tqdm, pyyaml'),
]:
    subprocess.check_call(
        [sys.executable, '-m', 'pip', 'install'] + pkgs + ['--quiet']
    )
    print(f"✔ {label}")

print("\n✅ All packages installed!")

## Step 2: Verify GPU Access

In [None]:
print("="*60)
print("STEP 2: Verifying GPU Access")
print("="*60)

import torch

print(f"\nPyTorch version: {torch.__version__}")
print(f"ROCm available : {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"GPU count      : {torch.cuda.device_count()}")
    print(f"GPU name       : {torch.cuda.get_device_name(0)}")
    
    total_mem = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"GPU memory     : {total_mem:.0f} GB")
    
    # Compute tests
    print("\nRunning GPU compute tests...")
    x = torch.randn(5000, 5000, device='cuda')
    y = torch.matmul(x, x)
    torch.cuda.synchronize()
    print("✔ Matrix multiplication: PASSED")
    
    x = torch.randn(100, requires_grad=True, device='cuda')
    y = (x ** 2).sum()
    y.backward()
    print("✔ Gradient computation: PASSED")
    
    print("\n✅ GPU is working correctly!")
else:
    print("\n❌ GPU not detected! Check ROCm installation.")
    print("Try running from SSH: rocm-smi")
    print("Also check: docker exec -it rocm rocm-smi")

---
## ✅ GPU Session Setup Complete!

In [None]:
print("\n" + "="*60)
print("         GPU SESSION SETUP SUMMARY")
print("="*60)

import torch

gpu_ok = torch.cuda.is_available()
gpu_status = f"✅ {torch.cuda.get_device_name(0)}" if gpu_ok else "❌ Not detected"

dataset_ok = dataset_path.exists()
dataset_status = "✅ Ready" if dataset_ok else "❌ Not uploaded"

print(f"""
┌─────────────────────────────────────────────────────────┐
│  Component          │  Status                          │
├─────────────────────────────────────────────────────────┤
│  PyTorch + ROCm     │  {torch.__version__:<30} │
│  GPU                │  {gpu_status:<30} │
│  Dataset            │  {dataset_status:<30} │
└─────────────────────────────────────────────────────────┘
""")

if gpu_ok and dataset_ok:
    print("🎉 ALL SYSTEMS GO! Proceed to Module 6 (Build 1D-CNN).")
elif gpu_ok and not dataset_ok:
    print("⚠️  GPU ready, but dataset missing.")
    print("    Upload dataset_1M.h5 from your Mac, then re-run Step 0.")
else:
    print("⚠️  GPU issues detected. Check ROCm installation.")

---

## After Training: Download Your Model

Once Module 7 is complete and you have `best_model.pt`, download it to your Mac for local inference:

```bash
# From your Mac terminal:
scp root@<DROPLET_IP>:/root/methane-ml-course/models/best_model.pt \
    ~/methane-ml-course/models/
```

Then you can run Modules 8–9 locally on your Mac — no GPU needed!

**Don’t forget** to snapshot and destroy the droplet when done to stop billing.

---

**Module 1B Complete!** Proceed to Module 6.