# Module 1A: Local Environment Setup
## Run This Once on Your Mac (Then Re-run Only If Needed)

---

This notebook sets up your **local MacBook Pro** for Modules 2–5, 8, and 9. These modules involve spectroscopic simulation, dataset generation, and inference — all CPU-bound work that doesn’t need a GPU.

**What this installs:**
- HAPI (HITRAN API) for spectroscopic data
- NumPy, Matplotlib, Seaborn for computation and plotting
- h5py for HDF5 dataset files
- PyTorch (CPU-only) for model architecture prototyping and local inference

**What this does NOT install:**
- ROCm or GPU drivers (that’s Module 1B, on the cloud droplet)
- Heavy vision packages (ultralytics, albumentations, opencv)

**Time required:** ~2–3 minutes

---

### Hybrid Workflow Overview

```
Your Mac (Modules 1A, 2–5, 8–9)       AMD GPU Droplet (Modules 1B, 6–7)
┌─────────────────────────────────┐   ┌────────────────────────────┐
│  Physics, Simulation, Data Gen  │   │  CNN Training (GPU)        │
│  Inference, Export              │   │  GPU Verification           │
└─────────────────────────────────┘   └────────────────────────────┘
        dataset_1M.h5  ── scp ─────→
        best_model.pt  ── scp ─────←
```

---

## Course Configuration

All course-wide parameters are defined here. Every subsequent cell and module references these variables.

In [None]:
from pathlib import Path
import platform

# ── Project paths (portable — works on Mac and Linux) ──────
PROJECT_DIR = Path.home() / 'methane-ml-course'
DATA_DIR    = PROJECT_DIR / 'data'
HITRAN_DIR  = DATA_DIR / 'hitran'
SPECTRA_DIR = DATA_DIR / 'spectra'
DATASET_DIR = DATA_DIR / 'datasets'
MODEL_DIR   = PROJECT_DIR / 'models'
OUTPUT_DIR  = PROJECT_DIR / 'outputs'

# ── HITRAN / spectroscopy parameters ───────────────────────
MOLECULE_ID  = 6          # CH4
ISOTOPE_ID   = 1          # Main isotopologue (12CH4)
NU_MIN       = 4383.0     # cm⁻¹  — start of wavenumber range
NU_MAX       = 4386.0     # cm⁻¹  — end of wavenumber range
TABLE_NAME   = 'CH4_4383_4386'

# ── Default simulation environment ────────────────────────
DEFAULT_TEMP     = 296.0  # K   (HITRAN reference temperature)
DEFAULT_PRESSURE = 1.0    # atm
MOLE_FRACTION    = 0.01   # 1% CH4 (for absorption coefficient calc)
WAVENUMBER_STEP  = 0.001  # cm⁻¹  — spectral resolution

# ── Device selection (CPU for local, GPU when available) ──
DEVICE = 'cpu'  # Modules 2-5 don't need a GPU

print(f"System          : {platform.system()} {platform.machine()}")
print(f"Project dir     : {PROJECT_DIR}")
print(f"HITRAN dir      : {HITRAN_DIR}")
print(f"Table name      : {TABLE_NAME}")
print(f"Wavenumber range: {NU_MIN} – {NU_MAX} cm⁻¹")
print(f"Compute device  : {DEVICE}")
print("\n✅ Configuration loaded.")

## Step 1: Install Course Packages

We install PyTorch CPU-only (much smaller download than ROCm) plus the science stack.

In [None]:
import subprocess, sys

print("="*60)
print("STEP 1: Installing Packages")
print("="*60)
print()

# Group 1: PyTorch CPU-only (~200 MB vs ~4 GB for ROCm)
print("Installing PyTorch (CPU-only)...")
subprocess.check_call([
    sys.executable, '-m', 'pip', 'install',
    'torch', 'torchvision',
    '--index-url', 'https://download.pytorch.org/whl/cpu',
    '--quiet'
])
print("✔ PyTorch (CPU)")

# Group 2: Science stack
for pkgs, label in [
    (['hitran-api'],                           'hitran-api'),
    (['numpy', 'matplotlib', 'seaborn'],       'numpy, matplotlib, seaborn'),
    (['h5py', 'tqdm', 'pyyaml'],               'h5py, tqdm, pyyaml'),
]:
    subprocess.check_call(
        [sys.executable, '-m', 'pip', 'install'] + pkgs + ['--quiet']
    )
    print(f"✔ {label}")

print("\n✅ All packages installed!")

## Step 2: Set Up Project Directories

In [None]:
print("="*60)
print("STEP 2: Setting Up Project Directories")
print("="*60)

for d in [DATA_DIR, HITRAN_DIR, SPECTRA_DIR, DATASET_DIR, MODEL_DIR, OUTPUT_DIR]:
    d.mkdir(parents=True, exist_ok=True)
    print(f"✔ {d}")

print("\n✅ Directories created!")

## Step 3: Set Up HITRAN Database

Download CH₄ spectroscopic data for our wavenumber range (4383–4386 cm⁻¹).

This downloads from the HITRAN server on first run, then uses the cached local copy.

In [None]:
print("="*60)
print("STEP 3: Setting Up HITRAN Database")
print("="*60)

import hapi

hapi.db_begin(str(HITRAN_DIR))
print(f"HITRAN data directory: {HITRAN_DIR}")

print(f"\nFetching CH₄ data from HITRAN...")
print(f"  Molecule: CH₄ (ID={MOLECULE_ID})")
print(f"  Isotope: ¹²CH₄ (ID={ISOTOPE_ID})")
print(f"  Wavenumber range: {NU_MIN} – {NU_MAX} cm⁻¹")

try:
    hapi.fetch(
        TableName=TABLE_NAME,
        M=MOLECULE_ID,
        I=ISOTOPE_ID,
        numin=NU_MIN,
        numax=NU_MAX
    )
    print("\n✔ Data fetched from HITRAN server")
except Exception as e:
    print(f"\n✔ Using cached data (or: {e})")

# Verify
try:
    nu_lines = hapi.getColumn(TABLE_NAME, 'nu')
    sw_lines = hapi.getColumn(TABLE_NAME, 'sw')
    print(f"\nSpectral lines found: {len(nu_lines)}")
    for i, (nu, sw) in enumerate(zip(nu_lines[:5], sw_lines[:5])):
        print(f"  Line {i+1}: ν = {nu:.4f} cm⁻¹, Intensity = {sw:.2e}")
    if len(nu_lines) > 5:
        print(f"  ... and {len(nu_lines)-5} more lines")
    print("\n✅ HITRAN database ready!")
except Exception as e:
    print(f"\n⚠️ Could not verify HITRAN data: {e}")

## Step 4: Verify PyTorch (CPU)

We confirm PyTorch works for local model prototyping and inference. GPU verification happens in Module 1B on the cloud droplet.

In [None]:
print("="*60)
print("STEP 4: Verifying PyTorch (CPU)")
print("="*60)

import torch

print(f"\nPyTorch version : {torch.__version__}")
print(f"GPU available   : {torch.cuda.is_available()} (expected: False on Mac)")
print(f"MPS available   : {torch.backends.mps.is_available() if hasattr(torch.backends, 'mps') else 'N/A'}")
print(f"Device          : {DEVICE}")

# Quick compute test on CPU
print("\nRunning CPU compute test...")
x = torch.randn(2000, 2000)
y = torch.matmul(x, x)
print("✔ Matrix multiplication: PASSED")

x = torch.randn(100, requires_grad=True)
y = (x ** 2).sum()
y.backward()
print("✔ Gradient computation: PASSED")

print("\n✅ PyTorch (CPU) is working correctly!")
print("\nNote: For GPU training, you'll use Module 1B on the AMD cloud droplet.")

## Step 5: Verify All Packages

In [None]:
print("="*60)
print("STEP 5: Verifying All Packages")
print("="*60)

packages = {
    'torch': 'torch',
    'torchvision': 'torchvision',
    'numpy': 'numpy',
    'matplotlib': 'matplotlib',
    'seaborn': 'seaborn',
    'hapi': 'hapi',
    'h5py': 'h5py',
    'tqdm': 'tqdm',
    'yaml': 'yaml',
}

all_ok = True
print("\nPackage Status:")
print("-" * 40)

for name, module in packages.items():
    try:
        mod = __import__(module)
        version = getattr(mod, '__version__', 'installed')
        print(f"✔ {name}: {version}")
    except ImportError:
        print(f"✗ {name}: NOT FOUND")
        all_ok = False

if all_ok:
    print("\n✅ All packages verified!")
else:
    print("\n⚠️ Some packages missing. Re-run Step 1.")

---
## ✅ Local Setup Complete!

Run the cell below to confirm everything is ready.

In [None]:
print("\n" + "="*60)
print("         LOCAL SESSION SETUP SUMMARY")
print("="*60)

import torch

# Check PyTorch
torch_ok = True
torch_status = f"✅ {torch.__version__} (CPU)"

# Check HITRAN
hitran_header = HITRAN_DIR / f"{TABLE_NAME}.header"
hitran_ok = hitran_header.exists()
hitran_status = "✅ Ready" if hitran_ok else "❌ Not found"

# Check directories
dirs_ok = all(d.exists() for d in [DATA_DIR, HITRAN_DIR, MODEL_DIR])
dirs_status = "✅ Created" if dirs_ok else "❌ Missing"

print(f"""
┌─────────────────────────────────────────────────────────┐
│  Component          │  Status                          │
├─────────────────────────────────────────────────────────┤
│  PyTorch            │  {torch_status:<30} │
│  HITRAN Database    │  {hitran_status:<30} │
│  Directories        │  {dirs_status:<30} │
└─────────────────────────────────────────────────────────┘
""")

if torch_ok and hitran_ok and dirs_ok:
    print("🎉 LOCAL SETUP COMPLETE! You're ready for Modules 2–5.")
    print("\nNext steps:")
    print("  1. Open Module_02_Physics_Background.ipynb")
    print("  2. Or open Module_03_HITRAN_Simulation.ipynb")
    print("\n  When you're ready for GPU training (Modules 6–7):")
    print("  3. Upload dataset_1M.h5 to the AMD droplet")
    print("  4. Run Module_1B_GPU_Session_Setup.ipynb on the droplet")
else:
    print("⚠️  Some issues detected. Please review the steps above.")

---

## Quick Reference: HITRAN Setup for Other Notebooks

After running this setup, other local notebooks can use HITRAN like this:

```python
import hapi
from pathlib import Path

# Portable path — works on Mac and Linux
PROJECT_DIR = Path.home() / 'methane-ml-course'
HITRAN_DIR  = PROJECT_DIR / 'data' / 'hitran'
hapi.db_begin(str(HITRAN_DIR))

# Use the pre-fetched CH4 data
TABLE_NAME = 'CH4_4383_4386'

# Generate spectrum
nu, coef = hapi.absorptionCoefficient_Voigt(
    SourceTables=TABLE_NAME,
    Components=[(6, 1, 0.01)],  # (molecule_id, isotope_id, mole_fraction)
    Environment={'T': 296, 'p': 1.0},
    WavenumberRange=[4383, 4386],
    WavenumberStep=0.001
)
```

---

## Transferring Data to/from the GPU Droplet

When you're ready for Modules 6–7, transfer your generated dataset:

```bash
# Upload dataset to GPU droplet
scp ~/methane-ml-course/data/datasets/dataset_1M.h5 \
    root@<DROPLET_IP>:/root/methane-ml-course/data/datasets/

# Download trained model back to Mac
scp root@<DROPLET_IP>:/root/methane-ml-course/models/best_model.pt \
    ~/methane-ml-course/models/
```

Or with Tailscale (if configured):
```bash
scp ~/methane-ml-course/data/datasets/dataset_1M.h5 \
    root@<TAILSCALE_HOSTNAME>:/root/methane-ml-course/data/datasets/
```

---

**Module 1A Complete!** Proceed to Module 2 for physics background, or Module 3 to start simulating spectra.