# SolarMamba — Google Colab Setup

**Spatiotemporal Solar Irradiance Forecasting with MambaVision + PyramidTCN**

### Prerequisites — before running this notebook:
1. Set Colab runtime to **GPU** (`Runtime → Change runtime type → T4 / A100`)
2. Upload your dataset to **Google Drive** (Option A) or a **HuggingFace Dataset repo** (Option B)
3. Push your code to a **GitHub repository** (public or private)
4. Make sure `weights/mambavision_b_1k.pth` is accessible (Drive or GitHub LFS)

---
### Execution order:
Run the cells **top to bottom in order** — each section depends on the previous one.

## Step 1 — Verify GPU Runtime

In [None]:
import subprocess, sys

gpu_info = subprocess.run(['nvidia-smi'], capture_output=True, text=True)
if gpu_info.returncode != 0:
    print("⚠️  NO GPU DETECTED.")
    print("    Go to Runtime → Change runtime type → Hardware accelerator → GPU (T4 or A100).")
    print("    Then re-run this notebook from the top.")
else:
    print(gpu_info.stdout.split('\n')[0])   # Print first line (driver / CUDA version)
    print(gpu_info.stdout.split('\n')[8])   # Print the GPU name line
    print("\n✅ GPU is available. Proceeding with installation.")

## Step 2 — Install PyTorch (Must Match Colab's CUDA)

> **Why pin versions?**  `mamba-ssm` and `causal-conv1d` build CUDA extensions at install time.  
> They **must** be compiled against the exact same PyTorch + CUDA version that will be used at runtime.  
> Colab currently ships CUDA 12.1 runtimes, so we pin `torch==2.4.0+cu121`.

In [None]:
# ⚡ Installs torch 2.4.0 built against CUDA 12.1 — matches Colab's default GPU driver.
# If Colab upgrades its driver in the future, check: https://pytorch.org/get-started/previous-versions/
!pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 \
    --index-url https://download.pytorch.org/whl/cu121 -q

## Step 3 — Install Mamba SSM CUDA Extensions

> `causal-conv1d` **must be installed before** `mamba-ssm` because mamba-ssm depends on it at build time.  
> `--no-build-isolation` lets pip reuse the already-installed torch/CUDA headers instead of rebuilding in an isolated env.

In [None]:
# Step 1 of 2: causal-conv1d (dependency of mamba-ssm)
!pip install causal-conv1d==1.4.0 -q

In [None]:
# Step 2 of 2: mamba-ssm (CUDA kernel — this takes ~2-4 mins to compile)
!pip install mamba-ssm==2.2.4 --no-build-isolation -q

## Step 4 — Install Remaining Python Dependencies

In [None]:
# Vision / transformer stack (same versions as requirements.txt)
!pip install \
    timm==1.0.15 \
    tensorboardX==2.6.2.2 \
    einops==0.8.1 \
    transformers==4.50.0 \
    Pillow==11.1.0 \
    requests==2.32.3 -q

In [None]:
# Solar irradiance / data science stack
!pip install \
    pvlib==0.13.1 \
    pandas==2.3.3 \
    scikit-learn==1.7.2 \
    scipy==1.15.3 \
    matplotlib==3.10.8 \
    h5py==3.15.1 \
    PyYAML \
    tqdm -q

In [None]:
# Sanity check: verify torch sees the GPU and mamba-ssm can be imported
import torch
print(f"PyTorch  : {torch.__version__}")
print(f"CUDA     : {torch.version.cuda}")
print(f"GPU      : {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'NOT AVAILABLE'}")

import mamba_ssm
print(f"mamba-ssm: {mamba_ssm.__version__}  ✅")

## Step 5 — Clone Repository & Install Local Package

The `mambaVision` package is a **local, customized** version — it must be installed from your GitHub repo  
(not the generic PyPI `mambavision`) so that `from mambaVision.models.mamba_vision import mamba_vision_B` resolves correctly.

> Replace `YOUR_GITHUB_USERNAME/YOUR_REPO_NAME` with your actual GitHub repo URL.

In [None]:
import os

# ── CONFIGURATION ──────────────────────────────────────────────────────────────
GITHUB_REPO_URL = "https://github.com/YOUR_GITHUB_USERNAME/YOUR_REPO_NAME.git"
REPO_DIR        = "/content/SolarMamba_repo"   # Where the repo will be cloned
# ───────────────────────────────────────────────────────────────────────────────

# Clone (or pull if already cloned — safe to re-run)
if not os.path.exists(REPO_DIR):
    !git clone {GITHUB_REPO_URL} {REPO_DIR}
    print(f"✅ Cloned to {REPO_DIR}")
else:
    !git -C {REPO_DIR} pull
    print(f"✅ Repository already exists — pulled latest changes.")

In [None]:
# Install the local mambaVision package in editable mode from the repo root.
# This registers the `mambaVision` module so that
# `from mambaVision.models.mamba_vision import mamba_vision_B` works inside Colab.
%cd {REPO_DIR}
!pip install -e . -q
print("✅ Local mambaVision package installed.")

## Step 6 — Load Dataset

Choose **one** of the two options below. Run the cell that matches how your data is stored.

| Option | Best for | Notes |
|---|---|---|
| **A — Google Drive** | Full Folsom ASI image archive | Mount Drive, point paths to your Drive folder |
| **B — HuggingFace** | Versioned, shareable datasets | Requires HF account + uploaded dataset repo |

### Option A — Google Drive Mount

**Folder structure expected inside Drive:**
```
MyDrive/
└── SolarMamba_Data/
    ├── csv_files/
    │   └── Folsom_irradiance_weather.csv
    └── datasets/
        └── 1_Folsom/
            ├── 2014/
            ├── 2015/
            └── 2016/
```

In [None]:
# ── Option A: Google Drive ─────────────────────────────────────────────────────
# Run this cell if your dataset is stored in Google Drive.

from google.colab import drive
drive.mount('/content/drive')

# Verify the expected paths exist
import os
DRIVE_DATA_ROOT = "/content/drive/MyDrive/SolarMamba_Data"
CSV_PATH        = f"{DRIVE_DATA_ROOT}/csv_files/Folsom_irradiance_weather.csv"
IMAGE_ROOT      = f"{DRIVE_DATA_ROOT}/datasets/1_Folsom"

assert os.path.exists(CSV_PATH),   f"❌ CSV not found at: {CSV_PATH}"
assert os.path.isdir(IMAGE_ROOT),  f"❌ Image root not found at: {IMAGE_ROOT}"
print(f"✅ CSV found     : {CSV_PATH}")
print(f"✅ Image root    : {IMAGE_ROOT}")
print(f"   Years present : {sorted(os.listdir(IMAGE_ROOT))}")

DATA_SOURCE = "drive"   # tag used in Step 7 to write config

### Option B — HuggingFace Dataset Download

> Skip this cell if you chose Option A above.

In [None]:
# ── Option B: HuggingFace Dataset ─────────────────────────────────────────────
# Run this cell INSTEAD of Option A if your data is on HuggingFace.
# Requires: pip install huggingface_hub  (already included in transformers install above)

# ── CONFIGURATION ──────────────────────────────────────────────────────────────
HF_REPO_ID      = "YOUR_HF_USERNAME/folsom-asi-dataset"   # <-- Change this
HF_REPO_TYPE    = "dataset"
HF_LOCAL_DIR    = "/content/data"
# ───────────────────────────────────────────────────────────────────────────────

from huggingface_hub import snapshot_download
import os

snapshot_download(
    repo_id=HF_REPO_ID,
    repo_type=HF_REPO_TYPE,
    local_dir=HF_LOCAL_DIR,
    ignore_patterns=["*.git*", "README*"]
)

DRIVE_DATA_ROOT = HF_LOCAL_DIR
CSV_PATH        = f"{HF_LOCAL_DIR}/csv_files/Folsom_irradiance_weather.csv"
IMAGE_ROOT      = f"{HF_LOCAL_DIR}/datasets/1_Folsom"

assert os.path.exists(CSV_PATH),  f"❌ CSV not found at: {CSV_PATH}"
assert os.path.isdir(IMAGE_ROOT), f"❌ Image root not found at: {IMAGE_ROOT}"
print(f"✅ HuggingFace download complete.")
print(f"   CSV: {CSV_PATH}")
print(f"   Images: {IMAGE_ROOT}")

DATA_SOURCE = "huggingface"  # tag used in Step 7 to write config

## Step 7 — Copy Pretrained Weights

`mambavision_b_1k.pth` must be accessible at a known path.  
The cell below looks for it in the GitHub repo first (e.g. stored via **Git LFS**),  
and falls back to copying from your Drive if it's not in the repo.

In [None]:
import os, shutil

WEIGHT_FILENAME  = "mambavision_b_1k.pth"

# This is where train.py will look for the weights (matches colab_pretrained_weights in config).
# The cells below ensure the file ends up here regardless of source.
FINAL_WEIGHT_PATH = f"{REPO_DIR}/weights/{WEIGHT_FILENAME}"
os.makedirs(os.path.dirname(FINAL_WEIGHT_PATH), exist_ok=True)

# ── Source priority ────────────────────────────────────────────────────────────
# 1. Git LFS  — file is already in the cloned repo (best: no extra upload needed)
# 2. Drive    — upload mambavision_b_1k.pth to MyDrive/ECCV_Irradiance/weights/
#               and Drive is already mounted from Step 6A above.
# ───────────────────────────────────────────────────────────────────────────────
REPO_WEIGHT_PATH  = FINAL_WEIGHT_PATH                                          # after git clone (LFS)
DRIVE_WEIGHT_PATH = "/content/drive/MyDrive/ECCV_Irradiance/weights/mambavision_b_1k.pth"

if os.path.exists(REPO_WEIGHT_PATH) and os.path.getsize(REPO_WEIGHT_PATH) > 1_000_000:
    # Already present via Git LFS — nothing to do
    WEIGHTS_PATH = REPO_WEIGHT_PATH
    print(f"✅ Weights already in repo (Git LFS): {WEIGHTS_PATH}")

elif os.path.exists(DRIVE_WEIGHT_PATH):
    # Copy from Drive → repo weights dir so the path matches config exactly
    shutil.copy2(DRIVE_WEIGHT_PATH, FINAL_WEIGHT_PATH)
    WEIGHTS_PATH = FINAL_WEIGHT_PATH
    print(f"✅ Weights copied from Drive to: {WEIGHTS_PATH}")

else:
    raise FileNotFoundError(
        f"Weights not found.\n"
        f"  Checked (Git LFS) : {REPO_WEIGHT_PATH}\n"
        f"  Checked (Drive)   : {DRIVE_WEIGHT_PATH}\n\n"
        f"Fix: Upload 'mambavision_b_1k.pth' to your Drive at:\n"
        f"  MyDrive/ECCV_Irradiance/weights/mambavision_b_1k.pth"
    )

print(f"   File size : {os.path.getsize(WEIGHTS_PATH) / 1e6:.1f} MB")
print(f"   Final path: {WEIGHTS_PATH}")

## Step 8 — Write `config_colab.yaml`

This generates a clean config file for the Colab run, pointing to the  
actual `/content/...` paths resolved in the cells above.  
`env: "colab"` tells `data_loader.py` and `train.py` to use the `colab_*` keys.

In [None]:
import yaml, os

# Paths resolved from the cells above
config = {
    "env": "colab",

    "data": {
        # --- kept for reference, not used in colab env ---
        "local_root":  "../mock_data_storage",
        "server_root": "/storage2/CV_Irradiance",
        "csv_path":    "/storage2/CV_Irradiance/datasets/1_Folsom/csv_files/Folsom_irradiance_weather.csv",
        "image_root":  "/storage2/CV_Irradiance/datasets/1_Folsom",

        # --- ACTIVE colab paths ---
        "colab_root":       DRIVE_DATA_ROOT,
        "colab_csv_path":   CSV_PATH,
        "colab_image_root": IMAGE_ROOT,

        "image_tolerance_sec": 120,
        "dataset_type":        "folsom_colab_run",
        "months":              [],
        "data": {
            "years": [2014, 2015, 2016]
        },

        # Sampling
        "sampling_rate_sec": 60,
        "sequence_length":   40,

        # Image
        "image_size": 512,

        # Loader — num_workers is auto-capped to 2 on Colab in data_loader.py
        "batch_size":   8,    # Reduce if you hit CUDA OOM on T4 (15 GB VRAM)
        "num_workers":  2,
    },

    "model": {
        "visual_backbone":         "mamba_vision_B",
        "temporal_channels":       7,
        "horizons":                [1, 5, 10, 15],
        "pretrained_weights":      WEIGHTS_PATH,
        "colab_pretrained_weights": WEIGHTS_PATH,
    },

    "training": {
        "epochs":        50,
        "learning_rate": 5.0e-5,
        "weight_decay":  0.1,
        "val_split":     0.1,
        "test_split":    0.1,
        "seed":          42,
    },
}

CONFIG_PATH = f"{REPO_DIR}/SolarMamba/config_colab.yaml"
with open(CONFIG_PATH, "w") as f:
    yaml.dump(config, f, default_flow_style=False, sort_keys=False)

print(f"✅ config_colab.yaml written to: {CONFIG_PATH}")
print(f"   colab_root      : {config['data']['colab_root']}")
print(f"   colab_csv_path  : {config['data']['colab_csv_path']}")
print(f"   colab_image_root: {config['data']['colab_image_root']}")
print(f"   weights         : {config['model']['pretrained_weights']}")

## Step 9 — Run Training

Checkpoints are saved to `SolarMamba/Results/checkpoints/folsom_colab_run/` inside the repo directory.  
To persist them across Colab sessions, copy the Results folder back to Drive after training (see the cell below).

In [None]:
import os

SOLAR_MAMBA_DIR = f"{REPO_DIR}/SolarMamba"
os.chdir(SOLAR_MAMBA_DIR)
print(f"Working directory: {os.getcwd()}")

# sys.path must include repo root so that `from mambaVision.models...` resolves
import sys
if REPO_DIR not in sys.path:
    sys.path.insert(0, REPO_DIR)

# Launch training — stdout/stderr are streamed directly into this cell's output
!python train.py --config config_colab.yaml

## Step 10 — (Optional) Back Up Checkpoints to Drive

Colab VMs are **ephemeral** — all `/content/` data is lost when the session ends.  
Run this cell periodically or after training to save checkpoints to Drive.

In [None]:
import shutil, os
from datetime import datetime

SRC_RESULTS  = f"{REPO_DIR}/SolarMamba/Results"
BACKUP_DIR   = f"/content/drive/MyDrive/SolarMamba_Checkpoints/{datetime.now().strftime('%Y%m%d_%H%M%S')}"

if os.path.exists(SRC_RESULTS):
    shutil.copytree(SRC_RESULTS, BACKUP_DIR)
    print(f"✅ Checkpoints backed up to Drive: {BACKUP_DIR}")
    # List best models
    for root, dirs, files in os.walk(BACKUP_DIR):
        for f in files:
            if "best" in f:
                print(f"   Best model: {os.path.join(root, f)}")
else:
    print(f"⚠️  No Results folder found at {SRC_RESULTS} — training may not have started yet.")