# Colab: Evaluate Checkpoints & Package Results

This notebook mounts Google Drive, copies checkpoints/results into the Colab runtime, installs dependencies, runs either a quick "Path A" analysis (process existing JSONs) or a full "Path B" evaluation (run `analysis/evaluate_checkpoints_per_class.py` on checkpoints), generates plots and per-class JSON outputs, and packages the small outputs back to Drive.

How to use:
1. Open this notebook in Colab (File > Upload notebook or via `colab.research.google.com` and select this GitHub/Drive file).
2. Run cells sequentially. Use Path A if you already have `results/*.json` in Drive. Use Path B to evaluate checkpoints in Drive (requires GPU and the dataset or access to dataset files).

This notebook will write a zip file to `/content/drive/My Drive/GhanaSegNet_Results/results_per_class_summary_jsons.zip` containing the small JSON and PNG outputs.

## 1) Check for an existing Colab notebook

This section searches the repository for other notebooks. It can help avoid duplicates and detect if a Colab-ready notebook already exists.

Run the Python cell below to list any `.ipynb` files in the repo and preview the first few lines of any candidate notebook.

In [None]:
# Cell 2: Search for notebooks in repo
import os
from pathlib import Path

repo_root = Path('/content/GhanaSegNet') if Path('/content/GhanaSegNet').exists() else Path('.')
notebooks = [str(p) for p in Path('.').rglob('*.ipynb')]
print(f'Found {len(notebooks)} notebook(s) in repo root:')
for nb in notebooks:
    print('-', nb)

# Preview the first candidate (if any)
if notebooks:
    with open(notebooks[0], 'r', encoding='utf-8') as f:
        first_lines = ''.join([next(f) for _ in range(5)])
    print('\nPreview of', notebooks[0], ':')
    print(first_lines)

## 2) Prepare Colab-compatible notebook metadata

If you'd like to programmatically patch notebook metadata to be Colab-friendly (kernelspec and runtime), run the helper below. It uses nbformat to open and rewrite metadata.

Note: This step is optional; Colab generally handles metadata itself when opening a notebook from Drive.

In [None]:
# Cell: helper to update notebook metadata to be Colab-friendly
try:
    import nbformat
    from nbformat import read, write, v4
except Exception as e:
    print('nbformat not available, you can pip install nbformat if you want to run this cell')


def patch_notebook_metadata(nb_path: str):
    nb = nbformat.read(nb_path, as_version=4)
    nb.metadata.setdefault('kernelspec', {})
    nb.metadata['kernelspec'].update({
        'name': 'python3',
        'display_name': 'Python 3'
    })
    nb.metadata.setdefault('colab', {})
    nb.metadata['colab'].update({'name': Path(nb_path).name})
    nbformat.write(nb, nb_path)
    print('Patched metadata for', nb_path)

# Example usage (uncomment to run):
# patch_notebook_metadata('Colab_Evaluate_and_Package.ipynb')

## 3) Install and verify dependencies in Colab

This cell installs dependencies from `requirements.txt` located in the repo root. It then verifies PyTorch and GPU availability. If you have a custom environment or a private wheel, adapt the install commands.

Note: Installing all dependencies can take several minutes.

In [None]:
# Install dependencies (may require GPU-specific torch wheel)
import os

# If requirements.txt exists in repo root, install it
if os.path.exists('requirements.txt'):
    print('Installing from requirements.txt...')
    !pip install -r requirements.txt
else:
    print('No requirements.txt found. Please ensure dependencies are installed manually.')

# Quick checks for torch and cuda
try:
    import torch
    print('torch', torch.__version__)
    print('CUDA available:', torch.cuda.is_available())
    if torch.cuda.is_available():
        print('CUDA device count:', torch.cuda.device_count())
        print('CUDA device name:', torch.cuda.get_device_name(0))
except Exception as e:
    print('PyTorch not available:', e)

# Optionally install PlantUML dependencies if needed for diagram rendering
!pip install plantuml-markdown sarge >/dev/null 2>&1 || true
print('Dependency install step complete.')

## 4) Mount Google Drive and access data

Mount Drive and define `DRIVE_ROOT` where your checkpoints/results are stored. Adjust `DRIVE_RESULTS_DIR` and `DRIVE_CHECKPOINTS_DIR` to the locations you used in Drive (for example: `My Drive/GhanaSegNet/results` and `My Drive/GhanaSegNet/checkpoints`).

In [None]:
# Mount Drive
from google.colab import drive
import os

drive.mount('/content/drive')

# Default Drive paths (adjust these to your Drive layout)
DRIVE_ROOT = '/content/drive/My Drive'
DRIVE_PROJECT_DIR = os.path.join(DRIVE_ROOT, 'GhanaSegNet')
DRIVE_RESULTS_DIR = os.path.join(DRIVE_PROJECT_DIR, 'results')
DRIVE_CHECKPOINTS_DIR = os.path.join(DRIVE_PROJECT_DIR, 'checkpoints')

print('Drive project dir:', DRIVE_PROJECT_DIR)
print('Results dir:', DRIVE_RESULTS_DIR)
print('Checkpoints dir:', DRIVE_CHECKPOINTS_DIR)

# List contents for verification
for p in [DRIVE_PROJECT_DIR, DRIVE_RESULTS_DIR, DRIVE_CHECKPOINTS_DIR]:
    try:
        print('\nContents of', p)
        for entry in os.listdir(p):
            print(' -', entry)
    except Exception as e:
        print('Could not list', p, ':', e)

## 5) Download or stream datasets from remote sources (optional)

If your dataset isn't on Drive, use this section to download it. Example shows using gdown for Google Drive shared links and wget for HTTP links. Include a checksum verification step when possible.

In [None]:
# Example: download dataset or other artifacts
# Uncomment and edit the lines below for your dataset urls
# !gdown --id <file-id> -O /content/dataset.zip
# !unzip -q /content/dataset.zip -d /content/dataset

print('Dataset download section - no action performed by default')

## 6) Run core analysis / model evaluation

This section provides two paths:
- Path A (quick): copy Drive `results/*.json` into the runtime and run the analysis scripts (plots & stats) using existing JSONs.
- Path B (full): copy Drive `checkpoints/` into the runtime and run `analysis/evaluate_checkpoints_per_class.py` to produce per-class JSONs. This may require GPU and dataset access. Choose Path B only if you have checkpoints and the dataset available in the runtime.

Set `USE_PATH = 'A'` or `'B'` in the next cell and adjust paths if necessary.

In [None]:
# Path selection
USE_PATH = 'A'  # 'A' = use existing results JSONs; 'B' = evaluate checkpoints

# Local workspace in Colab
import shutil
import os
from pathlib import Path

WORK_DIR = Path('/content/GhanaSegNet')
if WORK_DIR.exists():
    print('Workspace already at', WORK_DIR)
else:
    # Copy repo from Drive_PROJECT_DIR if present, else try to `git clone` (optional)
    if Path(DRIVE_PROJECT_DIR).exists():
        print('Copying repo from Drive to', WORK_DIR)
        shutil.copytree(DRIVE_PROJECT_DIR, WORK_DIR)
    else:
        print('Drive project dir not found - ensure you uploaded the repo or set DRIVE_PROJECT_DIR correctly')

os.chdir(WORK_DIR)
print('Current dir:', os.getcwd())

# Ensure results output folder exists
RESULTS_OUT = WORK_DIR / 'results'
RESULTS_OUT.mkdir(exist_ok=True)

# Path A: use existing JSONs from Drive
if USE_PATH == 'A':
    src = Path(DRIVE_RESULTS_DIR)
    if src.exists():
        print('Copying results from Drive results dir to runtime')
        for f in src.glob('*.json'):
            print(' - copying', f)
            shutil.copy(f, RESULTS_OUT / f.name)
    else:
        print('Drive results directory not found; adjust DRIVE_RESULTS_DIR')

# Path B: copy checkpoints and run evaluation
if USE_PATH == 'B':
    ckpt_src = Path(DRIVE_CHECKPOINTS_DIR)
    if ckpt_src.exists():
        ckpt_dest = WORK_DIR / 'checkpoints'
        print('Copying checkpoints to runtime:', ckpt_dest)
        if ckpt_dest.exists():
            print('checkpoints dir already exists at runtime')
        else:
            shutil.copytree(ckpt_src, ckpt_dest)
    else:
        print('Drive checkpoints directory not found; adjust DRIVE_CHECKPOINTS_DIR')

# If Path B, run the evaluation wrapper
if USE_PATH == 'B':
    # Adjust command depending on how evaluate_checkpoints_per_class.py expects args
    cmd = 'python analysis/evaluate_checkpoints_per_class.py --checkpoints checkpoints --out results/per_class_summary'
    print('Running evaluation wrapper:', cmd)
    os.system(cmd)

# After Path A, run analysis scripts that read results/*.json
if USE_PATH == 'A':
    print('Running analysis/model_comparison_analysis.py to generate comparison plots')
    os.system('python analysis/model_comparison_analysis.py')
    print('Running analysis/compute_val_iou_stats.py to compute stats')
    os.system('python analysis/compute_val_iou_stats.py')

print('Analysis step complete. Check results/ for outputs.')

## 7) Save outputs, checkpoints, and artifacts to Drive

This section packages small outputs (JSONs and plots) into a zip file and copies it back to Drive at the location specified below.

In [None]:
# Package results (JSONs and PNGs) into a zip and copy to Drive
from pathlib import Path
import shutil

OUT_ZIP = Path('/content/results_per_class_summary_jsons.zip')
SRC_DIRS = [Path('results')]

# Create a zip of the results directory
print('Creating zip:', OUT_ZIP)
if OUT_ZIP.exists():
    OUT_ZIP.unlink()
shutil.make_archive(str(OUT_ZIP).replace('.zip',''), 'zip', 'results')

# Copy to drive destination
DRIVE_OUT = os.path.join(DRIVE_ROOT, 'GhanaSegNet_Results')
Path(DRIVE_OUT).mkdir(parents=True, exist_ok=True)
shutil.copy(str(OUT_ZIP), os.path.join(DRIVE_OUT, 'results_per_class_summary_jsons.zip'))
print('Copied zip to', os.path.join(DRIVE_OUT, 'results_per_class_summary_jsons.zip'))

# List created files
print('\nFiles in results/:')
for f in Path('results').glob('*'):
    print(' -', f)

## 8) Convert/publish notebook to Colab & quick link

To open this notebook in Colab from GitHub, push this repo to GitHub and use the URL:

https://colab.research.google.com/github/<your-user>/<your-repo>/blob/main/Colab_Evaluate_and_Package.ipynb

Or simply open the copy stored in Drive under `GhanaSegNet/Colab_Evaluate_and_Package.ipynb`.

## 9) Automated testing: run notebook headlessly (optional)

If you'd like to validate the notebook in CI, use Papermill or nbconvert. Example GitHub Actions workflow snippet is provided in the cell below (edit paths and timeouts as needed).