# Google Colab Training Setup

This notebook sets up the environment in Google Colab for training medical image segmentation models.

**Steps:**
1. Check GPU availability
2. Clone GitHub repository
3. Install dependencies
4. Mount Google Drive
5. Extract processed data
6. Train models
7. Evaluate and save results

## 1. Check GPU

In [None]:
!nvidia-smi

In [None]:
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU: {torch.cuda.get_device_name(0)}")

## 2. Clone Repository

In [None]:
# Clone your repository
!git clone https://github.com/YOUR_USERNAME/Medical-Image-Segmentation.git
%cd Medical-Image-Segmentation

## 3. Install Dependencies

In [None]:
!pip install -q timm albumentations opencv-python-headless

## 4. Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## 5. Extract Processed Data

In [None]:
import zipfile
import os

# Path to your zip file in Google Drive
zip_path = "/content/drive/MyDrive/isic_processed_256.zip"
extract_path = "/content/Medical-Image-Segmentation/data/processed"

# Create directory
os.makedirs(extract_path, exist_ok=True)

# Extract
print("Extracting data...")
with zipfile.ZipFile(zip_path, 'r') as z:
    z.extractall(extract_path)

print("✓ Data extracted successfully!")

# Verify
!ls -lh data/processed/isic/

## 6. Training

### 6.1 Train UNet with 10% Data

In [None]:
!python -m src.train \
    --model unet \
    --epochs 30 \
    --batch_size 8 \
    --data_fraction 0.1 \
    --lr 1e-4

### 6.2 Train UNet++ with 10% Data

In [None]:
!python -m src.train \
    --model unetpp \
    --epochs 30 \
    --batch_size 8 \
    --data_fraction 0.1 \
    --lr 1e-4

### 6.3 Train TransUNet with 10% Data

In [None]:
!python -m src.train \
    --model transunet \
    --epochs 30 \
    --batch_size 4 \
    --data_fraction 0.1 \
    --lr 1e-4

### 6.4 Train with Other Data Fractions

Repeat for 25%, 50%, and 100% data:

In [None]:
# Example: UNet with 25% data
!python -m src.train \
    --model unet \
    --epochs 30 \
    --batch_size 8 \
    --data_fraction 0.25 \
    --lr 1e-4

## 7. Evaluation

### 7.1 Evaluate Best Model

In [None]:
# Find the experiment directory
!ls experiments/

In [None]:
# Evaluate UNet model
!python -m src.eval \
    --model unet \
    --checkpoint experiments/unet_10pct_TIMESTAMP/best_model.pt \
    --num_vis 8

### 7.2 View Results

In [None]:
import json
from IPython.display import Image, display

# Load results
with open('reports/figures/best_model/results.json', 'r') as f:
    results = json.load(f)

print("Evaluation Results:")
print(f"Dice: {results['metrics']['dice']['mean']:.4f} ± {results['metrics']['dice']['std']:.4f}")
print(f"IoU:  {results['metrics']['iou']['mean']:.4f} ± {results['metrics']['iou']['std']:.4f}")

# Display predictions
display(Image('reports/figures/best_model/predictions.png'))

## 8. Save Results to Drive

In [None]:
# Copy experiments to Drive
!cp -r experiments /content/drive/MyDrive/medseg_experiments/

# Copy reports to Drive
!cp -r reports /content/drive/MyDrive/medseg_reports/

print("✓ Results saved to Google Drive!")

## 9. Create Results Summary

In [None]:
import pandas as pd
import glob

# Collect all results
results_files = glob.glob('reports/figures/*/results.json')

summary_data = []

for result_file in results_files:
    with open(result_file, 'r') as f:
        data = json.load(f)
    
    summary_data.append({
        'Model': data['model'],
        'Checkpoint': data['checkpoint'],
        'Dice': f"{data['metrics']['dice']['mean']:.4f}",
        'IoU': f"{data['metrics']['iou']['mean']:.4f}",
        'Accuracy': f"{data['metrics']['accuracy']['mean']:.4f}"
    })

summary_df = pd.DataFrame(summary_data)
print("\nResults Summary:")
print(summary_df.to_string(index=False))

# Save to CSV
summary_df.to_csv('results_summary.csv', index=False)
!cp results_summary.csv /content/drive/MyDrive/

## Next Steps

1. Download results from Google Drive
2. Complete the report with actual numbers
3. Create visualizations for presentation
4. Update README with findings