# üè• Medical Image Segmentation - Google Colab Training

**Project:** Comparing UNet, UNet++, and TransUNet on ISIC Dataset

**Author:** Prabhat

---

## üìã Training Plan

We'll train 3 models √ó 4 data fractions = **12 experiments**

| Model | 10% | 25% | 50% | 100% |
|-------|-----|-----|-----|------|
| UNet | ‚úì | ‚úì | ‚úì | ‚úì |
| UNet++ | ‚úì | ‚úì | ‚úì | ‚úì |
| TransUNet | ‚úì | ‚úì | ‚úì | ‚úì |

**Estimated Time:** 6-8 hours total

## 1Ô∏è‚É£ Setup: Check GPU

In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("‚ö†Ô∏è WARNING: No GPU detected! Training will be very slow.")
    print("Go to Runtime ‚Üí Change runtime type ‚Üí Select GPU")

## 2Ô∏è‚É£ Clone Repository

In [None]:
# Clone your GitHub repository
!git clone https://github.com/Prabhat9801/Medical-Image-Segmentation.git
%cd Medical-Image-Segmentation

# Verify
!ls -la

## 3Ô∏è‚É£ Install Dependencies

In [None]:
# Install required packages
!pip install -q timm albumentations opencv-python-headless tqdm

print("‚úÖ Dependencies installed!")

## 4Ô∏è‚É£ Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Verify your zip file is there
!ls -lh /content/drive/MyDrive/*.zip

## 5Ô∏è‚É£ Extract Processed Data

In [None]:
import zipfile
import os

# Path to your zip file in Google Drive
zip_path = "/content/drive/MyDrive/isic_processed_256.zip"
extract_path = "/content/Medical-Image-Segmentation/data/processed"

# Create directory
os.makedirs(extract_path, exist_ok=True)

# Extract
print("üì¶ Extracting data... (this may take 2-3 minutes)")
with zipfile.ZipFile(zip_path, 'r') as z:
    z.extractall(extract_path)

print("‚úÖ Extraction complete!")

# Verify
!ls -lh data/processed/isic/
!echo "\nChecking splits.csv:"
!head -5 data/processed/isic/splits.csv

## 6Ô∏è‚É£ Test Run (Quick Verification)

In [None]:
# Quick test with UNet, 10% data, 2 epochs
# This should take ~2-3 minutes

!python -m src.train \
    --model unet \
    --epochs 2 \
    --batch_size 8 \
    --data_fraction 0.1 \
    --lr 1e-4

print("\n‚úÖ Test run complete! If this worked, proceed with full training.")

## 7Ô∏è‚É£ Full Training - UNet (All Data Fractions)

In [None]:
# UNet - 10% data (~15-20 minutes)
!python -m src.train \
    --model unet \
    --epochs 30 \
    --batch_size 8 \
    --data_fraction 0.1 \
    --lr 1e-4

In [None]:
# UNet - 25% data (~30-40 minutes)
!python -m src.train \
    --model unet \
    --epochs 30 \
    --batch_size 8 \
    --data_fraction 0.25 \
    --lr 1e-4

In [None]:
# UNet - 50% data (~1 hour)
!python -m src.train \
    --model unet \
    --epochs 30 \
    --batch_size 8 \
    --data_fraction 0.5 \
    --lr 1e-4

In [None]:
# UNet - 100% data (~2 hours)
!python -m src.train \
    --model unet \
    --epochs 30 \
    --batch_size 8 \
    --data_fraction 1.0 \
    --lr 1e-4

## 8Ô∏è‚É£ Full Training - UNet++ (All Data Fractions)

In [None]:
# UNet++ - 10% data
!python -m src.train \
    --model unetpp \
    --epochs 30 \
    --batch_size 8 \
    --data_fraction 0.1 \
    --lr 1e-4

In [None]:
# UNet++ - 25% data
!python -m src.train \
    --model unetpp \
    --epochs 30 \
    --batch_size 8 \
    --data_fraction 0.25 \
    --lr 1e-4

In [None]:
# UNet++ - 50% data
!python -m src.train \
    --model unetpp \
    --epochs 30 \
    --batch_size 8 \
    --data_fraction 0.5 \
    --lr 1e-4

In [None]:
# UNet++ - 100% data
!python -m src.train \
    --model unetpp \
    --epochs 30 \
    --batch_size 8 \
    --data_fraction 1.0 \
    --lr 1e-4

## 9Ô∏è‚É£ Full Training - TransUNet (All Data Fractions)

**Note:** TransUNet uses smaller batch size (4) due to higher memory requirements

In [None]:
# TransUNet - 10% data (~45 min)
!python -m src.train \
    --model transunet \
    --epochs 30 \
    --batch_size 4 \
    --data_fraction 0.1 \
    --lr 1e-4

In [None]:
# TransUNet - 25% data (~1.5 hours)
!python -m src.train \
    --model transunet \
    --epochs 30 \
    --batch_size 4 \
    --data_fraction 0.25 \
    --lr 1e-4

In [None]:
# TransUNet - 50% data (~3 hours)
!python -m src.train \
    --model transunet \
    --epochs 30 \
    --batch_size 4 \
    --data_fraction 0.5 \
    --lr 1e-4

In [None]:
# TransUNet - 100% data (~6 hours)
!python -m src.train \
    --model transunet \
    --epochs 30 \
    --batch_size 4 \
    --data_fraction 1.0 \
    --lr 1e-4

## üîü View Training Results

In [None]:
# List all experiments
!ls -lh experiments/

In [None]:
# View training history for a specific experiment
import json
import matplotlib.pyplot as plt

# Replace with your actual experiment folder name
exp_folder = "experiments/unet_10pct_TIMESTAMP"  # UPDATE THIS

with open(f"{exp_folder}/history.json", 'r') as f:
    history = json.load(f)

# Plot
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

ax1.plot(history['train_loss'], label='Train')
ax1.plot(history['val_loss'], label='Val')
ax1.set_title('Loss')
ax1.legend()

ax2.plot(history['train_dice'], label='Train')
ax2.plot(history['val_dice'], label='Val')
ax2.set_title('Dice Score')
ax2.legend()

plt.show()

## 1Ô∏è‚É£1Ô∏è‚É£ Evaluate Models

In [None]:
# Evaluate UNet 10% model
# Replace with your actual experiment folder
!python -m src.eval \
    --model unet \
    --checkpoint experiments/unet_10pct_TIMESTAMP/best_model.pt \
    --num_vis 8

In [None]:
# View evaluation results
from IPython.display import Image, display

# Display predictions
display(Image('reports/figures/best_model/predictions.png'))

## 1Ô∏è‚É£2Ô∏è‚É£ Save Results to Google Drive

In [None]:
# Create backup folders in Drive
!mkdir -p /content/drive/MyDrive/medseg_experiments
!mkdir -p /content/drive/MyDrive/medseg_reports

# Copy all experiments
print("üì¶ Copying experiments to Google Drive...")
!cp -r experiments/* /content/drive/MyDrive/medseg_experiments/

# Copy reports
print("üì¶ Copying reports to Google Drive...")
!cp -r reports/* /content/drive/MyDrive/medseg_reports/

print("\n‚úÖ All results saved to Google Drive!")
print("üìÇ Location: MyDrive/medseg_experiments/ and MyDrive/medseg_reports/")

## 1Ô∏è‚É£3Ô∏è‚É£ Create Results Summary

In [None]:
import pandas as pd
import glob
import json

# Collect all results
results_files = glob.glob('reports/figures/*/results.json')

summary_data = []

for result_file in results_files:
    with open(result_file, 'r') as f:
        data = json.load(f)
    
    summary_data.append({
        'Model': data['model'],
        'Checkpoint': data['checkpoint'].split('/')[-2],
        'Dice': f"{data['metrics']['dice']['mean']:.4f}",
        'IoU': f"{data['metrics']['iou']['mean']:.4f}",
        'Accuracy': f"{data['metrics']['accuracy']['mean']:.4f}"
    })

summary_df = pd.DataFrame(summary_data)
print("\n" + "="*60)
print("üìä RESULTS SUMMARY")
print("="*60)
print(summary_df.to_string(index=False))
print("="*60)

# Save to CSV
summary_df.to_csv('results_summary.csv', index=False)
!cp results_summary.csv /content/drive/MyDrive/

print("\n‚úÖ Summary saved to Google Drive as results_summary.csv")

## ‚úÖ Training Complete!

### Next Steps:

1. **Download results** from Google Drive:
   - `medseg_experiments/` folder
   - `medseg_reports/` folder
   - `results_summary.csv`

2. **Complete the report** (`reports/report.md`):
   - Fill in actual performance numbers
   - Add visualizations
   - Write analysis

3. **Create presentation** using the figures

4. **Update README** with findings

---

**üéâ Congratulations on completing the training!**