# üè• YellowCert Medical Certificate Detection - Training on Colab

This notebook trains a YOLOv8 model for vaccination certificate detection using Google Colab's free GPU.

## üìã Before you start:
1. **Enable GPU**: Runtime ‚Üí Change runtime type ‚Üí Hardware accelerator ‚Üí **GPU (T4)**
2. **Prepare your dataset**: Zip your dataset folder (train/, valid/, test/, data.yaml)

## üéØ Training Options (Optimized for T4 GPU):
- **Quick Test** (YOLOv8n, 10 epochs, ~15 min) - Testing only
- **Balanced** (YOLOv8s, 200 epochs, ~2-3 hours) - **Recommended for T4**
- **Advanced** (YOLOv8m, 150 epochs, ~3-4 hours) - Better accuracy
- **Maximum** (YOLOv8m, 200 epochs, ~4-5 hours) - Best for T4

---

## 1Ô∏è‚É£ Setup Environment

In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
# Install required packages
!pip install -q ultralytics

import torch
import os
from google.colab import files
from google.colab import drive
import shutil
import gc

print(f"‚úì PyTorch version: {torch.__version__}")
print(f"‚úì CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"‚úì GPU: {torch.cuda.get_device_name(0)}")
    total_memory = torch.cuda.get_device_properties(0).total_memory / 1024**3
    print(f"‚úì Total VRAM: {total_memory:.1f} GB")
    
    # Memory optimization for T4
    if total_memory < 16:
        print("\n‚ö†Ô∏è  Detected T4 GPU (15GB VRAM)")
        print("   Using memory-optimized settings automatically")

## 2Ô∏è‚É£ Upload Dataset

**Choose ONE method:**

### Option A: Upload ZIP file directly

In [None]:
# Create a ZIP of your dataset first:
# cd /Users/arnon/Downloads/YellowCert
# ./prepare_for_colab.sh

print("Click 'Choose Files' and upload yellowcert_dataset.zip...")
uploaded = files.upload()

# Extract the dataset
import zipfile
for filename in uploaded.keys():
    print(f"Extracting {filename}...")
    with zipfile.ZipFile(filename, 'r') as zip_ref:
        zip_ref.extractall('/content/yellowcert')

print("\n‚úì Dataset uploaded and extracted!")
!ls -la /content/yellowcert

### Option B: Use Google Drive

In [None]:
# Mount Google Drive
drive.mount('/content/drive')

# Update the path to your dataset ZIP in Google Drive:
DRIVE_DATASET_PATH = '/content/drive/MyDrive/yellowcert_dataset.zip'

# Extract dataset
import zipfile
print(f"Extracting dataset from Google Drive...")
with zipfile.ZipFile(DRIVE_DATASET_PATH, 'r') as zip_ref:
    zip_ref.extractall('/content/yellowcert')

print("\n‚úì Dataset loaded from Google Drive!")
!ls -la /content/yellowcert

## 3Ô∏è‚É£ Verify Dataset

In [None]:
# Check dataset structure
print("Dataset structure:")
!find /content/yellowcert -maxdepth 2 -type d

print("\ndata.yaml content:")
!cat /content/yellowcert/data.yaml

# Count images
import glob
train_imgs = len(glob.glob('/content/yellowcert/train/images/*.*'))
valid_imgs = len(glob.glob('/content/yellowcert/valid/images/*.*'))
print(f"\nTraining images: {train_imgs}")
print(f"Validation images: {valid_imgs}")

## 4Ô∏è‚É£ Configure Training

**Choose your training mode:**

‚úÖ **Recommended for T4 GPU: `balanced` or `advanced`**

In [None]:
# ========== CONFIGURATION ==========
# Choose ONE training mode:

# MODE 1: Quick Test - For testing pipeline only
# TRAINING_MODE = 'quick'

# MODE 2: Balanced - RECOMMENDED for T4 GPU ‚úÖ
TRAINING_MODE = 'balanced'

# MODE 3: Advanced - Better accuracy, fits in T4
# TRAINING_MODE = 'advanced'

# MODE 4: Maximum - Best for T4 (longer training)
# TRAINING_MODE = 'maximum'

print(f"‚úì Training mode: {TRAINING_MODE.upper()}")

## 5Ô∏è‚É£ Clear GPU Memory (Important!)

In [None]:
# Clear any existing GPU memory
import gc
import torch

gc.collect()
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    torch.cuda.reset_peak_memory_stats()
    print("‚úì GPU memory cleared")
    print(f"Available VRAM: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")

## 6Ô∏è‚É£ Train the Model üöÄ

‚ö†Ô∏è **Important:** This will take 2-5 hours depending on mode. Keep the browser tab open!

In [None]:
from ultralytics import YOLO
import os
import torch

# T4-OPTIMIZED configurations (15GB VRAM)
configs = {
    'quick': {
        'model': 'yolov8n.pt',
        'epochs': 10,
        'imgsz': 640,
        'batch': 16,
        'patience': 10,
        'name': 'yellowcert_quick'
    },
    'balanced': {
        'model': 'yolov8s.pt',      # Small model - good balance
        'epochs': 200,
        'imgsz': 640,               # Reduced from 1024
        'batch': 16,                # Safe for T4
        'patience': 50,
        'name': 'yellowcert_balanced'
    },
    'advanced': {
        'model': 'yolov8m.pt',      # Medium model
        'epochs': 150,
        'imgsz': 640,               # Safe size for T4
        'batch': 8,                 # Reduced batch for medium model
        'patience': 50,
        'name': 'yellowcert_advanced'
    },
    'maximum': {
        'model': 'yolov8m.pt',      # Medium model (not large)
        'epochs': 200,
        'imgsz': 768,               # Moderate size
        'batch': 6,                 # Smaller batch for safety
        'patience': 60,
        'name': 'yellowcert_max'
    }
}

config = configs[TRAINING_MODE]

print("="*80)
print(f"üè• YellowCert Training - {TRAINING_MODE.upper()} MODE")
print("="*80)
print(f"Model: {config['model']}")
print(f"Epochs: {config['epochs']}")
print(f"Image size: {config['imgsz']}")
print(f"Batch size: {config['batch']}")
print(f"\n‚è∞ Estimated time: ", end="")
if TRAINING_MODE == 'quick':
    print("15-20 minutes")
elif TRAINING_MODE == 'balanced':
    print("2-3 hours")
elif TRAINING_MODE == 'advanced':
    print("3-4 hours")
else:
    print("4-5 hours")
print("="*80)

# Load model
model = YOLO(config['model'])

# Train with error handling
try:
    results = model.train(
        data='/content/yellowcert/data.yaml',
        epochs=config['epochs'],
        imgsz=config['imgsz'],
        batch=config['batch'],
        name=config['name'],
        patience=config['patience'],
        device=0,
        workers=2,                  # Reduced workers
        project='runs/detect',
        exist_ok=True,
        pretrained=True,
        verbose=True,
        plots=True,
        
        # Optimizer
        optimizer='AdamW' if TRAINING_MODE != 'quick' else 'auto',
        lr0=0.001,
        lrf=0.01,
        momentum=0.937,
        weight_decay=0.0005,
        
        # Data augmentation
        hsv_h=0.015,
        hsv_s=0.7,
        hsv_v=0.4,
        degrees=10.0,
        translate=0.1,
        scale=0.5,
        fliplr=0.5,
        mosaic=1.0,
        mixup=0.1 if TRAINING_MODE != 'quick' else 0,
        copy_paste=0.1 if TRAINING_MODE != 'quick' else 0,
        
        # Memory optimization
        close_mosaic=10,
        amp=True,                   # Mixed precision for memory efficiency
        cache=False,                # Disable cache to save memory
        label_smoothing=0.1 if TRAINING_MODE != 'quick' else 0,
        val=True,
        save_period=20,             # Save less frequently
    )
    
    print("\n" + "="*80)
    print("‚úì TRAINING COMPLETED SUCCESSFULLY!")
    print("="*80)
    
except RuntimeError as e:
    if "out of memory" in str(e).lower():
        print("\n" + "="*80)
        print("‚ùå GPU OUT OF MEMORY ERROR")
        print("="*80)
        print("\nüí° Solutions:")
        print("\n1. Reduce batch size - Add this BEFORE the train cell:")
        print(f"   config['batch'] = {max(2, config['batch']//2)}")
        print("\n2. Use smaller image size - Add this:")
        print(f"   config['imgsz'] = {max(320, config['imgsz']//2)}")
        print("\n3. Use a smaller model - Change TRAINING_MODE:")
        if TRAINING_MODE == 'maximum':
            print("   TRAINING_MODE = 'advanced'")
        elif TRAINING_MODE == 'advanced':
            print("   TRAINING_MODE = 'balanced'")
        else:
            print("   TRAINING_MODE = 'quick'")
        print("\n4. Clear memory and restart runtime:")
        print("   Runtime ‚Üí Restart runtime ‚Üí Re-run cells")
        print("="*80)
    raise

# Clear memory after training
gc.collect()
torch.cuda.empty_cache()

## 7Ô∏è‚É£ Validate the Model

In [None]:
# Validate
print("\nValidating model...")
metrics = model.val()

print("\n" + "="*80)
print("üìä FINAL METRICS")
print("="*80)
if hasattr(metrics, 'box'):
    print(f"mAP50:     {metrics.box.map50:.4f}  (higher is better, max 1.0)")
    print(f"mAP50-95:  {metrics.box.map:.4f}  (higher is better, max 1.0)")
    print(f"Precision: {metrics.box.mp:.4f}  (accuracy of detections)")
    print(f"Recall:    {metrics.box.mr:.4f}  (% of objects found)")
    print("\nüí° Good results: mAP50 > 0.80, mAP50-95 > 0.60")
print("="*80)

## 8Ô∏è‚É£ View Training Results

In [None]:
# Display training plots
from IPython.display import Image, display
import glob

result_dir = f"runs/detect/{config['name']}"

print("Training Results:\n")

# Results plot
if os.path.exists(f"{result_dir}/results.png"):
    print("üìà Training Metrics:")
    display(Image(filename=f"{result_dir}/results.png", width=800))

# Confusion matrix
if os.path.exists(f"{result_dir}/confusion_matrix.png"):
    print("\nüéØ Confusion Matrix:")
    display(Image(filename=f"{result_dir}/confusion_matrix.png", width=600))

# Sample predictions
val_images = glob.glob(f"{result_dir}/val_batch*_pred.jpg")
if val_images:
    print("\nüîç Sample Predictions:")
    for img in val_images[:2]:
        display(Image(filename=img, width=800))

## 9Ô∏è‚É£ Download Trained Model

In [None]:
# Copy best model
best_model_path = f"runs/detect/{config['name']}/weights/best.pt"

if os.path.exists(best_model_path):
    shutil.copy(best_model_path, '/content/best.pt')
    
    print("‚úì Best model ready for download!")
    print(f"Model size: {os.path.getsize('/content/best.pt') / 1024**2:.1f} MB")
    
    # Download the model
    print("\nDownloading best.pt...")
    files.download('/content/best.pt')
    
    print("\n" + "="*80)
    print("üéâ SUCCESS! Model downloaded!")
    print("="*80)
    print("\nNext steps on your Mac:")
    print("1. mv ~/Downloads/best.pt /Users/arnon/Downloads/YellowCert/models/")
    print("2. cd /Users/arnon/Downloads/YellowCert/backend")
    print("3. python main.py")
    print("4. Open http://localhost:3000 and test!")
    print("="*80)
else:
    print("‚ùå Best model not found!")

## üîü Save to Google Drive (Recommended)

In [None]:
# Mount drive if not already mounted
if not os.path.exists('/content/drive'):
    drive.mount('/content/drive')

# Save model and results
drive_model_path = '/content/drive/MyDrive/yellowcert_best.pt'
shutil.copy('/content/best.pt', drive_model_path)

# Also save training plots
if os.path.exists(f"runs/detect/{config['name']}/results.png"):
    shutil.copy(
        f"runs/detect/{config['name']}/results.png",
        '/content/drive/MyDrive/yellowcert_results.png'
    )

print(f"‚úì Model saved to: {drive_model_path}")
print("‚úì Results saved to Google Drive")
print("\nYou can now download from Google Drive anytime!")

---

## üìù Training Mode Comparison

| Mode | Model | Time | mAP50 | Memory |
|------|-------|------|-------|--------|
| Quick | YOLOv8n | 15-20 min | 0.70-0.80 | Low |
| **Balanced** ‚úÖ | YOLOv8s | 2-3 hrs | **0.80-0.90** | Medium |
| Advanced | YOLOv8m | 3-4 hrs | 0.85-0.93 | High |
| Maximum | YOLOv8m | 4-5 hrs | 0.87-0.95 | High |

## ‚ö†Ô∏è If You Get Out of Memory Error:

1. **Restart runtime**: Runtime ‚Üí Restart runtime
2. **Choose smaller mode**: Change `TRAINING_MODE = 'balanced'`
3. **Reduce batch size**: Add `config['batch'] = 4` before training
4. **Clear memory**: Run the "Clear GPU Memory" cell

## üí° Pro Tips:

- ‚úÖ **Start with 'balanced'** - Best for T4 GPU
- ‚úÖ **Save to Google Drive** - Avoid losing progress
- ‚úÖ **Keep tab open** - Prevents session timeout
- ‚úÖ **Monitor progress** - Watch the training output

---

**Created for YellowCert Medical Certificate Detection** üè•
