# üè• YellowCert Medical Certificate Detection - Training on Colab

This notebook trains a YOLOv8 model for vaccination certificate detection using Google Colab's free GPU.

## üìã Before you start:
1. **Enable GPU**: Runtime ‚Üí Change runtime type ‚Üí Hardware accelerator ‚Üí **GPU (T4)**
2. **Prepare your dataset**: Zip your dataset folder (train/, valid/, test/, data.yaml)

## üéØ Training Options:
- **Quick Test** (10 epochs, ~15 min) - Testing only
- **Balanced** (YOLOv8m, 200 epochs, ~3 hours) - **Recommended**
- **Maximum** (YOLOv8l, 300 epochs, ~6 hours) - Best accuracy

---

## 1Ô∏è‚É£ Setup Environment

In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
# Install required packages
!pip install -q ultralytics

import torch
import os
from google.colab import files
from google.colab import drive
import shutil

print(f"‚úì PyTorch version: {torch.__version__}")
print(f"‚úì CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"‚úì GPU: {torch.cuda.get_device_name(0)}")
    print(f"‚úì VRAM: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")

## 2Ô∏è‚É£ Upload Dataset

**Choose ONE method:**

### Option A: Upload ZIP file directly (Recommended for small datasets)

In [None]:
# Create a ZIP of your dataset first:
# In terminal on your Mac:
# cd /Users/arnon/Downloads/YellowCert
# zip -r yellowcert_dataset.zip train/ valid/ test/ data.yaml

print("Click 'Choose Files' and upload yellowcert_dataset.zip...")
uploaded = files.upload()

# Extract the dataset
import zipfile
for filename in uploaded.keys():
    print(f"Extracting {filename}...")
    with zipfile.ZipFile(filename, 'r') as zip_ref:
        zip_ref.extractall('/content/yellowcert')

print("\n‚úì Dataset uploaded and extracted!")
!ls -la /content/yellowcert

### Option B: Use Google Drive (Better for large datasets)

In [None]:
# Mount Google Drive
drive.mount('/content/drive')

# IMPORTANT: Upload yellowcert_dataset.zip to your Google Drive first!
# Then update the path below:
DRIVE_DATASET_PATH = '/content/drive/MyDrive/yellowcert_dataset.zip'  # Update this path

# Extract dataset
import zipfile
print(f"Extracting dataset from Google Drive...")
with zipfile.ZipFile(DRIVE_DATASET_PATH, 'r') as zip_ref:
    zip_ref.extractall('/content/yellowcert')

print("\n‚úì Dataset loaded from Google Drive!")
!ls -la /content/yellowcert

## 3Ô∏è‚É£ Verify Dataset

In [None]:
# Check dataset structure
print("Dataset structure:")
!tree -L 2 /content/yellowcert || find /content/yellowcert -maxdepth 2 -type d

print("\ndata.yaml content:")
!cat /content/yellowcert/data.yaml

print("\nTraining images:")
!ls /content/yellowcert/train/images | head -10

print("\nValidation images:")
!ls /content/yellowcert/valid/images | head -10

## 4Ô∏è‚É£ Configure Training

**Choose your training mode:**

In [None]:
# ========== CONFIGURATION ==========
# Choose ONE training mode (uncomment the one you want):

# MODE 1: Quick Test (10 epochs, ~15 minutes) - For testing only
# TRAINING_MODE = 'quick'

# MODE 2: Balanced (YOLOv8m, 200 epochs) - RECOMMENDED ‚úÖ
TRAINING_MODE = 'balanced'

# MODE 3: Maximum Accuracy (YOLOv8l, 300 epochs) - Best results
# TRAINING_MODE = 'maximum'

# MODE 4: Ultra Maximum (YOLOv8x, 300 epochs) - Requires high VRAM
# TRAINING_MODE = 'ultra'

print(f"‚úì Training mode: {TRAINING_MODE.upper()}")

## 5Ô∏è‚É£ Train the Model üöÄ

This will take a while. You can close the browser - training will continue!

In [None]:
from ultralytics import YOLO
import os

# Training configurations
configs = {
    'quick': {
        'model': 'yolov8n.pt',
        'epochs': 10,
        'imgsz': 640,
        'batch': 16,
        'patience': 10,
        'name': 'yellowcert_quick'
    },
    'balanced': {
        'model': 'yolov8m.pt',
        'epochs': 200,
        'imgsz': 1024,
        'batch': 16,
        'patience': 50,
        'name': 'yellowcert_balanced'
    },
    'maximum': {
        'model': 'yolov8l.pt',
        'epochs': 300,
        'imgsz': 1280,
        'batch': 12,
        'patience': 80,
        'name': 'yellowcert_max'
    },
    'ultra': {
        'model': 'yolov8x.pt',
        'epochs': 300,
        'imgsz': 1280,
        'batch': 8,
        'patience': 100,
        'name': 'yellowcert_ultra'
    }
}

config = configs[TRAINING_MODE]

print("="*80)
print(f"üè• YellowCert Training - {TRAINING_MODE.upper()} MODE")
print("="*80)
print(f"Model: {config['model']}")
print(f"Epochs: {config['epochs']}")
print(f"Image size: {config['imgsz']}")
print(f"Batch size: {config['batch']}")
print("="*80)

# Load model
model = YOLO(config['model'])

# Train
try:
    results = model.train(
        data='/content/yellowcert/data.yaml',
        epochs=config['epochs'],
        imgsz=config['imgsz'],
        batch=config['batch'],
        name=config['name'],
        patience=config['patience'],
        device=0,  # Use GPU
        workers=2,
        project='runs/detect',
        exist_ok=True,
        pretrained=True,
        verbose=True,
        plots=True,
        
        # Optimizer (only for non-quick modes)
        optimizer='AdamW' if TRAINING_MODE != 'quick' else 'auto',
        lr0=0.001,
        lrf=0.01,
        momentum=0.937,
        weight_decay=0.0005,
        
        # Data augmentation
        hsv_h=0.015,
        hsv_s=0.7,
        hsv_v=0.4,
        degrees=10.0,
        translate=0.1,
        scale=0.5,
        fliplr=0.5,
        mosaic=1.0,
        mixup=0.1 if TRAINING_MODE != 'quick' else 0,
        copy_paste=0.1 if TRAINING_MODE != 'quick' else 0,
        
        # Advanced
        close_mosaic=10,
        amp=True,
        cache=True,
        label_smoothing=0.1 if TRAINING_MODE != 'quick' else 0,
        val=True,
        save_period=10,
    )
    
    print("\n" + "="*80)
    print("‚úì TRAINING COMPLETED SUCCESSFULLY!")
    print("="*80)
    
except RuntimeError as e:
    if "out of memory" in str(e).lower():
        print("\n‚ùå GPU OUT OF MEMORY!")
        print("Try reducing batch size or using a smaller model.")
    raise

## 6Ô∏è‚É£ Validate the Model

In [None]:
# Validate
print("\nValidating model...")
metrics = model.val()

print("\n" + "="*80)
print("üìä FINAL METRICS")
print("="*80)
if hasattr(metrics, 'box'):
    print(f"mAP50: {metrics.box.map50:.4f}")
    print(f"mAP50-95: {metrics.box.map:.4f}")
    print(f"Precision: {metrics.box.mp:.4f}")
    print(f"Recall: {metrics.box.mr:.4f}")
print("="*80)

## 7Ô∏è‚É£ View Training Results

In [None]:
# Display training plots
from IPython.display import Image, display
import glob

result_dir = f"runs/detect/{config['name']}"

print("Training Results:\n")

# Results plot
if os.path.exists(f"{result_dir}/results.png"):
    print("üìà Training Metrics:")
    display(Image(filename=f"{result_dir}/results.png", width=800))

# Confusion matrix
if os.path.exists(f"{result_dir}/confusion_matrix.png"):
    print("\nüéØ Confusion Matrix:")
    display(Image(filename=f"{result_dir}/confusion_matrix.png", width=600))

# Sample predictions
val_images = glob.glob(f"{result_dir}/val_batch*_pred.jpg")
if val_images:
    print("\nüîç Sample Predictions:")
    for img in val_images[:2]:  # Show first 2
        display(Image(filename=img, width=800))

## 8Ô∏è‚É£ Download Trained Model

In [None]:
# Copy best model
best_model_path = f"runs/detect/{config['name']}/weights/best.pt"

if os.path.exists(best_model_path):
    # Copy to easy location
    shutil.copy(best_model_path, '/content/best.pt')
    
    print("‚úì Best model ready for download!")
    print(f"Model size: {os.path.getsize('/content/best.pt') / 1024**2:.1f} MB")
    
    # Download the model
    print("\nDownloading best.pt...")
    files.download('/content/best.pt')
    
    print("\n" + "="*80)
    print("üéâ SUCCESS! Model downloaded!")
    print("="*80)
    print("\nNext steps:")
    print("1. Move best.pt to your YellowCert/models/ folder")
    print("2. Restart your backend: cd backend && python main.py")
    print("3. Test in the web app!")
    print("="*80)
else:
    print("‚ùå Best model not found!")

## 9Ô∏è‚É£ Optional: Download All Results

In [None]:
# Zip all training results
import shutil

print("Creating results archive...")
shutil.make_archive(
    '/content/yellowcert_results',
    'zip',
    f"runs/detect/{config['name']}"
)

print("\nDownloading all results (plots, metrics, weights)...")
files.download('/content/yellowcert_results.zip')

print("‚úì All results downloaded!")

## üß™ Optional: Test on a Sample Image

In [None]:
# Upload a test image
print("Upload a vaccination certificate image to test:")
test_uploaded = files.upload()

# Run inference
from ultralytics import YOLO
test_model = YOLO('/content/best.pt')

for filename in test_uploaded.keys():
    print(f"\nTesting on {filename}...")
    results = test_model.predict(
        source=filename,
        conf=0.25,
        save=True,
        project='test_predictions'
    )
    
    # Show result
    print(f"\nDetected {len(results[0].boxes)} objects:")
    for box in results[0].boxes:
        cls = int(box.cls[0])
        conf = float(box.conf[0])
        name = results[0].names[cls]
        print(f"  - {name}: {conf:.2f}")
    
    # Display result image
    pred_img = glob.glob('test_predictions/predict/*.jpg')[0]
    display(Image(filename=pred_img, width=800))

---

## üíæ Save to Google Drive (Optional)

To avoid losing your model if Colab disconnects:

In [None]:
# Mount drive if not already mounted
if not os.path.exists('/content/drive'):
    drive.mount('/content/drive')

# Copy model to Google Drive
drive_save_path = '/content/drive/MyDrive/yellowcert_model_best.pt'
shutil.copy('/content/best.pt', drive_save_path)

print(f"‚úì Model saved to Google Drive: {drive_save_path}")
print("You can now download it anytime from your Google Drive!")

---

## üìù Notes

### Colab Limitations:
- **Free tier**: ~12 hours max session, T4 GPU (16GB VRAM)
- **Colab Pro**: Longer sessions, better GPUs (A100)
- Sessions can disconnect - save to Google Drive!

### Training Time Estimates (T4 GPU):
- Quick (10 epochs): ~15-20 minutes
- Balanced (200 epochs): ~2-4 hours
- Maximum (300 epochs): ~5-8 hours
- Ultra (300 epochs, x model): ~8-12 hours

### Tips:
1. Start with 'balanced' mode - best results for free tier
2. Save to Google Drive periodically
3. Monitor GPU usage with `!nvidia-smi`
4. If out of memory, reduce batch size

---

**Created for YellowCert Medical Certificate Detection** üè•
