# üöÄ X-Lite Colab Setup Guide

**Google Colab Environment Setup for X-Lite Training**

This notebook helps you set up Google Colab for training the X-Lite models with free GPU access.

---

## üìã What This Notebook Does

1. ‚úÖ Check GPU availability
2. ‚úÖ Mount Google Drive (for data & checkpoints)
3. ‚úÖ Clone X-Lite repository from GitHub
4. ‚úÖ Install dependencies
5. ‚úÖ Set up data paths
6. ‚úÖ Verify setup

---

## üéØ Before You Start

### Prerequisites:
- [ ] GitHub account with X-Lite repository
- [ ] Google Drive account
- [ ] Dataset uploaded to Google Drive (optional - can upload later)

### Recommended Drive Structure:
```
Google Drive/
‚îî‚îÄ‚îÄ X-Lite/
    ‚îú‚îÄ‚îÄ data/
    ‚îÇ   ‚îú‚îÄ‚îÄ Data_Entry_2017.csv
    ‚îÇ   ‚îî‚îÄ‚îÄ images/
    ‚îú‚îÄ‚îÄ checkpoints/
    ‚îî‚îÄ‚îÄ results/
```

---

**Let's begin! Run each cell sequentially.**

## Step 1: Check GPU Availability

Verify that you have GPU access (T4, P100, or V100).

In [None]:
import torch
import sys

print("=" * 60)
print("GPU CHECK")
print("=" * 60)

# Check CUDA availability
cuda_available = torch.cuda.is_available()
print(f"\n‚úì CUDA Available: {cuda_available}")

if cuda_available:
    print(f"‚úì GPU Device: {torch.cuda.get_device_name(0)}")
    print(f"‚úì CUDA Version: {torch.version.cuda}")
    
    # Get GPU memory
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"‚úì GPU Memory: {gpu_memory:.2f} GB")
    
    # Test GPU
    x = torch.randn(1000, 1000).cuda()
    y = x @ x.T
    print(f"‚úì GPU Test: Success!")
    
    print("\n" + "=" * 60)
    print("‚úÖ GPU is ready for training!")
    print("=" * 60)
else:
    print("\n" + "=" * 60)
    print("‚ö†Ô∏è  WARNING: GPU not available!")
    print("=" * 60)
    print("\nüìù To enable GPU:")
    print("   1. Click 'Runtime' in menu")
    print("   2. Select 'Change runtime type'")
    print("   3. Choose 'T4 GPU' as Hardware accelerator")
    print("   4. Click 'Save'")
    print("   5. Re-run this cell")
    print("=" * 60)

## Step 2: Mount Google Drive

Mount your Google Drive to access datasets and save checkpoints.

In [None]:
from google.colab import drive
import os

print("=" * 60)
print("MOUNTING GOOGLE DRIVE")
print("=" * 60)

# Mount Google Drive
drive.mount('/content/drive')

# Verify mount
drive_path = '/content/drive/MyDrive'
if os.path.exists(drive_path):
    print(f"\n‚úì Google Drive mounted successfully!")
    print(f"‚úì Drive path: {drive_path}")
    
    # Create X-Lite folder structure if it doesn't exist
    xlite_drive = os.path.join(drive_path, 'X-Lite')
    os.makedirs(os.path.join(xlite_drive, 'data'), exist_ok=True)
    os.makedirs(os.path.join(xlite_drive, 'checkpoints'), exist_ok=True)
    os.makedirs(os.path.join(xlite_drive, 'results'), exist_ok=True)
    
    print(f"\n‚úì Created/verified X-Lite folders in Drive:")
    print(f"  ‚Ä¢ {xlite_drive}/data/")
    print(f"  ‚Ä¢ {xlite_drive}/checkpoints/")
    print(f"  ‚Ä¢ {xlite_drive}/results/")
    
    print("\n" + "=" * 60)
    print("‚úÖ Google Drive is ready!")
    print("=" * 60)
else:
    print("\n‚ö†Ô∏è  Drive mount failed!")
    print("Please authorize the mount and re-run this cell.")

## Step 3: Clone X-Lite Repository

Clone your X-Lite repository from GitHub.

In [None]:
import os
import shutil

print("=" * 60)
print("CLONING X-LITE REPOSITORY")
print("=" * 60)

# Repository URL (update with your GitHub username if needed)
REPO_URL = "https://github.com/dinethsadee01/X-Lite.git"
REPO_DIR = "/content/X-Lite"

# Remove existing directory if present
if os.path.exists(REPO_DIR):
    print(f"\n‚ö†Ô∏è  Existing repo found. Removing...")
    shutil.rmtree(REPO_DIR)

# Clone repository
print(f"\nüì• Cloning from: {REPO_URL}")
!git clone {REPO_URL} {REPO_DIR}

# Change to repo directory
os.chdir(REPO_DIR)
print(f"\n‚úì Changed directory to: {os.getcwd()}")

# Show repo structure
print("\nüìÅ Repository structure:")
!ls -la

print("\n" + "=" * 60)
print("‚úÖ Repository cloned successfully!")
print("=" * 60)

print("\nüí° Tip: To pull latest changes later, run:")
print("   !git pull origin main")

## Step 4: Install Dependencies

Install all required Python packages.

In [None]:
print("=" * 60)
print("INSTALLING DEPENDENCIES")
print("=" * 60)

# Install requirements
print("\nüì¶ Installing packages from requirements.txt...")
!pip install -q -r requirements.txt

# Install additional packages for Colab
print("\nüì¶ Installing Colab-specific packages...")
!pip install -q wandb  # For experiment tracking (optional)

print("\n‚úì Verifying installations...")

# Verify key packages
import torch
import torchvision
import timm
import albumentations
import pandas as pd
import numpy as np

print(f"\n‚úÖ Key Packages Verified:")
print(f"  ‚Ä¢ PyTorch: {torch.__version__}")
print(f"  ‚Ä¢ TorchVision: {torchvision.__version__}")
print(f"  ‚Ä¢ timm: {timm.__version__}")
print(f"  ‚Ä¢ albumentations: {albumentations.__version__}")

print("\n" + "=" * 60)
print("‚úÖ All dependencies installed!")
print("=" * 60)

## Step 5: Configure Data Paths

Set up paths to link Google Drive data with the repository.

In [None]:
import os
import sys

print("=" * 60)
print("CONFIGURING DATA PATHS")
print("=" * 60)

# Add project to Python path
sys.path.insert(0, '/content/X-Lite')

# Import config
from config import Config

# Google Drive paths
DRIVE_XLITE = '/content/drive/MyDrive/X-Lite'
DRIVE_DATA = os.path.join(DRIVE_XLITE, 'data')
DRIVE_CHECKPOINTS = os.path.join(DRIVE_XLITE, 'checkpoints')
DRIVE_RESULTS = os.path.join(DRIVE_XLITE, 'results')

print(f"\nüìÇ Google Drive Paths:")
print(f"  ‚Ä¢ Data: {DRIVE_DATA}")
print(f"  ‚Ä¢ Checkpoints: {DRIVE_CHECKPOINTS}")
print(f"  ‚Ä¢ Results: {DRIVE_RESULTS}")

# Create symbolic links (if data exists in Drive)
if os.path.exists(DRIVE_DATA):
    # Link data directory
    if not os.path.exists('/content/X-Lite/data/raw'):
        os.makedirs('/content/X-Lite/data', exist_ok=True)
        os.system(f'ln -s {DRIVE_DATA} /content/X-Lite/data/raw')
        print(f"\n‚úì Linked Drive data to repo")
else:
    print(f"\n‚ö†Ô∏è  Data not found in Drive. Upload dataset to: {DRIVE_DATA}")
    print("   You can upload later and re-run this cell.")

# Link checkpoint directory
if not os.path.exists('/content/X-Lite/ml/models/checkpoints'):
    os.makedirs('/content/X-Lite/ml/models', exist_ok=True)
    os.system(f'ln -s {DRIVE_CHECKPOINTS} /content/X-Lite/ml/models/checkpoints')
    print(f"‚úì Linked Drive checkpoints to repo")

print("\nüìä Project Configuration:")
print(f"  ‚Ä¢ Dataset: {Config.DATASET_NAME}")
print(f"  ‚Ä¢ Classes: {Config.NUM_CLASSES}")
print(f"  ‚Ä¢ Image Size: {Config.IMAGE_SIZE}")
print(f"  ‚Ä¢ Batch Size (Teacher): {Config.TEACHER_BATCH_SIZE}")

print("\n" + "=" * 60)
print("‚úÖ Data paths configured!")
print("=" * 60)

## Step 6: Verify Setup

Run final checks to ensure everything is ready for training.

In [None]:
print("=" * 60)
print("FINAL VERIFICATION")
print("=" * 60)

checks = {
    "GPU Available": torch.cuda.is_available(),
    "Google Drive Mounted": os.path.exists('/content/drive/MyDrive'),
    "Repository Cloned": os.path.exists('/content/X-Lite'),
    "Config Loaded": 'Config' in dir(),
    "Data Directory": os.path.exists('/content/X-Lite/data'),
    "Checkpoint Directory": os.path.exists('/content/X-Lite/ml/models/checkpoints')
}

print("\nüìã Setup Checklist:\n")
all_passed = True
for check, status in checks.items():
    icon = "‚úÖ" if status else "‚ùå"
    print(f"  {icon} {check}")
    if not status:
        all_passed = False

print("\n" + "=" * 60)

if all_passed:
    print("üéâ ALL CHECKS PASSED! Ready to train!")
    print("=" * 60)
    print("\nüöÄ Next Steps:")
    print("  1. Upload dataset to Google Drive (if not done)")
    print("  2. Open 01_train_teacher.ipynb")
    print("  3. Start training the teacher model!")
    print("\nüí° Remember:")
    print("  ‚Ä¢ Save checkpoints regularly")
    print("  ‚Ä¢ Monitor training with TensorBoard/WandB")
    print("  ‚Ä¢ Session timeout: 12 hours (90 min idle)")
else:
    print("‚ö†Ô∏è  SOME CHECKS FAILED!")
    print("=" * 60)
    print("\nPlease fix the issues above and re-run verification.")
    
print("\n" + "=" * 60)

# Show system info
print("\nüìä System Information:")
!nvidia-smi --query-gpu=gpu_name,memory.total,memory.free --format=csv,noheader

print("\nüíæ Disk Space:")
!df -h /content

print("\n" + "=" * 60)

## üìù Optional: Set Up Weights & Biases (Experiment Tracking)

Weights & Biases provides free experiment tracking and visualization.

In [None]:
# Optional: Set up Weights & Biases for experiment tracking
# Skip this cell if you don't want to use W&B

try:
    import wandb
    
    print("=" * 60)
    print("WEIGHTS & BIASES SETUP (Optional)")
    print("=" * 60)
    
    print("\nüìä Weights & Biases helps you:")
    print("  ‚Ä¢ Track experiments automatically")
    print("  ‚Ä¢ Visualize metrics in real-time")
    print("  ‚Ä¢ Compare different runs")
    print("  ‚Ä¢ Share results easily")
    
    print("\nüîë To use W&B:")
    print("  1. Go to https://wandb.ai/authorize")
    print("  2. Copy your API key")
    print("  3. Run: wandb.login()")
    print("  4. Paste your API key when prompted")
    
    print("\nüí° Or skip this and use TensorBoard instead!")
    print("=" * 60)
    
except ImportError:
    print("W&B not installed. Run: !pip install wandb")

## üéØ Summary & Next Steps

### ‚úÖ What We've Set Up

1. **GPU Environment**: Verified GPU access (T4/P100/V100)
2. **Google Drive**: Mounted and configured folder structure
3. **Repository**: Cloned X-Lite from GitHub
4. **Dependencies**: Installed all required packages
5. **Data Paths**: Linked Drive data to repository
6. **Verification**: All systems ready!

---

### üìö Next Notebooks (In Order)

1. **`01_train_teacher.ipynb`** - Train DenseNet121 teacher model
2. **`02_train_student.ipynb`** - Train lightweight student models
3. **`03_knowledge_distillation.ipynb`** - Apply knowledge distillation
4. **`04_gradcam_generation.ipynb`** - Generate Grad-CAM heatmaps

---

### üí° Important Reminders

**Before Training:**
- ‚úÖ Upload dataset to Google Drive: `/X-Lite/data/`
- ‚úÖ Check GPU quota (limited daily usage)
- ‚úÖ Set session timeout reminder (12 hours max)

**During Training:**
- üíæ Save checkpoints every epoch to Drive
- üìä Monitor with TensorBoard or W&B
- ‚ö° Use mixed precision for faster training
- üìù Log hyperparameters and results

**After Training:**
- üì• Download checkpoints from Drive
- üìä Download training logs and metrics
- üîÑ Push any code changes to GitHub
- ‚úÖ Update local config with best hyperparameters

---

### üîÑ Typical Workflow

```
VS Code (Local) ‚Üí GitHub ‚Üí Colab (Training) ‚Üí Google Drive ‚Üí VS Code (Local)
    ‚Üì                                                              ‚Üì
Write Code                                              Integrate Models
    ‚Üì                                                              ‚Üì
Commit & Push                                           Download Checkpoints
```

---

### üÜò Troubleshooting

**Session Disconnects?**
- Checkpoints are saved in Drive - just resume!
- Re-run setup cells and continue from last epoch

**Out of Memory?**
- Reduce batch size in config
- Use gradient accumulation
- Clear CUDA cache: `torch.cuda.empty_cache()`

**Slow Training?**
- Enable mixed precision (AMP)
- Reduce image size temporarily
- Use DataLoader with num_workers=2

---

### üìû Resources

- **Colab Pro**: $10/month for better GPU access & longer sessions
- **Colab Documentation**: https://colab.research.google.com/
- **X-Lite GitHub**: https://github.com/dinethsadee01/X-Lite
- **Project Docs**: Check `docs/` folder in repo

---

**üéâ You're all set! Happy training! üöÄ**