# 🚀 1-Hour ImageNet Validation Training on Google Colab

This notebook runs a complete ImageNet-1k validation training for exactly 1 hour using:
- **HuggingFace datasets API** for streaming ImageNet-1k data
- **MosaicML Composer** for optimized training
- **Time-based termination** to ensure 1-hour runtime
- **T4 GPU optimization** for Google Colab

## 📋 What This Does
- ✅ Streams 25,000 ImageNet samples efficiently
- ✅ Trains ResNet50 with modern optimizations (MixUp, CutMix, EMA, etc.)
- ✅ Automatically stops after exactly 60 minutes
- ✅ Monitors GPU utilization and training progress
- ✅ Saves checkpoints and metrics

## ⏱️ Expected Timeline
- **Setup**: 2-3 minutes
- **Data loading**: 1-2 minutes
- **Training**: 57-58 minutes
- **Total**: ~60 minutes


## 🔧 Setup and Installation

First, let's install all required dependencies and verify our environment.


In [None]:
# Install required packages - Updated for compatibility
print("📦 Installing packages (this may take 2-3 minutes)...")

# Install core ML packages 
!pip install -q torch torchvision

# Install MosaicML Composer - use latest version to avoid metadata issues
print("🎵 Installing MosaicML Composer...")
!pip install -q --upgrade pip  # Ensure latest pip
!pip install -q "mosaicml>=0.20.0" --no-warn-conflicts

# Install HuggingFace and other utilities
!pip install -q datasets transformers huggingface_hub 
!pip install -q torchmetrics wandb
!pip install -q pillow numpy matplotlib tqdm

print("✅ All packages installed successfully!")
print("💡 Note: Any warnings about mosaicml version metadata can be safely ignored.")


In [None]:
# Test MosaicML Composer installation
print("🔍 Testing MosaicML Composer installation...")

try:
    from composer import Trainer
    from composer.algorithms import MixUp
    print("✅ MosaicML Composer is working correctly!")
    
except ImportError as e:
    print(f"❌ Import failed: {e}")
    print("\n🔧 TROUBLESHOOTING:")
    print("If you see metadata warnings or import errors, try these solutions:")
    print("\n1️⃣ Run the fix script:")
    print("   !python fix_mosaicml_install.py")
    print("\n2️⃣ Or manually install a specific version:")
    print("   !pip install 'mosaicml==0.21.0' --force-reinstall")
    print("\n3️⃣ Or install from source:")
    print("   !pip install git+https://github.com/mosaicml/composer.git")
    
    # Stop execution if composer is not working
    raise ImportError("MosaicML Composer is required for training. Please fix installation first.")


In [None]:
# Check environment and GPU availability
import torch
import sys
from datetime import datetime

print("🖥️  ENVIRONMENT CHECK")
print("=" * 40)
print(f"Python: {sys.version}")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name()}")
    print(f"CUDA Version: {torch.version.cuda}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
    gpu_ok = True
else:
    print("⚠️ No GPU detected - training will be very slow!")
    gpu_ok = False

print(f"Start time: {datetime.now().strftime('%H:%M:%S')}")
print("=" * 40)

if not gpu_ok:
    print("\n❌ Please enable GPU runtime in Colab:")
    print("   Runtime → Change runtime type → Hardware accelerator → T4 GPU")


## 🔐 HuggingFace Authentication (Required)

**⚠️ Important**: ImageNet-1k is a gated dataset requiring authentication.

### Steps to get access:
1. **Request access**: Visit https://huggingface.co/datasets/imagenet-1k and request access
2. **Wait for approval**: Usually takes 1-2 business days
3. **Get your token**: Visit https://huggingface.co/settings/tokens and create a token
4. **Set your token below** in the next cell


In [None]:
# 🔐 Set your HuggingFace token here
import os
from datasets import load_dataset

# Method 1: Set your token directly (replace 'your_token_here' with your actual token)
HF_TOKEN = "your_token_here"  # Replace with your actual token
os.environ['HF_TOKEN'] = HF_TOKEN

# Method 2: Alternative - login interactively (uncomment if you prefer)
# from huggingface_hub import notebook_login
# notebook_login()

# Test ImageNet access
print("🔍 Testing ImageNet-1k access...")
try:
    test_dataset = load_dataset("imagenet-1k", split="train", streaming=True, token=HF_TOKEN)
    sample = next(iter(test_dataset.take(1)))
    
    print("✅ ImageNet access confirmed!")
    print(f"   📷 Sample image size: {sample['image'].size}")
    print(f"   🏷️ Sample label: {sample['label']}")
    print(f"   🔐 Authentication: SUCCESS")
    
except Exception as e:
    print(f"❌ ImageNet access failed: {e}")
    if "401" in str(e) or "unauthorized" in str(e).lower():
        print("\n🔐 AUTHENTICATION ERROR!")
        print("Steps to fix:")
        print("1. Make sure you have requested access at: https://huggingface.co/datasets/imagenet-1k")
        print("2. Wait for approval (1-2 days)")
        print("3. Get your token: https://huggingface.co/settings/tokens")
        print("4. Replace 'your_token_here' above with your actual token")
    else:
        print("💡 Check your internet connection and try again")
