# üçï NutriLearn AI - Food Classification Model Training

Train a deep learning model for food recognition using the Food-101 dataset with transfer learning.

**What you'll learn:**
- Transfer learning with PyTorch (MobileNetV2/EfficientNet/ResNet50)
- Data augmentation for image classification
- MLflow experiment tracking
- Model evaluation with confusion matrix
- Model deployment preparation

**Steps:**
1. ‚ö° Enable GPU (Runtime ‚Üí Change runtime type ‚Üí GPU ‚Üí Save)
2. üì¶ Run setup cells to install dependencies
3. üéØ Train model (choose architecture)
4. üìä View results and metrics
5. üíæ Download trained model

**Hardware Requirements:**
- GPU strongly recommended (T4 or better)
- Training time: ~1-2 hours on T4 GPU, ~15+ hours on CPU

**Dataset:** Food-101 (101 food categories, 101,000 images, ~5GB)

## 1. Setup Environment

In [None]:
# Check GPU availability and system info
import torch
import sys

print("=" * 60)
print("SYSTEM INFORMATION")
print("=" * 60)
print(f"Python version: {sys.version.split()[0]}")
print(f"PyTorch version: {torch.__version__}")
print(f"\nGPU Available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"‚úÖ GPU Name: {torch.cuda.get_device_name(0)}")
    print(f"‚úÖ GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    print(f"‚úÖ CUDA Version: {torch.version.cuda}")
    device = torch.device('cuda')
    print("\nüöÄ Training will be FAST on GPU!")
else:
    print("‚ö†Ô∏è  No GPU detected. Training will be VERY slow!")
    print("   Enable GPU: Runtime ‚Üí Change runtime type ‚Üí GPU ‚Üí Save")
    print("   Then: Runtime ‚Üí Restart runtime")
    device = torch.device('cpu')
    print("\nüêå Training will take 15+ hours on CPU")

print(f"\nUsing device: {device}")
print("=" * 60)

In [None]:
# Install required packages
print("Installing dependencies...")
!pip install -q mlflow scikit-learn matplotlib seaborn tqdm Pillow
print("‚úÖ All packages installed successfully!")

# Import libraries
import os
import json
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image
from tqdm import tqdm

print("‚úÖ Libraries imported successfully!")

### Mount Google Drive (Optional)

Mount Google Drive to save models persistently. Skip this if you'll download models directly.

In [None]:
# Mount Google Drive to save models
from google.colab import drive
drive.mount('/content/drive')

# Create directory for saving models
DRIVE_MODEL_DIR = '/content/drive/MyDrive/NutriLearn_Models'
os.makedirs(DRIVE_MODEL_DIR, exist_ok=True)

print(f"‚úÖ Google Drive mounted")
print(f"‚úÖ Models will be saved to: {DRIVE_MODEL_DIR}")

## 2. Clone Repository and Setup

In [None]:
# Clone repository (replace with your repo URL if you have one)
# Option 1: Clone from GitHub
# !git clone https://github.com/yourusername/nutrilearn-ai.git
# %cd nutrilearn-ai/backend

# Option 2: Download training script directly
!wget -q https://raw.githubusercontent.com/yourusername/nutrilearn-ai/main/backend/train_model.py

# Option 3: Create training script inline (see next cell)
print("‚úÖ Setup complete")
print("\nNote: If you have the train_model.py script, upload it using the Files panel on the left")

### Dataset Preview

Let's download a small sample and visualize the Food-101 dataset before training.

In [None]:
# Download and preview Food-101 dataset
import torchvision
from torchvision import transforms
import matplotlib.pyplot as plt
import random

print("Downloading Food-101 dataset (this may take 5-10 minutes)...")
print("Dataset size: ~5GB")

# Download dataset
dataset = torchvision.datasets.Food101(
    root='./data',
    split='train',
    download=True
)

print(f"\n‚úÖ Dataset downloaded!")
print(f"Total training images: {len(dataset)}")
print(f"Number of classes: {len(dataset.classes)}")
print(f"\nFirst 10 food categories:")
for i, cls in enumerate(dataset.classes[:10]):
    print(f"  {i}: {cls}")

In [None]:
# Visualize sample images from different classes
fig, axes = plt.subplots(3, 4, figsize=(15, 12))
fig.suptitle('Sample Food Images from Food-101 Dataset', fontsize=16, fontweight='bold')

# Select random samples
random_indices = random.sample(range(len(dataset)), 12)

for idx, ax in enumerate(axes.flat):
    img, label = dataset[random_indices[idx]]
    ax.imshow(img)
    ax.set_title(f"{dataset.classes[label]}", fontsize=10)
    ax.axis('off')

plt.tight_layout()
plt.show()

print("\nüìä Dataset Statistics:")
print(f"  Images per class: ~1,000")
print(f"  Total classes: 101")
print(f"  Image format: RGB")
print(f"  Typical size: 512x512 pixels")

### Data Augmentation Preview

See how data augmentation transforms images during training.

In [None]:
# Show data augmentation effects
from torchvision import transforms

# Define augmentation pipeline
train_transform = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),
    transforms.ToTensor(),
])

# Get a sample image
sample_img, sample_label = dataset[random.randint(0, len(dataset)-1)]
sample_class = dataset.classes[sample_label]

# Apply augmentation multiple times
fig, axes = plt.subplots(2, 4, figsize=(16, 8))
fig.suptitle(f'Data Augmentation Examples: {sample_class}', fontsize=14, fontweight='bold')

# Original image
axes[0, 0].imshow(sample_img)
axes[0, 0].set_title('Original', fontweight='bold')
axes[0, 0].axis('off')

# Augmented versions
for i in range(7):
    row = (i + 1) // 4
    col = (i + 1) % 4
    augmented = train_transform(sample_img)
    # Convert tensor back to image
    augmented_img = augmented.permute(1, 2, 0).numpy()
    axes[row, col].imshow(augmented_img)
    axes[row, col].set_title(f'Augmented {i+1}')
    axes[row, col].axis('off')

plt.tight_layout()
plt.show()

print("\nüìù Augmentation techniques applied:")
print("  ‚úì Random cropping and resizing")
print("  ‚úì Random horizontal flipping")
print("  ‚úì Color jittering (brightness, contrast, saturation)")
print("  ‚úì Normalization with ImageNet statistics")
print("\nThese augmentations help the model generalize better!")

## 3. Train Model

Now let's train the model! This will:
- Use transfer learning with pre-trained models
- Train for 20 epochs (adjustable)
- Track experiments with MLflow
- Save the best model automatically
- Generate evaluation metrics

**Choose your model architecture:**
- **MobileNetV2**: Fast, lightweight (recommended for quick training)
- **EfficientNet-B0**: Better accuracy, moderate speed
- **ResNet50**: Best accuracy, slower training

In [None]:
# Train with MobileNetV2 (fast, good accuracy)
!python train_model.py \
    --model mobilenet_v2 \
    --epochs 20 \
    --batch_size 64 \
    --lr 0.001

### Alternative: Train with EfficientNet (better accuracy, slower)

In [None]:
# Uncomment to train with EfficientNet-B0
# !python train_model.py \
#     --model efficientnet_b0 \
#     --epochs 25 \
#     --batch_size 48 \
#     --lr 0.001

## 4. View Training Results

Let's visualize the training progress and model performance.

In [None]:
# Plot training history
import json
import matplotlib.pyplot as plt

# Check if training metrics exist
if os.path.exists('ml-models/training_history.json'):
    with open('ml-models/training_history.json', 'r') as f:
        history = json.load(f)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
    
    # Plot loss
    epochs = range(1, len(history['train_loss']) + 1)
    ax1.plot(epochs, history['train_loss'], 'b-', label='Training Loss', linewidth=2)
    ax1.plot(epochs, history['val_loss'], 'r-', label='Validation Loss', linewidth=2)
    ax1.set_xlabel('Epoch', fontsize=12)
    ax1.set_ylabel('Loss', fontsize=12)
    ax1.set_title('Model Loss Over Time', fontsize=14, fontweight='bold')
    ax1.legend(fontsize=10)
    ax1.grid(True, alpha=0.3)
    
    # Plot accuracy
    ax2.plot(epochs, history['train_acc'], 'b-', label='Training Accuracy', linewidth=2)
    ax2.plot(epochs, history['val_acc'], 'r-', label='Validation Accuracy', linewidth=2)
    ax2.set_xlabel('Epoch', fontsize=12)
    ax2.set_ylabel('Accuracy (%)', fontsize=12)
    ax2.set_title('Model Accuracy Over Time', fontsize=14, fontweight='bold')
    ax2.legend(fontsize=10)
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Print final metrics
    print("\nüìä Final Training Metrics:")
    print(f"  Best Validation Accuracy: {max(history['val_acc']):.2f}%")
    print(f"  Final Training Loss: {history['train_loss'][-1]:.4f}")
    print(f"  Final Validation Loss: {history['val_loss'][-1]:.4f}")
else:
    print("‚ö†Ô∏è  Training history not found. Train the model first!")

In [None]:
# Display confusion matrix
from IPython.display import Image, display
import os

if os.path.exists('ml-models/confusion_matrix.png'):
    display(Image('ml-models/confusion_matrix.png'))
else:
    print("Confusion matrix not found (too many classes to visualize)")

In [None]:
# Show model config
import json

with open('ml-models/model_config.json', 'r') as f:
    config = json.load(f)

print("Model Configuration:")
print(json.dumps(config, indent=2))

In [None]:
# Show class mappings (first 10)
with open('ml-models/class_to_idx.json', 'r') as f:
    class_to_idx = json.load(f)

print(f"Total classes: {len(class_to_idx)}")
print("\nFirst 10 classes:")
for i, (name, idx) in enumerate(list(class_to_idx.items())[:10]):
    print(f"{idx}: {name}")

## 5. Model Evaluation and Testing

Let's evaluate the model's performance and test it on sample images.

In [None]:
# Load evaluation results
if os.path.exists('ml-models/evaluation_results.json'):
    with open('ml-models/evaluation_results.json', 'r') as f:
        eval_results = json.load(f)
    
    print("=" * 60)
    print("MODEL EVALUATION RESULTS")
    print("=" * 60)
    print(f"\nüìä Overall Performance:")
    print(f"  Top-1 Accuracy: {eval_results['top1_accuracy']:.2f}%")
    print(f"  Top-5 Accuracy: {eval_results['top5_accuracy']:.2f}%")
    print(f"  Total Test Samples: {eval_results['total_samples']:,}")
    
    print(f"\nüèÜ Best Performing Classes:")
    for i, (cls, score) in enumerate(eval_results['best_classes'][:5], 1):
        print(f"  {i}. {cls}: {score:.2f}% F1-score")
    
    print(f"\n‚ö†Ô∏è  Worst Performing Classes:")
    for i, (cls, score) in enumerate(eval_results['worst_classes'][:5], 1):
        print(f"  {i}. {cls}: {score:.2f}% F1-score")
    
    print("\n" + "=" * 60)
else:
    print("‚ö†Ô∏è  Evaluation results not found")

In [None]:
# Test inference on random samples
import torch
import torch.nn as nn
from torchvision import models, transforms
from PIL import Image
import random

# Load model
if os.path.exists('ml-models/food_model_v1.pth') and os.path.exists('ml-models/model_config.json'):
    # Load config
    with open('ml-models/model_config.json', 'r') as f:
        config = json.load(f)
    
    with open('ml-models/class_to_idx.json', 'r') as f:
        class_to_idx = json.load(f)
    
    idx_to_class = {v: k for k, v in class_to_idx.items()}
    
    # Build model
    model_name = config['model_name']
    num_classes = config['num_classes']
    
    if model_name == 'mobilenet_v2':
        model = models.mobilenet_v2(pretrained=False)
        model.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(model.last_channel, num_classes)
        )
    
    # Load weights
    model.load_state_dict(torch.load('ml-models/food_model_v1.pth', map_location=device))
    model = model.to(device)
    model.eval()
    
    # Preprocessing
    preprocess = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    
    # Test on random samples
    test_dataset = torchvision.datasets.Food101(root='./data', split='test', download=False)
    
    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
    fig.suptitle('Model Predictions on Test Images', fontsize=16, fontweight='bold')
    
    for idx, ax in enumerate(axes.flat):
        # Get random test image
        rand_idx = random.randint(0, len(test_dataset)-1)
        img, true_label = test_dataset[rand_idx]
        true_class = test_dataset.classes[true_label]
        
        # Predict
        input_tensor = preprocess(img).unsqueeze(0).to(device)
        with torch.no_grad():
            outputs = model(input_tensor)
            probabilities = torch.nn.functional.softmax(outputs, dim=1)
            top3_prob, top3_idx = torch.topk(probabilities, 3)
        
        # Get predictions
        pred_class = idx_to_class[top3_idx[0][0].item()]
        pred_conf = top3_prob[0][0].item() * 100
        
        # Display
        ax.imshow(img)
        ax.axis('off')
        
        # Color code: green if correct, red if wrong
        color = 'green' if pred_class == true_class else 'red'
        title = f"True: {true_class}\nPred: {pred_class} ({pred_conf:.1f}%)"
        ax.set_title(title, fontsize=9, color=color, fontweight='bold')
    
    plt.tight_layout()
    plt.show()
    
    print("\n‚úÖ Inference test complete!")
    print("Green = Correct prediction, Red = Incorrect prediction")
else:
    print("‚ö†Ô∏è  Model files not found. Train the model first!")

## 6. Download Trained Model

Download all model files to use in your NutriLearn AI application.

In [None]:
# Create a zip file with all model artifacts
import zipfile
from datetime import datetime

# Create zip file
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
zip_filename = f'nutrilearn_model_{timestamp}.zip'

with zipfile.ZipFile(zip_filename, 'w') as zipf:
    # Add model files
    if os.path.exists('ml-models/food_model_v1.pth'):
        zipf.write('ml-models/food_model_v1.pth', 'food_model_v1.pth')
        print("‚úì Added: food_model_v1.pth")
    
    if os.path.exists('ml-models/class_to_idx.json'):
        zipf.write('ml-models/class_to_idx.json', 'class_to_idx.json')
        print("‚úì Added: class_to_idx.json")
    
    if os.path.exists('ml-models/model_config.json'):
        zipf.write('ml-models/model_config.json', 'model_config.json')
        print("‚úì Added: model_config.json")
    
    if os.path.exists('ml-models/evaluation_results.json'):
        zipf.write('ml-models/evaluation_results.json', 'evaluation_results.json')
        print("‚úì Added: evaluation_results.json")
    
    if os.path.exists('ml-models/confusion_matrix.png'):
        zipf.write('ml-models/confusion_matrix.png', 'confusion_matrix.png')
        print("‚úì Added: confusion_matrix.png")

print(f"\n‚úÖ Created: {zip_filename}")

# Download the zip file
from google.colab import files
files.download(zip_filename)

print("\nüì¶ Download started!")
print("\n" + "=" * 60)
print("NEXT STEPS")
print("=" * 60)
print("\n1. Extract the downloaded zip file")
print("2. Copy files to your project's ml-models/ directory:")
print("   - food_model_v1.pth")
print("   - class_to_idx.json")
print("   - model_config.json")
print("\n3. The backend predictor will automatically load the model")
print("\n4. Test the API:")
print("   cd backend")
print("   python -m uvicorn app.main:app --reload")
print("\n5. Upload a food image and get predictions!")
print("\n" + "=" * 60)

In [None]:
# Optional: Copy to Google Drive for backup
if os.path.exists('/content/drive/MyDrive/NutriLearn_Models'):
    import shutil
    
    drive_path = '/content/drive/MyDrive/NutriLearn_Models'
    
    # Copy all model files
    for filename in ['food_model_v1.pth', 'class_to_idx.json', 'model_config.json', 
                     'evaluation_results.json', 'confusion_matrix.png']:
        src = f'ml-models/{filename}'
        if os.path.exists(src):
            dst = f'{drive_path}/{filename}'
            shutil.copy(src, dst)
            print(f"‚úì Copied {filename} to Google Drive")
    
    # Also copy the zip
    shutil.copy(zip_filename, f'{drive_path}/{zip_filename}')
    print(f"\n‚úÖ All files backed up to Google Drive!")
    print(f"Location: {drive_path}")
else:
    print("Google Drive not mounted. Skipping backup.")

## 7. View MLflow Results (Optional)

In [None]:
# Start MLflow UI (runs in background)
# Note: In Colab, you'll need to use ngrok to access the UI

# Install pyngrok
!pip install -q pyngrok

# Start MLflow UI
from pyngrok import ngrok
import subprocess
import time

# Start MLflow server
mlflow_process = subprocess.Popen(['mlflow', 'ui', '--port', '5000'])
time.sleep(5)

# Create ngrok tunnel
public_url = ngrok.connect(5000)
print(f"MLflow UI available at: {public_url}")
print("\nClick the link above to view your experiments!")

## üìù Training Summary and Tips

### üéØ Expected Performance

| Model | Accuracy | Speed | Parameters | Best For |
|-------|----------|-------|------------|----------|
| MobileNetV2 | 75-80% | Fast | 3.5M | Production, Mobile |
| EfficientNet-B0 | 78-83% | Medium | 5.3M | Balanced |
| ResNet50 | 80-85% | Slow | 25.6M | Best Accuracy |

### ‚ö° Training Tips

**Hardware:**
- Always use GPU (Runtime ‚Üí Change runtime type ‚Üí GPU)
- T4 GPU: ~1-2 hours training time
- CPU: ~15+ hours (not recommended)

**Hyperparameters:**
- Increase `batch_size` if you have more GPU memory (32 ‚Üí 64 ‚Üí 128)
- Decrease `batch_size` if you get OOM errors
- Try different learning rates: 0.001 (default), 0.0001 (fine-tuning)
- More epochs = better accuracy (but watch for overfitting)

**Model Selection:**
- Start with MobileNetV2 for quick experiments
- Use EfficientNet-B0 for production (best balance)
- Use ResNet50 if accuracy is critical

### üêõ Troubleshooting

**Out of Memory (OOM):**
```python
# Reduce batch size
!python train_model.py --batch_size 32  # or 16
```

**Slow Training:**
- Check GPU is enabled
- Reduce number of workers: `--num_workers 2`
- Use smaller model: MobileNetV2

**Low Accuracy:**
- Train for more epochs: `--epochs 30`
- Try different model: EfficientNet or ResNet
- Check data augmentation is working

### üöÄ Deployment Checklist

- [ ] Model trained and downloaded
- [ ] Files copied to `ml-models/` directory
- [ ] Backend predictor tested locally
- [ ] API endpoints working
- [ ] Frontend integrated
- [ ] Docker container built
- [ ] Deployed to production

### üìö Resources

**Dataset:**
- [Food-101 Dataset](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/)
- [Food-101 Paper](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/static/bossard_eccv14_food-101.pdf)

**PyTorch:**
- [Transfer Learning Tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html)
- [PyTorch Documentation](https://pytorch.org/docs/stable/index.html)
- [torchvision Models](https://pytorch.org/vision/stable/models.html)

**MLOps:**
- [MLflow Documentation](https://mlflow.org/docs/latest/index.html)
- [MLflow Tracking](https://mlflow.org/docs/latest/tracking.html)

### üéì Learning More

**Improve Model Performance:**
1. Try ensemble methods (combine multiple models)
2. Use test-time augmentation
3. Fine-tune more layers
4. Collect more training data
5. Use advanced augmentation (CutMix, MixUp)

**Production Optimization:**
1. Convert to ONNX for faster inference
2. Quantize model for mobile deployment
3. Use TorchScript for production
4. Implement model caching
5. Add A/B testing for model versions

---

**üéâ Congratulations!** You've successfully trained a food classification model!

**Questions or Issues?**
- Check the [GitHub repository](https://github.com/yourusername/nutrilearn-ai)
- Review the MODEL_TRAINING_GUIDE.md
- Open an issue for bugs or questions

**Happy Training! üöÄ**