# ResNet50 Training on ImageNet-1k

This notebook runs the training of ResNet50 on ImageNet with various optimizations.

In [None]:
import os
import torch
from IPython.display import display, HTML
import pytorch_lightning as pl

# Print system info
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")
    print(f"Number of GPUs: {torch.cuda.device_count()}")

## Configuration

In [None]:
from src.config import TrainingConfig

# Initialize config
config = TrainingConfig()

# Display current configuration
for key, value in config.__dict__.items():
    print(f"{key}: {value}")

## Sanity Check

First, let's run a sanity check with limited data to ensure everything works.

In [None]:
import sanity_check

# Run sanity check
print("Starting sanity check...")
sanity_check.main()

## Full Training

If the sanity check passes, we can proceed with full training.

In [None]:
# Uncomment to run full training
# import train
# train.main()

## Monitor Training Progress

You can monitor the training progress using TensorBoard.

In [None]:
# Load TensorBoard extension
%load_ext tensorboard
%tensorboard --logdir lightning_logs/

## Check Latest Results

After training, you can check the latest results and saved checkpoints.

In [None]:
import glob
import os

# Find latest checkpoint
checkpoints = glob.glob('lightning_logs/version_*/checkpoints/*.ckpt')
if checkpoints:
    latest_checkpoint = max(checkpoints, key=os.path.getctime)
    print(f"Latest checkpoint: {latest_checkpoint}")
    
    # Load checkpoint stats
    checkpoint = torch.load(latest_checkpoint)
    print(f"\nBest validation accuracy: {checkpoint['callbacks']['ModelCheckpoint']['best_model_score']:.2f}%")
    print(f"Epoch: {checkpoint['epoch']}")
else:
    print("No checkpoints found")