# Neural Language Model - LSTM Training in VS Code

## Assignment 2: Language Modeling with Pride and Prejudice

---

### üìã Overview
This notebook trains three LSTM language models:
1. **Underfit Model** - Small capacity, underfits the data
2. **Overfit Model** - Large capacity, overfits the data
3. **Best Fit Model** - Optimal capacity, generalizes well

### ‚ö° GPU Support
- **With CUDA GPU**: ~15-20 minutes
- **With CPU**: ~60-90 minutes (still acceptable for local testing)

### üìÅ Project Structure
Make sure you're running this from the `notebooks/` directory with the following structure:
```
Assignment2/
‚îú‚îÄ‚îÄ dataset/
‚îÇ   ‚îî‚îÄ‚îÄ Pride_and_Prejudice-Jane_Austen.txt
‚îú‚îÄ‚îÄ src/
‚îÇ   ‚îú‚îÄ‚îÄ config.py
‚îÇ   ‚îú‚îÄ‚îÄ dataset.py
‚îÇ   ‚îú‚îÄ‚îÄ model.py
‚îÇ   ‚îú‚îÄ‚îÄ train.py
‚îÇ   ‚îú‚îÄ‚îÄ evaluate.py
‚îÇ   ‚îú‚îÄ‚îÄ generate.py
‚îÇ   ‚îî‚îÄ‚îÄ utils.py
‚îî‚îÄ‚îÄ notebooks/
    ‚îî‚îÄ‚îÄ training_notebook.ipynb  (this file)
```

---

## üîß Step 1: Verify System Setup

In [None]:
import torch
import sys
import os

print("="*70)
print("SYSTEM INFORMATION")
print("="*70)
print(f"Python Version: {sys.version.split()[0]}")
print(f"PyTorch Version: {torch.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"GPU Device: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")
    print("\n‚úÖ GPU is ready! Training will be FAST (~15-20 minutes)")
else:
    print("\n‚ö†Ô∏è No GPU detected - using CPU")
    print("Training will take ~60-90 minutes (acceptable for local development)")

print(f"\nCurrent Directory: {os.getcwd()}")
print("="*70)

## üì¶ Step 2: Navigate to Project Root

This notebook should be in the `notebooks/` directory. We'll navigate to the project root.

In [None]:
import os
from pathlib import Path

# Get the project root directory (parent of notebooks/)
notebook_dir = Path.cwd()
if notebook_dir.name == 'notebooks':
    project_root = notebook_dir.parent
    os.chdir(project_root)
    print(f"‚úÖ Changed to project root: {project_root}")
else:
    project_root = notebook_dir
    print(f"‚ÑπÔ∏è Already in project root: {project_root}")

# Verify project structure
print("\nüìÅ Project structure:")
required_dirs = ['src', 'dataset', 'models', 'results', 'vocab']
for dir_name in required_dirs:
    exists = os.path.exists(dir_name)
    status = "‚úÖ" if exists else "‚ùå"
    print(f"  {status} {dir_name}/")
    if not exists and dir_name in ['models', 'results', 'vocab']:
        os.makedirs(dir_name, exist_ok=True)
        print(f"     ‚Üí Created {dir_name}/ directory")

# Create subdirectories for results
os.makedirs('results/plots', exist_ok=True)
os.makedirs('results/metrics', exist_ok=True)
os.makedirs('results/logs', exist_ok=True)

print(f"\n‚úÖ Setup complete! Working directory: {os.getcwd()}")

## üîç Step 3: Verify Dataset and Dependencies

In [None]:
# Check if dataset exists
dataset_path = 'dataset/Pride_and_Prejudice-Jane_Austen.txt'
if os.path.exists(dataset_path):
    file_size = os.path.getsize(dataset_path) / 1024  # KB
    print(f"‚úÖ Dataset found: {dataset_path} ({file_size:.1f} KB)")
else:
    print(f"‚ùå Dataset not found: {dataset_path}")
    print("   Please ensure Pride_and_Prejudice-Jane_Austen.txt is in the dataset/ folder")

# Verify all source modules
print("\nüìö Source modules:")
required_modules = ['config.py', 'dataset.py', 'model.py', 'train.py', 
                   'evaluate.py', 'generate.py', 'utils.py']
for module in required_modules:
    module_path = f'src/{module}'
    exists = os.path.exists(module_path)
    status = "‚úÖ" if exists else "‚ùå"
    print(f"  {status} src/{module}")

print("\n‚úÖ All checks complete!")

## üì¶ Step 4: Install Required Dependencies

In [None]:
# Import required libraries
try:
    import torch
    import numpy as np
    import matplotlib.pyplot as plt
    from tqdm import tqdm
    import json
    print("‚úÖ All required libraries are installed!")
    print(f"   - PyTorch: {torch.__version__}")
    print(f"   - NumPy: {np.__version__}")
except ImportError as e:
    print(f"‚ùå Missing library: {e}")
    print("\nPlease install dependencies:")
    print("   pip install torch numpy matplotlib tqdm")

## üìä Step 5: Load and Explore Dataset

In [None]:
# Add src to Python path
import sys
sys.path.insert(0, 'src')

from dataset import load_and_preprocess_data, build_vocab, create_dataloaders

# Load dataset
print("="*70)
print("LOADING DATASET")
print("="*70)

text = load_and_preprocess_data('dataset/Pride_and_Prejudice-Jane_Austen.txt')

print(f"\nüìñ Text Statistics:")
print(f"   Total characters: {len(text):,}")
print(f"   Total words: {len(text.split()):,}")

# Show sample
print(f"\nüìù Sample text (first 200 chars):")
print("-" * 70)
print(text[:200])
print("-" * 70)

# Build vocabulary
print("\nüî§ Building vocabulary...")
vocab = build_vocab(text, min_freq=2)

print(f"\nüìö Vocabulary Statistics:")
print(f"   Vocabulary size: {len(vocab):,}")
print(f"   Most common words: {list(vocab.word_to_idx.keys())[:20]}")

# Save vocabulary
import pickle
os.makedirs('vocab', exist_ok=True)
with open('vocab/vocab.pkl', 'wb') as f:
    pickle.dump(vocab, f)
print(f"\n‚úÖ Vocabulary saved to vocab/vocab.pkl")

print("="*70)

## üîß Step 6: Create Data Loaders

In [None]:
# Create dataloaders for training
train_loader, val_loader, test_loader = create_dataloaders(
    text=text,
    vocab=vocab,
    seq_length=35,
    batch_size=64,
    train_ratio=0.8,
    val_ratio=0.1
)

print("="*70)
print("DATA LOADERS CREATED")
print("="*70)
print(f"Training batches: {len(train_loader)}")
print(f"Validation batches: {len(val_loader)}")
print(f"Test batches: {len(test_loader)}")
print(f"Batch size: 64")
print(f"Sequence length: 35")
print("="*70)

## ‚öôÔ∏è Step 7: Update Configuration for Three Models

We'll configure three models with different characteristics:
- **Underfit**: Small model, high dropout, few epochs
- **Overfit**: Large model, no dropout, many epochs
- **Best Fit**: Optimal model, balanced settings

In [None]:
# Update config.py to include three model configurations
config_content = '''
"""
Configuration settings for LSTM Language Model training
Three models: Underfit, Overfit, and Best Fit
"""

def get_config(model_type='bestfit'):
    """
    Get model configuration based on model type
    
    Args:
        model_type: 'underfit', 'overfit', or 'bestfit'
    
    Returns:
        dict: Configuration parameters
    """
    
    # Base configuration
    base_config = {
        'data_path': 'dataset/Pride_and_Prejudice-Jane_Austen.txt',
        'vocab_path': 'vocab/vocab.pkl',
        'model_save_dir': 'models/',
        'results_dir': 'results/',
        
        # Data parameters
        'seq_length': 35,
        'min_freq': 2,
        'batch_size': 64,
        'train_ratio': 0.8,
        'val_ratio': 0.1,
        'num_workers': 0,  # Use 0 for Windows compatibility
        
        # Generation parameters
        'gen_length': 50,
        'temperature': 1.0,
    }
    
    # Model-specific configurations
    model_configs = {
        'underfit': {
            'embedding_dim': 64,
            'hidden_dim': 128,
            'num_layers': 1,
            'dropout': 0.5,
            'num_epochs': 10,
            'learning_rate': 0.01,  # High learning rate
            'grad_clip': 5.0,
            'patience': 3,
            'save_every': 5,
        },
        'overfit': {
            'embedding_dim': 512,
            'hidden_dim': 1024,
            'num_layers': 3,
            'dropout': 0.1,  # Very low dropout
            'num_epochs': 30,
            'learning_rate': 0.0005,  # Low learning rate
            'grad_clip': 5.0,
            'patience': 15,  # High patience
            'save_every': 5,
        },
        'bestfit': {
            'embedding_dim': 256,
            'hidden_dim': 512,
            'num_layers': 2,
            'dropout': 0.4,
            'num_epochs': 20,
            'learning_rate': 0.001,
            'grad_clip': 5.0,
            'patience': 5,
            'save_every': 5,
        }
    }
    
    if model_type not in model_configs:
        raise ValueError(f"Unknown model type: {model_type}. Choose from {list(model_configs.keys())}")
    
    # Merge base config with model-specific config
    config = {**base_config, **model_configs[model_type]}
    config['model_type'] = model_type
    
    return config
'''

# Write updated config
with open('src/config.py', 'w') as f:
    f.write(config_content)

print("‚úÖ Configuration updated with three model types!")
print("\nModel configurations:")
print("  1. Underfit  - Small model, high dropout")
print("  2. Overfit   - Large model, low dropout")
print("  3. Best Fit  - Optimal balanced model")

## üöÄ Step 8: Train All Three Models

This will train:
1. **Underfit Model** (~3-5 minutes on GPU)
2. **Overfit Model** (~8-12 minutes on GPU)
3. **Best Fit Model** (~5-8 minutes on GPU)

**Total time: ~15-25 minutes with GPU**

## ‚öôÔ∏è Resume Training - Check Existing Models

This cell automatically detects which models are already trained and skips them.

In [None]:
import torch
import sys
from pathlib import Path
import os
import json
from datetime import datetime

# Add src to path
sys.path.insert(0, 'src')

from dataset import load_and_preprocess_data, create_dataloaders
from model import LSTMLanguageModel
from train import LanguageModelTrainer
from evaluate import ModelEvaluator, create_comparison_report
from utils import plot_training_curves, plot_model_comparison, plot_perplexity_comparison
from config import get_config

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"\nüîß Using device: {device}")
if torch.cuda.is_available():
    print(f"‚ö° GPU: {torch.cuda.get_device_name(0)}")

print("\n" + "="*70)
print("RESUME MODE - CHECKING EXISTING MODELS")
print("="*70)

# Map model types to actual file names
model_file_mapping = {
    'underfit': 'small',
    'overfit': 'medium',
    'bestfit': 'large'
}

# Check which models are already trained
existing_models = []
missing_models = []

for model_type, file_prefix in model_file_mapping.items():
    # Check for both naming conventions
    checkpoint_paths = [
        f'models/{file_prefix}_model_best.pt',
        f'models/{model_type}_model_best.pt'
    ]
    
    found = False
    for checkpoint_path in checkpoint_paths:
        if os.path.exists(checkpoint_path):
            try:
                checkpoint = torch.load(checkpoint_path, map_location='cpu')
                epoch = checkpoint.get('epoch', 'N/A')
                val_loss = checkpoint.get('val_loss', 0)
                print(f"‚úÖ {model_type:12} - Found at {checkpoint_path}")
                print(f"   Epoch {epoch}, Val Loss: {val_loss:.4f}")
                existing_models.append((model_type, checkpoint_path))
                found = True
                break
            except Exception as e:
                print(f"‚ö†Ô∏è  {model_type:12} - Corrupted checkpoint at {checkpoint_path}")
    
    if not found:
        print(f"‚ùå {model_type:12} - Not found (will train)")
        missing_models.append(model_type)

print("="*70)
print(f"\nüìä Summary:")
print(f"   ‚úÖ Already trained: {len(existing_models)} model(s)")
print(f"   ‚ùå Need to train: {len(missing_models)} model(s)")
print(f"\nüéØ Training Plan:")
print(f"   SKIP:  {[m[0] for m in existing_models]}")
print(f"   TRAIN: {missing_models}")
print("="*70)

# Load and preprocess data (once for all models)
print("\n" + "="*70)
print("DATA PREPARATION")
print("="*70)

config = get_config('bestfit')
dataset, vocab, vocab_size = load_and_preprocess_data(
    config['data_path'],
    config['vocab_path'],
    config['seq_length'],
    config['min_freq']
)

# Create dataloaders
train_loader, val_loader, test_loader = create_dataloaders(
    dataset,
    config['train_ratio'],
    config['val_ratio'],
    config['batch_size'],
    config['num_workers']
)

print(f"‚úÖ Data loaded: {len(dataset)} sequences")
print(f"   Train batches: {len(train_loader)}")
print(f"   Val batches: {len(val_loader)}")
print(f"   Test batches: {len(test_loader)}")

# Load results from already-trained models
all_results = {}
all_metrics = {}

if existing_models:
    print("\n" + "="*70)
    print(f"‚è≠Ô∏è  LOADING {len(existing_models)} EXISTING MODEL(S)")
    print("="*70)
    
    for model_type, checkpoint_path in existing_models:
        checkpoint = torch.load(checkpoint_path, map_location=device)
        
        # Store results for later use
        all_results[model_type] = {
            'train_losses': checkpoint.get('train_losses', []),
            'val_losses': checkpoint.get('val_losses', []),
            'val_perplexities': checkpoint.get('val_perplexities', []),
            'best_epoch': checkpoint.get('epoch', 0),
            'best_val_loss': checkpoint.get('val_loss', 0),
        }
        
        print(f"   ‚úÖ {model_type}: Loaded from {checkpoint_path}")

# Train only missing models
if missing_models:
    print("\n" + "="*70)
    print(f"üöÄ TRAINING {len(missing_models)} MODEL(S)")
    print("="*70)
    
    for idx, model_type in enumerate(missing_models, 1):
        print("\n" + "="*70)
        print(f"TRAINING {model_type.upper()} MODEL ({idx}/{len(missing_models)})")
        print("="*70)
        
        # Get model-specific config
        config = get_config(model_type)
        
        # Use correct file prefix for saving
        file_prefix = model_file_mapping[model_type]
        config['model_type'] = file_prefix
        
        # Create model
        model = LSTMLanguageModel(
            vocab_size=vocab_size,
            embedding_dim=config['embedding_dim'],
            hidden_dim=config['hidden_dim'],
            num_layers=config['num_layers'],
            dropout=config['dropout']
        ).to(device)
        
        print(f"\nüìä {model_type.upper()} Model Architecture:")
        print(f"  Embedding dim: {config['embedding_dim']}")
        print(f"  Hidden dim: {config['hidden_dim']}")
        print(f"  Num layers: {config['num_layers']}")
        print(f"  Dropout: {config['dropout']}")
        print(f"  Learning rate: {config['learning_rate']}")
        print(f"  Epochs: {config['num_epochs']}")
        print(f"  Total parameters: {model.count_parameters():,}")
        print(f"  Will save as: models/{file_prefix}_model_best.pt")
        
        # Train model
        start_time = datetime.now()
        trainer = LanguageModelTrainer(
            model, train_loader, val_loader, config, device
        )
        results = trainer.train()
        end_time = datetime.now()
        duration = (end_time - start_time).total_seconds() / 60
        
        all_results[model_type] = results
        
        # Plot training curves
        plot_training_curves(
            results['train_losses'],
            results['val_losses'],
            f"{model_type.capitalize()} Model",
            save_path=f"results/plots/{model_type}_training_curves.png"
        )
        
        print(f"\n‚úÖ {model_type.upper()} model training complete!")
        print(f"   Best Val Loss: {results['best_val_loss']:.4f}")
        print(f"   Best Epoch: {results['best_epoch']}")
        print(f"   Training time: {duration:.1f} minutes")
        print(f"   Saved as: models/{file_prefix}_model_best.pt")
else:
    print("\n" + "="*70)
    print("üéâ ALL MODELS ALREADY TRAINED!")
    print("="*70)
    print("To retrain, delete the checkpoint files in models/ folder")

print("\n" + "="*70)
print("‚úÖ TRAINING PHASE COMPLETE")
print("="*70)
print(f"   Ready: {len(all_results)} model(s)")
print("="*70)

# Store model types for next cells
model_types = ['underfit', 'overfit', 'bestfit']

## üìä Step 9: Model Comparison and Visualization

In [None]:
print("\n" + "="*70)
print("CREATING MODEL COMPARISON PLOTS")
print("="*70)

# Plot validation loss comparison
plot_model_comparison(
    all_results,
    metric='val_losses',
    save_path='results/plots/model_comparison.png'
)

# Plot perplexity comparison
plot_perplexity_comparison(
    all_results,
    save_path='results/plots/perplexity_comparison.png'
)

print("\n‚úÖ Comparison plots created!")

## üéØ Step 10: Evaluate All Models on Test Set

In [None]:
print("\n" + "="*70)
print("EVALUATING ALL MODELS ON TEST SET")
print("="*70)

for model_type in model_types:
    print(f"\n{'='*70}")
    print(f"{model_type.upper()} MODEL EVALUATION")
    print(f"{'='*70}")
    
    config = get_config(model_type)
    
    # Find model checkpoint (try both naming conventions)
    file_prefix = model_file_mapping[model_type]
    checkpoint_paths = [
        f'models/{file_prefix}_model_best.pt',
        f'models/{model_type}_model_best.pt',
        f'models/{model_type}_best.pt'
    ]
    
    checkpoint_path = None
    for path in checkpoint_paths:
        if os.path.exists(path):
            checkpoint_path = path
            break
    
    if not checkpoint_path:
        print(f"‚ö†Ô∏è  Model not found! Skipping evaluation.")
        continue
    
    print(f"üìÇ Loading from: {checkpoint_path}")
    checkpoint = torch.load(checkpoint_path, map_location=device)
    
    # Create model
    model = LSTMLanguageModel(
        vocab_size=vocab_size,
        embedding_dim=config['embedding_dim'],
        hidden_dim=config['hidden_dim'],
        num_layers=config['num_layers'],
        dropout=config['dropout']
    ).to(device)
    
    model.load_state_dict(checkpoint['model_state_dict'])
    
    # Evaluate
    evaluator = ModelEvaluator(model, device)
    test_metrics = evaluator.evaluate_on_dataset(test_loader, "Test")
    
    # Save metrics
    train_losses = all_results[model_type].get('train_losses', [])
    final_train_loss = train_losses[-1] if train_losses else checkpoint.get('val_loss', 0)
    
    all_metrics[model_type] = {
        'train': {
            'final_loss': final_train_loss
        },
        'val': {
            'loss': checkpoint['val_loss'],
            'perplexity': checkpoint['val_perplexity'],
        },
        'test': test_metrics,
        'best_epoch': all_results[model_type]['best_epoch'],
        'total_epochs': len(train_losses) if train_losses else checkpoint.get('epoch', 0),
        'config': {
            'embedding_dim': config['embedding_dim'],
            'hidden_dim': config['hidden_dim'],
            'num_layers': config['num_layers'],
            'dropout': config['dropout'],
            'learning_rate': config['learning_rate'],
        }
    }
    
    # Save individual metrics
    metrics_path = f"results/metrics/{model_type}_metrics.json"
    with open(metrics_path, 'w') as f:
        json.dump(all_metrics[model_type], f, indent=4)
    
    print(f"\n‚úÖ Metrics saved to: {metrics_path}")

print("\n" + "="*70)
print("EVALUATION COMPLETE")
print("="*70)

## üìà Step 11: Create Final Comparison Report

In [None]:
# Create comprehensive comparison report
create_comparison_report(
    all_metrics,
    'results/metrics/final_comparison.json'
)

# Display summary table
print("\n" + "="*70)
print("FINAL MODEL COMPARISON SUMMARY")
print("="*70)
print(f"\n{'Model':<12} {'Params':<12} {'Test Loss':<12} {'Test PPL':<12} {'Val Loss':<12}")
print("-"*70)

for model_type in model_types:
    metrics = all_metrics[model_type]
    params = model.count_parameters() if model_type == 'bestfit' else 0
    
    # Calculate approximate parameters based on config
    cfg = metrics['config']
    approx_params = (
        vocab_size * cfg['embedding_dim'] +  # Embedding
        4 * cfg['num_layers'] * cfg['hidden_dim'] * (cfg['embedding_dim'] + cfg['hidden_dim']) +  # LSTM
        cfg['hidden_dim'] * vocab_size  # Output layer
    )
    
    print(f"{model_type:<12} {approx_params:>10,}  "
          f"{metrics['test']['loss']:>10.4f}  "
          f"{metrics['test']['perplexity']:>10.2f}  "
          f"{metrics['val']['loss']:>10.4f}")

print("\n" + "="*70)

## üìù Step 12: Generate Text Samples

Generate text using all three models for comparison

In [None]:
from generate import generate_text

print("\n" + "="*70)
print("TEXT GENERATION - ALL MODELS")
print("="*70)

# Prompts to test
start_texts = [
    "it is a truth",
    "elizabeth was",
    "mr darcy"
]

all_generations = {}

for model_type in model_types:
    print(f"\n{'='*70}")
    print(f"{model_type.upper()} MODEL GENERATION")
    print(f"{'='*70}")
    
    config = get_config(model_type)
    
    # Find model checkpoint (try both naming conventions)
    file_prefix = model_file_mapping[model_type]
    checkpoint_paths = [
        f'models/{file_prefix}_model_best.pt',
        f'models/{model_type}_model_best.pt',
        f'models/{model_type}_best.pt'
    ]
    
    checkpoint_path = None
    for path in checkpoint_paths:
        if os.path.exists(path):
            checkpoint_path = path
            break
    
    if not checkpoint_path:
        print(f"‚ö†Ô∏è  Model not found! Skipping text generation.")
        continue
    
    print(f"üìÇ Loading from: {checkpoint_path}")
    checkpoint = torch.load(checkpoint_path, map_location=device)
    
    model = LSTMLanguageModel(
        vocab_size=vocab_size,
        embedding_dim=config['embedding_dim'],
        hidden_dim=config['hidden_dim'],
        num_layers=config['num_layers'],
        dropout=config['dropout']
    ).to(device)
    
    model.load_state_dict(checkpoint['model_state_dict'])
    
    all_generations[model_type] = {}
    
    # Generate for each prompt
    for start_text in start_texts:
        generated = generate_text(
            model, vocab, start_text,
            max_length=40,
            temperature=0.8,
            device=device
        )
        all_generations[model_type][start_text] = generated
        
        print(f"\nüìù '{start_text}':")
        print(f"   {generated}")

# Save generations
with open('results/generated_samples.json', 'w') as f:
    json.dump(all_generations, f, indent=4)

print("\n" + "="*70)
print("‚úÖ Text generation complete!")
print("="*70)

## üìä Step 13: Display Results

View all generated plots and metrics

In [None]:
# Display all plots
import matplotlib.pyplot as plt
from PIL import Image

print("\n" + "="*70)
print("TRAINING RESULTS VISUALIZATION")
print("="*70)

plot_files = [
    ('results/plots/underfit_training_curves.png', 'Underfit Model Training Curves'),
    ('results/plots/overfit_training_curves.png', 'Overfit Model Training Curves'),
    ('results/plots/bestfit_training_curves.png', 'Best Fit Model Training Curves'),
    ('results/plots/model_comparison.png', 'Model Comparison - Validation Loss'),
    ('results/plots/perplexity_comparison.png', 'Model Comparison - Perplexity'),
]

for plot_file, title in plot_files:
    if os.path.exists(plot_file):
        print(f"\n{'='*70}")
        print(f"{title}")
        print(f"{'='*70}")
        img = Image.open(plot_file)
        plt.figure(figsize=(12, 6))
        plt.imshow(img)
        plt.axis('off')
        plt.title(title)
        plt.tight_layout()
        plt.show()
    else:
        print(f"\n‚ö†Ô∏è  {plot_file} not found")

## üìÅ Step 14: Verify All Output Files

Check that all expected files were created successfully

In [None]:
print("\n" + "="*70)
print("FILE VERIFICATION")
print("="*70)

# Check all expected files
expected_files = {
    'Models': [
        'models/underfit_model_best.pt',
        'models/overfit_model_best.pt',
        'models/bestfit_model_best.pt'
    ],
    'Plots': [
        'results/plots/underfit_training_curves.png',
        'results/plots/overfit_training_curves.png',
        'results/plots/bestfit_training_curves.png',
        'results/plots/model_comparison.png',
        'results/plots/perplexity_comparison.png'
    ],
    'Metrics': [
        'results/metrics/underfit_metrics.json',
        'results/metrics/overfit_metrics.json',
        'results/metrics/bestfit_metrics.json',
        'results/metrics/final_comparison.json'
    ],
    'Data': [
        'vocab/vocab.pkl',
        'results/generated_samples.json'
    ]
}

all_good = True
for category, files in expected_files.items():
    print(f"\nüìÇ {category}:")
    for file_path in files:
        exists = os.path.exists(file_path)
        if exists:
            size = os.path.getsize(file_path) / 1024  # KB
            print(f"  ‚úÖ {file_path} ({size:.1f} KB)")
        else:
            print(f"  ‚ùå {file_path} - NOT FOUND")
            all_good = False

print("\n" + "="*70)
if all_good:
    print("‚úÖ All files created successfully!")
else:
    print("‚ö†Ô∏è  Some files are missing. Check the training logs above.")
print("="*70)

## üìã Step 15: Training Summary and Analysis

In [None]:
print("\n" + "="*70)
print("TRAINING SUMMARY - ASSIGNMENT 2")
print("="*70)

print("\nüìä Models Trained: 3")
print("  1. Underfit Model  - Intentionally limited capacity")
print("  2. Overfit Model   - Excessive capacity, prone to overfitting")
print("  3. Best Fit Model  - Optimal balance")

print("\nüìà Results Summary:")
print("-"*70)

for model_type in model_types:
    metrics = all_metrics[model_type]
    print(f"\n{model_type.upper()} Model:")
    print(f"  Train Loss: {metrics['train']['final_loss']:.4f}")
    print(f"  Val Loss:   {metrics['val']['loss']:.4f}")
    print(f"  Test Loss:  {metrics['test']['loss']:.4f}")
    print(f"  Test PPL:   {metrics['test']['perplexity']:.2f}")
    print(f"  Best Epoch: {metrics['best_epoch']}/{metrics['total_epochs']}")
    
    # Determine if overfitting/underfitting
    train_val_gap = metrics['val']['loss'] - metrics['train']['final_loss']
    if train_val_gap > 0.5:
        print(f"  ‚ö†Ô∏è  Overfitting detected (Train-Val gap: {train_val_gap:.4f})")
    elif metrics['val']['loss'] > 5.0:
        print(f"  ‚ö†Ô∏è  Underfitting detected (High validation loss)")
    else:
        print(f"  ‚úÖ Good fit (Train-Val gap: {train_val_gap:.4f})")

print("\n" + "="*70)
print("üìÅ All files saved in project directory")
print("="*70)
print("\n‚úÖ Training complete!")
print("‚úÖ Results available in respective folders!")
print("‚úÖ Ready for report generation!")

# Show total training time estimate
if torch.cuda.is_available():
    print("\n‚ö° GPU training completed successfully!")
else:
    print("\nüíª CPU training completed successfully!")

print("\nüìÇ Output Locations:")
print(f"  - Models: {os.path.abspath('models/')}")
print(f"  - Plots: {os.path.abspath('results/plots/')}")
print(f"  - Metrics: {os.path.abspath('results/metrics/')}")

print("\n" + "="*70)

## üéØ Next Steps for Your Assignment

### ‚úÖ What You Now Have:

**Trained Models:**
- `models/underfit_model_best.pt` - Underfit model checkpoint
- `models/overfit_model_best.pt` - Overfit model checkpoint  
- `models/bestfit_model_best.pt` - Best fit model checkpoint

**Visualizations:**
- `results/plots/underfit_training_curves.png` - Training/validation curves
- `results/plots/overfit_training_curves.png` - Training/validation curves
- `results/plots/bestfit_training_curves.png` - Training/validation curves
- `results/plots/model_comparison.png` - Side-by-side comparison
- `results/plots/perplexity_comparison.png` - Perplexity comparison

**Metrics:**
- `results/metrics/underfit_metrics.json` - Detailed metrics
- `results/metrics/overfit_metrics.json` - Detailed metrics
- `results/metrics/bestfit_metrics.json` - Detailed metrics
- `results/metrics/final_comparison.json` - Comparison summary

**Generated Text:**
- `results/generated_samples.json` - Text samples from all models

---

### ? For Your Report:

**1. Analyze Underfitting (Underfit Model):**
   - High training AND validation loss
   - Model too simple to capture patterns
   - Poor text generation quality

**2. Analyze Overfitting (Overfit Model):**
   - Low training loss, high validation loss
   - Large train-val gap (> 0.5)
   - Memorizes training data but poor generalization

**3. Analyze Good Fit (Best Fit Model):**
   - Balanced train/val loss
   - Small train-val gap
   - Best test performance
   - Good quality generated text

**4. Include in Report:**
   - All 5 plots from `results/plots/`
   - Metrics comparison from `results/metrics/final_comparison.json`
   - Generated text samples showing quality differences
   - Analysis of why each model behaves differently

---

### üöÄ Optional: Further Experiments

Run additional cells below to experiment with text generation or analyze specific aspects of your models.

**All training is complete!** You can now write your report using the generated files.