# 🏋️ Model Training Pipeline
## Financial Sentiment Analysis - Production Model Training

[![Training](https://img.shields.io/badge/Stage-Model%20Training-orange?logo=pytorch&logoColor=white)]()
[![Multi-Model](https://img.shields.io/badge/Support-Multi%20Model-purple)]()
[![Hardware Optimised](https://img.shields.io/badge/Hardware-CPU%2FGPU%2FMPS-green)]()

---

### 📋 Overview

This notebook orchestrates the training of multiple transformer models for financial sentiment analysis. It features intelligent hardware detection, dynamic optimisation, and comprehensive model evaluation.

### 🎯 Key Features

- **🤖 Multi-Model Training**: Support for BERT, DistilBERT, TinyBERT, FinBERT, and SmolLM variants
- **⚡ Hardware Optimisation**: Automatic detection and optimisation for CPU, GPU (CUDA), and Apple Silicon (MPS)
- **📊 Dynamic Batch Sizing**: Intelligent batch size adjustment based on available memory
- **🎛️ Configuration Driven**: All hyperparameters controlled via `pipeline_config.json`
- **📈 Progress Monitoring**: Real-time training metrics and logging
- **💾 Checkpoint Management**: Automatic model saving and recovery

### 🏗️ Supported Model Architectures

| Model Family | Size | Memory Requirements | Training Time | Best Use Case |
|--------------|------|-------------------|---------------|---------------|
| **TinyBERT** | 14.5M | ~1GB | Fast | Resource-constrained deployment |
| **DistilBERT** | 66M | ~2GB | Moderate | Balanced performance/efficiency |
| **FinBERT** | 110M | ~3GB | Moderate | Finance-specific tasks |
| **SmolLM2** | 135M | ~1.5GB | Fast | On-device applications |
| **SmolLM3** | 3B | ~6-8GB | Slow | High-performance scenarios |

### ⚙️ Training Optimisation Features

- **📦 Dynamic Batch Sizing**: Automatic adjustment based on GPU memory
- **🔄 Gradient Accumulation**: Simulate larger batches on limited hardware
- **⚡ Mixed Precision**: FP16 training for supported hardware
- **🧹 Memory Management**: Optimised settings for different platforms
- **📊 Early Stopping**: Prevent overfitting with configurable patience

### 🖥️ Hardware Support Matrix

| Platform | Optimisation | Batch Size | Mixed Precision | Notes |
|----------|--------------|------------|----------------|--------|
| **NVIDIA GPU (>8GB)** | Full | 8+ | ✅ FP16 | Optimal performance |
| **NVIDIA GPU (<8GB)** | Memory-aware | 2-4 | ✅ FP16 | Reduced batch size |
| **Apple Silicon (MPS)** | MPS-optimised | 4 | ❌ Disabled | Native acceleration |
| **CPU** | Multi-core | 4 | ❌ N/A | Fallback mode |

### 📁 Output Structure

```
models/
├── {model_name}/
│   ├── config.json
│   ├── model.safetensors
│   ├── tokenizer_config.json
│   ├── tokenizer.json
│   ├── label_encoder.pkl
│   └── logs/
└── training_report.json
```

---

**Prerequisites**: Complete data processing via `1_data_processing_generalised.ipynb`

In [None]:
# Import configuration system and training utilities
import sys
import os
sys.path.append('../')

from src.pipeline_utils import ConfigManager, StateManager, LoggingManager
import torch
import torch.nn as nn
from transformers import (
    AutoTokenizer, AutoModelForSequenceClassification,
    TrainingArguments, Trainer, DataCollatorWithPadding
)
from datasets import Dataset
import pandas as pd
import numpy as np
from pathlib import Path
import json
from datetime import datetime
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

# Initialise managers
config = ConfigManager('../config/pipeline_config.json')
state = StateManager('../config/pipeline_state.json')
logger_manager = LoggingManager(config, 'training')
logger = logger_manager.get_logger()

logger.info("🏋️ Starting Model Training - Generalised Pipeline")
print("📋 Configuration loaded from ../config/pipeline_config.json")

2025-08-18 14:33:45,038 - pipeline.training - INFO - 🏋️ Starting Model Training - Generalized Pipeline


📋 Configuration loaded from ../config/pipeline_config.json


In [None]:
# Verify prerequisites and load processed data
logger.info("🔍 Checking training prerequisites...")

# Verify data processing was completed
if not state.is_step_complete('data_processing_completed'):
    logger.error("Data processing step not completed. Please run 1_data_processing_generalised.ipynb first.")
    raise RuntimeError("Data processing required. Run 1_data_processing_generalised.ipynb first.")

print("✅ Data processing verification passed")

# Load training configuration
training_config = config.get('training', {})
models_config = config.get('models', {})
data_config = config.get('data', {})

print(f"🏋️ Training Configuration:")
print(f"   📊 Batch size: {training_config.get('batch_size', 16)}")
print(f"   🔄 Epochs: {training_config.get('num_epochs', 3)}")
print(f"   📈 Learning rate: {training_config.get('learning_rate', 2e-5)}")

# Get base models from configuration
base_models = models_config.get('base_models', [])
enabled_models = [m for m in base_models if m.get('enabled', True)]
print(f"   🤖 Models to train: {len(enabled_models)}")

# Show all configured models
print(f"\n📋 Configured Models:")
for model in enabled_models:
    print(f"   ✅ {model['name']} -> {model['model_id']}")
    print(f"      🏷️ Labels: {model['num_labels']}")
    print(f"      ✅ Enabled: {model.get('enabled', True)}")

# Load processed datasets
print(f"\n📂 Loading Processed Datasets:")
processed_data_dir = data_config.get('processed_data_dir', 'data/processed')
train_path = Path(f"../{processed_data_dir}/train.csv")
val_path = Path(f"../{processed_data_dir}/validation.csv")

if train_path.exists() and val_path.exists():
    train_df = pd.read_csv(train_path)
    val_df = pd.read_csv(val_path)
    
    print(f"   ✅ Training data loaded:")
    print(f"      📊 Train: {len(train_df)} samples")
    print(f"      📊 Validation: {len(val_df)} samples")
    print(f"      🏷️ Labels: {sorted(train_df['label'].unique())}")
    
    logger.info(f"Successfully loaded training data")
else:
    logger.error(f"Processed data not found at {train_path} or {val_path}")
    raise FileNotFoundError("Processed data not found. Please run 1_data_processing_generalised.ipynb first.")

2025-08-18 14:33:45,117 - pipeline.training - INFO - 🔍 Checking training prerequisites...
2025-08-18 14:33:45,153 - pipeline.training - INFO - Successfully loaded training data
2025-08-18 14:33:45,153 - pipeline.training - INFO - Successfully loaded training data


✅ Data processing verification passed
🏋️ Training Configuration:
   📊 Batch size: 16
   🔄 Epochs: 3
   📈 Learning rate: 2e-05
   🤖 Models to train: 1

📋 Configured Models:
   ✅ smollm3-financial -> HuggingFaceTB/SmolLM3-3B 🔥 NEW!

🚀 SmolLM3 Models Detected:
   🔥 smollm3-financial (HuggingFaceTB/SmolLM3-3B)
      🏷️ Labels: 3
      ✅ Enabled: True

📂 Loading Processed Datasets:
   ✅ Training data loaded:
      📊 Train: 4361 samples
      📊 Validation: 485 samples
      🏷️ Labels: ['negative', 'neutral', 'positive']


In [None]:
# Dynamic training optimisations based on hardware and model requirements
logger.info("⚡ Applying dynamic training optimisations...")

# Detect device capabilities and optimise settings
device_info = torch.cuda.get_device_properties(0) if torch.cuda.is_available() else None
is_mps = torch.backends.mps.is_available() if hasattr(torch.backends, 'mps') else False

print("⚡ Training Optimisations:")

# Hardware-based optimisation
if device_info:
    gpu_memory_gb = device_info.total_memory / 1e9
    print(f"   💾 GPU Memory: {gpu_memory_gb:.1f} GB")
    
    # Adjust batch size based on GPU memory
    if gpu_memory_gb < 8:
        training_config['batch_size'] = 2
        print(f"   📊 Batch size optimised to 2 (limited GPU memory)")
    elif gpu_memory_gb < 16:
        training_config['batch_size'] = 4
        print(f"   📊 Batch size optimised to 4 (moderate GPU memory)")
    else:
        training_config['batch_size'] = 8
        print(f"   📊 Batch size optimised to 8 (high GPU memory)")
elif is_mps:
    # Apple Silicon MPS optimisation
    training_config['batch_size'] = 4
    print(f"   🍎 Apple Silicon MPS detected: batch size set to 4")
    print(f"   ⚡ Optimised for efficient local training")
else:
    # CPU training - moderate batch size
    training_config['batch_size'] = 4
    print(f"   💻 CPU training: batch size set to 4")

# Apply model-specific optimisations from configuration
model_specific_configs = training_config.get('model_specific_configs', {})
print(f"\n🤖 Model-Specific Optimisations:")

for model in enabled_models:
    model_name = model['name']
    model_id = model['model_id']
    
    # Categorise model by type or size hints in the model ID
    model_category = None
    
    # Check if model is explicitly listed in config categories
    for category, config in model_specific_configs.items():
        if model_name in config.get('models', []):
            model_category = category
            break
    
    # If not found, categorise by model ID patterns
    if not model_category:
        model_id_lower = model_id.lower()
        if any(pattern in model_id_lower for pattern in ['tiny', '135m', 'small']):
            model_category = 'small_models'
        elif any(pattern in model_id_lower for pattern in ['3b', '1b', 'large']):
            model_category = 'large_models'
        else:
            model_category = 'medium_models'
    
    # Apply category-specific optimisations
    if model_category in model_specific_configs:
        category_config = model_specific_configs[model_category]
        print(f"   🔧 {model_name} -> {model_category}")
        print(f"      📊 Recommended batch size: {category_config.get('batch_size', 'default')}")
        print(f"      📈 Learning rate: {category_config.get('learning_rate', 'default')}")
        print(f"      🔄 Epochs: {category_config.get('num_epochs', 'default')}")
        
        if 'memory_requirements' in category_config:
            print(f"      💾 Memory needs: {category_config['memory_requirements']}")
        
        # Store model-specific config for later use
        training_config[f'{model_name}_config'] = category_config

# Enable gradient accumulation for effective larger batch size
if training_config['batch_size'] < 8:
    training_config['gradient_accumulation_steps'] = max(1, 8 // training_config['batch_size'])
    print(f"\n🔄 Gradient accumulation steps: {training_config['gradient_accumulation_steps']}")

# Sequence length from configuration with fallback
training_config['max_length'] = data_config.get('max_sequence_length', 128)
print(f"📏 Max sequence length: {training_config['max_length']}")

# Enable mixed precision if available (but not on MPS - can cause issues)
if torch.cuda.is_available() and hasattr(torch.cuda, 'amp'):
    training_config['fp16'] = True
    print(f"⚡ Mixed precision (FP16) enabled")
elif is_mps:
    training_config['fp16'] = False
    print(f"🍎 Mixed precision disabled for MPS compatibility")

# Memory management settings for MPS
if is_mps:
    training_config['dataloader_pin_memory'] = False
    training_config['dataloader_num_workers'] = 0
    training_config['save_total_limit'] = 2
    print(f"🧹 Memory management optimisations for MPS enabled")

# Show final training configuration
print(f"\n📋 Final Training Configuration:")
print(f"   📊 Base batch size: {training_config.get('batch_size', 4)}")
print(f"   🔄 Gradient accumulation: {training_config.get('gradient_accumulation_steps', 1)}")
print(f"   📏 Max sequence length: {training_config.get('max_length', 128)}")
print(f"   🔄 Epochs: {training_config.get('num_epochs', 3)}")
print(f"   📈 Learning rate: {training_config.get('learning_rate', 2e-5)}")
print(f"   ⚡ Mixed precision: {training_config.get('fp16', False)}")

print(f"   ✅ Dynamic optimisations applied")
logger.info("Training optimisations completed")

2025-08-18 14:33:45,177 - pipeline.training - INFO - ⚡ Applying training optimizations...
2025-08-18 14:33:45,253 - pipeline.training - INFO - Training optimizations completed
2025-08-18 14:33:45,253 - pipeline.training - INFO - Training optimizations completed


⚡ Training Optimizations:
   🍎 Apple Silicon MPS detected: batch size set to 1
   ⚠️ Using aggressive memory conservation for MPS
   🔄 Gradient accumulation steps: 8
   📏 Max sequence length: 64
   🍎 Mixed precision disabled for MPS compatibility

🔥 SmolLM3 Memory Optimizations:
   📦 SmolLM3 batch size: 1
   🔄 SmolLM3 gradient accumulation: 8
   📏 SmolLM3 max length: 64
   💾 SmolLM3 requires ~6-8GB RAM for training
   🔄 Reduced to 1 epoch for faster training (increase later if needed)
   🧹 Memory management optimizations for MPS enabled
   ✅ Optimizations applied


In [None]:
# Main training loop for all configured models
logger.info("🚀 Starting model training loop...")

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"🔧 Training device: {device}")

trained_models = {}
training_results = {}

print(f"\n🏋️ Training Models:")
print(f"{'='*60}")

# Get base models from configuration
base_models = models_config.get('base_models', [])

for model_config in base_models:
    if not model_config.get('enabled', True):
        print(f"⏭️ Skipping {model_config['name']} (disabled)")
        continue
        
    try:
        model_name = model_config['name']
        model_id = model_config['model_id']
        
        print(f"\n🤖 Training {model_name}:")
        logger.info(f"Starting training for {model_name}")
        
        print(f"   🔧 Base model: {model_id}")
        print(f"   📈 Train samples: {len(train_df)}")
        print(f"   📊 Validation samples: {len(val_df)}")
        print(f"   📊 Optimised batch size: {training_config.get('batch_size', 16)}")
        
        # Create label mapping
        unique_labels = sorted(train_df['label'].unique())
        label2id = {label: idx for idx, label in enumerate(unique_labels)}
        id2label = {idx: label for label, idx in label2id.items()}
        
        print(f"   🏷️ Labels: {unique_labels}")
        
        # Data validation
        print(f"   🔍 Data validation:")
        print(f"      📊 Train label distribution: {dict(train_df['label'].value_counts())}")
        print(f"      📊 Val label distribution: {dict(val_df['label'].value_counts())}")
        
        # Check for any problematic labels
        train_unknown = set(train_df['label'].unique()) - set(unique_labels)
        val_unknown = set(val_df['label'].unique()) - set(unique_labels)
        
        if train_unknown:
            logger.warning(f"Unknown labels in train set: {train_unknown}")
        if val_unknown:
            logger.warning(f"Unknown labels in validation set: {val_unknown}")
            print(f"      ⚠️ Unknown validation labels: {val_unknown}")
        
        # Check for null values
        train_nulls = train_df['label'].isnull().sum()
        val_nulls = val_df['label'].isnull().sum()
        
        if train_nulls > 0:
            print(f"      ❌ {train_nulls} null labels in train set")
        if val_nulls > 0:
            print(f"      ❌ {val_nulls} null labels in validation set")
        
        # Load tokeniser and model
        tokenizer = AutoTokenizer.from_pretrained(model_id)
        model = AutoModelForSequenceClassification.from_pretrained(
            model_id,
            num_labels=len(unique_labels),
            label2id=label2id,
            id2label=id2label
        )
        
        # Prepare datasets with optimised tokenisation
        def tokenize_function(examples):
            return tokenizer(
                examples['text'], 
                truncation=True, 
                padding=False,  # Disable padding here - let DataCollator handle it
                max_length=training_config.get('max_length', 128),  # Use optimised max length
                return_tensors=None
            )
        
        def prepare_dataset(df):
            # Convert labels to ids
            df_processed = df.copy()
            
            # Check for missing labels before mapping
            missing_labels = set(df_processed['label'].unique()) - set(label2id.keys())
            if missing_labels:
                logger.warning(f"Found labels not in training set: {missing_labels}")
                print(f"   ⚠️ Warning: Unknown labels found: {missing_labels}")
                # Filter out rows with unknown labels
                df_processed = df_processed[df_processed['label'].isin(label2id.keys())]
                print(f"   📊 Dataset size after filtering: {len(df_processed)} samples")
            
            # Map labels to ids
            df_processed['labels'] = df_processed['label'].map(label2id)
            
            # Check for any remaining NaN values
            nan_count = df_processed['labels'].isna().sum()
            if nan_count > 0:
                logger.error(f"Found {nan_count} NaN values in labels after mapping")
                print(f"   ❌ Error: {nan_count} NaN values in labels")
                # Drop rows with NaN labels
                df_processed = df_processed.dropna(subset=['labels'])
                print(f"   📊 Dataset size after dropping NaN: {len(df_processed)} samples")
            
            # Ensure labels are integers and finite
            df_processed['labels'] = df_processed['labels'].astype(int)
            
            # Validate no infinite values
            if not np.isfinite(df_processed['labels']).all():
                logger.error("Found non-finite values in labels")
                raise ValueError("Labels contain non-finite values")
            
            # Create HuggingFace dataset
            dataset = Dataset.from_pandas(df_processed[['text', 'labels']])
            dataset = dataset.map(
                tokenize_function, 
                batched=True, 
                remove_columns=['text'],  # Remove original text column
                num_proc=1  # Single process to avoid multiprocessing overhead
            )
            
            return dataset
        
        train_dataset = prepare_dataset(train_df)
        val_dataset = prepare_dataset(val_df)
        
        print(f"   🔄 Tokenisation completed")
        
        # Apply model-specific training configuration
        model_specific_key = f'{model_name}_config'
        model_specific = training_config.get(model_specific_key, {})
        
        # Use model-specific settings if available, otherwise use defaults
        model_batch_size = model_specific.get('batch_size', training_config.get('batch_size', 4))
        model_learning_rate = model_specific.get('learning_rate', training_config.get('learning_rate', 2e-5))
        model_epochs = model_specific.get('num_epochs', training_config.get('num_epochs', 3))
        model_grad_accum = model_specific.get('gradient_accumulation_steps', training_config.get('gradient_accumulation_steps', 1))
        
        print(f"   🎯 Model-specific settings:")
        print(f"      📊 Batch size: {model_batch_size}")
        print(f"      📈 Learning rate: {model_learning_rate}")
        print(f"      🔄 Epochs: {model_epochs}")
        print(f"      🔄 Gradient accumulation: {model_grad_accum}")
        
        # Training arguments with model-specific optimisations
        output_dir = Path(f"../models/{model_name}")
        output_dir.mkdir(parents=True, exist_ok=True)
        
        training_args = TrainingArguments(
            output_dir=str(output_dir),
            num_train_epochs=model_epochs,
            per_device_train_batch_size=model_batch_size,
            per_device_eval_batch_size=model_batch_size,
            gradient_accumulation_steps=model_grad_accum,
            warmup_steps=training_config.get('warmup_steps', 50),
            weight_decay=training_config.get('weight_decay', 0.01),
            learning_rate=model_learning_rate,
            logging_dir=str(output_dir / 'logs'),
            logging_steps=training_config.get('logging_steps', 25),
            eval_strategy="epoch",
            save_strategy="epoch",
            load_best_model_at_end=True,
            metric_for_best_model="eval_loss",
            greater_is_better=False,
            report_to=None,  # Disable wandb/tensorboard
            dataloader_num_workers=0,  # Avoid multiprocessing issues
            save_total_limit=1,  # Save space
            fp16=training_config.get('fp16', False),
            dataloader_pin_memory=False,  # Reduce memory overhead
        )
        
        # Data collator - handles dynamic padding and tensor conversion
        data_collator = DataCollatorWithPadding(
            tokenizer=tokenizer,
            padding=True,
            return_tensors='pt'
        )
        
        # Metrics function
        def compute_metrics(eval_pred):
            predictions, labels = eval_pred
            predictions = np.argmax(predictions, axis=1)
            
            accuracy = accuracy_score(labels, predictions)
            
            return {
                'accuracy': accuracy,
            }
        
        # Create trainer
        trainer = Trainer(
            model=model,
            args=training_args,
            train_dataset=train_dataset,
            eval_dataset=val_dataset,
            tokenizer=tokenizer,
            data_collator=data_collator,
            compute_metrics=compute_metrics,
        )
        
        print(f"   🏋️ Starting training... (estimated time reduced with optimisations)")
        
        # Train model
        train_result = trainer.train()
        
        # Evaluate model
        eval_result = trainer.evaluate()
        
        print(f"   ✅ Training completed!")
        print(f"   📊 Final validation accuracy: {eval_result['eval_accuracy']:.4f}")
        print(f"   📈 Final validation loss: {eval_result['eval_loss']:.4f}")
        
        # Save model and tokeniser
        trainer.save_model()
        tokenizer.save_pretrained(output_dir)
        
        # Save label mapping
        import pickle
        with open(output_dir / 'label_encoder.pkl', 'wb') as f:
            pickle.dump({'label2id': label2id, 'id2label': id2label}, f)
        
        # Store results
        trained_models[model_name] = {
            'model_path': str(output_dir),
            'base_model': model_id,
            'labels': unique_labels,
            'label_mapping': {'label2id': label2id, 'id2label': id2label}
        }
        
        training_results[model_name] = {
            'train_loss': train_result.training_loss,
            'eval_loss': eval_result['eval_loss'],
            'eval_accuracy': eval_result['eval_accuracy'],
            'train_runtime': train_result.metrics.get('train_runtime', 0),
            'samples_per_second': train_result.metrics.get('train_samples_per_second', 0)
        }
        
        logger.info(f"Successfully trained {model_name}")
        
    except Exception as e:
        logger.error(f"Failed to train {model_name}: {str(e)}")
        print(f"   ❌ Training failed: {str(e)}")
        continue

print(f"\n{'='*60}")
print(f"🎉 Training completed! {len(trained_models)}/{len(enabled_models)} models trained successfully")

logger.info(f"Training completed: {len(trained_models)} models trained")

2025-08-18 14:33:45,291 - pipeline.training - INFO - 🚀 Starting model training loop...
2025-08-18 14:33:45,295 - pipeline.training - INFO - Starting training for smollm3-financial
2025-08-18 14:33:45,295 - pipeline.training - INFO - Starting training for smollm3-financial


🔧 Training device: cpu

🏋️ Training Models:
⏭️ Skipping tinybert-financial-classifier (disabled)
⏭️ Skipping finbert-tone (disabled)
⏭️ Skipping distilbert-base (disabled)

🤖 Training smollm3-financial:
   🔧 Base model: HuggingFaceTB/SmolLM3-3B
   📈 Train samples: 4361
   📊 Validation samples: 485
   📊 Optimized batch size: 1
   🏷️ Labels: ['negative', 'neutral', 'positive']
   🔍 Data validation:
      📊 Train label distribution: {'neutral': 2591, 'positive': 1227, 'negative': 543}
      📊 Val label distribution: {'neutral': 288, 'positive': 136, 'negative': 61}
   🔧 Applying SmolLM3-specific optimizations...
      📏 Using extended max_length: 64 for SmolLM3
   🔧 Applying SmolLM3-specific optimizations...
      📏 Using extended max_length: 64 for SmolLM3


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.18G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Some weights of SmolLM3ForSequenceClassification were not initialized from the model checkpoint at HuggingFaceTB/SmolLM3-3B and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Map:   0%|          | 0/4361 [00:00<?, ? examples/s]

Map:   0%|          | 0/485 [00:00<?, ? examples/s]

   🔄 Tokenization completed
      🏋️ Applying SmolLM3 training optimizations...
      📊 Adjusted batch_size: 1, grad_accum: 16
      📈 Adjusted learning_rate: 1e-05


  trainer = Trainer(
2025-08-18 14:42:10,220 - pipeline.training - ERROR - Failed to train smollm3-financial: MPS backend out of memory (MPS allocated: 8.98 GB, other allocations: 384.00 KB, max allowed: 9.07 GB). Tried to allocate 86.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
2025-08-18 14:42:10,322 - pipeline.training - INFO - Training completed: 0 models trained
2025-08-18 14:42:10,220 - pipeline.training - ERROR - Failed to train smollm3-financial: MPS backend out of memory (MPS allocated: 8.98 GB, other allocations: 384.00 KB, max allowed: 9.07 GB). Tried to allocate 86.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
2025-08-18 14:42:10,322 - pipeline.training - INFO - Training completed: 0 models trained


   ❌ Training failed: MPS backend out of memory (MPS allocated: 8.98 GB, other allocations: 384.00 KB, max allowed: 9.07 GB). Tried to allocate 86.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

🎉 Training completed! 0/1 models trained successfully


In [None]:
# Save training results and complete training step
logger.info("💾 Saving training results and updating pipeline state...")

# Create comprehensive training summary
training_summary = {
    'training_timestamp': datetime.now().isoformat(),
    'models_trained': list(trained_models.keys()),
    'training_config': training_config,
    'results': training_results,
    'model_details': trained_models
}

# Update pipeline state
state.mark_step_complete('model_training_completed', **training_summary)

# Save training report
results_dir = Path("../results")
results_dir.mkdir(exist_ok=True)

with open(results_dir / 'training_report.json', 'w') as f:
    json.dump(training_summary, f, indent=2)

# Create training results visualisation
if len(training_results) > 0:
    plt.figure(figsize=(15, 5))
    
    # Subplot 1: Training Loss
    plt.subplot(1, 3, 1)
    model_names = list(training_results.keys())
    train_losses = [training_results[m]['train_loss'] for m in model_names]
    plt.bar(model_names, train_losses)
    plt.title('Training Loss by Model')
    plt.ylabel('Training Loss')
    plt.xticks(rotation=45)
    
    # Subplot 2: Validation Loss
    plt.subplot(1, 3, 2)
    eval_losses = [training_results[m]['eval_loss'] for m in model_names]
    plt.bar(model_names, eval_losses)
    plt.title('Validation Loss by Model')
    plt.ylabel('Validation Loss')
    plt.xticks(rotation=45)
    
    # Subplot 3: Validation Accuracy
    plt.subplot(1, 3, 3)
    eval_accuracies = [training_results[m]['eval_accuracy'] for m in model_names]
    plt.bar(model_names, eval_accuracies)
    plt.title('Validation Accuracy by Model')
    plt.ylabel('Accuracy')
    plt.xticks(rotation=45)
    plt.ylim(0, 1)
    
    plt.tight_layout()
    plt.savefig(results_dir / 'training_results_comparison.png', dpi=300, bbox_inches='tight')
    plt.show()

print(f"\n{'='*60}")
print("🎉 MODEL TRAINING COMPLETED SUCCESSFULLY!")
print(f"{'='*60}")
print("📝 Next Steps:")
print("1. Run 3_convert_to_onnx_generalised.ipynb to convert models to ONNX")
print("2. Run 4_benchmarks_generalised.ipynb to benchmark performance")
print("3. Continue with the sequential pipeline: 5 → 6")

print(f"\n🏋️ Training Summary:")
print(f"   🤖 Models trained: {len(trained_models)}")

for model_name, results in training_results.items():
    print(f"   📊 {model_name}:")
    print(f"      🎯 Validation accuracy: {results['eval_accuracy']:.4f}")
    print(f"      📉 Validation loss: {results['eval_loss']:.4f}")
    print(f"      📁 Model saved to: {trained_models[model_name]['model_path']}")

print(f"\n📄 Training report saved to: {results_dir / 'training_report.json'}")
print(f"📊 Results visualisation saved to: {results_dir / 'training_results_comparison.png'}")

logger.info("✅ Model training completed successfully")

2025-08-18 14:42:11,958 - pipeline.training - INFO - 💾 Saving training results and updating pipeline state...
2025-08-18 14:42:11,976 - pipeline.training - INFO - ✅ Model training completed successfully
2025-08-18 14:42:11,976 - pipeline.training - INFO - ✅ Model training completed successfully



🎉 MODEL TRAINING COMPLETED SUCCESSFULLY!
📝 Next Steps:
1. Run 3_convert_to_onnx.ipynb to convert models to ONNX
2. Run 4_benchmarks.ipynb to benchmark performance
3. Continue with the sequential pipeline: 5 → 6

🏋️ Training Summary:
   🤖 Models trained: 0

📄 Training report saved to: ../results/training_report.json
📊 Results visualization saved to: ../results/training_results_comparison.png
