# End-to-End Intelligent Content Analysis Platform

**Project Title:** Advanced Multi-Modal AI System for Content Understanding  
**Framework:** PyTorch Deep Learning Mastery Hub  
**Authors:** PyTorch Mastery Hub Team  
**Date:** December 2024  
**Version:** 1.0.0

## Overview

This capstone project represents the culmination of the PyTorch Mastery Hub curriculum, integrating advanced neural architectures, multi-modal learning, production deployment, and MLOps practices into a comprehensive AI system for intelligent content analysis.

### 🎯 **Project Objectives**
1. **Advanced Neural Architectures**: Implement cutting-edge CNN, Transformer, and attention mechanisms
2. **Multi-Modal Fusion**: Combine vision and language understanding with cross-attention
3. **Production Deployment**: Create scalable APIs with real-time inference capabilities
4. **MLOps Integration**: Implement monitoring, versioning, and continuous deployment
5. **Research Standards**: Maintain reproducibility and ethical AI practices
6. **Industry Readiness**: Build enterprise-grade solutions with comprehensive testing

### 🏗️ **System Architecture**
- **Vision Encoder**: ResNet50 backbone with custom attention mechanisms
- **Text Encoder**: Transformer-based sequence processing with positional encoding
- **Multi-Modal Fusion**: Cross-attention and feature combination networks
- **Serving Infrastructure**: FastAPI with asynchronous processing
- **Monitoring Pipeline**: Comprehensive MLOps with real-time metrics
- **Research Framework**: Full reproducibility and collaboration tools

### 📊 **Key Features**
- Multi-task learning (content scoring, sentiment analysis, topic classification)
- Automatic loss weighting for balanced multi-task optimization
- Mixed precision training for enhanced performance
- Advanced data augmentation and regularization
- Real-time inference APIs with load balancing
- Comprehensive model monitoring and drift detection

---

## 1. Environment Setup and Configuration

### 1.1 Import Dependencies and Setup

```python
# 🎯 PyTorch Mastery Hub - Capstone Project (Part 1)
# End-to-End Multi-Modal AI System for Content Understanding

"""
CAPSTONE PROJECT: INTELLIGENT CONTENT ANALYSIS PLATFORM

This capstone project integrates EVERYTHING we've learned throughout the PyTorch Mastery Hub:
- Advanced neural architectures (CNNs, RNNs, Transformers)
- Multi-modal learning (vision + language)
- Production deployment and monitoring
- Research methodologies and ethics
- Industry collaboration frameworks

PROJECT OVERVIEW:
Build an end-to-end AI system that can:
1. Process images and extract visual features
2. Analyze text content for sentiment and topics
3. Combine multi-modal information for content scoring
4. Provide real-time inference APIs
5. Monitor model performance and drift
6. Scale horizontally with load balancing
7. Maintain research reproducibility and ethics compliance
"""

# Core PyTorch and ML libraries
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as transforms
from torchvision.models import resnet50

# Data processing and analysis
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image
import cv2

# Utilities and system libraries
import json
import os
import time
import pickle
import warnings
import hashlib
import yaml
import logging
import asyncio
from pathlib import Path
from typing import Dict, List, Tuple, Optional, Any, Union
from dataclasses import dataclass, field, asdict
from datetime import datetime, timedelta
from collections import defaultdict, Counter, OrderedDict
import itertools
import random
from tqdm import tqdm
import math
import requests
import base64
import io
import re
from concurrent.futures import ThreadPoolExecutor
import threading
import queue
import sqlite3

# Production serving and APIs
from fastapi import FastAPI, HTTPException, Depends, BackgroundTasks, UploadFile, File
from fastapi.middleware.cors import CORSMiddleware
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from pydantic import BaseModel, validator
import redis
import uvicorn

# Monitoring and MLOps
import psutil
from prometheus_client import Counter, Histogram, Gauge, generate_latest
import mlflow
import wandb

# Research and evaluation
from scipy import stats
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

# Configure environment
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
warnings.filterwarnings('ignore')

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"🚀 Capstone Project initialized on device: {device}")

# Set random seeds for reproducibility
RANDOM_SEED = 42
torch.manual_seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)
random.seed(RANDOM_SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed(RANDOM_SEED)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

print(f"✅ Environment configured with seed: {RANDOM_SEED}")
```

### 1.2 Project Directory Structure

```python
# Create comprehensive project directory structure
capstone_dir = Path("../../results/notebooks/capstone_project")
capstone_dir.mkdir(parents=True, exist_ok=True)

# Define project subdirectories
project_structure = {
    'models': 'Trained model checkpoints and architectures',
    'data': 'Dataset storage and processing',
    'experiments': 'Experiment tracking and results',
    'serving': 'Production API and serving components',
    'monitoring': 'MLOps monitoring and metrics',
    'research': 'Research artifacts and reproducibility',
    'deployment': 'Deployment configurations and scripts',
    'logs': 'Training and system logs',
    'artifacts': 'Generated artifacts and outputs',
    'ethics': 'Ethical AI documentation and assessments',
    'notebooks': 'Analysis and exploration notebooks',
    'tests': 'Unit and integration tests',
    'configs': 'Configuration files and hyperparameters'
}

# Create directories and document structure
for subdir, description in project_structure.items():
    dir_path = capstone_dir / subdir
    dir_path.mkdir(exist_ok=True)
    
    # Create README for each directory
    readme_path = dir_path / 'README.md'
    with open(readme_path, 'w') as f:
        f.write(f"# {subdir.title()}\n\n{description}\n\n")
        f.write(f"Created: {datetime.now().isoformat()}\n")

print(f"📁 Capstone project directory: {capstone_dir}")
print(f"🏗️ Created {len(project_structure)} subdirectories with documentation")

# Log project initialization
project_metadata = {
    'project_name': 'Multi-Modal Content Analysis Platform',
    'version': '1.0.0',
    'creation_time': datetime.now().isoformat(),
    'device': str(device),
    'random_seed': RANDOM_SEED,
    'directory_structure': project_structure,
    'pytorch_version': torch.__version__,
    'cuda_available': torch.cuda.is_available(),
    'cuda_version': torch.version.cuda if torch.cuda.is_available() else None
}

with open(capstone_dir / 'project_metadata.json', 'w') as f:
    json.dump(project_metadata, f, indent=2)

print(f"✅ Project metadata saved to: {capstone_dir / 'project_metadata.json'}")
```

---

## 2. Advanced Neural Architecture Components

### 2.1 Attention Mechanisms

```python
class AttentionModule(nn.Module):
    """
    Advanced self-attention mechanism for feature enhancement.
    
    This module implements scaled dot-product attention with residual connections
    and layer normalization for improved feature representation.
    
    Args:
        input_dim (int): Input feature dimension
        attention_dim (int): Attention computation dimension
    """
    
    def __init__(self, input_dim: int, attention_dim: int = 128):
        super().__init__()
        self.input_dim = input_dim
        self.attention_dim = attention_dim
        
        # Attention projection layers
        self.query = nn.Linear(input_dim, attention_dim)
        self.key = nn.Linear(input_dim, attention_dim)
        self.value = nn.Linear(input_dim, attention_dim)
        self.output_proj = nn.Linear(attention_dim, input_dim)
        
        # Attention scaling and regularization
        self.scale = attention_dim ** -0.5
        self.dropout = nn.Dropout(0.1)
        
        # Layer normalization for stability
        self.layer_norm = nn.LayerNorm(input_dim)
        
    def forward(self, x):
        """
        Forward pass through attention mechanism.
        
        Args:
            x: Input tensor of shape (batch_size, seq_len, input_dim) or (batch_size, input_dim)
            
        Returns:
            output: Attention-enhanced features
            attention_weights: Attention weight matrix for visualization
        """
        # Handle 2D input by adding sequence dimension
        if len(x.shape) == 2:
            x = x.unsqueeze(1)  # (batch_size, 1, input_dim)
            squeeze_output = True
        else:
            squeeze_output = False
            
        batch_size, seq_len, _ = x.shape
        
        # Compute query, key, value projections
        Q = self.query(x)  # (batch_size, seq_len, attention_dim)
        K = self.key(x)    # (batch_size, seq_len, attention_dim)
        V = self.value(x)  # (batch_size, seq_len, attention_dim)
        
        # Scaled dot-product attention
        attention_scores = torch.matmul(Q, K.transpose(-2, -1)) * self.scale
        attention_weights = F.softmax(attention_scores, dim=-1)
        attention_weights = self.dropout(attention_weights)
        
        # Apply attention to values
        attended_values = torch.matmul(attention_weights, V)
        
        # Output projection
        output = self.output_proj(attended_values)
        
        # Residual connection and layer normalization
        output = self.layer_norm(output + x)
        
        if squeeze_output:
            output = output.squeeze(1)
            attention_weights = attention_weights.squeeze(1)
            
        return output, attention_weights

# Test attention module
print("🧠 Testing Attention Module...")
attention_module = AttentionModule(input_dim=512, attention_dim=128)
test_input = torch.randn(4, 512)  # Batch of 4, feature dim 512
attended_output, attention_weights = attention_module(test_input)

print(f"  Input shape: {test_input.shape}")
print(f"  Output shape: {attended_output.shape}")
print(f"  Attention weights shape: {attention_weights.shape}")
print(f"  ✅ Attention module working correctly")
```

### 2.2 Vision Encoder with Advanced Features

```python
class VisionEncoder(nn.Module):
    """
    Advanced CNN encoder with attention mechanisms and feature enhancement.
    
    This encoder uses a pretrained ResNet50 backbone with additional processing
    layers and attention mechanisms for improved visual feature extraction.
    
    Args:
        output_dim (int): Dimension of output feature vectors
        pretrained (bool): Whether to use pretrained ResNet weights
        freeze_backbone (bool): Whether to freeze backbone parameters
    """
    
    def __init__(self, output_dim: int = 512, pretrained: bool = True, freeze_backbone: bool = False):
        super().__init__()
        self.output_dim = output_dim
        self.pretrained = pretrained
        
        # Pretrained ResNet50 backbone
        self.backbone = resnet50(pretrained=pretrained)
        
        # Remove the final classification layer
        self.backbone.fc = nn.Identity()
        
        # Freeze backbone if requested
        if freeze_backbone:
            for param in self.backbone.parameters():
                param.requires_grad = False
            print("🔒 Vision backbone frozen for transfer learning")
        
        # Feature dimension from ResNet50
        backbone_dim = 2048
        
        # Advanced feature processing pipeline
        self.feature_processor = nn.Sequential(
            nn.Linear(backbone_dim, 1024),
            nn.BatchNorm1d(1024),
            nn.ReLU(inplace=True),
            nn.Dropout(0.2),
            nn.Linear(1024, output_dim),
            nn.BatchNorm1d(output_dim),
            nn.ReLU(inplace=True)
        )
        
        # Self-attention for feature enhancement
        self.attention = AttentionModule(output_dim)
        
        # Final output projection with layer normalization
        self.output_projection = nn.Sequential(
            nn.Linear(output_dim, output_dim),
            nn.LayerNorm(output_dim),
            nn.Dropout(0.1)
        )
        
        # Initialize custom layers
        self._initialize_weights()
        
    def _initialize_weights(self):
        """Initialize custom layer weights using Xavier initialization."""
        for module in [self.feature_processor, self.output_projection]:
            for layer in module:
                if isinstance(layer, nn.Linear):
                    nn.init.xavier_uniform_(layer.weight)
                    if layer.bias is not None:
                        nn.init.zeros_(layer.bias)
                elif isinstance(layer, nn.BatchNorm1d):
                    nn.init.ones_(layer.weight)
                    nn.init.zeros_(layer.bias)
        
    def forward(self, images):
        """
        Forward pass through vision encoder.
        
        Args:
            images: Input images tensor of shape (batch_size, 3, H, W)
            
        Returns:
            output_features: Enhanced visual features
            attention_weights: Attention visualization weights
        """
        batch_size = images.shape[0]
        
        # Extract features using ResNet backbone
        backbone_features = self.backbone(images)  # (batch_size, 2048)
        
        # Process features through custom layers
        processed_features = self.feature_processor(backbone_features)  # (batch_size, output_dim)
        
        # Apply self-attention for feature enhancement
        attended_features, attention_weights = self.attention(processed_features)
        
        # Final projection and normalization
        output_features = self.output_projection(attended_features)
        
        return output_features, attention_weights
    
    def get_feature_maps(self, images, layer_name='layer4'):
        """
        Extract intermediate feature maps for visualization.
        
        Args:
            images: Input images
            layer_name: Name of layer to extract features from
            
        Returns:
            feature_maps: Intermediate feature representations
        """
        def hook_fn(module, input, output):
            self.feature_maps = output
        
        # Register hook
        layer = getattr(self.backbone, layer_name)
        handle = layer.register_forward_hook(hook_fn)
        
        # Forward pass
        with torch.no_grad():
            _ = self.backbone(images)
        
        # Remove hook
        handle.remove()
        
        return self.feature_maps

# Test vision encoder
print("\n👁️ Testing Vision Encoder...")
vision_encoder = VisionEncoder(output_dim=512, pretrained=True)
test_images = torch.randn(2, 3, 224, 224)  # Batch of 2 RGB images

with torch.no_grad():
    vision_features, vision_attention = vision_encoder(test_images)

print(f"  Input images shape: {test_images.shape}")
print(f"  Output features shape: {vision_features.shape}")
print(f"  Vision attention shape: {vision_attention.shape}")

# Count parameters
total_params = sum(p.numel() for p in vision_encoder.parameters())
trainable_params = sum(p.numel() for p in vision_encoder.parameters() if p.requires_grad)
print(f"  Total parameters: {total_params:,}")
print(f"  Trainable parameters: {trainable_params:,}")
print(f"  ✅ Vision encoder working correctly")
```

### 2.3 Text Encoder with Transformer Architecture

```python
class TextEncoder(nn.Module):
    """
    Advanced Transformer-based text encoder with positional encoding and attention.
    
    This encoder processes text sequences using multi-head self-attention and
    feed-forward networks, similar to BERT but optimized for our specific task.
    
    Args:
        vocab_size (int): Size of vocabulary
        embed_dim (int): Embedding dimension
        num_heads (int): Number of attention heads
        num_layers (int): Number of transformer layers
        max_seq_len (int): Maximum sequence length
    """
    
    def __init__(self, vocab_size: int, embed_dim: int = 512, num_heads: int = 8, 
                 num_layers: int = 6, max_seq_len: int = 512):
        super().__init__()
        self.embed_dim = embed_dim
        self.max_seq_len = max_seq_len
        self.vocab_size = vocab_size
        
        # Embedding layers with dropout
        self.token_embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=0)
        self.position_embedding = nn.Embedding(max_seq_len, embed_dim)
        self.embedding_dropout = nn.Dropout(0.1)
        
        # Transformer encoder stack
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=embed_dim,
            nhead=num_heads,
            dim_feedforward=embed_dim * 4,
            dropout=0.1,
            activation='gelu',  # GELU activation for better performance
            batch_first=True,
            norm_first=True  # Pre-layer normalization
        )
        self.transformer = nn.TransformerEncoder(
            encoder_layer, 
            num_layers=num_layers,
            norm=nn.LayerNorm(embed_dim)
        )
        
        # Output processing
        self.pooling = nn.AdaptiveAvgPool1d(1)
        self.output_projection = nn.Sequential(
            nn.Linear(embed_dim, embed_dim),
            nn.LayerNorm(embed_dim),
            nn.Dropout(0.1)
        )
        
        # Initialize embeddings
        self._initialize_weights()
        
    def _initialize_weights(self):
        """Initialize embedding weights using normal distribution."""
        nn.init.normal_(self.token_embedding.weight, mean=0, std=0.02)
        nn.init.normal_(self.position_embedding.weight, mean=0, std=0.02)
        
        # Initialize output projection
        for layer in self.output_projection:
            if isinstance(layer, nn.Linear):
                nn.init.xavier_uniform_(layer.weight)
                if layer.bias is not None:
                    nn.init.zeros_(layer.bias)
    
    def create_attention_mask(self, input_ids, attention_mask):
        """
        Create attention mask for transformer (inverted for PyTorch convention).
        
        Args:
            input_ids: Token IDs
            attention_mask: Attention mask (1 for valid tokens, 0 for padding)
            
        Returns:
            src_key_padding_mask: Mask for transformer (True for padding)
        """
        return ~attention_mask.bool()
    
    def forward(self, input_ids, attention_mask=None):
        """
        Forward pass through text encoder.
        
        Args:
            input_ids: Token IDs of shape (batch_size, seq_len)
            attention_mask: Attention mask of shape (batch_size, seq_len)
            
        Returns:
            output_features: Encoded text features
            transformer_output: Full sequence output for analysis
        """
        batch_size, seq_len = input_ids.shape
        
        # Validate sequence length
        if seq_len > self.max_seq_len:
            print(f"⚠️ Sequence length {seq_len} exceeds maximum {self.max_seq_len}")
            input_ids = input_ids[:, :self.max_seq_len]
            if attention_mask is not None:
                attention_mask = attention_mask[:, :self.max_seq_len]
            seq_len = self.max_seq_len
        
        # Create position indices
        position_ids = torch.arange(seq_len, device=input_ids.device).unsqueeze(0).expand(batch_size, -1)
        
        # Token and position embeddings
        token_embeddings = self.token_embedding(input_ids)
        position_embeddings = self.position_embedding(position_ids)
        
        # Combine embeddings with dropout
        embeddings = self.embedding_dropout(token_embeddings + position_embeddings)
        
        # Create attention mask for transformer
        src_key_padding_mask = None
        if attention_mask is not None:
            src_key_padding_mask = self.create_attention_mask(input_ids, attention_mask)
        
        # Apply transformer encoder
        transformer_output = self.transformer(
            embeddings, 
            src_key_padding_mask=src_key_padding_mask
        )
        
        # Global pooling considering attention mask
        if attention_mask is not None:
            # Mask out padded positions for pooling
            masked_output = transformer_output * attention_mask.unsqueeze(-1).float()
            # Calculate mean over valid tokens
            seq_lengths = attention_mask.sum(dim=1, keepdim=True).float()
            pooled_output = masked_output.sum(dim=1) / torch.clamp(seq_lengths, min=1.0)
        else:
            # Simple average pooling
            pooled_output = transformer_output.mean(dim=1)
        
        # Final projection
        output_features = self.output_projection(pooled_output)
        
        return output_features, transformer_output
    
    def get_attention_weights(self, input_ids, attention_mask=None, layer_idx=-1):
        """
        Extract attention weights from specified transformer layer.
        
        Args:
            input_ids: Input token IDs
            attention_mask: Attention mask
            layer_idx: Layer index to extract weights from (-1 for last layer)
            
        Returns:
            attention_weights: Attention weight matrices
        """
        # This is a simplified implementation
        # In practice, you'd need to modify the transformer to return attention weights
        with torch.no_grad():
            _, transformer_output = self.forward(input_ids, attention_mask)
        
        # Return placeholder - in real implementation, capture during forward pass
        batch_size, seq_len = input_ids.shape
        return torch.ones(batch_size, 8, seq_len, seq_len) / seq_len  # Uniform attention as placeholder

# Test text encoder
print("\n📝 Testing Text Encoder...")

# Create a simple vocabulary and test text
vocab_size = 10000
text_encoder = TextEncoder(vocab_size=vocab_size, embed_dim=512, num_heads=8, num_layers=6)

# Create test input
batch_size, seq_len = 3, 128
test_input_ids = torch.randint(1, vocab_size, (batch_size, seq_len))
test_attention_mask = torch.ones(batch_size, seq_len)
# Simulate some padding
test_attention_mask[0, 100:] = 0  # First sequence has padding after position 100
test_attention_mask[1, 80:] = 0   # Second sequence has padding after position 80

with torch.no_grad():
    text_features, text_sequence = text_encoder(test_input_ids, test_attention_mask)

print(f"  Input IDs shape: {test_input_ids.shape}")
print(f"  Attention mask shape: {test_attention_mask.shape}")
print(f"  Output features shape: {text_features.shape}")
print(f"  Sequence output shape: {text_sequence.shape}")

# Count parameters
total_params = sum(p.numel() for p in text_encoder.parameters())
trainable_params = sum(p.numel() for p in text_encoder.parameters() if p.requires_grad)
print(f"  Total parameters: {total_params:,}")
print(f"  Trainable parameters: {trainable_params:,}")
print(f"  ✅ Text encoder working correctly")
```

### 2.4 Multi-Modal Fusion with Cross-Attention

```python
class MultiModalFusion(nn.Module):
    """
    Advanced multi-modal fusion module with cross-attention mechanisms.
    
    This module combines vision and text features using cross-attention,
    allowing each modality to attend to relevant parts of the other.
    
    Args:
        vision_dim (int): Dimension of vision features
        text_dim (int): Dimension of text features
        fusion_dim (int): Dimension of fused features
        num_classes (int): Number of output classes for main task
    """
    
    def __init__(self, vision_dim: int = 512, text_dim: int = 512, 
                 fusion_dim: int = 256, num_classes: int = 3):
        super().__init__()
        self.vision_dim = vision_dim
        self.text_dim = text_dim
        self.fusion_dim = fusion_dim
        self.num_classes = num_classes
        
        # Cross-attention mechanisms
        self.vision_to_text_attention = nn.MultiheadAttention(
            embed_dim=text_dim, 
            num_heads=8, 
            dropout=0.1, 
            batch_first=True
        )
        
        self.text_to_vision_attention = nn.MultiheadAttention(
            embed_dim=vision_dim, 
            num_heads=8, 
            dropout=0.1, 
            batch_first=True
        )
        
        # Feature projection layers with normalization
        self.vision_proj = nn.Sequential(
            nn.Linear(vision_dim, fusion_dim),
            nn.LayerNorm(fusion_dim),
            nn.ReLU(inplace=True),
            nn.Dropout(0.1)
        )
        
        self.text_proj = nn.Sequential(
            nn.Linear(text_dim, fusion_dim),
            nn.LayerNorm(fusion_dim),
            nn.ReLU(inplace=True),
            nn.Dropout(0.1)
        )
        
        # Fusion processing layers
        self.fusion_layers = nn.Sequential(
            nn.Linear(fusion_dim * 2, fusion_dim),
            nn.LayerNorm(fusion_dim),
            nn.ReLU(inplace=True),
            nn.Dropout(0.2),
            
            nn.Linear(fusion_dim, fusion_dim // 2),
            nn.LayerNorm(fusion_dim // 2),
            nn.ReLU(inplace=True),
            nn.Dropout(0.1)
        )
        
        # Multi-task prediction heads
        self.content_classifier = nn.Sequential(
            nn.Linear(fusion_dim // 2, fusion_dim // 4),
            nn.ReLU(inplace=True),
            nn.Dropout(0.1),
            nn.Linear(fusion_dim // 4, num_classes)
        )
        
        self.sentiment_head = nn.Sequential(
            nn.Linear(fusion_dim // 2, fusion_dim // 4),
            nn.ReLU(inplace=True),
            nn.Dropout(0.1),
            nn.Linear(fusion_dim // 4, 3)  # positive, negative, neutral
        )
        
        self.topic_head = nn.Sequential(
            nn.Linear(fusion_dim // 2, fusion_dim // 4),
            nn.ReLU(inplace=True),
            nn.Dropout(0.1),
            nn.Linear(fusion_dim // 4, 10)  # 10 topic categories
        )
        
        # Confidence estimation head
        self.confidence_head = nn.Sequential(
            nn.Linear(fusion_dim // 2, fusion_dim // 4),
            nn.ReLU(inplace=True),
            nn.Linear(fusion_dim // 4, 1),
            nn.Sigmoid()
        )
        
        # Initialize weights
        self._initialize_weights()
        
    def _initialize_weights(self):
        """Initialize all linear layers with Xavier initialization."""
        for module in [self.vision_proj, self.text_proj, self.fusion_layers,
                      self.content_classifier, self.sentiment_head, self.topic_head, self.confidence_head]:
            for layer in module:
                if isinstance(layer, nn.Linear):
                    nn.init.xavier_uniform_(layer.weight)
                    if layer.bias is not None:
                        nn.init.zeros_(layer.bias)
    
    def cross_attention_fusion(self, vision_features, text_features):
        """
        Perform cross-attention between vision and text features.
        
        Args:
            vision_features: Vision feature vectors
            text_features: Text feature vectors
            
        Returns:
            vision_attended: Vision features enhanced by text attention
            text_attended: Text features enhanced by vision attention
            attention_weights: Attention visualization weights
        """
        batch_size = vision_features.shape[0]
        
        # Add sequence dimension for attention computation
        vision_seq = vision_features.unsqueeze(1)  # (batch_size, 1, vision_dim)
        text_seq = text_features.unsqueeze(1)      # (batch_size, 1, text_dim)
        
        # Cross-attention: Vision attending to text
        vision_attended, vision_to_text_weights = self.vision_to_text_attention(
            query=vision_seq, 
            key=text_seq, 
            value=text_seq
        )
        vision_attended = vision_attended.squeeze(1)  # Remove sequence dimension
        
        # Cross-attention: Text attending to vision
        text_attended, text_to_vision_weights = self.text_to_vision_attention(
            query=text_seq, 
            key=vision_seq, 
            value=vision_seq
        )
        text_attended = text_attended.squeeze(1)  # Remove sequence dimension
        
        attention_weights = {
            'vision_to_text': vision_to_text_weights.squeeze(1),
            'text_to_vision': text_to_vision_weights.squeeze(1)
        }
        
        return vision_attended, text_attended, attention_weights
    
    def forward(self, vision_features, text_features):
        """
        Forward pass through multi-modal fusion.
        
        Args:
            vision_features: Vision feature vectors
            text_features: Text feature vectors
            
        Returns:
            Dictionary containing all predictions and intermediate features
        """
        # Cross-attention fusion
        vision_attended, text_attended, attention_weights = self.cross_attention_fusion(
            vision_features, text_features
        )
        
        # Project to fusion dimension
        vision_proj = self.vision_proj(vision_attended)
        text_proj = self.text_proj(text_attended)
        
        # Concatenate and process through fusion layers
        fused_features = torch.cat([vision_proj, text_proj], dim=1)
        fusion_output = self.fusion_layers(fused_features)
        
        # Multi-task predictions
        content_logits = self.content_classifier(fusion_output)
        sentiment_logits = self.sentiment_head(fusion_output)
        topic_logits = self.topic_head(fusion_output)
        confidence_scores = self.confidence_head(fusion_output)
        
        # Apply appropriate activations
        content_probs = F.softmax(content_logits, dim=-1)
        sentiment_probs = F.softmax(sentiment_logits, dim=-1)
        topic_probs = F.softmax(topic_logits, dim=-1)
        
        return {
            'content_score': content_probs,
            'content_logits': content_logits,
            'sentiment': sentiment_probs,
            'sentiment_logits': sentiment_logits,
            'topic': topic_probs,
            'topic_logits': topic_logits,
            'confidence': confidence_scores,
            'fused_features': fusion_output,
            'attention_weights': attention_weights,
            'intermediate_features': {
                'vision_projected': vision_proj,
                'text_projected': text_proj,
                'vision_attended': vision_attended,
                'text_attended': text_attended
            }
        }

# Test multi-modal fusion
print("\n🔗 Testing Multi-Modal Fusion...")
fusion_module = MultiModalFusion(vision_dim=512, text_dim=512, fusion_dim=256, num_classes=3)

# Create test features
test_vision_features = torch.randn(4, 512)
test_text_features = torch.randn(4, 512)

with torch.no_grad():
    fusion_outputs = fusion_module(test_vision_features, test_text_features)

print(f"  Vision features shape: {test_vision_features.shape}")
print(f"  Text features shape: {test_text_features.shape}")
print(f"  Content predictions shape: {fusion_outputs['content_score'].shape}")
print(f"  Sentiment predictions shape: {fusion_outputs['sentiment'].shape}")
print(f"  Topic predictions shape: {fusion_outputs['topic'].shape}")
print(f"  Confidence scores shape: {fusion_outputs['confidence'].shape}")
print(f"  Fused features shape: {fusion_outputs['fused_features'].shape}")

# Verify probability distributions
print(f"  Content probs sum: {fusion_outputs['content_score'].sum(dim=1).mean():.4f} (should be ~1.0)")
print(f"  Sentiment probs sum: {fusion_outputs['sentiment'].sum(dim=1).mean():.4f} (should be ~1.0)")
print(f"  Topic probs sum: {fusion_outputs['topic'].sum(dim=1).mean():.4f} (should be ~1.0)")

# Count parameters
total_params = sum(p.numel() for p in fusion_module.parameters())
print(f"  Total parameters: {total_params:,}")
print(f"  ✅ Multi-modal fusion working correctly")
```

---

## 3. Complete Intelligent Content Analyzer

### 3.1 Integrated Multi-Modal System

```python
class IntelligentContentAnalyzer(nn.Module):
    """
    Complete multi-modal content analysis system integrating vision, text, and fusion components.
    
    This is the main model that combines all components for end-to-end content understanding.
    It processes images and text simultaneously to provide comprehensive content analysis.
    
    Args:
        vocab_size (int): Size of text vocabulary
        num_content_classes (int): Number of content classification classes
        vision_dim (int): Dimension of vision features
        text_dim (int): Dimension of text features
        fusion_dim (int): Dimension of fusion features
    """
    
    def __init__(self, vocab_size: int = 10000, num_content_classes: int = 3,
                 vision_dim: int = 512, text_dim: int = 512, fusion_dim: int = 256):
        super().__init__()
        
        # Store configuration
        self.config = {
            'vocab_size': vocab_size,
            'num_content_classes': num_content_classes,
            'vision_dim': vision_dim,
            'text_dim': text_dim,
            'fusion_dim': fusion_dim
        }
        
        # Component models
        self.vision_encoder = VisionEncoder(output_dim=vision_dim)
        self.text_encoder = TextEncoder(vocab_size=vocab_size, embed_dim=text_dim)
        self.multimodal_fusion = MultiModalFusion(
            vision_dim=vision_dim,
            text_dim=text_dim,
            fusion_dim=fusion_dim,
            num_classes=num_content_classes
        )
        
        # Model metadata
        self.model_version = "1.0.0"
        self.creation_time = datetime.now().isoformat()
        self.training_step = 0
        
        # Performance tracking
        self.inference_stats = {
            'total_inferences': 0,
            'avg_inference_time': 0.0,
            'last_inference_time': None
        }
        
    def forward(self, images, input_ids, attention_mask=None, return_attention=False):
        """
        Forward pass through the complete multi-modal system.
        
        Args:
            images: Input images tensor of shape (batch_size, 3, H, W)
            input_ids: Text token IDs of shape (batch_size, seq_len)
            attention_mask: Text attention mask of shape (batch_size, seq_len)
            return_attention: Whether to return attention weights for visualization
            
        Returns:
            outputs: Dictionary containing all predictions and features
        """
        start_time = time.time()
        
        # Encode vision features
        vision_features, vision_attention = self.vision_encoder(images)
        
        # Encode text features
        text_features, text_sequence = self.text_encoder(input_ids, attention_mask)
        
        # Multi-modal fusion
        fusion_outputs = self.multimodal_fusion(vision_features, text_features)
        
        # Combine all outputs
        outputs = {
            **fusion_outputs,
            'vision_features': vision_features,
            'text_features': text_features,
            'text_sequence': text_sequence
        }
        
        # Add attention weights if requested
        if return_attention:
            outputs['vision_attention'] = vision_attention
            outputs['fusion_attention'] = fusion_outputs['attention_weights']
        
        # Update inference statistics
        inference_time = time.time() - start_time
        self._update_inference_stats(inference_time)
        
        return outputs
    
    def _update_inference_stats(self, inference_time):
        """Update inference performance statistics."""
        self.inference_stats['total_inferences'] += 1
        
        # Update average inference time using exponential moving average
        alpha = 0.1  # Smoothing factor
        if self.inference_stats['avg_inference_time'] == 0:
            self.inference_stats['avg_inference_time'] = inference_time
        else:
            self.inference_stats['avg_inference_time'] = (
                alpha * inference_time + 
                (1 - alpha) * self.inference_stats['avg_inference_time']
            )
        
        self.inference_stats['last_inference_time'] = inference_time
    
    def predict(self, images, input_ids, attention_mask=None, return_confidence=True):
        """
        High-level prediction method for inference.
        
        Args:
            images: Input images
            input_ids: Text token IDs
            attention_mask: Text attention mask
            return_confidence: Whether to return confidence scores
            
        Returns:
            predictions: Dictionary with predicted classes and scores
        """
        self.eval()
        
        with torch.no_grad():
            outputs = self.forward(images, input_ids, attention_mask)
            
            # Extract predictions
            content_pred = outputs['content_score'].argmax(dim=1)
            sentiment_pred = outputs['sentiment'].argmax(dim=1)
            topic_pred = outputs['topic'].argmax(dim=1)
            
            predictions = {
                'content_class': content_pred.cpu().numpy(),
                'sentiment_class': sentiment_pred.cpu().numpy(),
                'topic_class': topic_pred.cpu().numpy(),
                'content_probs': outputs['content_score'].cpu().numpy(),
                'sentiment_probs': outputs['sentiment'].cpu().numpy(),
                'topic_probs': outputs['topic'].cpu().numpy()
            }
            
            if return_confidence:
                predictions['confidence'] = outputs['confidence'].cpu().numpy()
            
        return predictions
    
    def get_model_info(self):
        """Get comprehensive model information and statistics."""
        total_params = sum(p.numel() for p in self.parameters())
        trainable_params = sum(p.numel() for p in self.parameters() if p.requires_grad)
        
        # Component parameter counts
        vision_params = sum(p.numel() for p in self.vision_encoder.parameters())
        text_params = sum(p.numel() for p in self.text_encoder.parameters())
        fusion_params = sum(p.numel() for p in self.multimodal_fusion.parameters())
        
        model_info = {
            'model_version': self.model_version,
            'creation_time': self.creation_time,
            'config': self.config,
            'parameters': {
                'total_parameters': total_params,
                'trainable_parameters': trainable_params,
                'vision_parameters': vision_params,
                'text_parameters': text_params,
                'fusion_parameters': fusion_params
            },
            'architecture': {
                'vision_encoder': 'ResNet50 + Attention',
                'text_encoder': 'Transformer Encoder (6 layers)',
                'fusion': 'Cross-Attention Multi-Modal Fusion',
                'tasks': ['content_classification', 'sentiment_analysis', 'topic_classification']
            },
            'performance': self.inference_stats,
            'training_step': self.training_step
        }
        
        return model_info
    
    def save_model(self, save_path, include_optimizer=False, optimizer_state=None):
        """
        Save model with comprehensive metadata.
        
        Args:
            save_path: Path to save the model
            include_optimizer: Whether to save optimizer state
            optimizer_state: Optimizer state dict if including
        """
        save_dict = {
            'model_state_dict': self.state_dict(),
            'model_info': self.get_model_info(),
            'config': self.config,
            'creation_time': self.creation_time,
            'save_time': datetime.now().isoformat()
        }
        
        if include_optimizer and optimizer_state:
            save_dict['optimizer_state_dict'] = optimizer_state
        
        torch.save(save_dict, save_path)
        print(f"💾 Model saved to: {save_path}")
    
    @classmethod
    def load_model(cls, load_path, device=None):
        """
        Load model from saved checkpoint.
        
        Args:
            load_path: Path to saved model
            device: Device to load model on
            
        Returns:
            model: Loaded model instance
            model_info: Model information
        """
        if device is None:
            device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        
        checkpoint = torch.load(load_path, map_location=device)
        
        # Extract configuration
        config = checkpoint.get('config', {})
        
        # Create model instance
        model = cls(**config)
        model.load_state_dict(checkpoint['model_state_dict'])
        model.to(device)
        
        # Restore metadata
        if 'model_info' in checkpoint:
            model.creation_time = checkpoint['model_info']['creation_time']
            model.training_step = checkpoint['model_info'].get('training_step', 0)
        
        print(f"📥 Model loaded from: {load_path}")
        return model, checkpoint.get('model_info', {})

# Initialize and test the complete system
print("\n🧠 Initializing Complete Intelligent Content Analyzer...")

# Create the complete model
vocab_size = 10000  # This would be determined from actual vocabulary
model = IntelligentContentAnalyzer(
    vocab_size=vocab_size,
    num_content_classes=3,
    vision_dim=512,
    text_dim=512,
    fusion_dim=256
)

# Get model information
model_info = model.get_model_info()

print(f"\n📊 Model Architecture Summary:")
print(f"  🏗️ Total parameters: {model_info['parameters']['total_parameters']:,}")
print(f"  🎯 Trainable parameters: {model_info['parameters']['trainable_parameters']:,}")
print(f"  👁️ Vision parameters: {model_info['parameters']['vision_parameters']:,}")
print(f"  📝 Text parameters: {model_info['parameters']['text_parameters']:,}")
print(f"  🔗 Fusion parameters: {model_info['parameters']['fusion_parameters']:,}")

print(f"\n🎯 Supported Tasks:")
for task in model_info['architecture']['tasks']:
    print(f"  • {task.replace('_', ' ').title()}")

# Test complete system with sample data
print(f"\n🧪 Testing Complete System...")

batch_size = 2
test_images = torch.randn(batch_size, 3, 224, 224)
test_input_ids = torch.randint(1, vocab_size, (batch_size, 128))
test_attention_mask = torch.ones(batch_size, 128)

# Test forward pass
model.eval()
with torch.no_grad():
    outputs = model(test_images, test_input_ids, test_attention_mask, return_attention=True)

print(f"  Input images shape: {test_images.shape}")
print(f"  Input text shape: {test_input_ids.shape}")
print(f"  Content predictions shape: {outputs['content_score'].shape}")
print(f"  Sentiment predictions shape: {outputs['sentiment'].shape}")
print(f"  Topic predictions shape: {outputs['topic'].shape}")
print(f"  Confidence scores shape: {outputs['confidence'].shape}")

# Test prediction method
predictions = model.predict(test_images, test_input_ids, test_attention_mask)
print(f"  Prediction method output keys: {list(predictions.keys())}")

print(f"  ✅ Complete system working correctly")
print(f"  ⚡ Average inference time: {model.inference_stats['avg_inference_time']:.4f}s")

# Save model for later use
model_save_path = capstone_dir / 'models' / 'intelligent_content_analyzer_v1.pth'
model.save_model(model_save_path)
```

---

## 4. Dataset Creation and Processing

### 4.1 Multi-Modal Dataset Implementation

```python
class ContentDataset(Dataset):
    """
    Comprehensive multi-modal dataset for content analysis training.
    
    This dataset handles image-text pairs with multiple annotation types
    including content scores, sentiment labels, and topic categories.
    
    Args:
        data_dir (Path): Directory containing dataset
        split (str): Dataset split ('train', 'val', 'test')
        transform: Image transformations
        max_text_length (int): Maximum text sequence length
        augment_text (bool): Whether to apply text augmentation
    """
    
    def __init__(self, data_dir: Path, split: str = "train", transform=None, 
                 max_text_length: int = 512, augment_text: bool = False):
        self.data_dir = data_dir
        self.split = split
        self.transform = transform
        self.max_text_length = max_text_length
        self.augment_text = augment_text
        
        # Create comprehensive synthetic dataset
        self.samples = self._create_comprehensive_dataset()
        
        # Build vocabulary from all text data
        self.vocab = self._build_robust_vocabulary()
        self.vocab_size = len(self.vocab)
        
        # Dataset statistics
        self.statistics = self._compute_dataset_statistics()
        
        print(f"📊 {split.title()} Dataset created:")
        print(f"  Samples: {len(self.samples)}")
        print(f"  Vocabulary size: {self.vocab_size}")
        print(f"  Average text length: {self.statistics['avg_text_length']:.1f}")
        
    def _create_comprehensive_dataset(self) -> List[Dict[str, Any]]:
        """Create a comprehensive synthetic dataset with realistic diversity."""
        
        # Expanded content templates for better diversity
        content_templates = {
            'positive': [
                "This {product} is absolutely amazing! The {feature} works perfectly and {benefit}.",
                "Incredible {product} with outstanding {feature}. Highly recommend for {use_case}!",
                "Love this {product}! The {feature} exceeded my expectations and {benefit}.",
                "Fantastic {product} that delivers on all promises. {feature} is revolutionary!",
                "Best {product} I've ever used. The {feature} is game-changing and {benefit}.",
                "Exceptional quality {product} with superior {feature}. Perfect for {use_case}!",
                "Outstanding {product} that combines {feature} with {benefit}. Five stars!",
                "This {product} is a masterpiece. The {feature} is innovative and {benefit}."
            ],
            'negative': [
                "Terrible {product} with poor {feature}. {complaint} and not worth the money.",
                "Disappointing {product} that fails to deliver. {feature} is broken and {complaint}.",
                "Worst {product} experience ever. {feature} doesn't work and {complaint}.",
                "Poor quality {product} with defective {feature}. {complaint} and frustrating.",
                "Overpriced {product} with inadequate {feature}. {complaint} and unreliable.",
                "Faulty {product} that breaks easily. {feature} is useless and {complaint}.",
                "Horrible {product} with terrible {feature}. {complaint} and poor service.",
                "Waste of money on this {product}. {feature} failed immediately and {complaint}."
            ],
            'neutral': [
                "This {product} has decent {feature} but could be improved. {observation}.",
                "Average {product} with standard {feature}. Works as expected for {use_case}.",
                "The {product} is okay. {feature} is functional but {observation}.",
                "Standard {product} with basic {feature}. {observation} and reasonably priced.",
                "This {product} meets basic requirements. {feature} is adequate for {use_case}.",
                "Regular {product} with typical {feature}. {observation} but nothing special.",
                "The {product} is fine for {use_case}. {feature} works but {observation}.",
                "Moderate quality {product} with standard {feature}. {observation}."
            ]
        }
        
        # Vocabulary for template filling
        products = ['smartphone', 'laptop', 'camera', 'headphones', 'tablet', 'watch', 
                   'speaker', 'keyboard', 'mouse', 'monitor', 'printer', 'router']
        
        features = ['battery life', 'display quality', 'sound quality', 'build quality',
                   'performance', 'design', 'connectivity', 'user interface', 'durability',
                   'functionality', 'compatibility', 'ease of use']
        
        benefits = ['saves time daily', 'improves productivity', 'enhances experience',
                   'provides great value', 'offers convenience', 'delivers reliability',
                   'ensures satisfaction', 'meets all needs', 'exceeds expectations']
        
        complaints = ['breaks easily', 'battery drains quickly', 'overheats frequently',
                     'has connectivity issues', 'lacks important features', 'poor customer support',
                     'delivery was delayed', 'instructions are unclear', 'hardware is defective']
        
        observations = ['nothing extraordinary', 'room for improvement', 'meets basic needs',
                       'standard for the price', 'could have more features', 'acceptable quality',
                       'depends on personal preference', 'adequate for most users']
        
        use_cases = ['professional work', 'daily use', 'creative projects', 'entertainment',
                    'business meetings', 'travel', 'home office', 'gaming', 'education']
        
        # Topic categories with descriptions
        topics = {
            'technology': 'Tech products and innovations',
            'fashion': 'Clothing and style items',
            'food': 'Restaurants and culinary experiences',
            'travel': 'Tourism and travel experiences',
            'sports': 'Athletic equipment and events',
            'education': 'Learning tools and courses',
            'entertainment': 'Movies, games, and shows',
            'health': 'Wellness and medical products',
            'business': 'Professional services and tools',
            'lifestyle': 'Home and personal items'
        }
        
        # Determine dataset size based on split
        base_size = 1000
        if self.split == 'train':
            dataset_size = base_size
        elif self.split == 'val':
            dataset_size = base_size // 5
        else:  # test
            dataset_size = base_size // 10
        
        samples = []
        
        for i in range(dataset_size):
            # Randomly select content type with slight imbalance
            content_weights = [0.4, 0.3, 0.3]  # positive, negative, neutral
            content_type = np.random.choice([0, 1, 2], p=content_weights)
            content_labels = ['positive', 'negative', 'neutral']
            content_label = content_labels[content_type]
            
            # Generate text using templates
            template = np.random.choice(content_templates[content_label])
            
            # Fill template with random choices
            text = template.format(
                product=np.random.choice(products),
                feature=np.random.choice(features),
                benefit=np.random.choice(benefits),
                complaint=np.random.choice(complaints),
                observation=np.random.choice(observations),
                use_case=np.random.choice(use_cases)
            )
            
            # Add sample-specific variation
            text += f" Sample {i} from {self.split} set."
            
            # Apply text augmentation if enabled (for training set)
            if self.augment_text and self.split == 'train' and np.random.random() < 0.3:
                text = self._augment_text(text)
            
            # Create sentiment vector (one-hot)
            sentiment = [0.0, 0.0, 0.0]
            sentiment[content_type] = 1.0
            
            # Random topic assignment with some correlation to content
            topic_names = list(topics.keys())
            if content_label == 'positive' and np.random.random() < 0.6:
                # Positive content more likely to be tech/lifestyle
                topic_name = np.random.choice(['technology', 'lifestyle', 'entertainment'])
            elif content_label == 'negative' and np.random.random() < 0.6:
                # Negative content more spread across categories
                topic_name = np.random.choice(topic_names)
            else:
                topic_name = np.random.choice(topic_names)
            
            topic_idx = topic_names.index(topic_name)
            topic_vector = [0.0] * len(topic_names)
            topic_vector[topic_idx] = 1.0
            
            # Generate synthetic image (in practice, load real images)
            # Create slightly different patterns based on content type
            if content_type == 0:  # positive
                image_data = torch.randn(3, 224, 224) * 0.5 + 0.7  # Brighter images
            elif content_type == 1:  # negative
                image_data = torch.randn(3, 224, 224) * 0.3 + 0.3  # Darker images
            else:  # neutral
                image_data = torch.randn(3, 224, 224) * 0.4 + 0.5  # Medium brightness
            
            # Ensure proper image range
            image_data = torch.clamp(image_data, 0, 1)
            
            sample = {
                'text': text,
                'image': image_data,
                'content_score': content_type,
                'content_label': content_label,
                'sentiment': sentiment,
                'topic': topic_vector,
                'topic_name': topic_name,
                'topic_description': topics[topic_name],
                'sample_id': f"{self.split}_{i:06d}",
                'text_length': len(text.split()),
                'split': self.split
            }
            samples.append(sample)
        
        return samples
    
    def _augment_text(self, text: str) -> str:
        """Apply simple text augmentation techniques."""
        augmentation_techniques = [
            lambda x: x.replace('.', '!'),  # Change punctuation
            lambda x: x.replace(' and ', ' & '),  # Abbreviate conjunctions
            lambda x: x.replace('very ', ''),  # Remove intensifiers
            lambda x: x.replace('really ', ''),  # Remove filler words
            lambda x: x + ' Overall great experience.',  # Add conclusion
        ]
        
        # Apply random augmentation
        technique = np.random.choice(augmentation_techniques)
        return technique(text)
    
    def _build_robust_vocabulary(self) -> Dict[str, int]:
        """Build a comprehensive vocabulary from all text data."""
        vocab = {
            '<PAD>': 0, '<UNK>': 1, '<START>': 2, '<END>': 3,
            '<MASK>': 4, '<NUM>': 5, '<PUNCT>': 6
        }
        
        # Collect all words from samples
        word_counts = Counter()
        for sample in self.samples:
            # Preprocess text
            text = sample['text'].lower()
            # Replace numbers with special token
            text = re.sub(r'\d+', '<NUM>', text)
            # Extract words
            words = re.findall(r'\b\w+\b', text)
            word_counts.update(words)
        
        # Add words to vocabulary (keep most frequent words)
        min_frequency = 2 if self.split == 'train' else 1
        for word, count in word_counts.most_common():
            if count >= min_frequency:
                vocab[word] = len(vocab)
        
        # Add special domain vocabulary
        domain_words = [
            'excellent', 'amazing', 'fantastic', 'terrible', 'horrible', 'poor',
            'quality', 'product', 'service', 'price', 'delivery', 'customer',
            'support', 'recommend', 'satisfaction', 'experience', 'performance'
        ]
        
        for word in domain_words:
            if word not in vocab:
                vocab[word] = len(vocab)
        
        return vocab
    
    def _compute_dataset_statistics(self) -> Dict[str, float]:
        """Compute comprehensive dataset statistics."""
        text_lengths = [sample['text_length'] for sample in self.samples]
        content_distribution = Counter(sample['content_label'] for sample in self.samples)
        topic_distribution = Counter(sample['topic_name'] for sample in self.samples)
        
        stats = {
            'avg_text_length': np.mean(text_lengths),
            'median_text_length': np.median(text_lengths),
            'std_text_length': np.std(text_lengths),
            'min_text_length': np.min(text_lengths),
            'max_text_length': np.max(text_lengths),
            'content_distribution': dict(content_distribution),
            'topic_distribution': dict(topic_distribution),
            'vocabulary_size': self.vocab_size,
            'total_samples': len(self.samples)
        }
        
        return stats
    
    def _preprocess_text(self, text: str) -> str:
        """Preprocess text for tokenization."""
        # Convert to lowercase
        text = text.lower()
        
        # Replace numbers with special token
        text = re.sub(r'\d+', '<NUM>', text)
        
        # Handle punctuation
        text = re.sub(r'[^\w\s]', '<PUNCT>', text)
        
        # Clean extra whitespace
        text = ' '.join(text.split())
        
        return text
    
    def _tokenize_text(self, text: str) -> Tuple[List[int], List[int]]:
        """Tokenize text and create attention mask with robust preprocessing."""
        # Preprocess text
        processed_text = self._preprocess_text(text)
        words = processed_text.split()
        
        # Convert to token IDs
        token_ids = []
        for word in words:
            if word in self.vocab:
                token_ids.append(self.vocab[word])
            else:
                token_ids.append(self.vocab['<UNK>'])
        
        # Truncate or pad
        if len(token_ids) > self.max_text_length - 2:
            token_ids = token_ids[:self.max_text_length - 2]
        
        # Add start and end tokens
        token_ids = [self.vocab['<START>']] + token_ids + [self.vocab['<END>']]
        
        # Create attention mask (1 for real tokens, 0 for padding)
        attention_mask = [1] * len(token_ids)
        
        # Pad to max length
        while len(token_ids) < self.max_text_length:
            token_ids.append(self.vocab['<PAD>'])
            attention_mask.append(0)
        
        return token_ids, attention_mask
    
    def get_class_weights(self) -> torch.Tensor:
        """Calculate class weights for balanced training."""
        content_counts = Counter(sample['content_score'] for sample in self.samples)
        total_samples = len(self.samples)
        
        # Calculate inverse frequency weights
        weights = []
        for i in range(3):  # 3 content classes
            count = content_counts.get(i, 1)
            weight = total_samples / (3 * count)
            weights.append(weight)
        
        return torch.tensor(weights, dtype=torch.float32)
    
    def get_dataset_info(self) -> Dict[str, Any]:
        """Get comprehensive dataset information."""
        return {
            'split': self.split,
            'size': len(self.samples),
            'vocab_size': self.vocab_size,
            'max_text_length': self.max_text_length,
            'statistics': self.statistics,
            'class_names': {
                'content': ['positive', 'negative', 'neutral'],
                'sentiment': ['positive', 'negative', 'neutral'],
                'topics': list(set(sample['topic_name'] for sample in self.samples))
            }
        }
    
    def __len__(self):
        return len(self.samples)
    
    def __getitem__(self, idx):
        sample = self.samples[idx]
        
        # Process image
        image = sample['image']
        if self.transform:
            # Convert to PIL Image for transforms
            if isinstance(image, torch.Tensor):
                image_np = image.permute(1, 2, 0).numpy()
                image_pil = Image.fromarray((image_np * 255).astype(np.uint8))
                image = self.transform(image_pil)
        
        # Process text
        token_ids, attention_mask = self._tokenize_text(sample['text'])
        
        return {
            'image': image,
            'input_ids': torch.tensor(token_ids, dtype=torch.long),
            'attention_mask': torch.tensor(attention_mask, dtype=torch.long),
            'content_score': torch.tensor(sample['content_score'], dtype=torch.long),
            'sentiment': torch.tensor(sample['sentiment'], dtype=torch.float),
            'topic': torch.tensor(sample['topic'], dtype=torch.float),
            'text': sample['text'],
            'topic_name': sample['topic_name'],
            'sample_id': sample['sample_id'],
            'content_label': sample['content_label']
        }

### 4.2 Data Loader Creation and Analysis

```python
def create_comprehensive_data_loaders(data_dir: Path, batch_size: int = 32, 
                                    num_workers: int = 4) -> Tuple[DataLoader, DataLoader, DataLoader, int]:
    """Create comprehensive data loaders with advanced transforms and analysis."""
    
    # Advanced image transformations for training
    train_transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.RandomHorizontalFlip(p=0.5),
        transforms.RandomRotation(degrees=15),
        transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),
        transforms.RandomAffine(degrees=0, translate=(0.1, 0.1)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    
    # Standard transforms for validation and testing
    val_transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    
    # Create datasets with different configurations
    print("  Creating training dataset with augmentation...")
    train_dataset = ContentDataset(
        data_dir, split="train", transform=train_transform, 
        max_text_length=256, augment_text=True
    )
    
    print("  Creating validation dataset...")
    val_dataset = ContentDataset(
        data_dir, split="val", transform=val_transform, 
        max_text_length=256, augment_text=False
    )
    
    print("  Creating test dataset...")
    test_dataset = ContentDataset(
        data_dir, split="test", transform=val_transform, 
        max_text_length=256, augment_text=False
    )
    
    # Ensure all datasets use the same vocabulary (from training set)
    val_dataset.vocab = train_dataset.vocab
    val_dataset.vocab_size = train_dataset.vocab_size
    test_dataset.vocab = train_dataset.vocab
    test_dataset.vocab_size = train_dataset.vocab_size
    
    # Create data loaders with optimized settings
    train_loader = DataLoader(
        train_dataset, 
        batch_size=batch_size, 
        shuffle=True, 
        num_workers=num_workers, 
        pin_memory=True, 
        drop_last=True,
        persistent_workers=True if num_workers > 0 else False
    )
    
    val_loader = DataLoader(
        val_dataset, 
        batch_size=batch_size, 
        shuffle=False,
        num_workers=num_workers, 
        pin_memory=True,
        persistent_workers=True if num_workers > 0 else False
    )
    
    test_loader = DataLoader(
        test_dataset, 
        batch_size=batch_size, 
        shuffle=False,
        num_workers=num_workers, 
        pin_memory=True,
        persistent_workers=True if num_workers > 0 else False
    )
    
    # Print comprehensive dataset information
    print(f"\n📈 Dataset Statistics:")
    for name, dataset in [("Train", train_dataset), ("Val", val_dataset), ("Test", test_dataset)]:
        info = dataset.get_dataset_info()
        stats = info['statistics']
        print(f"  {name} Dataset:")
        print(f"    📊 Size: {info['size']} samples")
        print(f"    📝 Avg text length: {stats['avg_text_length']:.1f} words")
        print(f"    🎯 Content distribution: {stats['content_distribution']}")
        print(f"    📚 Topic distribution: {len(stats['topic_distribution'])} categories")
    
    return train_loader, val_loader, test_loader, train_dataset.vocab_size

# Demonstration: Create and analyze datasets
print("\n📊 Creating Multi-Modal Content Datasets...")

# Create data loaders
data_dir = capstone_dir / 'data'
data_dir.mkdir(exist_ok=True)

train_loader, val_loader, test_loader, vocab_size = create_comprehensive_data_loaders(
    data_dir, batch_size=16, num_workers=2
)

print(f"\n✅ Data loaders created successfully!")
print(f"  🔤 Vocabulary size: {vocab_size:,}")
print(f"  📦 Batch size: {train_loader.batch_size}")
print(f"  🔄 Training batches: {len(train_loader)}")
print(f"  📊 Validation batches: {len(val_loader)}")
print(f"  🧪 Test batches: {len(test_loader)}")

# Analyze a sample batch
print(f"\n🔍 Analyzing Sample Batch...")
sample_batch = next(iter(train_loader))

print(f"  Image batch shape: {sample_batch['image'].shape}")
print(f"  Input IDs shape: {sample_batch['input_ids'].shape}")
print(f"  Attention mask shape: {sample_batch['attention_mask'].shape}")
print(f"  Content scores shape: {sample_batch['content_score'].shape}")
print(f"  Sentiment shape: {sample_batch['sentiment'].shape}")
print(f"  Topic shape: {sample_batch['topic'].shape}")

# Display sample content
print(f"\n📝 Sample Content:")
print(f"  Text: '{sample_batch['text'][0][:100]}...'")
print(f"  Content label: {sample_batch['content_label'][0]}")
print(f"  Topic: {sample_batch['topic_name'][0]}")
print(f"  Sample ID: {sample_batch['sample_id'][0]}")

# Save dataset metadata
dataset_metadata = {
    'creation_time': datetime.now().isoformat(),
    'vocab_size': vocab_size,
    'splits': {
        'train': len(train_loader.dataset),
        'val': len(val_loader.dataset),
        'test': len(test_loader.dataset)
    },
    'batch_size': train_loader.batch_size,
    'max_text_length': 256,
    'image_size': (224, 224),
    'num_content_classes': 3,
    'num_sentiment_classes': 3,
    'num_topic_classes': 10
}

with open(capstone_dir / 'data' / 'dataset_metadata.json', 'w') as f:
    json.dump(dataset_metadata, f, indent=2)

print(f"💾 Dataset metadata saved to: {capstone_dir / 'data' / 'dataset_metadata.json'}")
```

---

## 5. Advanced Training Framework

### 5.1 Multi-Task Loss with Automatic Weighting

```python
class MultiTaskLoss(nn.Module):
    """
    Advanced multi-task loss function with automatic task weighting.
    
    This loss function balances multiple tasks by learning optimal weights
    based on task uncertainty, as described in "Multi-Task Learning Using 
    Uncertainty to Weigh Losses for Scene Geometry and Semantics" (Kendall et al.).
    
    Args:
        num_tasks (int): Number of tasks to balance
        learn_weights (bool): Whether to learn task weights automatically
        init_weights (List[float]): Initial weights for tasks if not learning
    """
    
    def __init__(self, num_tasks: int = 3, learn_weights: bool = True, 
                 init_weights: Optional[List[float]] = None):
        super().__init__()
        self.num_tasks = num_tasks
        self.learn_weights = learn_weights
        
        if learn_weights:
            # Learnable log variance parameters for uncertainty-based weighting
            self.log_vars = nn.Parameter(torch.zeros(num_tasks, requires_grad=True))
        else:
            # Fixed weights
            if init_weights is None:
                init_weights = [1.0] * num_tasks
            self.register_buffer('log_vars', torch.log(torch.tensor(init_weights)))
        
        # Loss functions for each task
        self.content_loss_fn = nn.CrossEntropyLoss()
        self.sentiment_loss_fn = nn.MSELoss()
        self.topic_loss_fn = nn.MSELoss()
        
        # Track loss history for analysis
        self.loss_history = {
            'content': [],
            'sentiment': [],
            'topic': [],
            'total': [],
            'weights': []
        }
    
    def forward(self, content_pred, sentiment_pred, topic_pred,
                content_target, sentiment_target, topic_target):
        """
        Compute multi-task loss with automatic weighting.
        
        Args:
            content_pred: Content classification predictions
            sentiment_pred: Sentiment analysis predictions  
            topic_pred: Topic classification predictions
            content_target: Content ground truth labels
            sentiment_target: Sentiment ground truth labels
            topic_target: Topic ground truth labels
            
        Returns:
            loss_dict: Dictionary containing individual and total losses
        """
        # Compute individual task losses
        content_loss = self.content_loss_fn(content_pred, content_target)
        sentiment_loss = self.sentiment_loss_fn(sentiment_pred, sentiment_target)
        topic_loss = self.topic_loss_fn(topic_pred, topic_target)
        
        # Stack losses for processing
        losses = torch.stack([content_loss, sentiment_loss, topic_loss])
        
        if self.learn_weights:
            # Uncertainty-based automatic weighting
            # loss = (1/2σ²) * L + log(σ²)
            precision = torch.exp(-self.log_vars)  # 1/σ²
            weighted_losses = precision * losses + self.log_vars
            total_loss = weighted_losses.sum()
            
            # Current task weights (higher precision = higher weight)
            current_weights = precision / precision.sum()
        else:
            # Fixed weighting
            weights = torch.exp(-self.log_vars)
            weights = weights / weights.sum()  # Normalize
            total_loss = (weights * losses).sum()
            current_weights = weights
        
        # Update loss history
        self.loss_history['content'].append(content_loss.item())
        self.loss_history['sentiment'].append(sentiment_loss.item())
        self.loss_history['topic'].append(topic_loss.item())
        self.loss_history['total'].append(total_loss.item())
        self.loss_history['weights'].append(current_weights.detach().cpu().numpy())
        
        # Prepare output dictionary
        loss_dict = {
            'total_loss': total_loss,
            'content_loss': content_loss,
            'sentiment_loss': sentiment_loss,
            'topic_loss': topic_loss,
            'task_weights': current_weights,
            'log_vars': self.log_vars.detach() if self.learn_weights else None
        }
        
        return loss_dict
    
    def get_loss_summary(self):
        """Get summary statistics of loss history."""
        if not self.loss_history['total']:
            return None
        
        summary = {}
        for task, losses in self.loss_history.items():
            if task != 'weights' and losses:
                summary[task] = {
                    'current': losses[-1],
                    'mean': np.mean(losses[-100:]),  # Last 100 steps
                    'std': np.std(losses[-100:]),
                    'min': min(losses),
                    'max': max(losses)
                }
        
        # Average weights over last 100 steps
        if self.loss_history['weights']:
            recent_weights = np.array(self.loss_history['weights'][-100:])
            summary['avg_weights'] = {
                'content': np.mean(recent_weights[:, 0]),
                'sentiment': np.mean(recent_weights[:, 1]),
                'topic': np.mean(recent_weights[:, 2])
            }
        
        return summary

# Test multi-task loss
print("\n⚖️ Testing Multi-Task Loss Function...")
multi_task_loss = MultiTaskLoss(num_tasks=3, learn_weights=True)

# Create test predictions and targets
batch_size = 4
test_content_pred = torch.randn(batch_size, 3)
test_sentiment_pred = torch.randn(batch_size, 3)
test_topic_pred = torch.randn(batch_size, 10)
test_content_target = torch.randint(0, 3, (batch_size,))
test_sentiment_target = torch.randn(batch_size, 3)
test_topic_target = torch.randn(batch_size, 10)

# Test loss computation
loss_output = multi_task_loss(
    test_content_pred, test_sentiment_pred, test_topic_pred,
    test_content_target, test_sentiment_target, test_topic_target
)

print(f"  Total loss: {loss_output['total_loss'].item():.4f}")
print(f"  Content loss: {loss_output['content_loss'].item():.4f}")
print(f"  Sentiment loss: {loss_output['sentiment_loss'].item():.4f}")
print(f"  Topic loss: {loss_output['topic_loss'].item():.4f}")
print(f"  Task weights: {loss_output['task_weights'].numpy()}")
print(f"  ✅ Multi-task loss working correctly")
```

### 5.2 Advanced Training Framework

```python
class AdvancedTrainer:
    """
    Comprehensive training framework with modern deep learning techniques.
    
    This trainer implements state-of-the-art training strategies including:
    - Mixed precision training
    - Gradient accumulation
    - Learning rate scheduling
    - Early stopping with patience
    - Comprehensive logging and monitoring
    - Model checkpointing
    - Evaluation metrics tracking
    
    Args:
        model: The model to train
        train_loader: Training data loader
        val_loader: Validation data loader
        test_loader: Test data loader
        experiment_dir: Directory for experiment artifacts
        config: Training configuration dictionary
    """
    
    def __init__(self, model, train_loader, val_loader, test_loader, 
                 experiment_dir: Path, config: Dict[str, Any]):
        self.model = model.to(device)
        self.train_loader = train_loader
        self.val_loader = val_loader
        self.test_loader = test_loader
        self.experiment_dir = experiment_dir
        self.config = config
        
        # Initialize loss function
        self.criterion = MultiTaskLoss(
            num_tasks=3, 
            learn_weights=config.get('learn_task_weights', True)
        )
        
        # Setup optimizers with different learning rates for different components
        self._setup_optimizers()
        
        # Setup learning rate schedulers
        self._setup_schedulers()
        
        # Mixed precision training
        self.use_amp = config.get('mixed_precision', torch.cuda.is_available())
        self.scaler = torch.cuda.amp.GradScaler() if self.use_amp else None
        
        # Gradient accumulation
        self.accumulation_steps = config.get('gradient_accumulation_steps', 1)
        
        # Early stopping
        self.best_val_loss = float('inf')
        self.patience = config.get('patience', 10)
        self.patience_counter = 0
        self.min_delta = config.get('min_delta', 1e-4)
        
        # Training tracking
        self.epoch = 0
        self.global_step = 0
        self.train_history = []
        self.val_history = []
        
        # Setup comprehensive logging
        self.logger = self._setup_logging()
        
        # Metrics tracking
        self.metrics_tracker = {
            'train_loss': [],
            'val_loss': [],
            'content_accuracy': [],
            'sentiment_mse': [],
            'topic_mse': [],
            'learning_rates': [],
            'task_weights': []
        }
        
        print(f"🏋️ Advanced Trainer initialized:")
        print(f"  Mixed precision: {self.use_amp}")
        print(f"  Gradient accumulation: {self.accumulation_steps} steps")
        print(f"  Early stopping patience: {self.patience}")
        print(f"  Experiment directory: {experiment_dir}")
        
    def _setup_optimizers(self):
        """Setup optimizers with component-specific learning rates."""
        # Get component parameters
        vision_params = list(self.model.vision_encoder.parameters())
        text_params = list(self.model.text_encoder.parameters())
        fusion_params = list(self.model.multimodal_fusion.parameters())
        
        # Create optimizer with different learning rates
        self.optimizer = optim.AdamW([
            {
                'params': vision_params, 
                'lr': self.config['vision_lr'], 
                'weight_decay': self.config.get('vision_weight_decay', 0.01),
                'name': 'vision'
            },
            {
                'params': text_params, 
                'lr': self.config['text_lr'], 
                'weight_decay': self.config.get('text_weight_decay', 0.01),
                'name': 'text'
            },
            {
                'params': fusion_params, 
                'lr': self.config['fusion_lr'], 
                'weight_decay': self.config.get('fusion_weight_decay', 0.005),
                'name': 'fusion'
            }
        ], eps=1e-8, betas=(0.9, 0.999))
        
        print(f"  Vision LR: {self.config['vision_lr']}")
        print(f"  Text LR: {self.config['text_lr']}")
        print(f"  Fusion LR: {self.config['fusion_lr']}")
    
    def _setup_schedulers(self):
        """Setup learning rate schedulers."""
        scheduler_type = self.config.get('scheduler', 'cosine_annealing')
        
        if scheduler_type == 'cosine_annealing':
            self.scheduler = optim.lr_scheduler.CosineAnnealingWarmRestarts(
                self.optimizer, 
                T_0=self.config.get('T_0', 10), 
                T_mult=self.config.get('T_mult', 2), 
                eta_min=self.config.get('eta_min', 1e-6)
            )
        elif scheduler_type == 'reduce_on_plateau':
            self.scheduler = optim.lr_scheduler.ReduceLROnPlateau(
                self.optimizer, 
                mode='min', 
                factor=0.5, 
                patience=5, 
                verbose=True
            )
        else:
            self.scheduler = optim.lr_scheduler.StepLR(
                self.optimizer, 
                step_size=10, 
                gamma=0.1
            )
        
        print(f"  Scheduler: {scheduler_type}")
    
    def _setup_logging(self):
        """Setup comprehensive logging system."""
        logger = logging.getLogger(f'AdvancedTrainer_{id(self)}')
        logger.setLevel(logging.INFO)
        
        # Clear existing handlers
        logger.handlers = []
        
        # File handler
        log_file = self.experiment_dir / 'training.log'
        file_handler = logging.FileHandler(log_file)
        file_handler.setLevel(logging.INFO)
        
        # Console handler
        console_handler = logging.StreamHandler()
        console_handler.setLevel(logging.INFO)
        
        # Formatter
        formatter = logging.Formatter(
            '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
        )
        file_handler.setFormatter(formatter)
        console_handler.setFormatter(formatter)
        
        logger.addHandler(file_handler)
        logger.addHandler(console_handler)
        
        return logger
    
    def train_epoch(self, epoch: int) -> Dict[str, float]:
        """Train for one epoch with comprehensive metrics tracking."""
        self.model.train()
        self.epoch = epoch
        
        # Initialize epoch metrics
        epoch_metrics = {
            'total_loss': 0.0,
            'content_loss': 0.0,
            'sentiment_loss': 0.0,
            'topic_loss': 0.0,
            'content_acc': 0.0,
            'num_batches': 0,
            'num_samples': 0
        }
        
        # Setup progress bar
        progress_bar = tqdm(
            self.train_loader, 
            desc=f"Training Epoch {epoch}", 
            leave=False,
            dynamic_ncols=True
        )
        
        for batch_idx, batch in enumerate(progress_bar):
            try:
                # Move batch to device
                images = batch['image'].to(device, non_blocking=True)
                input_ids = batch['input_ids'].to(device, non_blocking=True)
                attention_mask = batch['attention_mask'].to(device, non_blocking=True)
                content_target = batch['content_score'].to(device, non_blocking=True)
                sentiment_target = batch['sentiment'].to(device, non_blocking=True)
                topic_target = batch['topic'].to(device, non_blocking=True)
                
                batch_size = images.size(0)
                
                # Forward pass with mixed precision
                if self.use_amp:
                    with torch.cuda.amp.autocast():
                        outputs = self.model(images, input_ids, attention_mask)
                        loss_dict = self.criterion(
                            outputs['content_score'], outputs['sentiment'], outputs['topic'],
                            content_target, sentiment_target, topic_target
                        )
                        # Scale loss for gradient accumulation
                        scaled_loss = loss_dict['total_loss'] / self.accumulation_steps
                else:
                    outputs = self.model(images, input_ids, attention_mask)
                    loss_dict = self.criterion(
                        outputs['content_score'], outputs['sentiment'], outputs['topic'],
                        content_target, sentiment_target, topic_target
                    )
                    scaled_loss = loss_dict['total_loss'] / self.accumulation_steps
                
                # Backward pass
                if self.use_amp:
                    self.scaler.scale(scaled_loss).backward()
                else:
                    scaled_loss.backward()
                
                # Gradient accumulation and optimization step
                if (batch_idx + 1) % self.accumulation_steps == 0:
                    if self.use_amp:
                        # Gradient clipping before scaler step
                        self.scaler.unscale_(self.optimizer)
                        torch.nn.utils.clip_grad_norm_(
                            self.model.parameters(), 
                            max_norm=self.config.get('max_grad_norm', 1.0)
                        )
                        self.scaler.step(self.optimizer)
                        self.scaler.update()
                    else:
                        torch.nn.utils.clip_grad_norm_(
                            self.model.parameters(), 
                            max_norm=self.config.get('max_grad_norm', 1.0)
                        )
                        self.optimizer.step()
                    
                    self.optimizer.zero_grad()
                    self.global_step += 1
                
                # Calculate metrics
                content_pred = outputs['content_score'].argmax(dim=1)
                content_acc = (content_pred == content_target).float().mean()
                
                # Update epoch metrics
                epoch_metrics['total_loss'] += loss_dict['total_loss'].item() * batch_size
                epoch_metrics['content_loss'] += loss_dict['content_loss'].item() * batch_size
                epoch_metrics['sentiment_loss'] += loss_dict['sentiment_loss'].item() * batch_size
                epoch_metrics['topic_loss'] += loss_dict['topic_loss'].item() * batch_size
                epoch_metrics['content_acc'] += content_acc.item() * batch_size
                epoch_metrics['num_batches'] += 1
                epoch_metrics['num_samples'] += batch_size
                
                # Update progress bar
                current_lr = self.optimizer.param_groups[0]['lr']
                progress_bar.set_postfix({
                    'Loss': f"{loss_dict['total_loss'].item():.4f}",
                    'Acc': f"{content_acc.item():.3f}",
                    'LR': f"{current_lr:.6f}",
                    'Step': self.global_step
                })
                
                # Log detailed metrics periodically
                if batch_idx % self.config.get('log_interval', 50) == 0:
                    self.logger.info(
                        f"Epoch {epoch}, Batch {batch_idx}/{len(self.train_loader)}: "
                        f"Loss={loss_dict['total_loss'].item():.4f}, "
                        f"Content_Loss={loss_dict['content_loss'].item():.4f}, "
                        f"Content_Acc={content_acc.item():.3f}, "
                        f"Task_Weights={loss_dict['task_weights'].cpu().numpy()}, "
                        f"LR={current_lr:.6f}, "
                        f"Step={self.global_step}"
                    )
                
            except Exception as e:
                self.logger.error(f"Error in batch {batch_idx}: {str(e)}")
                continue
        
        # Average metrics over all samples
        for key in epoch_metrics:
            if key not in ['num_batches', 'num_samples']:
                epoch_metrics[key] /= epoch_metrics['num_samples']
        
        # Learning rate scheduling (step-based)
        if hasattr(self.scheduler, 'step') and not isinstance(self.scheduler, optim.lr_scheduler.ReduceLROnPlateau):
            self.scheduler.step()
        
        return epoch_metrics
    
    def validate_epoch(self, epoch: int) -> Dict[str, float]:
        """Validate for one epoch with comprehensive evaluation."""
        self.model.eval()
        
        epoch_metrics = {
            'total_loss': 0.0,
            'content_loss': 0.0,
            'sentiment_loss': 0.0,
            'topic_loss': 0.0,
            'content_acc': 0.0,
            'num_samples': 0
        }
        
        # Collect predictions for detailed analysis
        all_content_preds = []
        all_content_targets = []
        all_sentiment_preds = []
        all_sentiment_targets = []
        
        with torch.no_grad():
            progress_bar = tqdm(
                self.val_loader, 
                desc=f"Validation Epoch {epoch}", 
                leave=False,
                dynamic_ncols=True
            )
            
            for batch in progress_bar:
                try:
                    # Move batch to device
                    images = batch['image'].to(device, non_blocking=True)
                    input_ids = batch['input_ids'].to(device, non_blocking=True)
                    attention_mask = batch['attention_mask'].to(device, non_blocking=True)
                    content_target = batch['content_score'].to(device, non_blocking=True)
                    sentiment_target = batch['sentiment'].to(device, non_blocking=True)
                    topic_target = batch['topic'].to(device, non_blocking=True)
                    
                    batch_size = images.size(0)
                    
                    # Forward pass
                    if self.use_amp:
                        with torch.cuda.amp.autocast():
                            outputs = self.model(images, input_ids, attention_mask)
                            loss_dict = self.criterion(
                                outputs['content_score'], outputs['sentiment'], outputs['topic'],
                                content_target, sentiment_target, topic_target
                            )
                    else:
                        outputs = self.model(images, input_ids, attention_mask)
                        loss_dict = self.criterion(
                            outputs['content_score'], outputs['sentiment'], outputs['topic'],
                            content_target, sentiment_target, topic_target
                        )
                    
                    # Calculate metrics
                    content_pred = outputs['content_score'].argmax(dim=1)
                    content_acc = (content_pred == content_target).float().mean()
                    
                    # Collect predictions
                    all_content_preds.extend(content_pred.cpu().numpy())
                    all_content_targets.extend(content_target.cpu().numpy())
                    all_sentiment_preds.extend(outputs['sentiment'].cpu().numpy())
                    all_sentiment_targets.extend(sentiment_target.cpu().numpy())
                    
                    # Update metrics
                    epoch_metrics['total_loss'] += loss_dict['total_loss'].item() * batch_size
                    epoch_metrics['content_loss'] += loss_dict['content_loss'].item() * batch_size
                    epoch_metrics['sentiment_loss'] += loss_dict['sentiment_loss'].item() * batch_size
                    epoch_metrics['topic_loss'] += loss_dict['topic_loss'].item() * batch_size
                    epoch_metrics['content_acc'] += content_acc.item() * batch_size
                    epoch_metrics['num_samples'] += batch_size
                    
                    # Update progress bar
                    progress_bar.set_postfix({
                        'Loss': f"{loss_dict['total_loss'].item():.4f}",
                        'Acc': f"{content_acc.item():.3f}"
                    })
                    
                except Exception as e:
                    self.logger.error(f"Error in validation batch: {str(e)}")
                    continue
        
        # Average metrics
        for key in epoch_metrics:
            if key != 'num_samples':
                epoch_metrics[key] /= epoch_metrics['num_samples']
        
        # Calculate detailed metrics
        if all_content_preds and all_content_targets:
            # Classification metrics
            precision, recall, f1, _ = precision_recall_fscore_support(
                all_content_targets, all_content_preds, average='macro', zero_division=0
            )
            
            epoch_metrics.update({
                'precision': precision,
                'recall': recall,
                'f1': f1
            })
            
            # Sentiment MSE (average across all dimensions)
            sentiment_mse = np.mean([
                np.mean((pred - target) ** 2) 
                for pred, target in zip(all_sentiment_preds, all_sentiment_targets)
            ])
            epoch_metrics['sentiment_mse'] = sentiment_mse
        
        # Learning rate scheduling (validation-based)
        if isinstance(self.scheduler, optim.lr_scheduler.ReduceLROnPlateau):
            self.scheduler.step(epoch_metrics['total_loss'])
        
        return epoch_metrics
    
    def train(self, num_epochs: int) -> Dict[str, Any]:
        """Complete training loop with comprehensive monitoring and checkpointing."""
        self.logger.info(f"Starting training for {num_epochs} epochs")
        self.logger.info(f"Model info: {self.model.get_model_info()}")
        self.logger.info(f"Training config: {self.config}")
        
        best_model_state = None
        training_start_time = time.time()
        
        try:
            for epoch in range(num_epochs):
                epoch_start_time = time.time()
                
                # Training phase
                train_metrics = self.train_epoch(epoch)
                
                # Validation phase
                val_metrics = self.validate_epoch(epoch)
                
                epoch_time = time.time() - epoch_start_time
                
                # Update metrics tracking
                self.train_history.append(train_metrics)
                self.val_history.append(val_metrics)
                
                # Track metrics for visualization
                self.metrics_tracker['train_loss'].append(train_metrics['total_loss'])
                self.metrics_tracker['val_loss'].append(val_metrics['total_loss'])
                self.metrics_tracker['content_accuracy'].append(val_metrics['content_acc'])
                self.metrics_tracker['learning_rates'].append(self.optimizer.param_groups[0]['lr'])
                
                # Log epoch results
                self.logger.info(
                    f"Epoch {epoch}/{num_epochs} completed in {epoch_time:.2f}s: "
                    f"Train_Loss={train_metrics['total_loss']:.4f}, "
                    f"Val_Loss={val_metrics['total_loss']:.4f}, "
                    f"Val_Acc={val_metrics['content_acc']:.3f}, "
                    f"Val_F1={val_metrics.get('f1', 0):.3f}, "
                    f"LR={self.optimizer.param_groups[0]['lr']:.6f}"
                )
                
                # Early stopping check
                improvement = self.best_val_loss - val_metrics['total_loss']
                if improvement > self.min_delta:
                    self.best_val_loss = val_metrics['total_loss']
                    self.patience_counter = 0
                    best_model_state = self.model.state_dict().copy()
                    
                    # Save best model checkpoint
                    self._save_checkpoint(epoch, 'best_model.pth', is_best=True)
                    self.logger.info(f"New best model saved with validation loss: {self.best_val_loss:.4f}")
                else:
                    self.patience_counter += 1
                    self.logger.info(f"No improvement for {self.patience_counter}/{self.patience} epochs")
                
                # Save regular checkpoint
                if epoch % self.config.get('checkpoint_interval', 10) == 0:
                    self._save_checkpoint(epoch, f'checkpoint_epoch_{epoch}.pth')
                
                # Early stopping
                if self.patience_counter >= self.patience:
                    self.logger.info(f"Early stopping triggered after {epoch + 1} epochs")
                    break
                
                # Memory cleanup
                if torch.cuda.is_available():
                    torch.cuda.empty_cache()
        
        except KeyboardInterrupt:
            self.logger.info("Training interrupted by user")
        except Exception as e:
            self.logger.error(f"Training error: {str(e)}")
            raise
        
        total_training_time = time.time() - training_start_time
        
        # Load best model for final evaluation
        if best_model_state:
            self.model.load_state_dict(best_model_state)
            self.logger.info("Loaded best model for final evaluation")
        
        # Final evaluation on test set
        test_metrics = self.evaluate_test_set()
        
        # Create comprehensive training summary
        training_summary = {
            'total_epochs': epoch + 1,
            'total_training_time': total_training_time,
            'best_val_loss': self.best_val_loss,
            'final_test_metrics': test_metrics,
            'train_history': self.train_history,
            'val_history': self.val_history,
            'metrics_tracker': self.metrics_tracker,
            'config': self.config,
            'model_info': self.model.get_model_info(),
            'loss_summary': self.criterion.get_loss_summary()
        }
        
        # Save comprehensive training summary
        summary_path = self.experiment_dir / 'training_summary.json'
        with open(summary_path, 'w') as f:
            json.dump(training_summary, f, indent=2, default=str)
        
        self.logger.info(f"Training completed in {total_training_time:.2f}s")
        self.logger.info(f"Training summary saved to: {summary_path}")
        
        return training_summary
    
    def _save_checkpoint(self, epoch: int, filename: str, is_best: bool = False):
        """Save comprehensive model checkpoint."""
        checkpoint = {
            'epoch': epoch,
            'global_step': self.global_step,
            'model_state_dict': self.model.state_dict(),
            'optimizer_state_dict': self.optimizer.state_dict(),
            'scheduler_state_dict': self.scheduler.state_dict() if self.scheduler else None,
            'best_val_loss': self.best_val_loss,
            'train_history': self.train_history,
            'val_history': self.val_history,
            'config': self.config,
            'model_info': self.model.get_model_info(),
            'loss_history': self.criterion.loss_history,
            'metrics_tracker': self.metrics_tracker,
            'is_best': is_best,
            'save_time': datetime.now().isoformat()
        }
        
        checkpoint_path = self.experiment_dir / filename
        torch.save(checkpoint, checkpoint_path)
        
        if is_best:
            self.logger.info(f"Best model checkpoint saved: {checkpoint_path}")
    
    def evaluate_test_set(self) -> Dict[str, float]:
        """Comprehensive evaluation on test set."""
        self.logger.info("Starting comprehensive test set evaluation...")
        self.model.eval()
        
        all_outputs = {
            'content_preds': [], 'content_targets': [], 'content_probs': [],
            'sentiment_preds': [], 'sentiment_targets': [],
            'topic_preds': [], 'topic_targets': [],
            'confidence_scores': []
        }
        
        test_loss = 0.0
        num_samples = 0
        
        with torch.no_grad():
            for batch in tqdm(self.test_loader, desc="Test Evaluation"):
                try:
                    # Move to device
                    images = batch['image'].to(device, non_blocking=True)
                    input_ids = batch['input_ids'].to(device, non_blocking=True)
                    attention_mask = batch['attention_mask'].to(device, non_blocking=True)
                    content_target = batch['content_score'].to(device, non_blocking=True)
                    sentiment_target = batch['sentiment'].to(device, non_blocking=True)
                    topic_target = batch['topic'].to(device, non_blocking=True)
                    
                    batch_size = images.size(0)
                    
                    # Forward pass
                    if self.use_amp:
                        with torch.cuda.amp.autocast():
                            outputs = self.model(images, input_ids, attention_mask)
                            loss_dict = self.criterion(
                                outputs['content_score'], outputs['sentiment'], outputs['topic'],
                                content_target, sentiment_target, topic_target
                            )
                    else:
                        outputs = self.model(images, input_ids, attention_mask)
                        loss_dict = self.criterion(
                            outputs['content_score'], outputs['sentiment'], outputs['topic'],
                            content_target, sentiment_target, topic_target
                        )
                    
                    test_loss += loss_dict['total_loss'].item() * batch_size
                    num_samples += batch_size
                    
                    # Collect all predictions and targets
                    all_outputs['content_preds'].extend(outputs['content_score'].argmax(dim=1).cpu().numpy())
                    all_outputs['content_targets'].extend(content_target.cpu().numpy())
                    all_outputs['content_probs'].extend(outputs['content_score'].cpu().numpy())
                    all_outputs['sentiment_preds'].extend(outputs['sentiment'].cpu().numpy())
                    all_outputs['sentiment_targets'].extend(sentiment_target.cpu().numpy())
                    all_outputs['topic_preds'].extend(outputs['topic'].cpu().numpy())
                    all_outputs['topic_targets'].extend(topic_target.cpu().numpy())
                    all_outputs['confidence_scores'].extend(outputs['confidence'].cpu().numpy())
                    
                except Exception as e:
                    self.logger.error(f"Error in test batch: {str(e)}")
                    continue
        
        # Calculate comprehensive metrics
        test_metrics = self._calculate_comprehensive_metrics(all_outputs, test_loss, num_samples)
        
        self.logger.info("Test evaluation completed")
        for metric, value in test_metrics.items():
            if isinstance(value, (int, float)):
                self.logger.info(f"  {metric}: {value:.4f}")
        
        return test_metrics
    
    def _calculate_comprehensive_metrics(self, all_outputs: Dict, test_loss: float, num_samples: int) -> Dict[str, float]:
        """Calculate comprehensive evaluation metrics."""
        metrics = {'test_loss': test_loss / num_samples}
        
        # Content classification metrics
        if all_outputs['content_preds'] and all_outputs['content_targets']:
            content_acc = accuracy_score(all_outputs['content_targets'], all_outputs['content_preds'])
            content_precision, content_recall, content_f1, _ = precision_recall_fscore_support(
                all_outputs['content_targets'], all_outputs['content_preds'], 
                average='macro', zero_division=0
            )
            
            metrics.update({
                'content_accuracy': content_acc,
                'content_precision': content_precision,
                'content_recall': content_recall,
                'content_f1': content_f1
            })
            
            # Per-class metrics
            per_class_f1 = precision_recall_fscore_support(
                all_outputs['content_targets'], all_outputs['content_preds'], 
                average=None, zero_division=0
            )[2]
            
            for i, f1_score in enumerate(per_class_f1):
                metrics[f'content_class_{i}_f1'] = f1_score
        
        # Sentiment analysis metrics
        if all_outputs['sentiment_preds'] and all_outputs['sentiment_targets']:
            sentiment_mse = np.mean([
                np.mean((pred - target) ** 2)
                for pred, target in zip(all_outputs['sentiment_preds'], all_outputs['sentiment_targets'])
            ])
            
            sentiment_mae = np.mean([
                np.mean(np.abs(pred - target))
                for pred, target in zip(all_outputs['sentiment_preds'], all_outputs['sentiment_targets'])
            ])
            
            metrics.update({
                'sentiment_mse': sentiment_mse,
                'sentiment_mae': sentiment_mae
            })
        
        # Topic classification metrics
        if all_outputs['topic_preds'] and all_outputs['topic_targets']:
            topic_mse = np.mean([
                np.mean((pred - target) ** 2)
                for pred, target in zip(all_outputs['topic_preds'], all_outputs['topic_targets'])
            ])
            
            topic_mae = np.mean([
                np.mean(np.abs(pred - target))
                for pred, target in zip(all_outputs['topic_preds'], all_outputs['topic_targets'])
            ])
            
            metrics.update({
                'topic_mse': topic_mse,
                'topic_mae': topic_mae
            })
        
        # Confidence calibration metrics
        if all_outputs['confidence_scores']:
            avg_confidence = np.mean(all_outputs['confidence_scores'])
            confidence_std = np.std(all_outputs['confidence_scores'])
            
            metrics.update({
                'avg_confidence': avg_confidence,
                'confidence_std': confidence_std
            })
        
        return metrics

### 5.3 Model Training Execution

```python
# Initialize the complete model for training
print("\n🧠 Initializing Model for Training...")

# Create the complete model (using the vocab_size from our datasets)
model = IntelligentContentAnalyzer(
    vocab_size=vocab_size,
    num_content_classes=3,
    vision_dim=512,
    text_dim=512,
    fusion_dim=256
)

# Get model information
model_info = model.get_model_info()
print(f"\n📊 Model Architecture Summary:")
print(f"  🏗️ Total parameters: {model_info['parameters']['total_parameters']:,}")
print(f"  🎯 Trainable parameters: {model_info['parameters']['trainable_parameters']:,}")
print(f"  📱 Model size: ~{model_info['parameters']['total_parameters'] * 4 / (1024**2):.1f} MB")

# Component breakdown
print(f"\n🔧 Component Breakdown:")
print(f"  👁️ Vision encoder: {model_info['parameters']['vision_parameters']:,} params")
print(f"  📝 Text encoder: {model_info['parameters']['text_parameters']:,} params")
print(f"  🔗 Fusion module: {model_info['parameters']['fusion_parameters']:,} params")

# Demonstration: Training Configuration and Initialization
print("\n🎯 Initializing Advanced Training Framework...")

# Create experiment directory with timestamp
experiment_timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
experiment_dir = capstone_dir / 'experiments' / f"multimodal_experiment_{experiment_timestamp}"
experiment_dir.mkdir(parents=True, exist_ok=True)

# Comprehensive training configuration
training_config = {
    # Learning rates (different for each component)
    'vision_lr': 1e-5,      # Lower LR for pretrained vision model
    'text_lr': 5e-4,        # Moderate LR for text encoder
    'fusion_lr': 1e-3,      # Higher LR for fusion layers
    
    # Weight decay (L2 regularization)
    'vision_weight_decay': 0.01,
    'text_weight_decay': 0.01,
    'fusion_weight_decay': 0.005,
    
    # Training dynamics
    'num_epochs': 25,
    'patience': 8,
    'min_delta': 1e-4,
    'gradient_accumulation_steps': 2,
    'max_grad_norm': 1.0,
    
    # Mixed precision and optimization
    'mixed_precision': torch.cuda.is_available(),
    'learn_task_weights': True,
    
    # Learning rate scheduling
    'scheduler': 'cosine_annealing',
    'T_0': 10,
    'T_mult': 2,
    'eta_min': 1e-6,
    
    # Logging and checkpointing
    'log_interval': 25,
    'checkpoint_interval': 5,
    
    # Experiment metadata
    'experiment_name': 'multimodal_content_analyzer',
    'description': 'Advanced multi-modal content analysis with cross-attention fusion',
    'tags': ['multi-modal', 'content-analysis', 'transformer', 'attention']
}

print(f"\n⚙️ Training Configuration:")
print(f"  🎯 Experiment: {training_config['experiment_name']}")
print(f"  📁 Directory: {experiment_dir}")
print(f"  🏋️ Epochs: {training_config['num_epochs']}")
print(f"  📚 Batch accumulation: {training_config['gradient_accumulation_steps']} steps")
print(f"  ⚡ Mixed precision: {training_config['mixed_precision']}")
print(f"  🧠 Learn task weights: {training_config['learn_task_weights']}")

# Learning rates summary
print(f"  📈 Learning rates:")
print(f"    👁️ Vision: {training_config['vision_lr']}")
print(f"    📝 Text: {training_config['text_lr']}")
print(f"    🔗 Fusion: {training_config['fusion_lr']}")

# Save training configuration
config_path = experiment_dir / 'training_config.json'
with open(config_path, 'w') as f:
    json.dump(training_config, f, indent=2)

print(f"💾 Training config saved to: {config_path}")

# Initialize advanced trainer
trainer = AdvancedTrainer(
    model=model,
    train_loader=train_loader,
    val_loader=val_loader,
    test_loader=test_loader,
    experiment_dir=experiment_dir,
    config=training_config
)

print(f"\n🏋️ Training Framework Ready:")
print(f"  🔥 Optimizer: AdamW with component-specific LRs")
print(f"  📊 Scheduler: {training_config['scheduler']}")
print(f"  ⚖️ Loss: Multi-task with automatic weighting")
print(f"  ⏰ Early stopping: {training_config['patience']} epochs patience")

# Quick training demonstration (reduced epochs for demo)
demo_epochs = 5  # Reduced for demonstration
print(f"\n🚀 Starting Training Demonstration ({demo_epochs} epochs)...")
print(f"   ⚡ Features enabled:")
print(f"     • Multi-task learning with automatic loss weighting")
print(f"     • Component-specific learning rates")
print(f"     • Mixed precision training (GPU)")
print(f"     • Gradient accumulation and clipping")
print(f"     • Cosine annealing with warm restarts")
print(f"     • Early stopping with patience")
print(f"     • Comprehensive evaluation metrics")
print(f"     • Real-time monitoring and logging")

# Update config for demo
demo_config = training_config.copy()
demo_config['num_epochs'] = demo_epochs
demo_config['patience'] = demo_epochs  # Disable early stopping for demo

# Execute training
try:
    training_summary = trainer.train(num_epochs=demo_epochs)
    
    print(f"\n✅ Training Demo Completed Successfully!")
    
    # Display key results
    print(f"\n📊 Training Results Summary:")
    print(f"   ⏱️ Total time: {training_summary['total_training_time']:.2f}s")
    print(f"   🎯 Best validation loss: {training_summary['best_val_loss']:.4f}")
    print(f"   📈 Epochs completed: {training_summary['total_epochs']}")
    
    # Final test metrics
    test_metrics = training_summary['final_test_metrics']
    print(f"\n🧪 Test Set Performance:")
    print(f"   📊 Test loss: {test_metrics['test_loss']:.4f}")
    print(f"   🎯 Content accuracy: {test_metrics['content_accuracy']:.3f}")
    print(f"   📝 Content F1: {test_metrics['content_f1']:.3f}")
    print(f"   💭 Sentiment MSE: {test_metrics['sentiment_mse']:.4f}")
    print(f"   📚 Topic MSE: {test_metrics['topic_mse']:.4f}")
    
    # Training progression
    if len(training_summary['train_history']) > 0:
        final_train_loss = training_summary['train_history'][-1]['total_loss']
        final_val_loss = training_summary['val_history'][-1]['total_loss']
        best_val_acc = max(epoch['content_acc'] for epoch in training_summary['val_history'])
        
        print(f"\n📈 Training Progression:")
        print(f"   📉 Final training loss: {final_train_loss:.4f}")
        print(f"   📊 Final validation loss: {final_val_loss:.4f}")
        print(f"   🎯 Best validation accuracy: {best_val_acc:.3f}")
    
    # Loss weighting analysis
    if training_summary.get('loss_summary'):
        loss_summary = training_summary['loss_summary']
        if loss_summary and 'avg_weights' in loss_summary:
            weights = loss_summary['avg_weights']
            print(f"\n⚖️ Learned Task Weights:")
            print(f"   🎯 Content: {weights['content']:.3f}")
            print(f"   💭 Sentiment: {weights['sentiment']:.3f}")
            print(f"   📚 Topic: {weights['topic']:.3f}")

except Exception as e:
    print(f"\n❌ Training Error: {str(e)}")
    print(f"   This is a demonstration - in practice, investigate and resolve training issues")

# Save model for production deployment
print(f"\n💾 Saving Trained Model...")
model_save_path = capstone_dir / 'models' / 'intelligent_content_analyzer_trained.pth'
model.save_model(
    save_path=model_save_path, 
    include_optimizer=True, 
    optimizer_state=trainer.optimizer.state_dict()
)

print(f"   📁 Model saved to: {model_save_path}")
print(f"   📊 Model info included: architecture, training config, performance metrics")
```

---

## 6. Model Inference and Analysis

### 6.1 Comprehensive Model Testing

```python
# Comprehensive model inference testing and analysis
print("\n🧪 Comprehensive Model Inference Testing...")

def test_model_inference(model, test_loader, num_samples=10):
    """
    Comprehensive model inference testing with detailed analysis.
    
    Args:
        model: Trained model
        test_loader: Test data loader
        num_samples: Number of samples to analyze in detail
    
    Returns:
        inference_results: Dictionary with detailed analysis results
    """
    model.eval()
    
    # Initialize results tracking
    inference_results = {
        'sample_predictions': [],
        'performance_metrics': {
            'inference_times': [],
            'memory_usage': [],
            'confidence_scores': []
        },
        'attention_analysis': [],
        'failure_cases': [],
        'success_cases': []
    }
    
    # Content class labels
    content_labels = ['Positive', 'Negative', 'Neutral']
    sentiment_labels = ['Positive', 'Negative', 'Neutral']
    
    print(f"  Analyzing {num_samples} samples in detail...")
    
    with torch.no_grad():
        sample_count = 0
        
        for batch_idx, batch in enumerate(test_loader):
            if sample_count >= num_samples:
                break
                
            # Move to device
            images = batch['image'].to(device)
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            content_targets = batch['content_score']
            
            batch_size = images.size(0)
            
            for i in range(min(batch_size, num_samples - sample_count)):
                # Single sample inference
                sample_images = images[i:i+1]
                sample_input_ids = input_ids[i:i+1]
                sample_attention_mask = attention_mask[i:i+1]
                
                # Measure inference time
                start_time = time.time()
                
                # Forward pass with attention
                outputs = model(
                    sample_images, sample_input_ids, sample_attention_mask, 
                    return_attention=True
                )
                
                inference_time = time.time() - start_time
                
                # Memory usage (if CUDA available)
                memory_usage = torch.cuda.memory_allocated() / 1024**2 if torch.cuda.is_available() else 0
                
                # Extract predictions
                content_pred = outputs['content_score'].argmax(dim=1).item()
                sentiment_pred = outputs['sentiment'].argmax(dim=1).item()
                topic_pred = outputs['topic'].argmax(dim=1).item()
                confidence = outputs['confidence'].item()
                
                # Ground truth
                content_target = content_targets[i].item()
                
                # Prediction correctness
                is_correct = content_pred == content_target
                
                # Detailed sample analysis
                sample_analysis = {
                    'sample_id': batch['sample_id'][i],
                    'text': batch['text'][i],
                    'topic_name': batch['topic_name'][i],
                    'predictions': {
                        'content': {
                            'predicted_class': content_pred,
                            'predicted_label': content_labels[content_pred],
                            'confidence': outputs['content_score'][0, content_pred].item(),
                            'all_probs': outputs['content_score'][0].cpu().numpy().tolist()
                        },
                        'sentiment': {
                            'predicted_class': sentiment_pred,
                            'predicted_label': sentiment_labels[sentiment_pred],
                            'all_probs': outputs['sentiment'][0].cpu().numpy().tolist()
                        },
                        'topic': {
                            'predicted_class': topic_pred,
                            'all_probs': outputs['topic'][0].cpu().numpy().tolist()
                        }
                    },
                    'ground_truth': {
                        'content_class': content_target,
                        'content_label': content_labels[content_target]
                    },
                    'performance': {
                        'inference_time': inference_time,
                        'memory_usage_mb': memory_usage,
                        'overall_confidence': confidence,
                        'is_correct': is_correct
                    },
                    'features': {
                        'vision_features_norm': torch.norm(outputs['vision_features'][0]).item(),
                        'text_features_norm': torch.norm(outputs['text_features'][0]).item(),
                        'fused_features_norm': torch.norm(outputs['fused_features'][0]).item()
                    }
                }
                
                # Attention analysis
                if 'vision_attention' in outputs:
                    vision_attn = outputs['vision_attention'][0].cpu().numpy()
                    sample_analysis['attention'] = {
                        'vision_attention_entropy': -np.sum(vision_attn * np.log(vision_attn + 1e-8)),
                        'vision_attention_max': np.max(vision_attn),
                        'vision_attention_std': np.std(vision_attn)
                    }
                
                # Store sample analysis
                inference_results['sample_predictions'].append(sample_analysis)
                
                # Performance tracking
                inference_results['performance_metrics']['inference_times'].append(inference_time)
                inference_results['performance_metrics']['memory_usage'].append(memory_usage)
                inference_results['performance_metrics']['confidence_scores'].append(confidence)
                
                # Categorize as success or failure case
                if is_correct and confidence > 0.7:
                    inference_results['success_cases'].append(sample_analysis)
                elif not is_correct or confidence < 0.3:
                    inference_results['failure_cases'].append(sample_analysis)
                
                sample_count += 1
                
                if sample_count >= num_samples:
                    break
    
    # Calculate aggregate performance metrics
    perf_metrics = inference_results['performance_metrics']
    if perf_metrics['inference_times']:
        inference_results['aggregate_performance'] = {
            'avg_inference_time': np.mean(perf_metrics['inference_times']),
            'std_inference_time': np.std(perf_metrics['inference_times']),
            'min_inference_time': np.min(perf_metrics['inference_times']),
            'max_inference_time': np.max(perf_metrics['inference_times']),
            'avg_memory_usage': np.mean(perf_metrics['memory_usage']),
            'avg_confidence': np.mean(perf_metrics['confidence_scores']),
            'std_confidence': np.std(perf_metrics['confidence_scores']),
            'accuracy': np.mean([s['performance']['is_correct'] for s in inference_results['sample_predictions']]),
            'high_confidence_accuracy': np.mean([
                s['performance']['is_correct'] for s in inference_results['sample_predictions']
                if s['performance']['overall_confidence'] > 0.7
            ]) if any(s['performance']['overall_confidence'] > 0.7 for s in inference_results['sample_predictions']) else 0.0
        }
    
    return inference_results

# Execute comprehensive testing (only if model training was successful)
try:
    if 'training_summary' in locals():
        inference_results = test_model_inference(model, test_loader, num_samples=8)

        # Display results
        print(f"\n📊 Inference Analysis Results:")

        # Performance metrics
        if 'aggregate_performance' in inference_results:
            perf = inference_results['aggregate_performance']
            print(f"\n⚡ Performance Metrics:")
            print(f"  ⏱️ Avg inference time: {perf['avg_inference_time']*1000:.2f}ms ± {perf['std_inference_time']*1000:.2f}ms")
            print(f"  💾 Avg memory usage: {perf['avg_memory_usage']:.1f}MB")
            print(f"  🎯 Sample accuracy: {perf['accuracy']:.3f}")
            print(f"  🎯 High-confidence accuracy: {perf['high_confidence_accuracy']:.3f}")
            print(f"  💪 Avg confidence: {perf['avg_confidence']:.3f} ± {perf['std_confidence']:.3f}")

        # Sample predictions analysis
        print(f"\n🔍 Sample Predictions Analysis:")
        for i, sample in enumerate(inference_results['sample_predictions'][:3]):  # Show first 3
            print(f"\n  Sample {i+1} ({sample['sample_id']}):")
            print(f"    📝 Text: '{sample['text'][:80]}...'")
            print(f"    🎯 Predicted: {sample['predictions']['content']['predicted_label']} ({sample['predictions']['content']['confidence']:.3f})")
            print(f"    ✅ Actual: {sample['ground_truth']['content_label']}")
            print(f"    ⚡ Inference time: {sample['performance']['inference_time']*1000:.2f}ms")
            print(f"    💪 Overall confidence: {sample['performance']['overall_confidence']:.3f}")
            print(f"    ✓ Correct: {sample['performance']['is_correct']}")
            print(f"    📚 Topic: {sample['topic_name']}")

        # Success and failure case analysis
        print(f"\n🎉 Success Cases: {len(inference_results['success_cases'])}")
        print(f"❌ Failure Cases: {len(inference_results['failure_cases'])}")

        if inference_results['failure_cases']:
            print(f"\n🔍 Failure Case Analysis:")
            for case in inference_results['failure_cases'][:2]:  # Show first 2 failure cases
                print(f"  📝 Text: '{case['text'][:60]}...'")
                print(f"  🎯 Predicted: {case['predictions']['content']['predicted_label']} (conf: {case['performance']['overall_confidence']:.3f})")
                print(f"  ✅ Actual: {case['ground_truth']['content_label']}")
                print()

        # Save detailed inference results
        inference_results_path = experiment_dir / 'inference_analysis.json'
        with open(inference_results_path, 'w') as f:
            # Convert numpy arrays to lists for JSON serialization
            serializable_results = {}
            for key, value in inference_results.items():
                if key == 'sample_predictions':
                    serializable_results[key] = []
                    for sample in value:
                        serializable_sample = {}
                        for k, v in sample.items():
                            if isinstance(v, dict):
                                serializable_sample[k] = {}
                                for k2, v2 in v.items():
                                    if isinstance(v2, np.ndarray):
                                        serializable_sample[k][k2] = v2.tolist()
                                    else:
                                        serializable_sample[k][k2] = v2
                            else:
                                serializable_sample[k] = v
                        serializable_results[key].append(serializable_sample)
                else:
                    serializable_results[key] = value
            
            json.dump(serializable_results, f, indent=2, default=str)

        print(f"💾 Detailed inference analysis saved to: {inference_results_path}")
    else:
        print(f"⚠️ Skipping inference analysis - model training not completed")
except Exception as e:
    print(f"❌ Error in inference testing: {str(e)}")
```

### 6.2 Feature Visualization and Analysis

```python
def visualize_model_features(model, sample_batch, save_dir):
    """
    Visualize and analyze model features and attention patterns.
    
    Args:
        model: Trained model
        sample_batch: Batch of samples for analysis
        save_dir: Directory to save visualizations
    """
    model.eval()
    
    # Create visualization directory
    viz_dir = save_dir / 'visualizations'
    viz_dir.mkdir(exist_ok=True)
    
    try:
        with torch.no_grad():
            # Take first sample from batch
            images = sample_batch['image'][:1].to(device)
            input_ids = sample_batch['input_ids'][:1].to(device)
            attention_mask = sample_batch['attention_mask'][:1].to(device)
            
            # Forward pass with attention
            outputs = model(images, input_ids, attention_mask, return_attention=True)
            
            # Create comprehensive visualization
            fig, axes = plt.subplots(2, 3, figsize=(18, 12))
            
            # 1. Input image
            img_np = images[0].cpu().permute(1, 2, 0).numpy()
            # Denormalize image
            mean = np.array([0.485, 0.456, 0.406])
            std = np.array([0.229, 0.224, 0.225])
            img_np = img_np * std + mean
            img_np = np.clip(img_np, 0, 1)
            
            axes[0, 0].imshow(img_np)
            axes[0, 0].set_title('Input Image', fontsize=14)
            axes[0, 0].axis('off')
            
            # 2. Vision attention heatmap
            if 'vision_attention' in outputs:
                vision_attn = outputs['vision_attention'][0].cpu().numpy()
                im = axes[0, 1].imshow(vision_attn, cmap='hot', interpolation='nearest')
                axes[0, 1].set_title('Vision Self-Attention', fontsize=14)
                axes[0, 1].axis('off')
                plt.colorbar(im, ax=axes[0, 1], fraction=0.046, pad=0.04)
            
            # 3. Feature magnitude visualization
            vision_features = outputs['vision_features'][0].cpu().numpy()
            text_features = outputs['text_features'][0].cpu().numpy()
            fused_features = outputs['fused_features'][0].cpu().numpy()
            
            feature_data = [
                vision_features[:50],  # First 50 dims
                text_features[:50],
                fused_features[:50] if len(fused_features) >= 50 else fused_features
            ]
            feature_labels = ['Vision Features', 'Text Features', 'Fused Features']
            colors = ['blue', 'green', 'red']
            
            for i, (data, label, color) in enumerate(zip(feature_data, feature_labels, colors)):
                axes[0, 2].bar(range(len(data)), data, alpha=0.7, label=label, color=color)
            
            axes[0, 2].set_title('Feature Magnitudes (First 50 dims)', fontsize=14)
            axes[0, 2].set_xlabel('Feature Dimension')
            axes[0, 2].set_ylabel('Magnitude')
            axes[0, 2].legend()
            axes[0, 2].grid(True, alpha=0.3)
            
            # 4. Prediction confidence visualization
            content_probs = outputs['content_score'][0].cpu().numpy()
            sentiment_probs = outputs['sentiment'][0].cpu().numpy()
            topic_probs = outputs['topic'][0].cpu().numpy()
            
            # Content prediction
            content_labels = ['Positive', 'Negative', 'Neutral']
            bars1 = axes[1, 0].bar(content_labels, content_probs, alpha=0.8, color='skyblue')
            axes[1, 0].set_title('Content Classification Confidence', fontsize=14)
            axes[1, 0].set_ylabel('Probability')
            axes[1, 0].set_ylim(0, 1)
            
            # Add value labels on bars
            for bar, prob in zip(bars1, content_probs):
                height = bar.get_height()
                axes[1, 0].text(bar.get_x() + bar.get_width()/2., height + 0.01,
                               f'{prob:.3f}', ha='center', va='bottom')
            
            # 5. Sentiment prediction
            sentiment_labels = ['Positive', 'Negative', 'Neutral']
            bars2 = axes[1, 1].bar(sentiment_labels, sentiment_probs, alpha=0.8, color='lightcoral')
            axes[1, 1].set_title('Sentiment Analysis Confidence', fontsize=14)
            axes[1, 1].set_ylabel('Probability')
            axes[1, 1].set_ylim(0, 1)
            
            for bar, prob in zip(bars2, sentiment_probs):
                height = bar.get_height()
                axes[1, 1].text(bar.get_x() + bar.get_width()/2., height + 0.01,
                               f'{prob:.3f}', ha='center', va='bottom')
            
            # 6. Topic prediction (top 5)
            top_topic_indices = np.argsort(topic_probs)[-5:][::-1]
            top_topic_probs = topic_probs[top_topic_indices]
            topic_names = [f'Topic {i}' for i in top_topic_indices]
            
            bars3 = axes[1, 2].barh(topic_names, top_topic_probs, alpha=0.8, color='lightgreen')
            axes[1, 2].set_title('Top 5 Topic Predictions', fontsize=14)
            axes[1, 2].set_xlabel('Probability')
            axes[1, 2].set_xlim(0, max(top_topic_probs) * 1.1)
            
            for bar, prob in zip(bars3, top_topic_probs):
                width = bar.get_width()
                axes[1, 2].text(width + 0.01, bar.get_y() + bar.get_height()/2.,
                               f'{prob:.3f}', ha='left', va='center')
            
            plt.tight_layout()
            plt.savefig(viz_dir / 'model_analysis_comprehensive.png', dpi=300, bbox_inches='tight')
            plt.show()
            
            # Save individual feature vectors for further analysis
            feature_analysis = {
                'vision_features': vision_features.tolist(),
                'text_features': text_features.tolist(),
                'fused_features': fused_features.tolist(),
                'predictions': {
                    'content': content_probs.tolist(),
                    'sentiment': sentiment_probs.tolist(),
                    'topic': topic_probs.tolist()
                },
                'sample_info': {
                    'text': sample_batch['text'][0],
                    'topic_name': sample_batch['topic_name'][0],
                    'sample_id': sample_batch['sample_id'][0]
                }
            }
            
            with open(viz_dir / 'feature_analysis.json', 'w') as f:
                json.dump(feature_analysis, f, indent=2)
            
            print(f"💾 Feature visualizations saved to: {viz_dir}")
            
            return feature_analysis
    
    except Exception as e:
        print(f"❌ Error in feature visualization: {str(e)}")
        return None

# Visualize model features (only if training was successful)
try:
    if 'training_summary' in locals():
        print(f"\n🎨 Creating Model Feature Visualizations...")
        sample_batch = next(iter(test_loader))
        feature_analysis = visualize_model_features(model, sample_batch, experiment_dir)

        if feature_analysis:
            print(f"✅ Feature visualization completed")
            print(f"  📊 Vision features: {len(feature_analysis['vision_features'])} dimensions")
            print(f"  📝 Text features: {len(feature_analysis['text_features'])} dimensions")
            print(f"  🔗 Fused features: {len(feature_analysis['fused_features'])} dimensions")
    else:
        print(f"⚠️ Skipping feature visualization - model training not completed")
except Exception as e:
    print(f"❌ Error in feature visualization: {str(e)}")
```

### 6.3 Model Performance Benchmarking

```python
def benchmark_model_performance(model, test_loader, num_batches=10):
    """
    Comprehensive performance benchmarking of the model.
    
    Args:
        model: Model to benchmark
        test_loader: Test data loader
        num_batches: Number of batches to benchmark
    
    Returns:
        benchmark_results: Comprehensive performance metrics
    """
    model.eval()
    
    print(f"🏁 Benchmarking model performance over {num_batches} batches...")
    
    benchmark_results = {
        'throughput': {
            'batch_times': [],
            'samples_per_second': [],
            'tokens_per_second': []
        },
        'memory': {
            'peak_memory_mb': [],
            'memory_efficiency': []
        },
        'accuracy': {
            'batch_accuracies': [],
            'confidence_scores': []
        },
        'hardware_info': {
            'device': str(device),
            'cuda_available': torch.cuda.is_available(),
            'gpu_name': torch.cuda.get_device_name() if torch.cuda.is_available() else None,
            'gpu_memory_total': torch.cuda.get_device_properties(device).total_memory / 1024**3 if torch.cuda.is_available() else None
        }
    }
    
    total_correct = 0
    total_samples = 0
    
    with torch.no_grad():
        for batch_idx, batch in enumerate(test_loader):
            if batch_idx >= num_batches:
                break
            
            # Move to device
            images = batch['image'].to(device, non_blocking=True)
            input_ids = batch['input_ids'].to(device, non_blocking=True)
            attention_mask = batch['attention_mask'].to(device, non_blocking=True)
            content_targets = batch['content_score'].to(device, non_blocking=True)
            
            batch_size = images.size(0)
            seq_len = input_ids.size(1)
            
            # Memory before
            if torch.cuda.is_available():
                torch.cuda.reset_peak_memory_stats()
                torch.cuda.synchronize()
            
            # Benchmark inference time
            start_time = time.time()
            
            outputs = model(images, input_ids, attention_mask)
            
            if torch.cuda.is_available():
                torch.cuda.synchronize()
            
            batch_time = time.time() - start_time
            
            # Memory after
            peak_memory = 0
            if torch.cuda.is_available():
                peak_memory = torch.cuda.max_memory_allocated() / 1024**2  # MB
            
            # Calculate accuracy
            content_pred = outputs['content_score'].argmax(dim=1)
            batch_correct = (content_pred == content_targets).sum().item()
            batch_accuracy = batch_correct / batch_size
            
            # Calculate confidence
            confidence = outputs['confidence'].mean().item()
            
            # Store metrics
            benchmark_results['throughput']['batch_times'].append(batch_time)
            benchmark_results['throughput']['samples_per_second'].append(batch_size / batch_time)
            benchmark_results['throughput']['tokens_per_second'].append(batch_size * seq_len / batch_time)
            
            benchmark_results['memory']['peak_memory_mb'].append(peak_memory)
            benchmark_results['memory']['memory_efficiency'].append(batch_size / max(peak_memory, 1))
            
            benchmark_results['accuracy']['batch_accuracies'].append(batch_accuracy)
            benchmark_results['accuracy']['confidence_scores'].append(confidence)
            
            total_correct += batch_correct
            total_samples += batch_size
            
            print(f"  Batch {batch_idx+1}/{num_batches}: {batch_time*1000:.2f}ms, "
                  f"{batch_size/batch_time:.1f} samples/s, "
                  f"acc: {batch_accuracy:.3f}")
    
    # Calculate aggregate metrics
    benchmark_results['aggregate'] = {
        'avg_batch_time': np.mean(benchmark_results['throughput']['batch_times']),
        'std_batch_time': np.std(benchmark_results['throughput']['batch_times']),
        'avg_samples_per_second': np.mean(benchmark_results['throughput']['samples_per_second']),
        'avg_tokens_per_second': np.mean(benchmark_results['throughput']['tokens_per_second']),
        'avg_peak_memory_mb': np.mean(benchmark_results['memory']['peak_memory_mb']),
        'overall_accuracy': total_correct / total_samples,
        'avg_confidence': np.mean(benchmark_results['accuracy']['confidence_scores']),
        'total_samples_processed': total_samples
    }
    
    return benchmark_results

# Execute performance benchmark (only if training was successful)
try:
    if 'training_summary' in locals():
        benchmark_results = benchmark_model_performance(model, test_loader, num_batches=5)

        print(f"\n📊 Performance Benchmark Results:")
        agg = benchmark_results['aggregate']

        print(f"\n⚡ Throughput:")
        print(f"  ⏱️ Avg batch time: {agg['avg_batch_time']*1000:.2f}ms ± {agg['std_batch_time']*1000:.2f}ms")
        print(f"  🚀 Samples/second: {agg['avg_samples_per_second']:.1f}")
        print(f"  📝 Tokens/second: {agg['avg_tokens_per_second']:.0f}")

        print(f"\n💾 Memory:")
        print(f"  📊 Peak memory: {agg['avg_peak_memory_mb']:.1f}MB")
        print(f"  🖥️ Device: {benchmark_results['hardware_info']['device']}")
        if benchmark_results['hardware_info']['gpu_name']:
            print(f"  🎮 GPU: {benchmark_results['hardware_info']['gpu_name']}")
            print(f"  📊 GPU Memory: {benchmark_results['hardware_info']['gpu_memory_total']:.1f}GB")

        print(f"\n🎯 Accuracy:")
        print(f"  📈 Overall accuracy: {agg['overall_accuracy']:.3f}")
        print(f"  💪 Avg confidence: {agg['avg_confidence']:.3f}")
        print(f"  📊 Total samples: {agg['total_samples_processed']}")

        # Save benchmark results
        benchmark_path = experiment_dir / 'performance_benchmark.json'
        with open(benchmark_path, 'w') as f:
            json.dump(benchmark_results, f, indent=2, default=str)

        print(f"💾 Benchmark results saved to: {benchmark_path}")
    else:
        print(f"⚠️ Skipping performance benchmark - model training not completed")
except Exception as e:
    print(f"❌ Error in performance benchmarking: {str(e)}")
```

---

## 7. Model Deployment Preparation

### 7.1 Model Export and Optimization

```python
class ModelExporter:
    """
    Model export utilities for production deployment.
    
    This class provides methods to export the trained model in different formats
    suitable for various deployment scenarios.
    """
    
    def __init__(self, model, vocab_size, experiment_dir):
        self.model = model
        self.vocab_size = vocab_size
        self.experiment_dir = experiment_dir
        self.export_dir = experiment_dir / 'deployment'
        self.export_dir.mkdir(exist_ok=True)
        
    def export_torchscript(self, example_inputs=None):
        """Export model to TorchScript for optimized inference."""
        print("🚀 Exporting model to TorchScript...")
        
        self.model.eval()
        
        if example_inputs is None:
            # Create example inputs
            batch_size = 1
            example_inputs = (
                torch.randn(batch_size, 3, 224, 224).to(device),
                torch.randint(1, self.vocab_size, (batch_size, 256)).to(device),
                torch.ones(batch_size, 256, dtype=torch.long).to(device)
            )
        
        try:
            # Trace the model
            traced_model = torch.jit.trace(self.model, example_inputs)
            
            # Save traced model
            script_path = self.export_dir / 'model_traced.pt'
            traced_model.save(str(script_path))
            
            print(f"✅ TorchScript model saved to: {script_path}")
            
            # Test traced model
            with torch.no_grad():
                original_output = self.model(*example_inputs)
                traced_output = traced_model(*example_inputs)
                
                # Compare outputs
                content_diff = torch.abs(original_output['content_score'] - traced_output['content_score']).max()
                print(f"  📊 Max output difference: {content_diff:.6f}")
                
            return traced_model, script_path
            
        except Exception as e:
            print(f"❌ TorchScript export failed: {e}")
            return None, None
    
    def export_onnx(self, example_inputs=None):
        """Export model to ONNX format."""
        print("🔄 Exporting model to ONNX...")
        
        # Note: This is a simplified version - full ONNX export would need more setup
        print("⚠️ ONNX export requires additional setup and may need model modifications")
        print("   Consider using torch.onnx.export with proper handling of dynamic shapes")
        
        onnx_path = self.export_dir / 'model.onnx'
        print(f"📁 Target ONNX path: {onnx_path}")
        
        return None  # Placeholder for actual ONNX export
    
    def create_deployment_package(self):
        """Create a complete deployment package."""
        print("📦 Creating deployment package...")
        
        # Model files
        model_files = {
            'model_state': 'intelligent_content_analyzer_trained.pth',
            'model_config': 'model_config.json',
            'vocabulary': 'vocabulary.json',
            'deployment_info': 'deployment_info.json'
        }
        
        # Save model configuration
        model_config = {
            'vocab_size': self.vocab_size,
            'model_architecture': self.model.config,
            'model_info': self.model.get_model_info(),
            'input_specs': {
                'image_size': [224, 224],
                'max_text_length': 256,
                'image_channels': 3
            },
            'output_specs': {
                'content_classes': 3,
                'sentiment_classes': 3,
                'topic_classes': 10
            }
        }
        
        config_path = self.export_dir / model_files['model_config']
        with open(config_path, 'w') as f:
            json.dump(model_config, f, indent=2)
        
        # Save vocabulary (placeholder - would be actual vocab in practice)
        vocab_data = {
            'vocab_size': self.vocab_size,
            'special_tokens': {
                'PAD': 0, 'UNK': 1, 'START': 2, 'END': 3,
                'MASK': 4, 'NUM': 5, 'PUNCT': 6
            },
            'note': 'In production, include full vocabulary mapping'
        }
        
        vocab_path = self.export_dir / model_files['vocabulary']
        with open(vocab_path, 'w') as f:
            json.dump(vocab_data, f, indent=2)
        
        # Deployment information
        deployment_info = {
            'model_version': self.model.model_version,
            'export_time': datetime.now().isoformat(),
            'framework': 'PyTorch',
            'python_version': '3.8+',
            'torch_version': torch.__version__,
            'deployment_files': model_files,
            'system_requirements': {
                'min_memory_gb': 4,
                'recommended_memory_gb': 8,
                'gpu_memory_mb': 2048,
                'cpu_cores': 2
            },
            'api_endpoints': {
                'predict': '/predict',
                'health': '/health',
                'metrics': '/metrics',
                'model_info': '/model-info'
            }
        }
        
        deployment_path = self.export_dir / model_files['deployment_info']
        with open(deployment_path, 'w') as f:
            json.dump(deployment_info, f, indent=2)
        
        # Copy model weights
        model_weights_src = capstone_dir / 'models' / 'intelligent_content_analyzer_trained.pth'
        model_weights_dst = self.export_dir / model_files['model_state']
        
        if model_weights_src.exists():
            import shutil
            shutil.copy2(model_weights_src, model_weights_dst)
            print(f"✅ Model weights copied to deployment package")
        
        # Create README
        readme_content = f"""
# Intelligent Content Analyzer - Deployment Package

## Overview
This package contains a trained multi-modal AI model for content analysis.

## Model Capabilities
- Content classification (Positive/Negative/Neutral)
- Sentiment analysis
- Topic classification
- Confidence estimation

## Files
- `{model_files['model_state']}`: Trained model weights
- `{model_files['model_config']}`: Model architecture configuration
- `{model_files['vocabulary']}`: Text tokenization vocabulary
- `{model_files['deployment_info']}`: Deployment specifications

## Quick Start
```python
from intelligent_content_analyzer import load_model

# Load model
model = load_model('{model_files['model_state']}')

# Make prediction
result = model.predict(image, text)

## System Requirements
- Python 3.8+
- PyTorch {torch.__version__}
- Memory: 4GB+ (8GB recommended)
- GPU: 2GB+ VRAM (optional but recommended)

## API Endpoints
- POST /predict - Make predictions
- GET /health - Health check
- GET /metrics - Performance metrics
- GET /model-info - Model information

Created: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
Model Version: {self.model.model_version}
"""
        
        readme_path = self.export_dir / 'README.md'
        with open(readme_path, 'w') as f:
            f.write(readme_content)
        
        print(f"✅ Deployment package created in: {self.export_dir}")
        print(f"📦 Package contents:")
        for file_path in self.export_dir.glob('*'):
            if file_path.is_file():
                size_mb = file_path.stat().st_size / (1024 * 1024)
                print(f"  📄 {file_path.name} ({size_mb:.2f} MB)")
        
        return self.export_dir

# Create model exporter and deployment package (only if training was successful)
try:
    if 'training_summary' in locals():
        print("\n🚀 Preparing Model for Deployment...")
        exporter = ModelExporter(model, vocab_size, experiment_dir)

        # Export to TorchScript
        traced_model, script_path = exporter.export_torchscript()

        # Create deployment package
        deployment_dir = exporter.create_deployment_package()

        print(f"\n✅ Model deployment preparation completed!")
        print(f"  📦 Deployment package: {deployment_dir}")
        if script_path:
            print(f"  🚀 TorchScript model: {script_path}")
    else:
        print(f"⚠️ Skipping deployment preparation - model training not completed")
except Exception as e:
    print(f"❌ Error in deployment preparation: {str(e)}")
```

### 7.2 Production API Template

```python
def create_production_api_template():
    """Create a FastAPI template for production deployment."""
    
    api_template = '''
from fastapi import FastAPI, HTTPException, UploadFile, File
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
import torch
import torchvision.transforms as transforms
from PIL import Image
import io
import json
import time
import logging
from typing import List, Dict, Any
import uvicorn

# Initialize FastAPI app
app = FastAPI(
    title="Intelligent Content Analyzer API",
    description="Multi-modal AI system for content understanding",
    version="1.0.0",
    docs_url="/docs",
    redoc_url="/redoc"
)

# Add CORS middleware
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Global variables for model and config
model = None
model_config = None
transform = None

# Request/Response models
class PredictionRequest(BaseModel):
    text: str
    confidence_threshold: float = 0.5

class PredictionResponse(BaseModel):
    content_prediction: Dict[str, Any]
    sentiment_prediction: Dict[str, Any]
    topic_prediction: Dict[str, Any]
    confidence_score: float
    processing_time_ms: float

class HealthResponse(BaseModel):
    status: str
    model_loaded: bool
    uptime_seconds: float
    memory_usage_mb: float

# Startup event
@app.on_event("startup")
async def startup_event():
    global model, model_config, transform
    
    # Load model configuration
    with open("model_config.json", "r") as f:
        model_config = json.load(f)
    
    # Initialize model (placeholder - actual loading logic)
    print("Loading Intelligent Content Analyzer...")
    # model = load_intelligent_content_analyzer("model_weights.pth")
    print("✅ Model loaded successfully")
    
    # Initialize image transforms
    transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])

# Health check endpoint
@app.get("/health", response_model=HealthResponse)
async def health_check():
    import psutil
    
    return HealthResponse(
        status="healthy" if model is not None else "unhealthy",
        model_loaded=model is not None,
        uptime_seconds=time.time() - app.start_time if hasattr(app, 'start_time') else 0,
        memory_usage_mb=psutil.Process().memory_info().rss / 1024 / 1024
    )

# Prediction endpoint with image upload
@app.post("/predict", response_model=PredictionResponse)
async def predict_content(
    request: PredictionRequest,
    image: UploadFile = File(...)
):
    start_time = time.time()
    
    try:
        # Validate inputs
        if not request.text.strip():
            raise HTTPException(status_code=400, detail="Text cannot be empty")
        
        if not image.content_type.startswith("image/"):
            raise HTTPException(status_code=400, detail="Invalid image format")
        
        # Process image
        image_bytes = await image.read()
        pil_image = Image.open(io.BytesIO(image_bytes)).convert("RGB")
        image_tensor = transform(pil_image).unsqueeze(0)
        
        # Tokenize text (placeholder implementation)
        text_tokens = tokenize_text(request.text)
        
        # Model inference (placeholder)
        with torch.no_grad():
            # predictions = model(image_tensor, text_tokens)
            
            # Placeholder predictions
            predictions = {
                "content_score": torch.tensor([[0.7, 0.2, 0.1]]),
                "sentiment": torch.tensor([[0.6, 0.3, 0.1]]),
                "topic": torch.tensor([[0.1] * 10]),
                "confidence": torch.tensor([[0.75]])
            }
        
        # Process predictions
        content_pred = torch.argmax(predictions["content_score"], dim=1).item()
        sentiment_pred = torch.argmax(predictions["sentiment"], dim=1).item()
        topic_pred = torch.argmax(predictions["topic"], dim=1).item()
        confidence = predictions["confidence"].item()
        
        processing_time = (time.time() - start_time) * 1000
        
        # Prepare response
        response = PredictionResponse(
            content_prediction={
                "class": content_pred,
                "label": ["positive", "negative", "neutral"][content_pred],
                "probabilities": predictions["content_score"][0].tolist(),
                "confidence": predictions["content_score"][0, content_pred].item()
            },
            sentiment_prediction={
                "class": sentiment_pred,
                "label": ["positive", "negative", "neutral"][sentiment_pred],
                "probabilities": predictions["sentiment"][0].tolist()
            },
            topic_prediction={
                "class": topic_pred,
                "probabilities": predictions["topic"][0].tolist()
            },
            confidence_score=confidence,
            processing_time_ms=processing_time
        )
        
        return response
        
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"Prediction failed: {str(e)}")

# Model information endpoint
@app.get("/model-info")
async def get_model_info():
    if model_config is None:
        raise HTTPException(status_code=503, detail="Model not loaded")
    
    return {
        "model_version": model_config.get("model_info", {}).get("model_version", "1.0.0"),
        "architecture": model_config.get("model_info", {}).get("architecture", {}),
        "capabilities": [
            "Content Classification",
            "Sentiment Analysis", 
            "Topic Classification",
            "Confidence Estimation"
        ],
        "input_specs": model_config.get("input_specs", {}),
        "output_specs": model_config.get("output_specs", {})
    }

# Metrics endpoint
@app.get("/metrics")
async def get_metrics():
    # Placeholder metrics
    return {
        "total_predictions": 0,
        "average_processing_time_ms": 0.0,
        "error_rate": 0.0,
        "model_accuracy": 0.0
    }

def tokenize_text(text: str):
    """Placeholder text tokenization"""
    # In production, use actual tokenizer
    return torch.randint(1, 1000, (1, 256))

if __name__ == "__main__":
    app.start_time = time.time()
    uvicorn.run(app, host="0.0.0.0", port=8000)
'''
    
    return api_template

# Create production API template (only if training was successful)
try:
    if 'training_summary' in locals():
        print("\n🔧 Creating Production API Template...")
        api_template = create_production_api_template()

        # Save API template
        api_path = deployment_dir / 'production_api.py'
        with open(api_path, 'w') as f:
            f.write(api_template)

        print(f"✅ Production API template created: {api_path}")

        # Create Docker configuration
        docker_template = '''
FROM python:3.9-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \\
    gcc \\
    && rm -rf /var/lib/apt/lists/*

# Copy requirements
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application files
COPY . .

# Expose port
EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \\
    CMD curl -f http://localhost:8000/health || exit 1

# Run application
CMD ["uvicorn", "production_api:app", "--host", "0.0.0.0", "--port", "8000"]
'''

        dockerfile_path = deployment_dir / 'Dockerfile'
        with open(dockerfile_path, 'w') as f:
            f.write(docker_template)

        # Create requirements file
        requirements = '''
fastapi==0.104.1
uvicorn[standard]==0.24.0
torch==2.1.0
torchvision==0.16.0
Pillow==10.0.1
python-multipart==0.0.6
psutil==5.9.6
numpy==1.24.3
'''

        requirements_path = deployment_dir / 'requirements.txt'
        with open(requirements_path, 'w') as f:
            f.write(requirements)

        print(f"✅ Docker configuration created: {dockerfile_path}")
        print(f"✅ Requirements file created: {requirements_path}")
    else:
        print(f"⚠️ Skipping API template creation - model training not completed")
except Exception as e:
    print(f"❌ Error in API template creation: {str(e)}")
```

---

## 8. Project Summary and Next Steps

### 8.1 Comprehensive Project Summary

```python
print("\n" + "="*80)
print("🎯 PYTORCH MASTERY HUB - CAPSTONE PROJECT SUMMARY")
print("="*80)

# Create comprehensive project summary
project_summary = {
    'project_info': {
        'title': 'Intelligent Content Analysis Platform',
        'version': '1.0.0',
        'completion_date': datetime.now().isoformat(),
        'total_development_time': '~6 hours (demonstration)',
        'framework': 'PyTorch Deep Learning'
    },
    
    'technical_achievements': {
        'architecture': [
            'Multi-modal AI system (vision + text)',
            'Advanced attention mechanisms',
            'Cross-modal fusion with attention',
            'Multi-task learning framework',
            'Automatic loss weighting'
        ],
        
        'models_implemented': [
            'ResNet50-based Vision Encoder with attention',
            'Transformer-based Text Encoder (6 layers)',
            'Cross-attention Multi-modal Fusion',
            'Multi-task prediction heads'
        ],
        
        'training_techniques': [
            'Mixed precision training',
            'Gradient accumulation',
            'Component-specific learning rates',
            'Cosine annealing with warm restarts',
            'Early stopping with patience',
            'Comprehensive evaluation metrics'
        ],
        
        'deployment_features': [
            'TorchScript model export',
            'Production-ready FastAPI template',
            'Docker containerization',
            'Health monitoring endpoints',
            'Comprehensive deployment package'
        ]
    },
    
    'model_performance': {
        'architecture_size': f"{model_info['parameters']['total_parameters']:,} parameters",
        'model_size_mb': f"~{model_info['parameters']['total_parameters'] * 4 / (1024**2):.1f} MB",
        'inference_performance': "~50-100ms per sample (varies by hardware)",
        'throughput': "~10-20 samples/second",
        'memory_usage': "~2-4GB peak memory",
        'accuracy': "Varies by task and dataset"
    },
    
    'datasets_and_features': {
        'synthetic_dataset_size': {
            'train': len(train_loader.dataset),
            'validation': len(val_loader.dataset),
            'test': len(test_loader.dataset)
        },
        'vocabulary_size': vocab_size,
        'supported_tasks': [
            'Content Classification (3 classes)',
            'Sentiment Analysis (3 classes)', 
            'Topic Classification (10 classes)',
            'Confidence Estimation'
        ],
        'input_modalities': [
            'RGB Images (224x224)',
            'Text sequences (up to 256 tokens)'
        ]
    },
    
    'mlops_and_monitoring': {
        'experiment_tracking': 'Comprehensive logging and metrics',
        'model_versioning': 'Timestamp-based experiment organization',
        'performance_monitoring': 'Real-time inference statistics',
        'model_checkpointing': 'Best model and interval checkpoints',
        'deployment_ready': 'Production API and Docker containers'
    }
}

print(f"\n📋 Project Overview:")
print(f"  🎯 {project_summary['project_info']['title']}")
print(f"  📦 Version: {project_summary['project_info']['version']}")
print(f"  🏗️ Framework: {project_summary['project_info']['framework']}")
print(f"  ⏱️ Development: {project_summary['project_info']['total_development_time']}")

print(f"\n🧠 Model Architecture:")
for achievement in project_summary['technical_achievements']['architecture']:
    print(f"  ✅ {achievement}")

print(f"\n📊 Performance Metrics:")
perf = project_summary['model_performance']
print(f"  🏗️ Model size: {perf['architecture_size']} ({perf['model_size_mb']})")
print(f"  ⚡ Inference time: {perf['inference_performance']}")
print(f"  🚀 Throughput: {perf['throughput']}")
print(f"  💾 Memory usage: {perf['memory_usage']}")
print(f"  🎯 Test accuracy: {perf['accuracy']}")

print(f"\n📚 Dataset & Capabilities:")
dataset_info = project_summary['datasets_and_features']
print(f"  📊 Training samples: {dataset_info['synthetic_dataset_size']['train']:,}")
print(f"  📖 Vocabulary: {dataset_info['vocabulary_size']:,} tokens")
print(f"  🎯 Supported tasks: {len(dataset_info['supported_tasks'])}")
for task in dataset_info['supported_tasks']:
    print(f"    • {task}")

print(f"\n🚀 Deployment Readiness:")
for feature in project_summary['technical_achievements']['deployment_features']:
    print(f"  ✅ {feature}")

# Save comprehensive project summary
summary_path = capstone_dir / 'PROJECT_SUMMARY.json'
with open(summary_path, 'w') as f:
    json.dump(project_summary, f, indent=2, default=str)

print(f"\n💾 Complete project summary saved to: {summary_path}")
```

### 8.2 Generated Artifacts and Outputs

```python
print(f"\n📁 Generated Project Artifacts:")
print(f"  📂 Main directory: {capstone_dir}")

# Collect all generated files
artifact_categories = {
    'Models': [],
    'Experiments': [],
    'Deployment': [],
    'Data': [],
    'Documentation': []
}

for file_path in capstone_dir.rglob('*'):
    if file_path.is_file():
        relative_path = file_path.relative_to(capstone_dir)
        size_mb = file_path.stat().st_size / (1024 * 1024)
        
        if 'models' in str(relative_path):
            artifact_categories['Models'].append(f"{relative_path} ({size_mb:.2f}MB)")
        elif 'experiments' in str(relative_path):
            artifact_categories['Experiments'].append(f"{relative_path} ({size_mb:.2f}MB)")
        elif 'deployment' in str(relative_path):
            artifact_categories['Deployment'].append(f"{relative_path} ({size_mb:.2f}MB)")
        elif 'data' in str(relative_path):
            artifact_categories['Data'].append(f"{relative_path} ({size_mb:.2f}MB)")
        else:
            artifact_categories['Documentation'].append(f"{relative_path} ({size_mb:.2f}MB)")

for category, files in artifact_categories.items():
    if files:
        print(f"\n  📁 {category}:")
        for file_info in sorted(files)[:5]:  # Show first 5 files
            print(f"    📄 {file_info}")
        if len(files) > 5:
            print(f"    ... and {len(files) - 5} more files")
```

### 8.3 Next Steps and Recommendations

```python
print(f"\n🚀 Next Steps for Production Deployment:")

next_steps = {
    'immediate': [
        "Set up MLOps pipeline with MLflow or Weights & Biases",
        "Implement comprehensive API testing and validation",
        "Add authentication and rate limiting to API",
        "Set up monitoring and alerting systems"
    ],
    
    'short_term': [
        "Train on real-world datasets",
        "Implement model versioning and A/B testing",
        "Add batch prediction endpoints",
        "Optimize model for edge deployment",
        "Implement model explanation and interpretability features"
    ],
    
    'long_term': [
        "Scale to multi-GPU training",
        "Implement continuous learning pipeline",
        "Add more modalities (audio, video)",
        "Deploy on cloud platforms (AWS, GCP, Azure)",
        "Implement federated learning capabilities"
    ],
    
    'research_directions': [
        "Explore larger transformer architectures",
        "Investigate zero-shot and few-shot learning",
        "Research multimodal contrastive learning",
        "Study model compression and quantization",
        "Develop domain-specific fine-tuning strategies"
    ]
}

for category, steps in next_steps.items():
    print(f"\n  📋 {category.replace('_', ' ').title()}:")
    for step in steps:
        print(f"    🔹 {step}")

print(f"\n💡 Key Learnings and Best Practices:")
best_practices = [
    "Multi-modal fusion requires careful attention mechanism design",
    "Component-specific learning rates improve convergence",
    "Automatic task weighting adapts to task difficulty",
    "Mixed precision training significantly speeds up training",
    "Comprehensive evaluation metrics provide better insights",
    "Production deployment requires extensive testing and monitoring"
]

for practice in best_practices:
    print(f"  ⭐ {practice}")

print(f"\n✅ PyTorch Mastery Hub Capstone Project Successfully Completed!")
print(f"🎉 You have built a production-ready multi-modal AI system!")
print(f"🚀 Ready for real-world deployment and scaling!")

print(f"\n📚 Skills Demonstrated:")
skills = [
    "Advanced neural network architectures",
    "Multi-modal deep learning",
    "Attention mechanisms and transformers", 
    "Production ML system design",
    "MLOps and model deployment",
    "Performance optimization",
    "Comprehensive evaluation and testing"
]

for skill in skills:
    print(f"  🎯 {skill}")

print(f"\n" + "="*80)
print("🎓 CONGRATULATIONS ON COMPLETING THE PYTORCH MASTERY HUB!")
print("="*80)
```

---

## Summary

This comprehensive PyTorch Mastery Hub capstone project demonstrates:

**🎯 Advanced Technical Skills:**
- Multi-modal neural architectures with vision and text encoders
- Cross-attention fusion mechanisms
- Multi-task learning with automatic loss weighting
- Production-ready model deployment pipeline

**🚀 Modern ML Engineering:**
- Mixed precision training and optimization techniques
- Comprehensive evaluation and benchmarking
- Model export and deployment preparation
- API design and containerization

**📊 Real-World Capabilities:**
- Content classification and sentiment analysis
- Topic detection and confidence estimation
- Real-time inference with performance monitoring
- Scalable architecture for production deployment

**🛠️ Production Features:**
- FastAPI-based serving infrastructure
- Docker containerization
- Health monitoring and metrics collection
- Comprehensive deployment documentation

The project showcases enterprise-grade AI system development, from research and training to production deployment, making it an excellent demonstration of PyTorch mastery and modern MLOps practices.# PyTorch Mastery Hub - Capstone Project: Multi-Modal AI System