# AlgoSpace-8 Integration Tests - Google Colab

This notebook provides comprehensive integration testing for the AlgoSpace-8 MARL trading system. It validates:

1. **Component Integration**: Data flow between all agents and models
2. **Model Compatibility**: Tensor shapes and data types
3. **Memory Usage**: Resource consumption patterns
4. **Checkpoint/Resume**: Save/load functionality
5. **Performance**: Latency and throughput benchmarks
6. **End-to-End**: Complete trading pipeline validation

Designed for Google Colab Pro with comprehensive error detection.

## 1. Environment Setup & Component Loading

In [None]:
# Core imports and setup
import sys
import os
from pathlib import Path
import time
from datetime import datetime
import json
import yaml
import traceback
from typing import Dict, List, Optional, Any, Tuple, Union
import logging
import warnings
warnings.filterwarnings('ignore')

# Check if running in Colab
try:
    import google.colab
    IN_COLAB = True
    print("🧪 Running Integration Tests in Google Colab")
except ImportError:
    IN_COLAB = False
    print("💻 Running Integration Tests locally")

# Mount Drive and setup paths
if IN_COLAB:
    from google.colab import drive
    drive.mount('/content/drive')
    
    PROJECT_PATH = Path('/content/drive/MyDrive/AlgoSpace-8')
    sys.path.insert(0, str(PROJECT_PATH))
else:
    PROJECT_PATH = Path.cwd().parent.parent
    sys.path.insert(0, str(PROJECT_PATH))

In [None]:
# Install test dependencies
if IN_COLAB:
    !pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
    !pip install -q numpy pandas h5py pyyaml tensorboard
    !pip install -q tqdm matplotlib seaborn psutil gputil memory_profiler
    !pip install -q pytest pytest-benchmark

In [None]:
# Import testing and core libraries
import torch
import torch.nn as nn
import numpy as np
import pandas as pd
from tqdm.notebook import tqdm
import matplotlib.pyplot as plt
import seaborn as sns
import psutil
import gc
from memory_profiler import profile

# Test utilities
import unittest
from dataclasses import dataclass
from collections import defaultdict

# AlgoSpace utilities
from notebooks.utils.colab_setup import ColabSetup, SessionMonitor
from notebooks.utils.drive_manager import DriveManager
from notebooks.utils.checkpoint_manager import CheckpointManager

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger('IntegrationTest')

In [None]:
# Initialize environment
colab_setup = ColabSetup(project_name="AlgoSpace-8")
if IN_COLAB:
    drive_manager = DriveManager(str(PROJECT_PATH))
    device = colab_setup.device
else:
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

print(f"🎮 Using device: {device}")
print(f"📁 Project path: {PROJECT_PATH}")
print(f"💾 Available models: {list(drive_manager.list_available('models').get('models', [])) if IN_COLAB else 'N/A'}")

## 2. Test Framework Setup

In [None]:
@dataclass
class TestResult:
    """Test result container."""
    name: str
    status: str  # 'passed', 'failed', 'skipped'
    duration: float
    message: str = ""
    details: Dict[str, Any] = None
    error: Optional[str] = None
    
    def __post_init__(self):
        if self.details is None:
            self.details = {}


class TestSuite:
    """Comprehensive test suite for AlgoSpace-8 components."""
    
    def __init__(self, device: torch.device):
        self.device = device
        self.results: List[TestResult] = []
        self.test_data = {}
        self.models = {}
        
    def run_test(self, test_func, test_name: str, *args, **kwargs) -> TestResult:
        """Run a single test with error handling."""
        print(f"\n🧪 Running: {test_name}")
        start_time = time.time()
        
        try:
            result = test_func(*args, **kwargs)
            duration = time.time() - start_time
            
            if isinstance(result, dict):
                test_result = TestResult(
                    name=test_name,
                    status='passed',
                    duration=duration,
                    message=result.get('message', ''),
                    details=result.get('details', {})
                )
            else:
                test_result = TestResult(
                    name=test_name,
                    status='passed',
                    duration=duration,
                    message=str(result) if result else 'Test passed'
                )
            
            print(f"   ✅ PASSED ({duration:.2f}s)")
            
        except Exception as e:
            duration = time.time() - start_time
            error_msg = str(e)
            
            test_result = TestResult(
                name=test_name,
                status='failed',
                duration=duration,
                message=f"Test failed: {error_msg}",
                error=traceback.format_exc()
            )
            
            print(f"   ❌ FAILED ({duration:.2f}s): {error_msg}")
        
        self.results.append(test_result)
        return test_result
    
    def get_summary(self) -> Dict[str, Any]:
        """Get test summary statistics."""
        total = len(self.results)
        passed = sum(1 for r in self.results if r.status == 'passed')
        failed = sum(1 for r in self.results if r.status == 'failed')
        skipped = sum(1 for r in self.results if r.status == 'skipped')
        total_time = sum(r.duration for r in self.results)
        
        return {
            'total': total,
            'passed': passed,
            'failed': failed,
            'skipped': skipped,
            'success_rate': passed / total if total > 0 else 0,
            'total_duration': total_time
        }

# Initialize test suite
test_suite = TestSuite(device)
print("✅ Test framework initialized")

## 3. Component Availability Tests

In [None]:
def test_component_availability():
    """Test if all required components are available."""
    
    components = {
        'src/agents/main_core/models.py': 'Main MARL Core models',
        'src/agents/main_core/engine.py': 'Training engine',
        'src/agents/main_core/tactical_embedder.py': 'Tactical embedder',
        'notebooks/utils/colab_setup.py': 'Colab utilities',
        'notebooks/utils/drive_manager.py': 'Drive manager',
        'notebooks/utils/checkpoint_manager.py': 'Checkpoint manager'
    }
    
    results = {}
    missing_components = []
    
    for component_path, description in components.items():
        full_path = PROJECT_PATH / component_path
        exists = full_path.exists()
        results[description] = exists
        
        if not exists:
            missing_components.append(f"{description} ({component_path})")
    
    if missing_components:
        raise AssertionError(f"Missing components: {', '.join(missing_components)}")
    
    return {
        'message': 'All components available',
        'details': {'components': results}
    }

# Run test
test_suite.run_test(test_component_availability, "Component Availability")

In [None]:
def test_model_imports():
    """Test if all model classes can be imported."""
    
    import_results = {}
    
    try:
        # Test core model imports
        sys.path.insert(0, str(PROJECT_PATH / 'src'))
        
        from agents.main_core.models import (
            MarketRegimeDetector, TacticalEmbedder, StructureAgent,
            MainMARLCore, SharedPolicyNetwork
        )
        import_results['Core Models'] = True
        
        # Store model classes for later tests
        test_suite.models.update({
            'MarketRegimeDetector': MarketRegimeDetector,
            'TacticalEmbedder': TacticalEmbedder,
            'StructureAgent': StructureAgent,
            'MainMARLCore': MainMARLCore,
            'SharedPolicyNetwork': SharedPolicyNetwork
        })
        
    except ImportError as e:
        import_results['Core Models'] = False
        raise AssertionError(f"Failed to import core models: {e}")
    
    try:
        # Test tactical embedder
        from agents.main_core.tactical_embedder import TacticalEmbedder as TacticalEmbedderNew
        import_results['Tactical Embedder'] = True
        test_suite.models['TacticalEmbedderNew'] = TacticalEmbedderNew
        
    except ImportError as e:
        import_results['Tactical Embedder'] = False
        # Non-critical, continue
    
    return {
        'message': 'Model imports successful',
        'details': {'imports': import_results}
    }

# Run test
test_suite.run_test(test_model_imports, "Model Imports")

## 4. Model Instantiation Tests

In [None]:
def test_model_instantiation():
    """Test instantiation of all model components."""
    
    # Standard dimensions for testing
    test_config = {
        'market_dim': 128,
        'risk_dim': 64,
        'tactical_dim': 96,
        'hidden_dim': 256,
        'action_dim': 32,
        'num_agents': 3,
        'embedding_dim': 384  # market + risk + tactical
    }
    
    instantiated_models = {}
    
    # Test MarketRegimeDetector
    if 'MarketRegimeDetector' in test_suite.models:
        detector = test_suite.models['MarketRegimeDetector'](
            input_dim=100,  # Raw market features
            embedding_dim=test_config['market_dim'],
            hidden_dim=128,
            num_regimes=4
        ).to(device)
        instantiated_models['MarketRegimeDetector'] = detector
    
    # Test TacticalEmbedder
    if 'TacticalEmbedder' in test_suite.models:
        tactical = test_suite.models['TacticalEmbedder'](
            input_dim=test_config['market_dim'],
            hidden_dim=128,
            embedding_dim=test_config['tactical_dim']
        ).to(device)
        instantiated_models['TacticalEmbedder'] = tactical
    
    # Test StructureAgent
    if 'StructureAgent' in test_suite.models:
        structure = test_suite.models['StructureAgent'](
            input_dim=test_config['embedding_dim'],
            hidden_dim=test_config['hidden_dim'],
            risk_dim=test_config['risk_dim']
        ).to(device)
        instantiated_models['StructureAgent'] = structure
    
    # Test SharedPolicyNetwork
    if 'SharedPolicyNetwork' in test_suite.models:
        shared_policy = test_suite.models['SharedPolicyNetwork'](
            state_dim=test_config['embedding_dim'],
            action_dim=test_config['action_dim'],
            hidden_dim=test_config['hidden_dim']
        ).to(device)
        instantiated_models['SharedPolicyNetwork'] = shared_policy
    
    # Test MainMARLCore
    if 'MainMARLCore' in test_suite.models:
        main_core = test_suite.models['MainMARLCore'](
            embedding_dim=test_config['embedding_dim'],
            hidden_dim=test_config['hidden_dim'],
            action_dim=test_config['action_dim'],
            num_agents=test_config['num_agents']
        ).to(device)
        instantiated_models['MainMARLCore'] = main_core
    
    # Store for later tests
    test_suite.test_data['models'] = instantiated_models
    test_suite.test_data['config'] = test_config
    
    return {
        'message': f'Successfully instantiated {len(instantiated_models)} models',
        'details': {
            'models': list(instantiated_models.keys()),
            'device': str(device),
            'config': test_config
        }
    }

# Run test
test_suite.run_test(test_model_instantiation, "Model Instantiation")

## 5. Data Flow & Compatibility Tests

In [None]:
def test_data_flow_compatibility():
    """Test data flow between components and tensor compatibility."""
    
    if 'models' not in test_suite.test_data:
        raise RuntimeError("Models not instantiated. Run model instantiation test first.")
    
    models = test_suite.test_data['models']
    config = test_suite.test_data['config']
    batch_size = 32
    
    flow_results = {}
    
    # Test 1: Market Regime Detection
    if 'MarketRegimeDetector' in models:
        market_input = torch.randn(batch_size, 100).to(device)
        regime_embedding, regime_probs = models['MarketRegimeDetector'](market_input)
        
        assert regime_embedding.shape == (batch_size, config['market_dim']), \
            f"Expected {(batch_size, config['market_dim'])}, got {regime_embedding.shape}"
        assert regime_probs.shape == (batch_size, 4), \
            f"Expected {(batch_size, 4)}, got {regime_probs.shape}"
        
        flow_results['market_regime'] = {
            'input_shape': market_input.shape,
            'embedding_shape': regime_embedding.shape,
            'probs_shape': regime_probs.shape
        }
    
    # Test 2: Tactical Embedding
    if 'TacticalEmbedder' in models and 'MarketRegimeDetector' in models:
        tactical_output = models['TacticalEmbedder'](regime_embedding)
        
        if isinstance(tactical_output, tuple):
            tactical_embedding = tactical_output[0]
        else:
            tactical_embedding = tactical_output
        
        assert tactical_embedding.shape == (batch_size, config['tactical_dim']), \
            f"Expected {(batch_size, config['tactical_dim'])}, got {tactical_embedding.shape}"
        
        flow_results['tactical'] = {
            'input_shape': regime_embedding.shape,
            'output_shape': tactical_embedding.shape
        }
    
    # Test 3: Combined Embedding
    if 'MarketRegimeDetector' in models and 'TacticalEmbedder' in models:
        # Simulate risk embedding (from M-RMS)
        risk_embedding = torch.randn(batch_size, config['risk_dim']).to(device)
        
        # Combine embeddings
        combined_embedding = torch.cat([
            regime_embedding,
            risk_embedding,
            tactical_embedding
        ], dim=-1)
        
        expected_dim = config['market_dim'] + config['risk_dim'] + config['tactical_dim']
        assert combined_embedding.shape == (batch_size, expected_dim), \
            f"Expected {(batch_size, expected_dim)}, got {combined_embedding.shape}"
        
        flow_results['combined_embedding'] = {
            'shape': combined_embedding.shape,
            'components': {
                'market': regime_embedding.shape[-1],
                'risk': risk_embedding.shape[-1],
                'tactical': tactical_embedding.shape[-1]
            }
        }
        
        # Store for next tests
        test_suite.test_data['combined_embedding'] = combined_embedding
    
    # Test 4: Structure Agent
    if 'StructureAgent' in models and 'combined_embedding' in test_suite.test_data:
        structure_output = models['StructureAgent'](combined_embedding)
        
        if isinstance(structure_output, tuple):
            risk_assessment = structure_output[0]
        else:
            risk_assessment = structure_output
        
        assert risk_assessment.shape == (batch_size, config['risk_dim']), \
            f"Expected {(batch_size, config['risk_dim'])}, got {risk_assessment.shape}"
        
        flow_results['structure_agent'] = {
            'input_shape': combined_embedding.shape,
            'output_shape': risk_assessment.shape
        }
    
    # Test 5: Main MARL Core
    if 'MainMARLCore' in models and 'combined_embedding' in test_suite.test_data:
        core_output = models['MainMARLCore'](combined_embedding)
        
        # Check if output has expected structure
        if isinstance(core_output, dict):
            assert 'actions' in core_output, "Main core output missing 'actions'"
            actions = core_output['actions']
        else:
            actions = core_output
        
        # Actions should be (batch_size, num_agents, action_dim)
        expected_shape = (batch_size, config['num_agents'], config['action_dim'])
        assert actions.shape == expected_shape, \
            f"Expected {expected_shape}, got {actions.shape}"
        
        flow_results['main_core'] = {
            'input_shape': combined_embedding.shape,
            'actions_shape': actions.shape,
            'num_agents': config['num_agents']
        }
    
    return {
        'message': 'Data flow compatibility verified',
        'details': flow_results
    }

# Run test
test_suite.run_test(test_data_flow_compatibility, "Data Flow Compatibility")

## 6. Memory Usage Tests

In [None]:
def test_memory_usage():
    """Test memory consumption patterns."""
    
    if 'models' not in test_suite.test_data:
        raise RuntimeError("Models not instantiated")
    
    models = test_suite.test_data['models']
    config = test_suite.test_data['config']
    
    memory_results = {}
    
    # Get initial memory state
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        initial_gpu_memory = torch.cuda.memory_allocated(device) / 1024**2  # MB
    else:
        initial_gpu_memory = 0
    
    initial_cpu_memory = psutil.Process().memory_info().rss / 1024**2  # MB
    
    # Test model parameter counts
    model_params = {}
    total_params = 0
    
    for name, model in models.items():
        params = sum(p.numel() for p in model.parameters())
        trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
        
        model_params[name] = {
            'total': params,
            'trainable': trainable_params,
            'size_mb': params * 4 / 1024**2  # Assuming float32
        }
        total_params += params
    
    # Test memory with different batch sizes
    batch_memory = {}
    
    for batch_size in [1, 16, 32, 64, 128]:
        try:
            if torch.cuda.is_available():
                torch.cuda.empty_cache()
                start_memory = torch.cuda.memory_allocated(device)
            
            # Forward pass with batch
            test_input = torch.randn(batch_size, 100).to(device)
            
            with torch.no_grad():
                if 'MarketRegimeDetector' in models:
                    _ = models['MarketRegimeDetector'](test_input)
            
            if torch.cuda.is_available():
                peak_memory = torch.cuda.max_memory_allocated(device)
                memory_used = (peak_memory - start_memory) / 1024**2
                torch.cuda.reset_peak_memory_stats(device)
            else:
                memory_used = 0
            
            batch_memory[batch_size] = memory_used
            
        except RuntimeError as e:
            if "out of memory" in str(e):
                batch_memory[batch_size] = "OOM"
                break
            else:
                raise
    
    # Memory efficiency check
    max_batch_size = max([k for k, v in batch_memory.items() if v != "OOM"], default=1)
    
    memory_results = {
        'initial_gpu_mb': initial_gpu_memory,
        'initial_cpu_mb': initial_cpu_memory,
        'total_parameters': total_params,
        'model_parameters': model_params,
        'batch_memory_usage': batch_memory,
        'max_batch_size': max_batch_size,
        'memory_per_sample': batch_memory.get(32, 0) / 32 if batch_memory.get(32, 0) != "OOM" else "N/A"
    }
    
    # Check for memory efficiency
    warnings = []
    if total_params > 50_000_000:  # 50M parameters
        warnings.append("Model has >50M parameters - consider optimization")
    
    if max_batch_size < 32:
        warnings.append(f"Low max batch size ({max_batch_size}) - memory constraints")
    
    if warnings:
        memory_results['warnings'] = warnings
    
    return {
        'message': f'Memory usage analyzed - {total_params:,} total parameters',
        'details': memory_results
    }

# Run test
test_suite.run_test(test_memory_usage, "Memory Usage Analysis")

## 7. Checkpoint & Resume Tests

In [None]:
def test_checkpoint_functionality():
    """Test model saving, loading, and state consistency."""
    
    if 'models' not in test_suite.test_data:
        raise RuntimeError("Models not instantiated")
    
    models = test_suite.test_data['models']
    checkpoint_results = {}
    
    # Test each model individually
    for model_name, model in models.items():
        # Get initial state
        initial_state = model.state_dict()
        
        # Generate test input
        if model_name == 'MarketRegimeDetector':
            test_input = torch.randn(4, 100).to(device)
        elif model_name == 'TacticalEmbedder':
            test_input = torch.randn(4, 128).to(device)
        else:
            test_input = torch.randn(4, test_suite.test_data['config']['embedding_dim']).to(device)
        
        # Get initial output
        model.eval()
        with torch.no_grad():
            initial_output = model(test_input)
            if isinstance(initial_output, tuple):
                initial_output = initial_output[0]
        
        # Save model
        temp_path = f"/tmp/test_{model_name}.pt"
        torch.save(model.state_dict(), temp_path)
        
        # Modify model (to verify loading works)
        with torch.no_grad():
            for param in model.parameters():
                param.add_(torch.randn_like(param) * 0.1)
        
        # Verify model changed
        with torch.no_grad():
            modified_output = model(test_input)
            if isinstance(modified_output, tuple):
                modified_output = modified_output[0]
        
        output_changed = not torch.allclose(initial_output, modified_output, atol=1e-6)
        
        # Load model back
        model.load_state_dict(torch.load(temp_path, map_location=device))
        
        # Verify restoration
        with torch.no_grad():
            restored_output = model(test_input)
            if isinstance(restored_output, tuple):
                restored_output = restored_output[0]
        
        output_restored = torch.allclose(initial_output, restored_output, atol=1e-6)
        
        # State dict comparison
        restored_state = model.state_dict()
        state_consistent = all(
            torch.allclose(initial_state[key], restored_state[key], atol=1e-6)
            for key in initial_state.keys()
        )
        
        checkpoint_results[model_name] = {
            'output_changed_after_modification': output_changed,
            'output_restored_after_loading': output_restored,
            'state_dict_consistent': state_consistent,
            'checkpoint_size_bytes': os.path.getsize(temp_path)
        }
        
        # Cleanup
        os.remove(temp_path)
        
        # Assert all tests passed
        assert output_changed, f"{model_name}: Model didn't change after modification"
        assert output_restored, f"{model_name}: Output not restored after loading"
        assert state_consistent, f"{model_name}: State dict not consistent after loading"
    
    return {
        'message': f'Checkpoint functionality verified for {len(models)} models',
        'details': checkpoint_results
    }

# Run test
test_suite.run_test(test_checkpoint_functionality, "Checkpoint & Resume")

## 8. Performance Benchmarks

In [None]:
def test_inference_performance():
    """Benchmark inference performance."""
    
    if 'models' not in test_suite.test_data:
        raise RuntimeError("Models not instantiated")
    
    models = test_suite.test_data['models']
    config = test_suite.test_data['config']
    
    performance_results = {}
    
    # Warm up GPU
    if torch.cuda.is_available():
        warmup_input = torch.randn(1, 100).to(device)
        for _ in range(10):
            _ = warmup_input @ warmup_input.T
        torch.cuda.synchronize()
    
    # Test each model
    for model_name, model in models.items():
        model.eval()
        
        # Prepare input
        if model_name == 'MarketRegimeDetector':
            batch_input = torch.randn(32, 100).to(device)
        elif model_name == 'TacticalEmbedder':
            batch_input = torch.randn(32, 128).to(device)
        else:
            batch_input = torch.randn(32, config['embedding_dim']).to(device)
        
        # Benchmark inference
        times = []
        
        for _ in range(100):  # 100 runs
            if torch.cuda.is_available():
                torch.cuda.synchronize()
            
            start_time = time.time()
            
            with torch.no_grad():
                _ = model(batch_input)
            
            if torch.cuda.is_available():
                torch.cuda.synchronize()
            
            end_time = time.time()
            times.append((end_time - start_time) * 1000)  # Convert to ms
        
        # Calculate statistics
        times = np.array(times)
        
        performance_results[model_name] = {
            'mean_latency_ms': float(np.mean(times)),
            'std_latency_ms': float(np.std(times)),
            'min_latency_ms': float(np.min(times)),
            'max_latency_ms': float(np.max(times)),
            'p95_latency_ms': float(np.percentile(times, 95)),
            'throughput_samples_per_sec': 32 / (np.mean(times) / 1000),
            'batch_size': 32
        }
    
    # End-to-end pipeline benchmark
    if all(key in models for key in ['MarketRegimeDetector', 'TacticalEmbedder', 'MainMARLCore']):
        pipeline_times = []
        batch_input = torch.randn(32, 100).to(device)
        
        for _ in range(50):  # 50 runs
            if torch.cuda.is_available():
                torch.cuda.synchronize()
            
            start_time = time.time()
            
            with torch.no_grad():
                # Full pipeline
                regime_emb, _ = models['MarketRegimeDetector'](batch_input)
                tactical_emb = models['TacticalEmbedder'](regime_emb)
                if isinstance(tactical_emb, tuple):
                    tactical_emb = tactical_emb[0]
                
                # Simulate risk embedding
                risk_emb = torch.randn(32, 64).to(device)
                combined = torch.cat([regime_emb, risk_emb, tactical_emb], dim=-1)
                
                actions = models['MainMARLCore'](combined)
                if isinstance(actions, dict):
                    actions = actions['actions']
            
            if torch.cuda.is_available():
                torch.cuda.synchronize()
            
            end_time = time.time()
            pipeline_times.append((end_time - start_time) * 1000)
        
        pipeline_times = np.array(pipeline_times)
        
        performance_results['end_to_end_pipeline'] = {
            'mean_latency_ms': float(np.mean(pipeline_times)),
            'std_latency_ms': float(np.std(pipeline_times)),
            'p95_latency_ms': float(np.percentile(pipeline_times, 95)),
            'throughput_samples_per_sec': 32 / (np.mean(pipeline_times) / 1000)
        }
    
    # Performance assessment
    warnings = []
    
    for model_name, perf in performance_results.items():
        if perf['mean_latency_ms'] > 100:  # >100ms for batch of 32
            warnings.append(f"{model_name}: High latency ({perf['mean_latency_ms']:.1f}ms)")
        
        if perf.get('throughput_samples_per_sec', 0) < 100:  # <100 samples/sec
            warnings.append(f"{model_name}: Low throughput ({perf.get('throughput_samples_per_sec', 0):.1f} samples/sec)")
    
    if warnings:
        performance_results['warnings'] = warnings
    
    return {
        'message': f'Performance benchmarked for {len(models)} models',
        'details': performance_results
    }

# Run test
test_suite.run_test(test_inference_performance, "Inference Performance")

## 9. End-to-End Trading Pipeline Test

In [None]:
def test_end_to_end_trading_pipeline():
    """Test complete trading pipeline simulation."""
    
    if 'models' not in test_suite.test_data:
        raise RuntimeError("Models not instantiated")
    
    models = test_suite.test_data['models']
    config = test_suite.test_data['config']
    
    # Simulate trading session
    n_timesteps = 100
    batch_size = 16
    
    pipeline_results = {
        'timesteps': n_timesteps,
        'batch_size': batch_size,
        'outputs': {},
        'consistency_checks': {}
    }
    
    # Storage for outputs
    all_actions = []
    all_regimes = []
    all_embeddings = []
    
    for t in tqdm(range(n_timesteps), desc="Trading simulation"):
        # Generate market data (simulate real market feed)
        market_data = torch.randn(batch_size, 100).to(device)
        
        with torch.no_grad():
            # Step 1: Market regime detection
            if 'MarketRegimeDetector' in models:
                regime_embedding, regime_probs = models['MarketRegimeDetector'](market_data)
                predicted_regime = regime_probs.argmax(dim=-1)
                all_regimes.append(predicted_regime.cpu())
            else:
                regime_embedding = torch.randn(batch_size, config['market_dim']).to(device)
                predicted_regime = torch.randint(0, 4, (batch_size,))
            
            # Step 2: Tactical analysis
            if 'TacticalEmbedder' in models:
                tactical_output = models['TacticalEmbedder'](regime_embedding)
                if isinstance(tactical_output, tuple):
                    tactical_embedding = tactical_output[0]
                else:
                    tactical_embedding = tactical_output
            else:
                tactical_embedding = torch.randn(batch_size, config['tactical_dim']).to(device)
            
            # Step 3: Risk assessment (simulate M-RMS output)
            risk_embedding = torch.randn(batch_size, config['risk_dim']).to(device)
            
            # Step 4: Combine embeddings
            combined_embedding = torch.cat([
                regime_embedding,
                risk_embedding,
                tactical_embedding
            ], dim=-1)
            all_embeddings.append(combined_embedding.cpu())
            
            # Step 5: Structure analysis
            if 'StructureAgent' in models:
                structure_output = models['StructureAgent'](combined_embedding)
                if isinstance(structure_output, tuple):
                    structure_risk = structure_output[0]
                else:
                    structure_risk = structure_output
            
            # Step 6: Main MARL Core decision
            if 'MainMARLCore' in models:
                core_output = models['MainMARLCore'](combined_embedding)
                if isinstance(core_output, dict):
                    actions = core_output['actions']
                else:
                    actions = core_output
                
                all_actions.append(actions.cpu())
    
    # Analyze outputs
    if all_actions:
        actions_tensor = torch.stack(all_actions)  # (timesteps, batch, agents, action_dim)
        
        pipeline_results['outputs']['actions'] = {
            'shape': list(actions_tensor.shape),
            'mean': float(actions_tensor.mean()),
            'std': float(actions_tensor.std()),
            'min': float(actions_tensor.min()),
            'max': float(actions_tensor.max())
        }
        
        # Check action consistency
        action_std_over_time = actions_tensor.std(dim=0).mean()
        pipeline_results['consistency_checks']['action_stability'] = {
            'std_over_time': float(action_std_over_time),
            'is_stable': float(action_std_over_time) < 2.0  # Reasonable threshold
        }
    
    if all_regimes:
        regimes_tensor = torch.stack(all_regimes)  # (timesteps, batch)
        
        # Regime distribution
        regime_counts = torch.bincount(regimes_tensor.flatten(), minlength=4)
        regime_distribution = (regime_counts.float() / regime_counts.sum()).tolist()
        
        pipeline_results['outputs']['regimes'] = {
            'distribution': regime_distribution,
            'most_common': int(regime_counts.argmax()),
            'diversity': float(torch.std(regime_counts.float()))
        }
        
        # Check regime transitions
        regime_changes = (regimes_tensor[1:] != regimes_tensor[:-1]).float().mean()
        pipeline_results['consistency_checks']['regime_stability'] = {
            'change_rate': float(regime_changes),
            'is_reasonable': 0.1 <= float(regime_changes) <= 0.5  # 10-50% change rate
        }
    
    if all_embeddings:
        embeddings_tensor = torch.stack(all_embeddings)  # (timesteps, batch, embedding_dim)
        
        # Embedding consistency
        embedding_std = embeddings_tensor.std(dim=0).mean()
        pipeline_results['consistency_checks']['embedding_stability'] = {
            'std_over_time': float(embedding_std),
            'is_stable': float(embedding_std) < 1.0
        }
    
    # Overall pipeline health
    health_score = 0
    total_checks = 0
    
    for check_category, checks in pipeline_results['consistency_checks'].items():
        if isinstance(checks, dict) and 'is_stable' in checks:
            health_score += int(checks['is_stable'])
            total_checks += 1
        elif isinstance(checks, dict) and 'is_reasonable' in checks:
            health_score += int(checks['is_reasonable'])
            total_checks += 1
    
    pipeline_health = health_score / total_checks if total_checks > 0 else 0
    pipeline_results['pipeline_health_score'] = pipeline_health
    
    # Assertions for critical functionality
    assert len(all_actions) == n_timesteps, "Missing action outputs"
    assert pipeline_health > 0.5, f"Pipeline health too low: {pipeline_health}"
    
    return {
        'message': f'End-to-end pipeline completed {n_timesteps} timesteps (health: {pipeline_health:.2f})',
        'details': pipeline_results
    }

# Run test
test_suite.run_test(test_end_to_end_trading_pipeline, "End-to-End Trading Pipeline")

## 10. Test Results Summary & Visualization

In [None]:
# Generate test summary
summary = test_suite.get_summary()

print("\n" + "="*60)
print("🧪 INTEGRATION TEST RESULTS SUMMARY")
print("="*60)

print(f"\n📊 Overall Results:")
print(f"   Total Tests: {summary['total']}")
print(f"   Passed: {summary['passed']} ✅")
print(f"   Failed: {summary['failed']} ❌")
print(f"   Skipped: {summary['skipped']} ⏭️")
print(f"   Success Rate: {summary['success_rate']:.1%}")
print(f"   Total Duration: {summary['total_duration']:.2f}s")

# Status indicator
if summary['success_rate'] >= 0.9:
    status_emoji = "🎉"
    status_text = "EXCELLENT"
elif summary['success_rate'] >= 0.7:
    status_emoji = "✅"
    status_text = "GOOD"
elif summary['success_rate'] >= 0.5:
    status_emoji = "⚠️"
    status_text = "NEEDS ATTENTION"
else:
    status_emoji = "❌"
    status_text = "CRITICAL ISSUES"

print(f"\n{status_emoji} Integration Status: {status_text}")

# Detailed results
print(f"\n📋 Test Details:")
print("-" * 80)
print(f"{'Test Name':<35} {'Status':<10} {'Duration':<10} {'Message':<25}")
print("-" * 80)

for result in test_suite.results:
    status_emoji = {'passed': '✅', 'failed': '❌', 'skipped': '⏭️'}[result.status]
    message = result.message[:25] + "..." if len(result.message) > 25 else result.message
    print(f"{result.name:<35} {status_emoji + ' ' + result.status:<10} {result.duration:<9.2f}s {message:<25}")

print("-" * 80)

In [None]:
# Visualize test results
def visualize_test_results():
    """Create visualizations of test results."""
    
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))
    
    # 1. Test status pie chart
    statuses = [r.status for r in test_suite.results]
    status_counts = pd.Series(statuses).value_counts()
    
    colors = {'passed': 'green', 'failed': 'red', 'skipped': 'orange'}
    pie_colors = [colors.get(status, 'gray') for status in status_counts.index]
    
    ax1.pie(status_counts.values, labels=status_counts.index, autopct='%1.1f%%',
           colors=pie_colors, startangle=90)
    ax1.set_title('Test Status Distribution')
    
    # 2. Test duration bar chart
    test_names = [r.name for r in test_suite.results]
    durations = [r.duration for r in test_suite.results]
    
    y_pos = np.arange(len(test_names))
    bar_colors = [colors.get(r.status, 'gray') for r in test_suite.results]
    
    ax2.barh(y_pos, durations, color=bar_colors)
    ax2.set_yticks(y_pos)
    ax2.set_yticklabels([name[:20] + '...' if len(name) > 20 else name for name in test_names])
    ax2.set_xlabel('Duration (seconds)')
    ax2.set_title('Test Execution Times')
    
    # 3. Memory usage analysis (if available)
    memory_result = next((r for r in test_suite.results if 'Memory' in r.name), None)
    if memory_result and memory_result.details:
        model_params = memory_result.details.get('model_parameters', {})
        if model_params:
            models = list(model_params.keys())
            params = [model_params[m]['total'] for m in models]
            
            ax3.bar(range(len(models)), params)
            ax3.set_xticks(range(len(models)))
            ax3.set_xticklabels([m[:10] + '...' if len(m) > 10 else m for m in models], rotation=45)
            ax3.set_ylabel('Parameters')
            ax3.set_title('Model Parameter Counts')
            ax3.set_yscale('log')
    else:
        ax3.text(0.5, 0.5, 'Memory data not available', ha='center', va='center', transform=ax3.transAxes)
        ax3.set_title('Model Parameters')
    
    # 4. Performance metrics (if available)
    perf_result = next((r for r in test_suite.results if 'Performance' in r.name), None)
    if perf_result and perf_result.details:
        perf_data = {k: v for k, v in perf_result.details.items() if isinstance(v, dict) and 'mean_latency_ms' in v}
        if perf_data:
            models = list(perf_data.keys())
            latencies = [perf_data[m]['mean_latency_ms'] for m in models]
            
            ax4.bar(range(len(models)), latencies)
            ax4.set_xticks(range(len(models)))
            ax4.set_xticklabels([m[:10] + '...' if len(m) > 10 else m for m in models], rotation=45)
            ax4.set_ylabel('Latency (ms)')
            ax4.set_title('Inference Latency')
    else:
        ax4.text(0.5, 0.5, 'Performance data not available', ha='center', va='center', transform=ax4.transAxes)
        ax4.set_title('Performance Metrics')
    
    plt.tight_layout()
    plt.show()
    
    # Save figure
    if IN_COLAB:
        fig_path = drive_manager.results_path / "plots" / "integration_test_results.png"
        fig.savefig(fig_path, dpi=300, bbox_inches='tight')
        print(f"\n📊 Saved visualization to: {fig_path}")

visualize_test_results()

In [None]:
# Generate detailed test report
def generate_test_report():
    """Generate a comprehensive test report."""
    
    timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    
    report = f"""# AlgoSpace-8 Integration Test Report

**Generated:** {timestamp}  
**Device:** {device}  
**Environment:** {'Google Colab' if IN_COLAB else 'Local'}

## Executive Summary

- **Overall Status**: {status_emoji} {status_text}
- **Success Rate**: {summary['success_rate']:.1%}
- **Tests Passed**: {summary['passed']}/{summary['total']}
- **Total Duration**: {summary['total_duration']:.2f} seconds

## Test Results

| Test | Status | Duration | Details |
|------|--------|----------|----------|
"""
    
    for result in test_suite.results:
        status_emoji_map = {'passed': '✅', 'failed': '❌', 'skipped': '⏭️'}
        emoji = status_emoji_map[result.status]
        
        details = result.message.replace('|', '\\|')  # Escape pipes for markdown
        report += f"| {result.name} | {emoji} {result.status} | {result.duration:.2f}s | {details} |\n"
    
    # Add detailed sections for failed tests
    failed_tests = [r for r in test_suite.results if r.status == 'failed']
    if failed_tests:
        report += "\n## Failed Tests Details\n\n"
        
        for test in failed_tests:
            report += f"### {test.name}\n\n"
            report += f"**Error:** {test.message}\n\n"
            if test.error:
                report += f"**Stack Trace:**\n```\n{test.error}\n```\n\n"
    
    # Add performance summary
    perf_result = next((r for r in test_suite.results if 'Performance' in r.name), None)
    if perf_result and perf_result.details:
        report += "\n## Performance Summary\n\n"
        
        for model_name, perf_data in perf_result.details.items():
            if isinstance(perf_data, dict) and 'mean_latency_ms' in perf_data:
                report += f"**{model_name}:**\n"
                report += f"- Mean Latency: {perf_data['mean_latency_ms']:.2f}ms\n"
                report += f"- Throughput: {perf_data['throughput_samples_per_sec']:.1f} samples/sec\n\n"
    
    # Add memory summary
    memory_result = next((r for r in test_suite.results if 'Memory' in r.name), None)
    if memory_result and memory_result.details:
        report += "\n## Memory Analysis\n\n"
        
        total_params = memory_result.details.get('total_parameters', 0)
        max_batch = memory_result.details.get('max_batch_size', 0)
        
        report += f"- **Total Parameters**: {total_params:,}\n"
        report += f"- **Max Batch Size**: {max_batch}\n"
        
        if 'warnings' in memory_result.details:
            report += f"\n**Memory Warnings:**\n"
            for warning in memory_result.details['warnings']:
                report += f"- {warning}\n"
    
    # Recommendations
    report += "\n## Recommendations\n\n"
    
    if summary['success_rate'] == 1.0:
        report += "🎉 All tests passed! The system is ready for production deployment.\n\n"
    elif summary['success_rate'] >= 0.8:
        report += "✅ Most tests passed. Review failed tests and address issues before deployment.\n\n"
    else:
        report += "⚠️ Multiple test failures detected. Significant issues need to be resolved.\n\n"
    
    report += "### Next Steps:\n"
    report += "1. Address any failed tests\n"
    report += "2. Optimize performance bottlenecks\n"
    report += "3. Run production validation tests\n"
    report += "4. Deploy to staging environment\n"
    
    return report

# Generate and display report
test_report = generate_test_report()
print(test_report)

# Save report
if IN_COLAB:
    report_path = drive_manager.results_path / "integration_test_report.md"
    with open(report_path, 'w') as f:
        f.write(test_report)
    print(f"\n📄 Saved report to: {report_path}")

## 11. Cleanup & Final Status

In [None]:
# Cleanup resources
if IN_COLAB:
    colab_setup.optimize_memory()

# Clear test data
test_suite.test_data.clear()
gc.collect()

# Final status
print("\n" + "="*60)
print("🧪 INTEGRATION TESTING COMPLETE")
print("="*60)

if summary['success_rate'] >= 0.9:
    print("\n🎉 EXCELLENT: System passed comprehensive integration testing!")
    print("   Ready for production deployment.")
elif summary['success_rate'] >= 0.7:
    print("\n✅ GOOD: System passed most integration tests.")
    print("   Review failed tests before deployment.")
else:
    print("\n⚠️ ISSUES DETECTED: Multiple test failures.")
    print("   Resolve critical issues before proceeding.")

print(f"\n📊 Final Score: {summary['passed']}/{summary['total']} tests passed ({summary['success_rate']:.1%})")
print(f"⏱️ Total Testing Time: {summary['total_duration']:.1f} seconds")

if IN_COLAB:
    print(f"\n💾 All results saved to: {drive_manager.results_path}")
    print("\n🚀 Next: Run Production_Export_Colab.ipynb for deployment preparation")

print("\n✨ Integration testing complete!")

## Summary

This Integration Test notebook provides comprehensive validation of the AlgoSpace-8 MARL trading system:

### ✅ Tests Completed:
1. **Component Availability** - Verified all required files exist
2. **Model Imports** - Confirmed all classes can be imported
3. **Model Instantiation** - Tested object creation and GPU placement
4. **Data Flow** - Validated tensor shapes and compatibility
5. **Memory Usage** - Analyzed resource consumption patterns
6. **Checkpoint/Resume** - Verified save/load functionality
7. **Performance** - Benchmarked inference speed and throughput
8. **End-to-End Pipeline** - Tested complete trading simulation

### 📊 Key Metrics:
- Component compatibility across all modules
- Memory efficiency and resource usage
- Inference performance and latency
- Pipeline stability and consistency

### 🎯 Output:
- Detailed test report with pass/fail status
- Performance benchmarks and recommendations
- Memory analysis and optimization suggestions
- Production readiness assessment

The system is now validated and ready for production deployment!