# 🚀 OLYMPUS AGI2 Training Notebook

## V4 Enhanced Training with MEPT, LEAP, and PRISM

This notebook provides all commands to train the OLYMPUS AGI2 ensemble models with the latest V4 enhancements:
- **MEPT**: Memory-Enhanced Progressive Training
- **LEAP**: Learning Enhancement through Adaptive Patterns
- **PRISM**: Program Reasoning through Inductive Synthesis
- **LEAP-PRISM Bridge**: Enhanced pattern learning

### 📋 Requirements:
- GPU: A100 80GB (recommended) or V100 32GB (minimum)
- Runtime: GPU with High-RAM


## 1️⃣ Initial Setup

In [None]:
# Clone the repository
!git clone https://github.com/AutomataControls/AutomataNexus_Olympus_AGI2.git /content/AutomataNexus_Olympus_AGI2

# Install dependencies
!cd /content/AutomataNexus_Olympus_AGI2 && pip install -r requirements.txt -q

print("✅ Repository cloned and dependencies installed!")

In [None]:
# Download ARC datasets
!cd /content/AutomataNexus_Olympus_AGI2 && python scripts/download_arc_datasets.py

print("✅ ARC datasets downloaded!")

In [None]:
# Verify environment
import torch
import os

print(f"PyTorch Version: {torch.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    print(f"\nCurrent Memory Usage:")
    print(f"  Allocated: {torch.cuda.memory_allocated() / 1e9:.2f} GB")
    print(f"  Reserved: {torch.cuda.memory_reserved() / 1e9:.2f} GB")
else:
    print("⚠️ No GPU detected! Please enable GPU in Runtime > Change runtime type")

# Check if data exists
data_path = '/content/AutomataNexus_Olympus_AGI2/data'
if os.path.exists(data_path):
    files = os.listdir(data_path)
    print(f"\n📁 Data directory contains {len(files)} files")
else:
    print("⚠️ Data directory not found!")

## 2️⃣ Individual Model Training

Train each model individually. Each model specializes in different aspects:

- 🧠 **MINERVA**: Strategic Pattern Analysis
- 🌍 **ATLAS**: Spatial Transformation Specialist
- 👁️ **IRIS**: Feature Extraction Specialist
- ⏰ **CHRONOS**: Temporal Reasoning Specialist
- 🔥 **PROMETHEUS**: Meta-Learning Specialist

In [None]:
# 🧠 Train MINERVA - Strategic Pattern Analysis
!cd /content/AutomataNexus_Olympus_AGI2 && python scripts/training/train_minerva.py

In [None]:
# 🌍 Train ATLAS - Spatial Transformation Specialist
!cd /content/AutomataNexus_Olympus_AGI2 && python scripts/training/train_atlas.py

In [None]:
# 👁️ Train IRIS - Feature Extraction Specialist
!cd /content/AutomataNexus_Olympus_AGI2 && python scripts/training/train_iris.py

In [None]:
# ⏰ Train CHRONOS - Temporal Reasoning Specialist
!cd /content/AutomataNexus_Olympus_AGI2 && python scripts/training/train_chronos.py

In [None]:
# 🔥 Train PROMETHEUS - Meta-Learning Specialist
!cd /content/AutomataNexus_Olympus_AGI2 && python scripts/training/train_prometheus.py

## 3️⃣ Combined Training Options

In [None]:
# Option A: Train all models sequentially
import subprocess
import time

models = [
    ('MINERVA', 'train_minerva.py', '🧠'),
    ('ATLAS', 'train_atlas.py', '🌍'),
    ('IRIS', 'train_iris.py', '👁️'),
    ('CHRONOS', 'train_chronos.py', '⏰'),
    ('PROMETHEUS', 'train_prometheus.py', '🔥')
]

for model_name, script_name, emoji in models:
    print(f"\n{'='*60}")
    print(f"{emoji} Starting training for {model_name}")
    print(f"{'='*60}\n")
    
    start_time = time.time()
    
    result = subprocess.run([
        'python', f'/content/AutomataNexus_Olympus_AGI2/scripts/training/{script_name}'
    ], capture_output=True, text=True)
    
    elapsed_time = time.time() - start_time
    
    print(f"\n{model_name} training completed in {elapsed_time/3600:.2f} hours")
    
    if result.returncode != 0:
        print(f"⚠️ Error training {model_name}:")
        print(result.stderr)
    else:
        print(f"✅ {model_name} trained successfully!")
    
    # Clear GPU memory between models
    torch.cuda.empty_cache()

print("\n🎉 All models training completed!")

In [None]:
# Option B: Run the main V4 ensemble training script
!cd /content/AutomataNexus_Olympus_AGI2 && python scripts/training/colab_training_v4_megascale_curriculum.py

## 4️⃣ Monitor Training Progress

In [None]:
# Real-time training monitor
import subprocess
import time
import glob
import os
from IPython.display import clear_output

def monitor_training(duration_minutes=60, update_interval=30):
    """
    Monitor training progress
    
    Args:
        duration_minutes: How long to monitor (default: 60 minutes)
        update_interval: Update frequency in seconds (default: 30 seconds)
    """
    start_time = time.time()
    end_time = start_time + (duration_minutes * 60)
    
    while time.time() < end_time:
        clear_output(wait=True)
        
        # Show GPU usage
        gpu_info = subprocess.run(['nvidia-smi'], capture_output=True, text=True)
        print("🖥️ GPU Status:")
        print(gpu_info.stdout)
        
        # Check for latest checkpoints
        checkpoints = glob.glob('/content/AutomataNexus_Olympus_AGI2/arc_models_v4/*_checkpoint.pt')
        if checkpoints:
            print("\n📦 Latest Checkpoints:")
            for checkpoint in sorted(checkpoints, key=os.path.getmtime, reverse=True)[:5]:
                size = os.path.getsize(checkpoint) / (1024**2)  # Size in MB
                mtime = os.path.getmtime(checkpoint)
                model_name = os.path.basename(checkpoint).split('_')[0].upper()
                print(f"  {model_name}: {size:.1f} MB - {time.ctime(mtime)}")
        
        # Check training reports
        reports = glob.glob('/content/AutomataNexus_Olympus_AGI2/arc_models_v4/*_report.json')
        if reports:
            print("\n📊 Training Reports:")
            import json
            for report_path in sorted(reports, key=os.path.getmtime, reverse=True)[:3]:
                try:
                    with open(report_path, 'r') as f:
                        report = json.load(f)
                    print(f"  {report['model_name']}: Best Exact = {report.get('best_exact', 0):.2f}%")
                except:
                    pass
        
        # Time remaining
        elapsed = (time.time() - start_time) / 60
        remaining = duration_minutes - elapsed
        print(f"\n⏱️ Monitor time: {elapsed:.1f}/{duration_minutes} minutes")
        print(f"   Next update in {update_interval} seconds...")
        
        time.sleep(update_interval)
    
    print("\n✅ Monitoring complete!")

# Run monitor for 60 minutes (adjust as needed)
# monitor_training(duration_minutes=60, update_interval=30)

## 5️⃣ Evaluate Trained Models

In [None]:
# Load and evaluate all trained models
import torch
import json
import glob
import os
from datetime import datetime

print("🔍 Evaluating Trained Models\n" + "="*50)

# Find all trained models
model_files = glob.glob('/content/AutomataNexus_Olympus_AGI2/arc_models_v4/*_best.pt')

if model_files:
    print(f"\n📦 Found {len(model_files)} trained models:")
    
    model_stats = []
    for model_file in sorted(model_files):
        model_name = os.path.basename(model_file).replace('_best.pt', '').upper()
        try:
            checkpoint = torch.load(model_file, map_location='cpu')
            
            stats = {
                'name': model_name,
                'val_exact': checkpoint.get('val_exact', 0),
                'epoch': checkpoint.get('epoch', 0),
                'stage': checkpoint.get('stage', 0),
                'file_size': os.path.getsize(model_file) / (1024**2)  # MB
            }
            model_stats.append(stats)
            
            emoji_map = {
                'MINERVA': '🧠',
                'ATLAS': '🌍', 
                'IRIS': '👁️',
                'CHRONOS': '⏰',
                'PROMETHEUS': '🔥'
            }
            emoji = emoji_map.get(model_name, '🤖')
            
            print(f"\n{emoji} {model_name}:")
            print(f"  Best Validation Exact Match: {stats['val_exact']:.2f}%")
            print(f"  Training Epoch: {stats['epoch']}")
            print(f"  Curriculum Stage: {stats['stage']}")
            print(f"  Model Size: {stats['file_size']:.1f} MB")
        except Exception as e:
            print(f"  ⚠️ Error loading {model_name}: {e}")
    
    # Summary statistics
    if model_stats:
        avg_exact = sum(m['val_exact'] for m in model_stats) / len(model_stats)
        best_model = max(model_stats, key=lambda x: x['val_exact'])
        
        print(f"\n📊 Summary Statistics:")
        print(f"  Average Exact Match: {avg_exact:.2f}%")
        print(f"  Best Model: {best_model['name']} ({best_model['val_exact']:.2f}%)")
        print(f"  Total Models Trained: {len(model_stats)}")
else:
    print("❌ No trained models found. Please run training first.")

# Load latest training reports
print("\n📄 Latest Training Reports:")
report_files = glob.glob('/content/AutomataNexus_Olympus_AGI2/arc_models_v4/*_training_report_*.json')

if report_files:
    # Get the 5 most recent reports
    recent_reports = sorted(report_files, key=os.path.getmtime, reverse=True)[:5]
    
    for report_file in recent_reports:
        try:
            with open(report_file, 'r') as f:
                report = json.load(f)
            
            timestamp = datetime.fromtimestamp(os.path.getmtime(report_file))
            print(f"\n  📅 {timestamp.strftime('%Y-%m-%d %H:%M:%S')}")
            print(f"     Model: {report['model_name']}")
            print(f"     Best Exact: {report['best_exact']:.2f}%")
            print(f"     Best Val Loss: {report['best_val_loss']:.4f}")
            print(f"     Total Epochs: {report.get('total_epochs', 'N/A')}")
        except Exception as e:
            print(f"  ⚠️ Error loading report: {e}")
else:
    print("  No training reports found.")

print("\n✅ Evaluation complete!")

## 6️⃣ Troubleshooting

In [None]:
# Clear GPU memory if needed
import torch
import gc

def clear_gpu_memory():
    """Clear GPU memory and show stats"""
    print("🧹 Clearing GPU memory...")
    
    # Show before stats
    if torch.cuda.is_available():
        print(f"\nBefore clearing:")
        print(f"  Allocated: {torch.cuda.memory_allocated() / 1e9:.2f} GB")
        print(f"  Reserved: {torch.cuda.memory_reserved() / 1e9:.2f} GB")
    
    # Clear memory
    torch.cuda.empty_cache()
    gc.collect()
    
    # Show after stats
    if torch.cuda.is_available():
        print(f"\nAfter clearing:")
        print(f"  Allocated: {torch.cuda.memory_allocated() / 1e9:.2f} GB")
        print(f"  Reserved: {torch.cuda.memory_reserved() / 1e9:.2f} GB")
        print(f"  Free: {(torch.cuda.get_device_properties(0).total_memory - torch.cuda.memory_reserved()) / 1e9:.2f} GB")
    
    print("\n✅ GPU memory cleared!")

clear_gpu_memory()

In [None]:
# Check for common issues
import os
import subprocess

print("🔍 Checking for common issues...\n")

issues_found = False

# Check 1: GPU availability
if not torch.cuda.is_available():
    print("❌ No GPU detected! Please enable GPU in Runtime > Change runtime type")
    issues_found = True
else:
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"✅ GPU detected: {gpu_name} ({gpu_memory:.1f} GB)")
    
    if gpu_memory < 32:
        print("⚠️  Warning: GPU has less than 32GB memory. Consider reducing batch size.")
        issues_found = True

# Check 2: Repository exists
repo_path = '/content/AutomataNexus_Olympus_AGI2'
if not os.path.exists(repo_path):
    print("❌ Repository not found! Please run the clone command first.")
    issues_found = True
else:
    print("✅ Repository found")

# Check 3: Data exists
data_path = os.path.join(repo_path, 'data')
if not os.path.exists(data_path) or len(os.listdir(data_path)) < 10:
    print("❌ Data directory missing or incomplete! Please run download_arc_datasets.py")
    issues_found = True
else:
    print(f"✅ Data directory found with {len(os.listdir(data_path))} files")

# Check 4: Required packages
try:
    import torch
    import numpy
    import tqdm
    print("✅ Core packages installed")
except ImportError as e:
    print(f"❌ Missing required package: {e}")
    issues_found = True

# Check 5: Model files
models_path = os.path.join(repo_path, 'src/models')
if os.path.exists(models_path):
    model_files = [f for f in os.listdir(models_path) if f.endswith('_model.py')]
    if len(model_files) >= 5:
        print(f"✅ All {len(model_files)} model files found")
    else:
        print(f"⚠️  Only {len(model_files)} model files found (expected 5+)")
        issues_found = True
else:
    print("❌ Models directory not found!")
    issues_found = True

if not issues_found:
    print("\n🎉 All checks passed! Ready to train.")
else:
    print("\n⚠️  Please fix the issues above before training.")

# Quick fix suggestions
print("\n💡 Quick Fixes:")
print("1. For GPU issues: Runtime > Change runtime type > GPU > A100")
print("2. For memory issues: Reduce BATCH_SIZE from 512 to 256 or 128")
print("3. For missing data: Run the data download cell")
print("4. For import errors: Run the dependency installation cell")