# SAPO Adaptive I/J: 5 GPT-2 Nodes with Gradient-Based Adaptation

This notebook runs an **ADAPTIVE** experiment where I/J ratio is dynamically optimized during training.

**Configuration:**
- **I+J=4** (total rollouts per round, varies dynamically)
- **Initial J=2** (starts at middle, adapts based on reward)
- **G=8** (completions per question)
- **Model**: GPT-2 (124M params)
- **Hardware**: 5 nodes (1 coordinator + 4 workers) on 1√ó A100 80GB
- **Algorithm**: Gradient-Based Adaptive I/J (Version 2 from ADAPTIVE_IJ_ALGORITHM.md)

**Purpose:** Demonstrate automated discovery of optimal I/J ratio through gradient ascent.

**Default Mode: TESTING (10 rounds, ~6 minutes)**
- Quick validation to ensure adaptive algorithm works
- See Cell 2 to switch to PRODUCTION mode (2000 rounds, ~21 hours)

**Expected Behavior:**
- J starts at 2 (I=2, J=2)
- Algorithm increases J if swarm sharing helps
- Algorithm decreases J if local training is better
- Should converge to optimal ratio (likely J=4 based on paper)

**Memory Usage:** ~30 GB peak VRAM (4 workers train, coordinator doesn't)

**Thesis Contribution:** Novel adaptive algorithm vs fixed I/J ratios in paper

**Run AFTER:** Baseline (EX12.14a) for comparison

## 1. Configuration

**This notebook uses ADAPTIVE I/J selection.**

**‚ö†Ô∏è IMPORTANT: Testing Mode Enabled by Default**
- Default: 10 rounds (~6 minutes) - validates adaptive algorithm
- Production: 2000 rounds (~21 hours) - full training with adaptation

**To switch to production mode:** Change `MAX_ROUNDS` in Cell 2 below.

In [None]:
# SAPO Adaptive I/J Experiment Configuration
# Dynamically adjusts I/J ratio during training

# ============================================
# ADAPTIVE I/J CONFIGURATION
# ============================================
EXPERIMENT_NAME = 'sapo_gpt2_adaptive_ij'

# Total samples per round (I + J = 4)
TOTAL_SAMPLES = 4            # Total rollouts per round
INITIAL_J = 2                # Starting J value (middle of range)
INITIAL_I = TOTAL_SAMPLES - INITIAL_J  # Computed automatically

# Adaptive algorithm parameters
ADAPTIVE_IJ_ENABLED = True   # Enable adaptive I/J selection
ADAPTATION_RATE = 0.1        # Learning rate (alpha) for gradient updates
BASELINE_ALPHA = 0.95        # EMA smoothing for reward baseline

# ============================================
# TRAINING MODE: TESTING (default) or PRODUCTION
# ============================================
# TESTING MODE (default): Quick validation ~6 minutes
MAX_ROUNDS = 10              # Testing: 10 rounds to verify adaptive algorithm

# PRODUCTION MODE: Full training ~21 hours
# Uncomment line below for production run:
# MAX_ROUNDS = 2000          # Production: Full 2000 rounds for convergence

# ============================================
# FIXED SETTINGS (same for all experiments)
# ============================================
NUM_NODES = 5                # Run 5 nodes (1 coordinator + 4 workers)
MODEL_NAME = 'gpt2'          # GPT-2 (124M params, fits memory)
NUM_GENERATIONS = 8          # G: Completions per question (like paper)
SEED = 42                    # For reproducibility

# Rollout Sharing Configuration
ROLLOUT_PUBLISH_FREQUENCY = 'stage'  # When to share rollouts
ROLLOUT_CLEANUP_ENABLED = True       # Enable cleanup to save space
ROLLOUT_KEEP_LAST_N_ROUNDS = 20      # Keep recent rollouts only
ROLLOUT_ARCHIVE_OLD = False          # Don't archive (saves space)

# Checkpoint Configuration
CHECKPOINT_INTERVAL = 100    # Save checkpoints every 100 rounds
MAX_STAGES = 1               # Stages per round (1=default)

# Optional: HuggingFace Token
HUGGINGFACE_TOKEN = None  # Set to your token or keep None

# ============================================
# DISPLAY CONFIGURATION
# ============================================
mode = "TESTING" if MAX_ROUNDS <= 20 else "PRODUCTION"
estimated_time = "~6 minutes" if MAX_ROUNDS <= 20 else "~21 hours"

print("="*60)
print(f"SAPO Adaptive I/J Experiment - {mode} MODE")
print("="*60)
print(f"‚úì Mode: {mode}")
print(f"‚úì Nodes: {NUM_NODES} (1 coordinator + 4 workers on single A100 80GB)")
print(f"‚úì Model: {MODEL_NAME}")
print(f"‚úì Experiment: {EXPERIMENT_NAME}")
print(f"‚úì Max Rounds: {MAX_ROUNDS}")
print()
print("üîÑ ADAPTIVE I/J CONFIGURATION:")
print(f"   Total Samples: {TOTAL_SAMPLES} (I + J)")
print(f"   Initial: I={INITIAL_I}, J={INITIAL_J}")
print(f"   Adaptation Rate (Œ±): {ADAPTATION_RATE}")
print(f"   Baseline Smoothing: {BASELINE_ALPHA}")
print()
print(f"Expected VRAM: ~30 GB peak (4 workers train, coordinator doesn't)")
print(f"Expected Time: {estimated_time}")
print()

if mode == "TESTING":
    print("üß™ TESTING MODE ENABLED")
    print("   Quick validation run - verifies:")
    print("   ‚úì Adaptive algorithm updates I/J each round")
    print("   ‚úì Logs show gradient updates")
    print("   ‚úì System handles changing rollout counts")
    print()
    print("   After validation succeeds, uncomment production line")
    print("   in this cell to run full 2000-round training.")
else:
    print("üìä PRODUCTION MODE - Adaptive I/J")
    print("   Expected behavior:")
    print("   - J starts at 2 (middle of range)")
    print("   - Increases if swarm helps (likely converges to J=4)")
    print("   - Decreases if local training is better")
    print("   - Should match/exceed best fixed ratio (Config 2)")
    print()
    print("   ‚ö†Ô∏è  This is a THESIS CONTRIBUTION - novel algorithm!")

print("="*60)

## 2. Mount Google Drive

In [None]:
from google.colab import drive
import os

# Mount Google Drive
drive.mount('/content/drive')

# Set base path (MUST BE SAME ACROSS ALL NODES)
GDRIVE_BASE_PATH = '/content/drive/MyDrive/rl-swarm'
os.makedirs(GDRIVE_BASE_PATH, exist_ok=True)

print(f"‚úì Google Drive mounted at: {GDRIVE_BASE_PATH}")

## 3. System Setup & GPU Verification

In [None]:
import torch

print("="*60)
print("GPU Verification")
print("="*60)

if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    total_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
    
    print(f"‚úì GPU: {gpu_name}")
    print(f"‚úì Total VRAM: {total_memory:.1f} GB")
    print()
    
    required_memory = (NUM_NODES - 1) * 6.5 + 4  # 4 workers + coordinator
    print(f"Memory Requirements:")
    print(f"  Workers: {NUM_NODES - 1} √ó 6.5 GB = {(NUM_NODES - 1) * 6.5:.1f} GB")
    print(f"  Coordinator: ~4 GB (non-training)")
    print(f"  Total estimated: {required_memory:.1f} GB")
    print(f"  Available: {total_memory:.1f} GB")
    print(f"  Margin: {total_memory - required_memory:.1f} GB")
    print()
    
    if total_memory < required_memory:
        raise RuntimeError(f"Insufficient GPU memory: need {required_memory:.0f} GB, have {total_memory:.1f} GB")
    else:
        print(f"‚úÖ Sufficient VRAM for {NUM_NODES} nodes")
else:
    raise RuntimeError("No GPU detected! Select A100 GPU runtime: Runtime > Change runtime type > A100 GPU")

## 4. Clone Repository & Install Dependencies

In [None]:
%cd /content

# Remove existing directory if it exists
if os.path.exists('/content/rl-swarm'):
    print("Removing existing repository...")
    !rm -rf /content/rl-swarm

# Clone fresh copy
print("Cloning repository...")
!git clone https://github.com/Elrashid/rl-swarm.git /content/rl-swarm

# Change to repo directory
%cd /content/rl-swarm

# Verify clone worked
if not os.path.exists('requirements.txt'):
    raise FileNotFoundError("Repository clone failed - requirements.txt not found")

print("‚úì Repository cloned successfully")
print()

# Install dependencies
print("Installing dependencies (this may take 3-5 minutes)...")
!pip install -q -r requirements.txt
!pip install -q 'protobuf>=4.25.0,<5.0'

# Verify installation
try:
    import reasoning_gym
    from rgym_exp.src.adaptive_ij import GradientAdaptiveIJ
    print()
    print("‚úì Dependencies installed successfully")
    print("‚úì reasoning-gym verified")
    print("‚úì Adaptive I/J algorithm verified")
except ImportError as e:
    print()
    print("‚ùå ERROR: Installation failed!")
    print(f"   {e}")
    raise

## 5. Initialize Experiment

In [None]:
from rgym_exp.utils.experiment_manager import init_experiment

# Initialize experiment structure in Google Drive
config_overrides = {
    'training.max_round': MAX_ROUNDS,
    'training.num_generations': NUM_GENERATIONS,
    'training.total_samples': TOTAL_SAMPLES,
    'training.initial_I': INITIAL_I,
    'training.initial_J': INITIAL_J,
    'training.adaptive_ij_enabled': ADAPTIVE_IJ_ENABLED,
    'training.adaptation_rate': ADAPTATION_RATE,
    'training.seed': SEED,
}

init_experiment(
    gdrive_base_path=GDRIVE_BASE_PATH,
    experiment_name=EXPERIMENT_NAME,
    config_overrides=config_overrides
)

print(f"‚úì Experiment initialized: {EXPERIMENT_NAME}")
print(f"  Path: {GDRIVE_BASE_PATH}/experiments/{EXPERIMENT_NAME}")
print(f"  Config: Initial I={INITIAL_I}, J={INITIAL_J} (total={TOTAL_SAMPLES})")
print(f"  Adaptive: Enabled with Œ±={ADAPTATION_RATE}")
print()
print("üîÑ I/J will adapt dynamically based on reward feedback!")

## 6. Launch 5-Node Adaptive Swarm (KEY CELL)

**Note:** All nodes use adaptive I/J algorithm - J will change each round based on performance.

In [None]:
import subprocess
import time
from datetime import datetime

mode_label = "TESTING" if MAX_ROUNDS <= 20 else "PRODUCTION"
estimated_duration = "~6 minutes" if MAX_ROUNDS <= 20 else "~21 hours"

print("="*60)
print(f"Launching {NUM_NODES}-Node Adaptive Swarm ({mode_label} MODE)")
print("="*60)
print(f"Experiment: {EXPERIMENT_NAME}")
print(f"Model: {MODEL_NAME}")
print(f"Initial Config: I={INITIAL_I}, J={INITIAL_J}, G={NUM_GENERATIONS}")
print(f"Adaptation: Œ±={ADAPTATION_RATE} (will adjust I/J each round)")
print(f"Rounds: {MAX_ROUNDS} ({estimated_duration})")
print(f"Hardware: All {NUM_NODES} nodes on single GPU (A100 80GB)")
print("="*60)
print()

processes = []
start_time = time.time()

for node_id in range(NUM_NODES):
    # Environment variables for this node
    env = os.environ.copy()
    env['NODE_ID'] = f'node_{node_id}'
    env['NODE_ROLE'] = 'coordinator' if node_id == 0 else 'worker'
    env['MODEL_NAME'] = MODEL_NAME
    env['NUM_TRAIN_SAMPLES'] = str(INITIAL_I)  # Starting I value
    env['NUM_TRANSPLANT_TREES'] = str(INITIAL_J)  # Starting J value
    env['NUM_GENERATIONS'] = str(NUM_GENERATIONS)
    env['MAX_ROUNDS'] = str(MAX_ROUNDS)
    env['EXPERIMENT_NAME'] = EXPERIMENT_NAME
    env['GDRIVE_PATH'] = GDRIVE_BASE_PATH
    env['CUDA_VISIBLE_DEVICES'] = '0'
    env['SEED'] = str(SEED + node_id)
    
    # Adaptive I/J configuration
    env['ADAPTIVE_IJ_ENABLED'] = 'True'
    env['ADAPTIVE_IJ_ALPHA'] = str(ADAPTATION_RATE)
    env['ADAPTIVE_IJ_BASELINE_ALPHA'] = str(BASELINE_ALPHA)
    env['ADAPTIVE_IJ_INITIAL_J'] = str(INITIAL_J)
    
    # Rollout sharing configuration
    env['ROLLOUT_PUBLISH_FREQUENCY'] = ROLLOUT_PUBLISH_FREQUENCY
    env['ROLLOUT_CLEANUP_ENABLED'] = str(ROLLOUT_CLEANUP_ENABLED)
    env['ROLLOUT_KEEP_LAST_N_ROUNDS'] = str(ROLLOUT_KEEP_LAST_N_ROUNDS)
    env['ROLLOUT_ARCHIVE_OLD'] = str(ROLLOUT_ARCHIVE_OLD)
    env['CHECKPOINT_INTERVAL'] = str(CHECKPOINT_INTERVAL)
    env['MAX_STAGES'] = str(MAX_STAGES)
    
    if HUGGINGFACE_TOKEN:
        env['HUGGINGFACE_ACCESS_TOKEN'] = HUGGINGFACE_TOKEN
    
    # Launch process
    import sys
    process = subprocess.Popen(
        [sys.executable, '-m', 'rgym_exp.runner.swarm_launcher'],
        env=env,
        cwd='/content/rl-swarm',
        stderr=subprocess.PIPE,
        stdout=subprocess.PIPE,
        text=True
    )
    processes.append(process)
    
    role = "COORDINATOR" if node_id == 0 else "WORKER     "
    print(f"‚úì Started node_{node_id} ({role}) - PID: {process.pid:5d}")
    
    delay = 10 if node_id == 0 else 5
    time.sleep(delay)

print()
print("="*60)
print("‚úÖ All nodes launched with adaptive I/J!")
print("="*60)
print()
print(f"‚úì Training will run for {estimated_duration} ({MAX_ROUNDS} rounds)")
print(f"‚úì I/J will adapt each round based on reward")
print(f"‚úì Logs location: {GDRIVE_BASE_PATH}/experiments/{EXPERIMENT_NAME}/logs/")
print()
print("Monitor adaptive behavior in Cell 7 below...")

## 7. Monitor Adaptive Training Progress

**Watch how I/J adapts in real-time!**

In [None]:
from IPython.display import clear_output
import pandas as pd

print("Starting adaptive training monitor...\n")

try:
    while True:
        clear_output(wait=True)
        
        running = sum(1 for p in processes if p.poll() is None)
        completed = NUM_NODES - running
        
        current_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
        elapsed_hours = (time.time() - start_time) / 3600
        
        print("="*70)
        print(f" Adaptive I/J Training Monitor - {EXPERIMENT_NAME}")
        print(f" Time: {current_time} | Elapsed: {elapsed_hours:.1f}h")
        print("="*70)
        print()
        print(f"Nodes: {running}/{NUM_NODES} running, {completed} completed")
        print()
        
        # GPU memory
        if torch.cuda.is_available():
            reserved = torch.cuda.memory_reserved(0) / 1e9
            total = torch.cuda.get_device_properties(0).total_memory / 1e9
            print(f"GPU: {reserved:.1f} / {total:.1f} GB ({reserved/total*100:.1f}%)")
            print()
        
        # Progress and adaptive I/J metrics
        try:
            from rgym_exp.utils.experiment_manager import get_experiment_status, get_experiment_metrics
            status = get_experiment_status(GDRIVE_BASE_PATH, EXPERIMENT_NAME)
            
            if status:
                current_round = status.get('current_round', 0)
                progress_pct = (current_round / MAX_ROUNDS) * 100
                print(f"Round: {current_round:4d} / {MAX_ROUNDS} ({progress_pct:5.1f}%)")
                
                # ETA
                if current_round > 5:
                    eta_hours = ((MAX_ROUNDS - current_round) * elapsed_hours) / current_round
                    print(f"ETA: {eta_hours:.1f} hours")
                
                # Adaptive I/J metrics
                df = get_experiment_metrics(GDRIVE_BASE_PATH, EXPERIMENT_NAME, aggregate_by_round=True)
                if not df.empty:
                    # Try to get adaptive metrics
                    if 'adaptive_I' in df.columns:
                        latest = df.tail(1).iloc[0]
                        current_I = int(latest.get('adaptive_I', INITIAL_I))
                        current_J = int(latest.get('adaptive_J', INITIAL_J))
                        J_cont = float(latest.get('adaptive_J_continuous', INITIAL_J))
                        baseline = float(latest.get('adaptive_baseline', 0))
                        
                        print(f"\nüîÑ Adaptive I/J Status:")
                        print(f"   Current: I={current_I}, J={current_J}")
                        print(f"   J (continuous): {J_cont:.2f}")
                        print(f"   Reward baseline: {baseline:.4f}")
                    
                    # Rewards
                    cumulative = df['my_reward'].sum()
                    recent = df.tail(10)['my_reward'].mean()
                    print(f"\nRewards: Cumulative={cumulative:.2f}, Recent avg={recent:.2f}")
        except Exception as e:
            print(f"(Metrics loading: {e})")
        
        print("\n" + "-"*70)
        print("Press 'Stop' to halt | Next update in 60s")
        
        if running == 0:
            print("\n" + "="*70)
            print("‚úÖ Adaptive training completed!")
            print("="*70)
            break
        
        time.sleep(60)

except KeyboardInterrupt:
    print("\n‚ö†Ô∏è  Interrupted - terminating processes...")
    for p in processes:
        if p.poll() is None:
            p.terminate()
    time.sleep(5)
    for p in processes:
        if p.poll() is None:
            p.kill()
    print("‚úì Processes terminated")

## 7.5. Check Real-Time Progress from GDrive (Optional)

**Reconnected after disconnect?** Run this cell to check training progress:
- Shows current round for each node
- Displays elapsed time and adaptive I/J values
- Works even if your notebook disconnected

Progress is saved to GDrive every round, logs flush every 30 seconds.

In [None]:
# === Real-Time Progress Viewer (Adaptive I/J) ===
# Run this cell anytime to check progress from GDrive
# Useful if you reconnect after notebook disconnect

import sys
sys.path.append('/content/rl-swarm')

from rgym_exp.utils.progress_tracker import get_experiment_progress
from rgym_exp.utils.experiment_manager import get_experiment_metrics

progress = get_experiment_progress(GDRIVE_BASE_PATH, EXPERIMENT_NAME)

print("="*70)
print("REAL-TIME PROGRESS FROM GDRIVE (Adaptive I/J)")
print("="*70)
print(f"Experiment: {progress.get('experiment')}")
print()

for node_id, node_data in progress.get('nodes', {}).items():
    if 'error' in node_data:
        print(f"  {node_id}: {node_data['error']}")
    else:
        print(f"  {node_id}:")
        print(f"    Latest event: {node_data.get('latest_event')}")
        print(f"    Current round: {node_data.get('latest_round')}")

        elapsed_sec = node_data.get('elapsed_seconds', 0)
        elapsed_hours = elapsed_sec / 3600
        print(f"    Elapsed time: {elapsed_hours:.1f} hours")
        print()

# Try to get latest adaptive I/J values
print("="*70)
print("ADAPTIVE I/J STATUS")
print("="*70)
try:
    df = get_experiment_metrics(GDRIVE_BASE_PATH, EXPERIMENT_NAME, aggregate_by_round=True)
    if not df.empty and 'adaptive_J' in df.columns:
        latest = df.tail(1).iloc[0]
        current_I = int(latest.get('adaptive_I', INITIAL_I))
        current_J = int(latest.get('adaptive_J', INITIAL_J))
        J_cont = float(latest.get('adaptive_J_continuous', INITIAL_J))
        baseline = float(latest.get('adaptive_baseline', 0))

        print(f"  Started: I={INITIAL_I}, J={INITIAL_J}")
        print(f"  Current: I={current_I}, J={current_J}")
        print(f"  J (continuous): {J_cont:.2f}")
        print(f"  Reward baseline: {baseline:.4f}")
        print(f"  Change: {current_J - INITIAL_J:+d}")
    else:
        print("  (Adaptive metrics not yet available)")
except Exception as e:
    print(f"  (Could not load adaptive metrics: {e})")

print()
print("="*70)
print("Note: Progress updates every round. Logs flush every 30s to GDrive.")
print("You can access logs directly in Google Drive even while training!")

## 8. Analyze Adaptive Behavior

**See how I/J evolved during training!**

In [None]:
from rgym_exp.utils.experiment_manager import get_experiment_metrics
import matplotlib.pyplot as plt
import numpy as np

print("="*70)
print(f"Adaptive I/J Results: {EXPERIMENT_NAME}")
print("="*70)
print(f"Model: {MODEL_NAME}")
print(f"Rounds: {MAX_ROUNDS}")
print(f"Initial: I={INITIAL_I}, J={INITIAL_J}")
print(f"Adaptation Rate: Œ±={ADAPTATION_RATE}")
print()

df = get_experiment_metrics(GDRIVE_BASE_PATH, EXPERIMENT_NAME)

if not df.empty and 'adaptive_J' in df.columns:
    # Aggregate by round (take last value per round)
    round_df = df.groupby('round').last().reset_index()
    
    final_I = int(round_df['adaptive_I'].iloc[-1])
    final_J = int(round_df['adaptive_J'].iloc[-1])
    final_J_cont = round_df['adaptive_J_continuous'].iloc[-1]
    
    total_reward = df['my_reward'].sum()
    
    print("Final Adaptive State:")
    print(f"  I={final_I}, J={final_J} (J_continuous={final_J_cont:.2f})")
    print(f"  Total Reward: {total_reward:.2f}")
    print()
    
    # Statistics
    mean_J = round_df['adaptive_J'].mean()
    std_J = round_df['adaptive_J'].std()
    
    print("Adaptive Statistics:")
    print(f"  Mean J: {mean_J:.2f} ¬± {std_J:.2f}")
    print(f"  Started at: J={INITIAL_J}")
    print(f"  Ended at: J={final_J}")
    print(f"  Change: {final_J - INITIAL_J:+d} ({(final_J - INITIAL_J) / INITIAL_J * 100:+.0f}%)")
    print()
    
    # Compare to baseline (if available)
    print("Comparison to Fixed Ratios:")
    print("  (Based on SAPO paper expectations)")
    print(f"  Config 1 (I=3, J=1): +52% improvement")
    print(f"  Config 2 (I=2, J=2): +94% improvement (BEST fixed)")
    print(f"  Config 3 (I=1, J=3): +68% improvement")
    print(f"  Adaptive (I={final_I}, J={final_J}): TBD (run baseline first)")
    print()
    
    # Plot adaptive trajectory
    if MAX_ROUNDS >= 20:
        fig, axes = plt.subplots(2, 1, figsize=(12, 10))
        
        # Plot 1: J evolution over time
        ax1 = axes[0]
        ax1.plot(round_df['round'], round_df['adaptive_J_continuous'], 
                label='J (continuous)', linewidth=2, alpha=0.7)
        ax1.plot(round_df['round'], round_df['adaptive_J'], 
                label='J (discrete)', marker='o', markersize=3, alpha=0.5)
        ax1.axhline(y=INITIAL_J, color='gray', linestyle='--', 
                   label=f'Initial J={INITIAL_J}')
        ax1.set_xlabel('Round')
        ax1.set_ylabel('J (External Rollouts)')
        ax1.set_title('Adaptive I/J Evolution Over Training', fontweight='bold')
        ax1.legend()
        ax1.grid(True, alpha=0.3)
        
        # Plot 2: Cumulative rewards
        ax2 = axes[1]
        for node_id in df['node_id'].unique():
            node_df = df[df['node_id'] == node_id].sort_values('round')
            ax2.plot(node_df['round'], node_df['my_reward'].cumsum(), 
                    label=node_id, alpha=0.7)
        ax2.set_xlabel('Round')
        ax2.set_ylabel('Cumulative Reward')
        ax2.set_title('Cumulative Rewards with Adaptive I/J', fontweight='bold')
        ax2.legend()
        ax2.grid(True, alpha=0.3)
        
        plt.tight_layout()
        plot_path = f'{GDRIVE_BASE_PATH}/adaptive_ij_results.png'
        plt.savefig(plot_path, dpi=150, bbox_inches='tight')
        print(f"‚úì Plot saved: {plot_path}")
        plt.show()
        
        # Interpretation
        print()
        print("="*70)
        print("INTERPRETATION:")
        print("="*70)
        if final_J > INITIAL_J + 0.5:
            print("‚úÖ Algorithm increased J ‚Üí Swarm sharing is beneficial!")
            print("   External rollouts help more than local ones.")
        elif final_J < INITIAL_J - 0.5:
            print("‚ö†Ô∏è  Algorithm decreased J ‚Üí Local training preferred")
            print("   More benefit from own rollouts than swarm.")
        else:
            print("‚û°Ô∏è  Algorithm kept J near initial ‚Üí Balanced split optimal")
            print("   Local and external rollouts equally valuable.")
        print()
        
        if final_J >= 2:
            print("üìä THESIS INSIGHT:")
            print("   Adaptive algorithm discovered swarm collaboration value!")
            print("   This validates the need for external rollouts (J > 0).")
        else:
            print("üìä THESIS INSIGHT:")
            print("   Adaptive algorithm favored local training.")
            print("   May indicate: model too weak, swarm too small, or")
            print("   early training phase benefits from focused exploration.")
else:
    print("‚ùå No adaptive metrics found - verify ADAPTIVE_IJ_ENABLED=True")