# Brain Encoding with RL Features

## Predicting Brain Activity from Agent Representations

**Overview:**
This notebook uses the CNN activations from the RL agent (notebook 02) to predict brain activity during gameplay.

**What we'll cover:**
1. Understanding the encoding model framework
2. Loading and preparing BOLD data
3. Loading CNN activations from the agent
4. Aligning timepoints between BOLD and activations
5. Fitting ridge regression encoding models
6. Comparing layer performance
7. Visualizing brain maps

**Key question:** Which layer of the agent best predicts brain activity, and where?

In [1]:
# @title Environment Setup
# @markdown Run this cell to set up the environment and download the necessary data.

import os
import sys
import subprocess
from pathlib import Path

# Configuration
REPO_URL = "https://github.com/courtois-neuromod/mario.tutorials.git"
PROJECT_PATH = Path("/content/mario.tutorials")
REQUIREMENTS_FILE = "notebooks/03_requirements.txt"
SUBJECT = "sub-01"
SESSION = "ses-001"
TR = 1.49
DOWNLOAD_STIMULI = True

def run_shell(cmd):
    print(f"Running: {cmd}")
    subprocess.check_call(cmd, shell=True)

# Detect Colab
try:
    import google.colab
    IN_COLAB = True
except ImportError:
    IN_COLAB = False

if IN_COLAB:
    print("üöÄ Detected Google Colab. Setting up ephemeral environment...")
    
    # 1. Clone Repository
    if not PROJECT_PATH.exists():
        run_shell(f"git clone {REPO_URL} {PROJECT_PATH}")
    else:
        run_shell(f"cd {PROJECT_PATH} && git pull")
    
    os.chdir(PROJECT_PATH)
    sys.path.insert(0, str(PROJECT_PATH / "src"))
    
    # 2. Run Setup
    from setup_utils import setup_project
    setup_project(REQUIREMENTS_FILE, SUBJECT, SESSION, download_stimuli_flag=DOWNLOAD_STIMULI)

else:
    print("üíª Detected Local Environment.")
    if Path.cwd().name == 'notebooks':
        os.chdir(Path.cwd().parent)
    sys.path.insert(0, str(Path.cwd() / "src"))
    print(f"‚úÖ Ready. Working directory: {os.getcwd()}")

üíª Detected Local Environment.
‚úÖ Ready. Working directory: /home/hyruuk/GitHub/neuromod/mario_analysis/mario.tutorials


In [2]:
# Silent Setup
try:
    from setup_utils import setup_all
    # Ensure data is available (silently checks)
    setup_all(subject="sub-01", session="ses-010")
except ImportError:
    print("Setup utils not found. Please ensure src is in path.")
except Exception as e:
    print(f"Setup warning: {e}")


Setup utils not found. Please ensure src is in path.


In [3]:
# Setup - imports and configuration

import sys
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Add src to path
src_dir = Path('..') / 'src'
sys.path.insert(0, str(src_dir))

# Import utilities
from utils import (
    get_sourcedata_path,
    load_events,
    get_session_runs,
    get_bold_path,
    load_bold
)

# Import RL utilities
from rl_utils import (
    create_simple_proxy_features,
    convolve_with_hrf,
    apply_pca
)

# Import RL visualizations
from rl_viz_utils import (
    plot_pca_variance_per_layer,
    plot_layer_activations_sample
)

# Import encoding utilities
from encoding_utils import (
    load_and_prepare_bold,
    fit_encoding_model_per_layer,
    compare_layer_performance
)

# Import encoding visualizations
from encoding_viz_utils import (
    plot_layer_comparison_bars,
    plot_r2_brainmap
)

# Plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 6)
plt.rcParams['font.size'] = 11

# Get sourcedata path
sourcedata_path = get_sourcedata_path()

print("‚úì Setup complete!")

‚úì Setup complete!


## 1. The Encoding Model Framework

**Goal:** Predict BOLD activity from RL agent features

**Model:** Ridge Regression (linear regression with L2 regularization)

```
BOLD(voxel, time) = Œ£ Œ≤·µ¢ ¬∑ Feature_i(time) + Œµ
```

**Why ridge regression?**
- Handles high-dimensional features (50 PCA components)
- L2 penalty prevents overfitting: `||Œ≤||¬≤ ‚â§ Œ±`
- Cross-validation selects optimal regularization strength Œ±
- Fast to fit (~5 mins for whole brain)

**Alternative approaches:**
- Lasso (L1): Sparse feature selection
- Elastic net: L1 + L2
- Nonlinear: Kernel ridge, neural networks

**For interpretability and speed, we use ridge regression.**

In [None]:
# Load prerequisites

from nilearn.masking import compute_multi_epi_mask

# Get runs
runs = get_session_runs(SUBJECT, SESSION, sourcedata_path)
print(f"Found {len(runs)} runs: {runs}")

# Load events
all_events = []
for run in runs:
    events = load_events(SUBJECT, SESSION, run, sourcedata_path)
    all_events.append(events)
    print(f"  {run}: {len(events)} events")

# Load BOLD images and paths
print("\nLoading BOLD data...")
bold_imgs = []
bold_paths = []
for run in runs:
    bold_path = get_bold_path(SUBJECT, SESSION, run, sourcedata_path)
    bold_img = load_bold(SUBJECT, SESSION, run, sourcedata_path)
    bold_paths.append(str(bold_path))  # Convert Path to string for nilearn
    bold_imgs.append(bold_img)

# Create common mask
print("\nCreating common brain mask...")
common_mask = compute_multi_epi_mask(bold_imgs, n_jobs=1)
n_voxels = int((common_mask.get_fdata() > 0).sum())
print(f"‚úì Common mask: {n_voxels:,} voxels")

print("\n‚úì All prerequisites loaded!")

Found 4 runs: ['run-1', 'run-2', 'run-3', 'run-4']
  run-1: 712 events
  run-2: 1032 events
  run-3: 1037 events
  run-4: 1030 events

Loading BOLD data...

Creating common brain mask...


## 2. Loading Prerequisites

We need:
- Subject/session info (sub-01, ses-010)
- Run IDs (4 runs)
- BOLD images (preprocessed fMRI data)
- Event files (for alignment)
- Common brain mask (from GLM analysis)

**Note:** If you haven't run notebook 01, this will create a fresh mask.

In [None]:
# Load and align activations from replays

# First, check if we have a trained model
from pathlib import Path

MODEL_DIR = Path('models/')
MODEL_PATH = MODEL_DIR / 'mario_ppo_agent.pth'

if not MODEL_PATH.exists():
    print(f"‚úó No trained model found at: {MODEL_PATH}")
    print("\nYou need a trained RL agent to extract activations.")
    print("Please train an agent first by running:")
    print("  python ../train_mario_agent.py --steps 5000000")
    print("\n‚ö† Cannot proceed with encoding analysis without trained model")
    HAS_MODEL = False
else:
    print(f"‚úì Found trained model: {MODEL_PATH}")
    HAS_MODEL = True
    
    # Load the model
    from rl_utils import load_pretrained_model, align_activations_to_bold
    
    print("\nLoading model...")
    model = load_pretrained_model(MODEL_PATH, device='cpu')
    print("‚úì Model loaded")
    
    # Align activations to BOLD
    # This will:
    # 1. Load replay files for each game segment
    # 2. Extract RL activations at 60Hz  
    # 3. Downsample to TR (1.49s)
    # 4. Apply HRF convolution
    # 5. Create NaN mask for non-gameplay periods
    
    alignment_results = align_activations_to_bold(
        model=model,
        subject=SUBJECT,
        session=SESSION,
        runs=runs,
        sourcedata_path=sourcedata_path,
        tr=TR,
        device='cpu',
        apply_hrf=True,  # Apply HRF convolution
        bold_imgs=bold_imgs  # Pass BOLD images for exact TR count
    )
    
    # Extract results
    layer_activations = alignment_results['activations']
    valid_mask = alignment_results['mask']
    run_info = alignment_results['run_info']
    
    print(f"\n{'='*70}")
    print("Alignment summary:")
    for info in run_info:
        print(f"  {info['run']}: {info['n_valid_trs']}/{info['n_trs']} TRs "
              f"({info['n_segments']} game segments)")
    print(f"{'='*70}\n")

‚úì Found trained model: models/mario_ppo_agent.pth

Loading model...
‚úì Model loaded

Aligning RL activations to BOLD for sub-01 ses-001


Processing run-1:
--------------------------------------------------
  Found 11 game trial(s)
  Using actual BOLD length: 451 TRs

  Repetition 0: Level1-1
    Onset: 0.01s, Duration: 86.55s
  Loading replay: sub-01_ses-001_task-mario_level-w1l1_rep-000.bk2
    Level format: Level1-1 -> Level1-1
  Processing 4419 frames...
    Extracted 4419 frames ‚Üí downsampling to TR...
    ‚Üí 59 TRs (indices 0-59)

  Repetition 1: Level1-1
    Onset: 86.56s, Duration: 71.11s
  Loading replay: sub-01_ses-001_task-mario_level-w1l1_rep-001.bk2
    Level format: Level1-1 -> Level1-1
  Processing 3906 frames...
    Extracted 3906 frames ‚Üí downsampling to TR...
    ‚Üí 48 TRs (indices 58-106)

  Repetition 2: Level1-1
    Onset: 157.68s, Duration: 43.15s
  Loading replay: sub-01_ses-001_task-mario_level-w1l1_rep-002.bk2
    Level format: Level1-1 -> Level1-1
  P

## 3. Loading and Aligning RL Activations

**NEW APPROACH:**

Instead of using pre-extracted activations, we now:

1. **Load replay files** from the human subject's actual gameplay
   - Uses `.bk2` replay files from `sourcedata/mario/`
   - Matches exact stimuli presented during fMRI scanning

2. **Extract activations frame-by-frame** (60Hz)
   - Pass replay frames through trained RL agent
   - Collect CNN activations from all layers

3. **Align to fMRI timing**
   - Use `mario.annotations` files to get game segment timing
   - Downsample from 60Hz to TR (1.49s)
   - Apply HRF convolution

4. **Handle multiple games per run**
   - Concatenate gameplay segments
   - Mask inter-game periods with NaN

**This ensures perfect alignment between RL activations and BOLD data!**

In [None]:
# Clean and prepare BOLD data

from encoding_utils import load_and_prepare_bold

print("Cleaning BOLD data...")
print("This performs:")
print("  1. Confound regression (motion, WM, CSF) - NO global signal")
print("  2. Detrending (remove linear drift)")
print("  3. Standardization (z-score each voxel)")
print("\nNote: High-pass filtering is handled by fMRIPrep confounds")
print("Note: Global signal regression removed (was too aggressive)\n")

bold_data = load_and_prepare_bold(
    bold_paths,  # Use paths instead of images for confound loading
    mask_img=common_mask,
    detrend=True,
    standardize=True,
    t_r=TR,
    load_confounds_from_fmriprep=True  # Automatically load confounds from fMRIPrep
)

print(f"‚úì BOLD prepared:")
print(f"  Shape: {bold_data.shape}")
print(f"  Timepoints: {bold_data.shape[0]}")
print(f"  Voxels: {bold_data.shape[1]:,}")

Cleaning BOLD data...
This performs:
  1. Confound regression (motion, WM, CSF) - NO global signal
  2. Detrending (remove linear drift)
  3. Standardization (z-score each voxel)

Note: High-pass filtering is handled by fMRIPrep confounds
Note: Global signal regression removed (was too aggressive)

‚úì BOLD prepared:
  Shape: (1794, 213371)
  Timepoints: 1794
  Voxels: 213,371


## 4. Cleaning and Preparing BOLD Data

**Preprocessing steps:**

1. **Confound regression:** Remove nuisance signals from each voxel's timeseries
   - Motion parameters (6 DOF: translation + rotation)
   - White matter signal (non-neural tissue)
   - CSF signal (physiological pulsations)
   - Global signal (whole-brain average)
   - High-pass filter components (from fMRIPrep, removes slow drifts <1/128 Hz)

2. **Detrending:** Remove linear drift within each run

3. **Standardization:** Z-score each voxel (mean=0, std=1)

**What is confound regression?**

Think of it as "noise cancellation" for fMRI:
- BOLD signal = neural activity + artifacts (motion, heartbeat, breathing, scanner drift)
- For each voxel, we fit a linear model: `BOLD = Œ≤‚ÇÅ¬∑motion + Œ≤‚ÇÇ¬∑WM + Œ≤‚ÇÉ¬∑CSF + ... + Œµ`
- We keep only the residual (Œµ) = signal unexplained by confounds
- This "cleaned" signal better reflects neural activity

**Why is this important?**
- Head motion creates spurious correlations between brain regions
- Without cleaning, you might "predict" brain activity that's actually just head movement
- Confound regression removes these artifacts while preserving neural signals

**Output:** `(timepoints √ó voxels)` matrix ready for regression, with artifacts removed

In [None]:
# Check alignment between BOLD and activations

if HAS_MODEL:
    n_bold = bold_data.shape[0]
    n_acts = list(layer_activations.values())[0].shape[0]
    
    print(f"BOLD timepoints: {n_bold}")
    print(f"Activations timepoints: {n_acts}")
    print(f"Valid (gameplay) timepoints: {valid_mask.sum()}")
    print(f"Invalid (non-gameplay) timepoints: {(~valid_mask).sum()}")
    
    # Ensure dimensions match
    if n_bold != n_acts:
        print(f"\n‚ö† Dimension mismatch!")
        print(f"  Truncating to minimum length: {min(n_bold, n_acts)}")
        n_time = min(n_bold, n_acts)
        bold_data = bold_data[:n_time]
        valid_mask = valid_mask[:n_time]
        for layer in layer_activations.keys():
            layer_activations[layer] = layer_activations[layer][:n_time]
    else:
        print("\n‚úì Dimensions match!")
else:
    print("‚ö† No model available, skipping alignment check")


BOLD timepoints: 1794
Activations timepoints: 1794
Valid (gameplay) timepoints: 1692
Invalid (non-gameplay) timepoints: 102

‚úì Dimensions match!


## 5. Alignment Status

**Automatic alignment completed!**

The `align_activations_to_bold()` function has:

1. ‚úÖ **Loaded replay files** for each game segment
2. ‚úÖ **Extracted RL activations** at 60Hz from replay frames
3. ‚úÖ **Downsampled to TR** using temporal averaging within each TR window
4. ‚úÖ **Applied HRF convolution** to account for hemodynamic lag
5. ‚úÖ **Created validity mask** to mark gameplay vs non-gameplay periods

**Key differences from old approach:**
- OLD: Arbitrary agent gameplay, misaligned
- NEW: Exact subject gameplay from replays, perfectly aligned

**Dimensions should now match:**
- BOLD: Number of TRs across all runs
- Activations: Same number of TRs (with NaN for non-gameplay)

In [None]:
# Create run-based train/test splits

if HAS_MODEL:
    print("Setting up run-based cross-validation...")
    print("\nIMPORTANT: For proper generalization, we should use leave-one-run-out CV.")
    print("This ensures the model is tested on completely unseen runs.\n")
    
    # Calculate run boundaries in concatenated data
    run_boundaries = [0]
    for info in run_info:
        run_boundaries.append(run_boundaries[-1] + info['n_trs'])
    
    print("Run boundaries (in concatenated array):")
    for i, (run, info) in enumerate(zip(runs, run_info)):
        start_idx = run_boundaries[i]
        end_idx = run_boundaries[i+1]
        print(f"  {run}: TRs {start_idx}-{end_idx} ({info['n_trs']} TRs, {info['n_valid_trs']} valid)")
    
    # For simplicity in this tutorial, we'll use first 3 runs for training, last run for testing
    # In a real analysis, you should do full leave-one-run-out cross-validation
    test_run_idx = 3  # Use last run as test set
    
    # Get train indices (first 3 runs) and test indices (last run)
    train_start = run_boundaries[0]
    train_end = run_boundaries[test_run_idx]
    test_start = run_boundaries[test_run_idx]
    test_end = run_boundaries[test_run_idx + 1]
    
    # Get valid (gameplay) indices within train and test sets
    all_indices = np.arange(len(valid_mask))
    train_all_indices = all_indices[train_start:train_end]
    test_all_indices = all_indices[test_start:test_end]
    
    # Filter to only valid (gameplay) TRs
    train_valid_indices = train_all_indices[valid_mask[train_start:train_end]]
    test_valid_indices = test_all_indices[valid_mask[test_start:test_end]]
    
    print(f"\nRun-based split:")
    print(f"  Train runs: {runs[:test_run_idx]}")
    print(f"  Test run: {runs[test_run_idx]}")
    print(f"  Train TRs (gameplay only): {len(train_valid_indices)}")
    print(f"  Test TRs (gameplay only): {len(test_valid_indices)}")
    
    print("\n‚ö† Note: For a full analysis, implement leave-one-run-out CV and average results!")
else:
    print("‚ö† No model available, skipping train/test split")

Setting up run-based cross-validation...

IMPORTANT: For proper generalization, we should use leave-one-run-out CV.
This ensures the model is tested on completely unseen runs.

Run boundaries (in concatenated array):
  run-1: TRs 0-451 (451 TRs, 415 valid)
  run-2: TRs 451-896 (445 TRs, 422 valid)
  run-3: TRs 896-1351 (455 TRs, 436 valid)
  run-4: TRs 1351-1794 (443 TRs, 419 valid)

Run-based split:
  Train runs: ['run-1', 'run-2', 'run-3']
  Test run: run-4
  Train TRs (gameplay only): 1273
  Test TRs (gameplay only): 419

‚ö† Note: For a full analysis, implement leave-one-run-out CV and average results!


## 6. Run-Based Train/Test Split

**Critical methodological point:** We must use **run-based cross-validation**, not random splitting!

**Why run-based?**
- **Temporal autocorrelation**: Adjacent TRs are correlated (hemodynamic response spans ~15-20 seconds)
- **Random split**: Train and test would contain adjacent TRs from the same run ‚Üí inflated performance
- **Run-based split**: Test set is from completely unseen runs ‚Üí true generalization

**Leave-One-Run-Out (LORO) Cross-Validation:**
- Train on N-1 runs, test on 1 held-out run
- Repeat for each run as test set
- Average results across folds
- This is the gold standard for fMRI encoding models

**Simplified approach (this notebook):**
- Train: Runs 1-3
- Test: Run 4
- For a real analysis, implement full LORO and average across all folds

**Only use gameplay TRs:**
- Both train and test only include TRs where the subject was actually playing
- Non-gameplay periods (between games) are excluded using the valid_mask

In [None]:
# Apply Random Projection to layer activations (testing multiple dimensions)

if HAS_MODEL:
    from rl_utils import apply_random_projection_with_nan_handling
    
    # Test multiple projection dimensions
    projection_dims = [10, 50, 100, 1000]
    
    print("Testing Random Projection with multiple dimensions...")
    print("(Random projection is fit only on valid gameplay TRs)\n")
    
    all_projection_results = {}
    
    for n_comp in projection_dims:
        print(f"\n{'='*70}")
        print(f"Testing n_components = {n_comp}")
        print(f"{'='*70}")
        
        projection_results = apply_random_projection_with_nan_handling(
            layer_activations,
            valid_mask,
            n_components=n_comp,
            random_state=42
        )
        
        all_projection_results[n_comp] = projection_results
        
        print(f"\nRandom projection summary (n={n_comp}):")
        for layer, acts in projection_results['reduced_activations'].items():
            print(f"  {layer}: {acts.shape[1]} components")
    
    print(f"\n{'='*70}")
    print("All random projection dimensions tested!")
    print(f"{'='*70}\n")
else:
    print("‚ö† No model available, skipping random projection")

Testing Random Projection with multiple dimensions...
(Random projection is fit only on valid gameplay TRs)


Testing n_components = 1000

Applying Random Projection to layer activations (n_components=1000)...

  conv1:
    Original features: 56448
    Random projection: 56448 ‚Üí 1000 components

  conv2:
    Original features: 14112
    Random projection: 14112 ‚Üí 1000 components

  conv3:
    Original features: 3872
    Random projection: 3872 ‚Üí 1000 components

  conv4:
    Original features: 1152
    Random projection: 1152 ‚Üí 1000 components

  linear:
    Original features: 512
    Random projection: 512 ‚Üí 512 components

‚úì Random projection complete

Random projection summary (n=1000):
  conv1: 1000 components
  conv2: 1000 components
  conv3: 1000 components
  conv4: 1000 components
  linear: 512 components

All random projection dimensions tested!



## 7. Fitting Ridge Regression Encoding Models with Random Projections

**NEW APPROACH: Random Projection instead of PCA**

**Why Random Projection?**
- Computationally efficient (no eigendecomposition)
- Preserves distances approximately (Johnson-Lindenstrauss lemma)
- Works well for high-dimensional data
- No need to fit on training data (just random matrix)

**Testing multiple dimensions:**
- 10 components: Very low dimensional
- 50 components: Similar to original PCA
- 100 components: Medium dimensional
- 1000 components: High dimensional

**For each dimension and layer:**
1. Use random-projected activations
2. Cross-validate to find optimal Œ± (regularization strength)
3. Fit ridge regression on training data (gameplay TRs only)
4. Predict BOLD on test data
5. Compute R¬≤ per voxel

**Hyperparameter search:** Œ± ‚àà [0.1, 1, 10, 100, 1000, 10000, 100000]

**Question:** Does the number of projection dimensions affect encoding performance?

In [None]:
# Fit ridge regression encoding models for each projection dimension

if HAS_MODEL:
    from encoding_utils import fit_encoding_model_per_layer
    
    alphas = [0.1, 1, 10, 100, 1000, 10000, 100000]
    
    # Store results for each dimension
    all_encoding_results = {}
    
    for n_comp in projection_dims:
        print(f"\n{'='*70}")
        print(f"Fitting encoding models for n_components = {n_comp}")
        print(f"{'='*70}\n")
        
        reduced_activations = all_projection_results[n_comp]['reduced_activations']
        
        print(f"Fitting ridge regression (5 layers √ó {n_comp} components √ó voxels)...")
        print("This may take a few minutes...\n")
        
        encoding_results = fit_encoding_model_per_layer(
            reduced_activations,
            bold_data,
            common_mask,
            train_valid_indices,
            test_valid_indices,
            alphas=alphas,
            valid_mask=valid_mask
        )
        
        all_encoding_results[n_comp] = encoding_results
        print(f"\n‚úì Encoding complete for n_components = {n_comp}!")
    
    print(f"\n{'='*70}")
    print("All encoding models fitted!")
    print(f"{'='*70}\n")
else:
    print("‚ö† No model available, skipping encoding")


Fitting encoding models for n_components = 1000

Fitting ridge regression (5 layers √ó 1000 components √ó voxels)...
This may take a few minutes...

Fitting encoding model for layer: conv1


  Best alpha: 100000.0
  Mean R¬≤ (train): 0.0160
  Mean R¬≤ (test): 0.0001

Fitting encoding model for layer: conv2
  Best alpha: 100000.0
  Mean R¬≤ (train): 0.0163
  Mean R¬≤ (test): 0.0001

Fitting encoding model for layer: conv3
  Best alpha: 100000.0
  Mean R¬≤ (train): 0.0170
  Mean R¬≤ (test): 0.0003

Fitting encoding model for layer: conv4
  Best alpha: 100000.0
  Mean R¬≤ (train): 0.0169
  Mean R¬≤ (test): 0.0003

Fitting encoding model for layer: linear
  Best alpha: 100000.0
  Mean R¬≤ (train): 0.0089
  Mean R¬≤ (test): 0.0004


‚úì Encoding complete for n_components = 1000!

All encoding models fitted!



In [None]:
  # Check shapes at every step
  print("Shape diagnostics:")
  print(f"valid_mask length: {len(valid_mask)}")
  print(f"valid_mask.sum(): {valid_mask.sum()}")
  print(f"bold_data shape: {bold_data.shape}")
  print(f"projected_acts shape: {projected_acts.shape}")
  print(f"train_valid_indices length: {len(train_valid_indices)}")
  print(f"test_valid_indices length: {len(test_valid_indices)}")
  print(f"train + test: {len(train_valid_indices) + len(test_valid_indices)}")

  print("\nFiltered shapes:")
  print(f"projected_acts[valid_mask] shape: {projected_acts[valid_mask].shape}")
  print(f"bold_data[valid_mask] shape: {bold_data[valid_mask].shape}")

  print("\nIndexing into projected_acts:")
  print(f"projected_acts[train_valid_indices] shape: {projected_acts[train_valid_indices].shape}")
  print(f"Has NaN: {np.isnan(projected_acts[train_valid_indices]).any()}")

  print("\nIndexing into bold_data:")
  print(f"bold_data[train_valid_indices] shape: {bold_data[train_valid_indices].shape}")



In [None]:
# Compare performance across projection dimensions and layers

if HAS_MODEL:
    from encoding_utils import compare_layer_performance
    from encoding_viz_utils import plot_layer_comparison_bars
    import matplotlib.pyplot as plt
    
    print("="*80)
    print("COMPARISON: Performance across different random projection dimensions")
    print("="*80)
    
    # Store all comparisons
    all_comparisons = {}
    
    # Get total number of voxels for percentage calculation
    n_total_voxels = bold_data.shape[1]
    
    for n_comp in projection_dims:
        print(f"\n{'='*70}")
        print(f"n_components = {n_comp}")
        print(f"{'='*70}")
        
        comparison_df = compare_layer_performance(all_encoding_results[n_comp])
        all_comparisons[n_comp] = comparison_df
        
        print(comparison_df.to_string(index=False))
        
        best_layer = comparison_df.iloc[0]['layer']
        best_r2 = comparison_df.iloc[0]['mean_r2']
        print(f"\n‚≠ê Best: {best_layer.upper()} (R¬≤ = {best_r2:.4f})")
    
    # Create comparison plot across dimensions
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    axes = axes.flatten()
    
    layer_order = ['conv1', 'conv2', 'conv3', 'conv4', 'linear']
    
    for idx, n_comp in enumerate(projection_dims):
        ax = axes[idx]
        
        # Plot for this dimension
        encoding_results = all_encoding_results[n_comp]
        
        # Extract mean R¬≤ for each layer
        layer_r2 = []
        for layer in layer_order:
            r2_map = encoding_results[layer]['r2_map']
            # Extract data from NIfTI image if needed
            if hasattr(r2_map, 'get_fdata'):
                r2_data = r2_map.get_fdata().flatten()
            else:
                r2_data = r2_map.flatten() if hasattr(r2_map, 'flatten') else r2_map
            
            mean_r2 = np.mean(r2_data[r2_data > 0])  # Mean of positive R¬≤
            layer_r2.append(mean_r2)
        
        # Bar plot
        bars = ax.bar(range(len(layer_order)), layer_r2, color='steelblue', alpha=0.8)
        ax.set_xticks(range(len(layer_order)))
        ax.set_xticklabels(layer_order, rotation=45)
        ax.set_ylabel('Mean R¬≤ (positive voxels)')
        ax.set_title(f'n_components = {n_comp}', fontsize=14, fontweight='bold')
        ax.set_ylim([0, max(0.1, max(layer_r2) * 1.2)])
        ax.grid(axis='y', alpha=0.3)
        
        # Highlight best layer
        best_idx = np.argmax(layer_r2)
        bars[best_idx].set_color('darkorange')
    
    plt.tight_layout()
    plt.show()
    
    # Summary table: best R¬≤ for each dimension
    print(f"\n{'='*80}")
    print("SUMMARY: Best performance across dimensions")
    print(f"{'='*80}")
    summary_data = []
    for n_comp in projection_dims:
        comparison_df = all_comparisons[n_comp]
        best_row = comparison_df.iloc[0]
        summary_data.append({
            'n_components': n_comp,
            'best_layer': best_row['layer'],
            'mean_r2': best_row['mean_r2'],
            'median_r2': best_row['median_r2'],
            'pct_positive': (best_row['n_positive_voxels'] / n_total_voxels) * 100
        })
    
    summary_df = pd.DataFrame(summary_data)
    print(summary_df.to_string(index=False))
    print(f"{'='*80}\n")
    
else:
    print("‚ö† No model available, skipping layer comparison")

## 8. Comparing Performance Across Dimensions

**Key Questions:**

1. **Does dimensionality matter?** 
   - Do more components always lead to better performance?
   - Is there a sweet spot, or does performance plateau?

2. **Which layer is best?**
   - Does the best layer change with dimensionality?
   - Are results consistent across projection dimensions?

3. **Overfitting vs Underfitting:**
   - Too few components (10): May lose important information
   - Too many components (1000): May introduce noise, harder to regularize
   - Medium (50-100): Potentially optimal balance

**Expected patterns:**
- Performance should increase from 10 ‚Üí 50 ‚Üí 100 components
- Beyond 100-1000, performance may plateau or decrease (overfitting)
- Ridge regularization should help prevent overfitting with high dimensions

In [None]:
# Visualize R¬≤ brain maps for best performing dimension

if HAS_MODEL:
    from encoding_viz_utils import plot_r2_brainmap
    
    # Find best overall dimension
    best_n_comp = None
    best_overall_r2 = -np.inf
    
    for n_comp in projection_dims:
        comparison_df = all_comparisons[n_comp]
        top_r2 = comparison_df.iloc[0]['mean_r2']
        if top_r2 > best_overall_r2:
            best_overall_r2 = top_r2
            best_n_comp = n_comp
    
    print(f"Best overall performance: n_components = {best_n_comp} (R¬≤ = {best_overall_r2:.4f})")
    print(f"\nShowing brain maps for n_components = {best_n_comp}:\n")
    
    # Get best layer for this dimension
    comparison_df = all_comparisons[best_n_comp]
    best_layer = comparison_df.iloc[0]['layer']
    encoding_results = all_encoding_results[best_n_comp]
    best_r2_map = encoding_results[best_layer]['r2_map']
    
    print(f"Best layer: {best_layer.upper()}\n")
    
    fig = plot_r2_brainmap(
        best_r2_map, 
        f"{best_layer} (n={best_n_comp})",
        threshold=0.01,
        vmax=0.2
    )
    plt.show()
    
    # Also show comparison for a specific layer across dimensions
    print(f"\n{'='*70}")
    print(f"Comparison: conv2 layer across all dimensions")
    print(f"{'='*70}\n")
    
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    axes = axes.flatten()
    
    for idx, n_comp in enumerate(projection_dims):
        ax = axes[idx]
        encoding_results = all_encoding_results[n_comp]
        r2_map = encoding_results['conv2']['r2_map']
        
        # Extract data from NIfTI image
        if hasattr(r2_map, 'get_fdata'):
            r2_data = r2_map.get_fdata().flatten()
        else:
            r2_data = r2_map.flatten() if hasattr(r2_map, 'flatten') else r2_map
        
        # Create simple histogram of R¬≤ values
        r2_positive = r2_data[r2_data > 0]
        ax.hist(r2_positive, bins=50, color='steelblue', alpha=0.7, edgecolor='black')
        ax.set_xlabel('R¬≤ value')
        ax.set_ylabel('Number of voxels')
        ax.set_title(f'conv2 R¬≤ distribution (n={n_comp})', fontweight='bold')
        ax.axvline(np.mean(r2_positive), color='red', linestyle='--', 
                   label=f'Mean = {np.mean(r2_positive):.4f}')
        ax.legend()
        ax.grid(alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    print("\nüìç Interpretation:")
    print("  - Higher n_components may capture more information")
    print("  - But also may introduce more noise")
    print("  - Ridge regularization helps balance this trade-off")
    print("  - Optimal dimension depends on data complexity and sample size")
    
else:
    print("‚ö† No model available, skipping brain map visualization")

## Summary: Brain Encoding with Random Projections

**What we accomplished:**

1. ‚úÖ **Loaded RL model:** Trained PPO agent
2. ‚úÖ **Extracted activations from replays:** Used actual gameplay .bk2 files
3. ‚úÖ **Proper temporal alignment:**
   - Matched replay frames to fMRI TRs using annotations
   - Downsampled from 60Hz to TR (1.49s)
   - Applied HRF convolution
   - Masked non-gameplay periods with NaN
4. ‚úÖ **Applied Random Projection:** Tested multiple dimensions (10, 50, 100, 1000 components)
5. ‚úÖ **Fit encoding models:** Ridge regression with NaN-aware training
6. ‚úÖ **Compared dimensions:** Evaluated how dimensionality affects prediction
7. ‚úÖ **Visualized brain maps:** Localized where each layer is encoded

---

### Random Projection vs PCA

**Why we switched from PCA to Random Projection:**

1. **Computational efficiency:**
   - PCA: Requires eigendecomposition (O(n¬≥) complexity)
   - Random Projection: Just matrix multiplication (O(n¬≤) complexity)

2. **Theoretical foundation:**
   - Johnson-Lindenstrauss Lemma: Random projections preserve distances
   - No need to fit on training data
   - Works well for high-dimensional spaces

3. **Flexibility:**
   - Easy to test multiple dimensions
   - No need to compute variance explained
   - Same random seed ensures reproducibility

4. **Performance comparison:**
   - Random projection often performs similarly to PCA for encoding tasks
   - Sometimes better when data is noisy or sample size is small

**Trade-offs:**
- PCA: Optimal variance preservation, interpretable components
- Random Projection: Faster, simpler, non-interpretable but effective

---

### Key Findings: Effect of Dimensionality

**Expected patterns:**

1. **Too few components (10):**
   - Information loss from compression
   - May miss important features
   - Lower R¬≤ values expected

2. **Medium components (50-100):**
   - Good balance between compression and information
   - Likely optimal for this dataset
   - Ridge regularization helps prevent overfitting

3. **High components (1000):**
   - More capacity to capture variance
   - But also more noise
   - Ridge regularization becomes critical
   - May not improve over medium if signal-to-noise is low

**Interpretation checklist:**
- Did performance increase monotonically with dimensions?
- Is there a plateau or sweet spot?
- Does the best layer change across dimensions?
- Are results consistent or noisy?

---

### Methodological Insights

**What determines optimal dimensionality?**

1. **Sample size:** 
   - More TRs ‚Üí Can support higher dimensions
   - Our dataset: ~100-200 valid TRs per run
   - Limited sample size may favor lower dimensions

2. **Signal-to-noise ratio:**
   - Clean signal ‚Üí Higher dimensions helpful
   - Noisy data ‚Üí Lower dimensions better (acts as regularization)

3. **Feature redundancy:**
   - Highly correlated features ‚Üí PCA/projection removes redundancy
   - Independent features ‚Üí Need more dimensions

4. **Regularization:**
   - Ridge regression compensates for high dimensionality
   - Stronger regularization (higher Œ±) ‚Üí Can handle more dimensions

**Recommendations:**
- For small datasets (<1000 samples): Use 50-100 components
- For medium datasets (1000-10000): Try 100-500 components  
- For large datasets (>10000): Can go higher (500-1000+)
- Always validate with cross-validation!

---

### Comparison to Original PCA Approach

**Original notebook (PCA with 50 components):**
- Fixed at 50 components based on variance threshold
- Interpretable components (ordered by variance)
- Computational cost moderate

**Current approach (Random Projection with 10/50/100/1000):**
- Tested multiple dimensions systematically
- Non-interpretable but effective
- Faster computation
- Reveals dimensionality-performance relationship

**Which is better?**
- PCA: When you need interpretability and optimal compression
- Random Projection: When speed matters and you want to test many dimensions
- Both: Often give similar encoding performance!

---

### Next Steps & Extensions

**To improve these results:**

1. **More sophisticated dimensionality reduction:**
   - Sparse random projections
   - Locality-sensitive hashing
   - Autoencoders
   - t-SNE or UMAP (for visualization)

2. **Cross-validation:**
   - Leave-one-run-out for all 4 runs
   - Average results across folds
   - More robust estimate of optimal dimension

3. **Statistical testing:**
   - Permutation tests for significance
   - Confidence intervals on R¬≤ estimates
   - Compare to null models (shuffled features)

4. **Alternative encodingmodels:**
   - Kernel ridge regression
   - Elastic net (L1 + L2)
   - Neural network encoders
   - Bayesian models with automatic relevance determination

5. **Feature analysis:**
   - Which projected features are most predictive?
   - Can we interpret random projections post-hoc?
   - Stability analysis (bootstrap over random seeds)

---

### Practical Takeaways

**When should you use random projection?**

‚úÖ **Good for:**
- Quick exploratory analysis
- Testing multiple dimensions
- Very high-dimensional data (>10,000 features)
- When computational resources are limited
- When interpretability is not critical

‚ùå **Not ideal for:**
- When you need interpretable components
- Small feature sets (<100 features)
- When optimal variance preservation is critical
- Publication-quality analyses (PCA more standard)

**Best practices:**
1. Always test multiple random seeds and average results
2. Use cross-validation to select optimal dimension
3. Compare to PCA as a baseline
4. Report both methods if results differ substantially

---

### Research Questions Enabled

**This systematic comparison allows us to ask:**

1. **Does the brain encoding depend on feature dimensionality?**
   - If yes ‚Üí Brain representation is low-dimensional
   - If no ‚Üí Suggests high-dimensional distributed code

2. **Is there redundancy in RL representations?**
   - Strong compression (10 dims) works ‚Üí High redundancy
   - Need many dimensions ‚Üí Features are diverse/independent

3. **What's the information bottleneck?**
   - Performance plateaus early ‚Üí Limited by BOLD signal quality
   - Keeps improving ‚Üí Limited by feature richness

**This tutorial demonstrates a complete, systematic approach to evaluating dimensionality in encoding models!**