# Mario fMRI Tutorial
## Complete Analysis Pipeline: From GLM to Brain Encoding

<br>

### Overview of the CNeuromod Mario Dataset

**What we'll cover:**
- Dataset exploration and behavioral annotations
- GLM analysis: Actions and game events
- RL agent: Learning representations from gameplay
- Brain encoding: Predicting fMRI from learned features

<br>

---

*CNeuromod 2025*

In [None]:
# Setup - hidden from presentation
import sys
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import nibabel as nib
from nilearn import plotting
import warnings
warnings.filterwarnings('ignore')

# Add src to path
src_dir = Path('..') / 'src'
sys.path.insert(0, str(src_dir))

from utils import (
    get_sourcedata_path,
    get_derivatives_path,
    load_events,
    get_session_runs,
    load_lowlevel_confounds
)

# Plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 6)
plt.rcParams['font.size'] = 11

# Define constants
SUBJECT = 'sub-01'
SESSION = 'ses-010'
TR = 1.49

print("Setup complete!")

# Introduction

## The CNeuromod Mario Dataset

## The CNeuromod Mario Dataset

### A Naturalistic fMRI Paradigm

**Participants:** 5 subjects playing Super Mario Bros (NES) in the scanner

**Task:** Natural gameplay - no constraints on strategy or behavior

**Levels:**
- **22 levels:** exclusion of waterworld and Bowser levels for gameplay consistency.

**Acquisition:**
- TR = 1.49s (multiband fMRI)
- ~5 runs per session
- ~5 minutes per run (~200 volumes)
- ~25 minutes total gameplay per session

**Key insight:** Real-world complexity with rich behavioral structure

<div style="background-color: #e8f4f8; padding: 10px; border-radius: 5px; margin-top: 20px;">
<b>Why naturalistic paradigms?</b><br>
Traditional fMRI uses simple, repetitive tasks. Naturalistic paradigms like gameplay capture complex, dynamic behavior closer to real-world cognition.
</div>

## Analysis Pipeline Overview

### Two Complementary Approaches

```
┌─────────────────────────────────────────────────────────────────┐
│                         fMRI Data                                │
│                    (BOLD time series)                            │
└────────────┬────────────────────────────┬───────────────────────┘
             │                            │
    ┌────────▼─────────┐         ┌────────▼──────────┐
    │   GLM Analysis   │         │   RL Agent        │
    └────────┬─────────┘         └────────┬──────────┘
             │                            │
    ┌────────▼─────────┐         ┌────────▼──────────┐
    │ Hypothesis-driven│         │ Learned features  │
    │ contrasts        │         │ (CNN activations) │
    │ - LEFT vs RIGHT  │         └────────┬──────────┘
    └────────┬─────────┘                  │
             │                   ┌────────▼──────────┐
             │                   │ Ridge Encoding    │
             │                   │ (Predict BOLD)    │
             │                   └────────┬──────────┘
             │                            │
    ┌────────▼────────────────────────────▼──────────┐
    │         Brain Activity Maps                    │
    │    Which regions? What representations?        │
    └────────────────────────────────────────────────┘
```

**GLM:** Hand-crafted regressors → Interpretable contrasts

**Encoding:** Learned representations → Predictive power

## Today's Focus: sub-01, ses-010

### Single Session Deep Dive

**Why single session?**
- Laptop-friendly analysis (~30-45 min runtime)
- Complete pipeline demonstration
- Easy to extend to multiple subjects/sessions

**Session details:**
- 5 runs × ~5 minutes = ~25 minutes gameplay
- ~1000 fMRI volumes
- ~200+ behavioral events

**BIDS structure:**
```
sourcedata/
├── mario/                    # Raw fMRI
├── mario.fmriprep/          # Preprocessed BOLD
├── mario.annotations/       # Behavioral events
├── mario.replays/           # Game recordings (.bk2)
└── cneuromod.processed/     # Anatomical templates
    └── smriprep/
        └── sub-01/
```

# Dataset Exploration

## Rich Behavioral Annotations

## Behavioral Annotations

The `mario.annotations` dataset provides three types of events:

**1. Action events (button presses):**
- A, B, LEFT, RIGHT, UP, DOWN
- Precise onset and duration
- **Button mappings:**
  - **A = JUMP** (short taps, mean duration ~0.3s)
  - **B = RUN/FIREBALL** (held continuously, mean duration ~12s)
  - LEFT/RIGHT = Movement
  - UP = Enter pipe, DOWN = Crouch

**2. Game events:**
- Kill/stomp, Kill/kick (defeating enemies)
- Hit/life_lost (player damage)
- Powerup_collected, Coin_collected (rewards)
- Flag_reached (level completion)

**3. Scene information:**
- Level segmentation
- Unique scene codes for each game section

Let's load and visualize these events!

In [None]:
# Load events for all runs in the session

sourcedata_path = get_sourcedata_path()

try:
    runs = get_session_runs(SUBJECT, SESSION, sourcedata_path)
    print(f"Found {len(runs)} runs: {runs}\n")
    
    # Load all events
    all_events = []
    for run in runs:
        events = load_events(SUBJECT, SESSION, run, sourcedata_path)
        all_events.append(events)
        print(f"{run}: {len(events)} events")
    
    session_events = pd.concat(all_events, ignore_index=True)
    print(f"\nTotal events: {len(session_events)}")
    
    # Categorize
    button_events = ['A', 'B', 'LEFT', 'RIGHT', 'UP', 'DOWN']
    game_events = ['Kill/stomp', 'Kill/kick', 'Hit/life_lost', 
                   'Powerup_collected', 'Coin_collected']
    
    n_buttons = len(session_events[session_events['trial_type'].isin(button_events)])
    n_game = len(session_events[session_events['trial_type'].isin(game_events)])
    
    print(f"\nButton presses: {n_buttons}")
    print(f"Game events: {n_game}")
    
    # Top events
    print("\nTop 10 most frequent events:")
    print(session_events['trial_type'].value_counts().head(10))
    
    EVENTS_LOADED = True
    
except Exception as e:
    print(f"Error loading events: {e}")
    print("Using demo data...")
    EVENTS_LOADED = False

Import necessary libraries and configure the environment.

In [None]:
# Visualize event frequencies

from glm_utils import plot_event_frequencies

fig = plot_event_frequencies(
    session_events, n_buttons, n_game,
    SUBJECT, SESSION
)
plt.show()

## Timeline Visualization

**Goal:** Understand the temporal structure of gameplay

We'll visualize:
- Button press patterns over time
- Game event occurrences
- Event density (actions per second)

**What to look for:**
- Clusters of activity (intense gameplay moments)
- Gaps (deaths, level transitions)
- Relationships between buttons and game events

In [None]:
# Event timeline for first run

from glm_utils import plot_event_timeline

if EVENTS_LOADED and len(all_events) > 0:
    fig = plot_event_timeline(all_events[0], runs[0], button_events)
    plt.show()
else:
    print("Timeline not available.")

## Game Replay Data

### Frame-by-frame recordings (.bk2 files)

**What's in a replay?**
- 60 Hz game frames
- Button states for each frame
- RAM variables: player position, score, lives, time, power-up state

**Uses:**
1. **RL training:** Extract frames as visual input for CNN
2. **Validation:** Verify behavioral annotations
3. **Visualization:** Show actual gameplay moments

**For this tutorial:** We'll use simplified proxy features instead of full frame extraction (faster for demonstration)

<div style="background-color: #fff3cd; padding: 10px; border-radius: 5px; margin-top: 20px;">
<b>Note:</b> Full replay processing requires BizHawk emulator and can extract ~18,000 frames per run. For efficiency, we use pre-computed features.
</div>