# Longitudinal RSA Analysis: VOTC Resection Study

## Hypothesis
Bilateral visual categories (Object, House) show greater representational reorganization than unilateral categories (Face, Word) in OTC patients, because bilateral representations are **collaborative** (not redundant) across hemispheres—losing one hemisphere forces compensation.

## Contrast Scheme Justification

### Liu Distinctiveness (Selectivity Change)
- Measures how **selective** a region is for its preferred category
- Uses contrasts that **define** category selectivity (following Liu et al.):
  - FFA: Face > Object (cope 1)
  - VWFA: Word > Scramble (cope 12) — cannot use Word > Face because face signal dominates VWFA's neighborhood
  - PPA: House > Object (cope 2)
  - LOC: Object > Scramble (cope 3)
- Question: "How correlated is preferred with non-preferred?" — about ROI's functional **identity**

### RSA Measures (Geometry Preservation, MDS Shift)
- Measures representational **structure** — how categories relate to each other
- Requires comparing patterns across all four categories simultaneously
- **Must use same baseline** for fair RDM comparison
- All Category > Scramble (copes 10, 12, 3, 11):
  - Consistent reference point
  - Each pattern reflects category response above low-level visual baseline
  - RDM comparisons are apples-to-apples

**Key distinction:** Selectivity is about a region's *identity*; RSA is about representational *structure*.

## Methodological Note: ROI Definition and Pattern Extraction

### Circularity Consideration

In this analysis, ROIs are defined using contrast maps from each session, and patterns are subsequently extracted from the same data used to define those ROIs. This approach introduces a degree of circularity, as the ROI boundaries are not independent of the data being analyzed.

We adopt this methodology for several reasons:

1. **Precedent in the literature**: This approach is standard in studies of category-selective reorganization following cortical resection. Ayzenberg et al. (2023) used suprathreshold voxels (p < 0.01, uncorrected) within anatomical masks and extracted patterns from the same data, explicitly noting that "a lax threshold [was used] because a relatively limited amount of data was collected for each participant." Similarly, Liu et al. (2025) employed peak-voxel sphere approaches for RSA without cross-validation between ROI definition and pattern extraction.

2. **Equal impact across groups**: Critically, any inflation of effect sizes due to circularity affects all participant groups (OTC, nonOTC, and control) equally. Because our primary hypothesis concerns *group differences* in the bilateral vs. unilateral contrast, circularity does not introduce systematic bias favoring our predictions.

3. **Alternative approaches are prohibitively noisy**: We explored leave-one-run-out (LORO) cross-validation, where ROIs were defined on N-1 runs and patterns extracted from the held-out run. This approach yielded geometry preservation values near zero (mean = 0.18) with weak correlation to standard estimates (r = 0.15), suggesting that single-run pattern estimates contain insufficient signal for reliable RSA. The dramatic reduction in effect size likely reflects measurement noise rather than a more accurate estimate of true representational stability.

4. **Anatomical constraints reduce arbitrary circularity**: ROIs are constrained to fall within predefined anatomical search masks for each category (e.g., fusiform for faces, parahippocampal for houses). This ensures that functional peaks reflect category-selective responses in expected cortical locations rather than arbitrary noise-driven activations.

### Interpretation

Results should be interpreted as reflecting *relative* differences between groups and category types rather than absolute estimates of representational change. The key finding—that bilateral categories show greater reorganization than unilateral categories specifically in OTC patients—is robust to the circularity concern because the methodological approach is identical across all comparisons.

**References**:
- Ayzenberg, V., et al. (2023). *Developmental Cognitive Neuroscience*, 64, 101323.
- Liu, T.T., et al. (2025). *Communications Biology*, 8, 1200.

## Cell 1: Setup & Configuration

In [1]:
import numpy as np
import nibabel as nib
from pathlib import Path
import pandas as pd
from scipy.ndimage import label, center_of_mass
from scipy.stats import pearsonr, ttest_ind, ttest_rel
from scipy.linalg import orthogonal_procrustes
import warnings
warnings.filterwarnings('ignore')

# === PATHS ===
BASE_DIR = Path("/user_data/csimmon2/long_pt")
CSV_FILE = Path('/user_data/csimmon2/git_repos/long_pt/long_pt_sub_info.csv')
OUTPUT_DIR = Path('/user_data/csimmon2/git_repos/long_pt/B_analyses')
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

# === SUBJECT INFO ===
df = pd.read_csv(CSV_FILE)
SESSION_START = {'sub-010': 2, 'sub-018': 2, 'sub-068': 2}
EXCLUDE_SUBJECTS = ['sub-025', 'sub-027', 'sub-045', 'sub-072']

# === CATEGORIES ===
CATEGORIES = ['face', 'word', 'object', 'house']
BILATERAL = ['object', 'house']
UNILATERAL = ['face', 'word']

# === CONTRAST SCHEMES ===

# For Liu Distinctiveness (Selectivity Change)
# Uses ROI-defining contrasts per Liu et al.
COPE_MAP_LIU = {
    'face': (1, 1),    # Face > Object
    'word': (12, 1),   # Word > Scramble
    'object': (3, 1),  # Object > Scramble
    'house': (2, 1)    # House > Object
}

# For RSA measures (Geometry, MDS, Drift)
# Consistent baseline across all categories
COPE_MAP_SCRAMBLE = {
    'face': (10, 1),   # Face > Scramble
    'word': (12, 1),   # Word > Scramble
    'object': (3, 1),  # Object > Scramble
    'house': (11, 1)   # House > Scramble
}

print("✓ Configuration loaded")
print(f"  Excluding: {EXCLUDE_SUBJECTS}")
print(f"  Liu Distinctiveness: COPE_MAP_LIU")
print(f"  RSA Measures: COPE_MAP_SCRAMBLE")

✓ Configuration loaded
  Excluding: ['sub-025', 'sub-027', 'sub-045', 'sub-072']
  Liu Distinctiveness: COPE_MAP_LIU
  RSA Measures: COPE_MAP_SCRAMBLE


## Cell 2: Load Subjects

In [2]:
def load_subjects():
    """Load all subjects from CSV, excluding problematic ones"""
    subjects = {}
    
    for _, row in df.iterrows():
        subject_id = row['sub']
        
        if subject_id in EXCLUDE_SUBJECTS:
            continue
            
        subj_dir = BASE_DIR / subject_id
        if not subj_dir.exists():
            continue
        
        sessions = sorted(
            [d.name.replace('ses-', '') for d in subj_dir.glob('ses-*') if d.is_dir()], 
            key=int
        )
        start_session = SESSION_START.get(subject_id, 1)
        sessions = [s for s in sessions if int(s) >= start_session]
        
        if len(sessions) < 2:
            continue
        
        hemi = 'l' if row.get('intact_hemi', 'left') == 'left' else 'r'
        
        subjects[subject_id] = {
            'code': f"{row['group']}{subject_id.split('-')[1]}",
            'sessions': sessions,
            'hemi': hemi,
            'group': row['group'],
            'patient': row['patient'] == 1,
            'surgery_side': row.get('SurgerySide', 'na'),
            'sex': row.get('sex', 'na'),
            'age_1': row.get('age_1', np.nan),
            'age_2': row.get('age_2', np.nan)
        }
    
    return subjects

SUBJECTS = load_subjects()

print(f"✓ Loaded {len(SUBJECTS)} subjects (after exclusions)")
for group in ['OTC', 'nonOTC', 'control']:
    n = sum(1 for s in SUBJECTS.values() if s['group'] == group)
    print(f"  {group}: {n}")

✓ Loaded 20 subjects (after exclusions)
  OTC: 6
  nonOTC: 7
  control: 7


## Cell 3: Helper Functions

In [3]:
def create_sphere(center_coord, affine, brain_shape, radius=6):
    """Create spherical mask around coordinate"""
    grid = np.array(np.meshgrid(
        np.arange(brain_shape[0]),
        np.arange(brain_shape[1]),
        np.arange(brain_shape[2]),
        indexing='ij'
    )).reshape(3, -1).T
    
    world = nib.affines.apply_affine(affine, grid)
    distances = np.linalg.norm(world - center_coord, axis=1)
    
    mask = np.zeros(brain_shape, dtype=bool)
    within = grid[distances <= radius]
    for c in within:
        mask[c[0], c[1], c[2]] = True
    
    return mask


def filter_to_intact_hemisphere(df_results):
    """Filter results to intact hemisphere for patients, keep both for controls"""
    filtered = []
    for _, row in df_results.iterrows():
        sid = row['subject']
        info = SUBJECTS[sid]
        if info['group'] == 'control':
            filtered.append(row)
        elif row['hemi'] == info['hemi']:
            filtered.append(row)
    return pd.DataFrame(filtered)


print("✓ Helper functions defined")

✓ Helper functions defined


## Cell 4: ROI Extraction Function

In [4]:
def extract_rois(cope_map, threshold_z=2.3, min_voxels=20):
    """Extract ROIs for all subjects using specified contrast scheme"""
    
    all_rois = {}
    
    for sid, info in SUBJECTS.items():
        first_ses = info['sessions'][0]
        roi_dir = BASE_DIR / sid / f'ses-{first_ses}' / 'ROIs'
        
        if not roi_dir.exists():
            continue
        
        all_rois[sid] = {}
        
        # For controls, extract both hemispheres
        hemis = ['l', 'r'] if info['group'] == 'control' else [info['hemi']]
        
        for hemi in hemis:
            for category in CATEGORIES:
                cope_num, mult = cope_map[category]
                
                # Load search mask
                mask_file = roi_dir / f'{hemi}_{category}_searchmask.nii.gz'
                if not mask_file.exists():
                    continue
                
                try:
                    mask_img = nib.load(mask_file)
                    search_mask = mask_img.get_fdata() > 0
                    affine = mask_img.affine
                except:
                    continue
                
                roi_key = f'{hemi}_{category}'
                all_rois[sid][roi_key] = {}
                
                for session in info['sessions']:
                    feat_dir = BASE_DIR / sid / f'ses-{session}' / 'derivatives' / 'fsl' / 'loc' / 'HighLevel.gfeat'
                    z_name = 'zstat1.nii.gz' if session == first_ses else f'zstat1_ses{first_ses}.nii.gz'
                    cope_file = feat_dir / f'cope{cope_num}.feat' / 'stats' / z_name
                    
                    if not cope_file.exists():
                        continue
                    
                    try:
                        z_data = nib.load(cope_file).get_fdata() * mult
                        suprathresh = (z_data > threshold_z) & search_mask
                        
                        if suprathresh.sum() < min_voxels:
                            continue
                        
                        labeled, n_clusters = label(suprathresh)
                        if n_clusters == 0:
                            continue
                        
                        # Largest cluster
                        sizes = [(labeled == i).sum() for i in range(1, n_clusters + 1)]
                        best_idx = np.argmax(sizes) + 1
                        roi_mask = (labeled == best_idx)
                        
                        if roi_mask.sum() < min_voxels:
                            continue
                        
                        peak_idx = np.unravel_index(np.argmax(z_data * roi_mask), z_data.shape)
                        
                        all_rois[sid][roi_key][session] = {
                            'n_voxels': int(roi_mask.sum()),
                            'peak_z': z_data[peak_idx],
                            'centroid': nib.affines.apply_affine(affine, center_of_mass(roi_mask)),
                            'peak_coord': nib.affines.apply_affine(affine, peak_idx),
                            'roi_mask': roi_mask,
                            'affine': affine,
                            'shape': z_data.shape
                        }
                    except Exception as e:
                        continue
    
    return all_rois

print("✓ ROI extraction function defined")

✓ ROI extraction function defined


## Cell 5: Liu Distinctiveness (Selectivity Change)

In [5]:
def compute_selectivity_change(rois, pattern_cope_map):
    """
    Selectivity Change (Liu Distinctiveness):
    - Correlation of preferred category with non-preferred categories
    - Change from T1 to T2 (absolute difference)
    """
    results = []
    
    for sid, roi_data in rois.items():
        info = SUBJECTS[sid]
        first_ses = info['sessions'][0]
        
        for roi_key, sessions_data in roi_data.items():
            sessions = sorted(sessions_data.keys())
            if len(sessions) < 2:
                continue
            
            hemi = roi_key.split('_')[0]
            category = roi_key.split('_')[1]
            
            ref_data = sessions_data[sessions[0]]
            affine = ref_data['affine']
            shape = ref_data['shape']
            
            distinctiveness = {}
            for ses in [sessions[0], sessions[-1]]:
                if ses not in sessions_data:
                    continue
                
                centroid = sessions_data[ses]['centroid']
                sphere = create_sphere(centroid, affine, shape, radius=6)
                
                feat_dir = BASE_DIR / sid / f'ses-{ses}' / 'derivatives' / 'fsl' / 'loc' / 'HighLevel.gfeat'
                
                patterns = {}
                valid = True
                for cat in CATEGORIES:
                    cope_num, mult = pattern_cope_map[cat]
                    z_name = 'zstat1.nii.gz' if ses == first_ses else f'zstat1_ses{first_ses}.nii.gz'
                    cope_file = feat_dir / f'cope{cope_num}.feat' / 'stats' / z_name
                    
                    if not cope_file.exists():
                        valid = False
                        break
                    
                    data = nib.load(cope_file).get_fdata() * mult
                    pattern = data[sphere]
                    
                    if len(pattern) == 0 or not np.all(np.isfinite(pattern)):
                        valid = False
                        break
                    
                    patterns[cat] = pattern
                
                if valid and len(patterns) == 4:
                    pref_pattern = patterns[category]
                    nonpref_corrs = []
                    for other_cat in CATEGORIES:
                        if other_cat != category:
                            r, _ = pearsonr(pref_pattern, patterns[other_cat])
                            nonpref_corrs.append(np.arctanh(np.clip(r, -0.999, 0.999)))
                    
                    distinctiveness[ses] = np.mean(nonpref_corrs)
            
            if len(distinctiveness) == 2:
                change = abs(distinctiveness[sessions[-1]] - distinctiveness[sessions[0]])
                results.append({
                    'subject': sid,
                    'code': info['code'],
                    'group': info['group'],
                    'hemi': hemi,
                    'category': category,
                    'selectivity_change': change
                })
    
    return pd.DataFrame(results)

print("✓ Selectivity change function defined")

✓ Selectivity change function defined


## Cell 6: RSA Measures (Geometry Preservation, MDS Shift)

In [6]:
def compute_geometry_preservation(rois, pattern_cope_map, radius=6):
    """
    Geometry Preservation: RDM stability across sessions
    - Extract patterns from sphere at each session's centroid
    - Correlate T1 and T2 RDMs
    - Higher = more stable; lower in bilateral = MORE reorganization
    """
    results = []
    
    for sid, roi_data in rois.items():
        info = SUBJECTS[sid]
        first_ses = info['sessions'][0]
        
        for roi_key, sessions_data in roi_data.items():
            sessions = sorted(sessions_data.keys())
            if len(sessions) < 2:
                continue
            
            hemi = roi_key.split('_')[0]
            category = roi_key.split('_')[1]
            
            ref_data = sessions_data[sessions[0]]
            affine = ref_data['affine']
            shape = ref_data['shape']
            
            rdms = {}
            for ses in [sessions[0], sessions[-1]]:
                if ses not in sessions_data:
                    continue
                
                centroid = sessions_data[ses]['centroid']
                sphere = create_sphere(centroid, affine, shape, radius)
                
                feat_dir = BASE_DIR / sid / f'ses-{ses}' / 'derivatives' / 'fsl' / 'loc' / 'HighLevel.gfeat'
                
                patterns = []
                valid = True
                for cat in CATEGORIES:
                    cope_num, mult = pattern_cope_map[cat]
                    z_name = 'zstat1.nii.gz' if ses == first_ses else f'zstat1_ses{first_ses}.nii.gz'
                    cope_file = feat_dir / f'cope{cope_num}.feat' / 'stats' / z_name
                    
                    if not cope_file.exists():
                        valid = False
                        break
                    
                    data = nib.load(cope_file).get_fdata() * mult
                    pattern = data[sphere]
                    
                    if len(pattern) == 0 or not np.all(np.isfinite(pattern)):
                        valid = False
                        break
                    
                    patterns.append(pattern)
                
                if valid and len(patterns) == 4:
                    corr_matrix = np.corrcoef(patterns)
                    rdm = 1 - corr_matrix
                    rdms[ses] = rdm
            
            if len(rdms) == 2:
                triu = np.triu_indices(4, k=1)
                r, _ = pearsonr(rdms[sessions[0]][triu], rdms[sessions[-1]][triu])
                
                results.append({
                    'subject': sid,
                    'code': info['code'],
                    'group': info['group'],
                    'hemi': hemi,
                    'category': category,
                    'geometry_preservation': r
                })
    
    return pd.DataFrame(results)


def compute_mds_shift(rois, pattern_cope_map, radius=6):
    """
    MDS Shift: Procrustes-aligned embedding distance
    - MDS embed RDMs to 2D
    - Align with Procrustes
    - Measure movement of each category
    """
    def mds_2d(rdm):
        n = rdm.shape[0]
        H = np.eye(n) - np.ones((n, n)) / n
        B = -0.5 * H @ (rdm ** 2) @ H
        eigvals, eigvecs = np.linalg.eigh(B)
        idx = np.argsort(eigvals)[::-1]
        coords = eigvecs[:, idx[:2]] * np.sqrt(np.maximum(eigvals[idx[:2]], 0))
        return coords
    
    results = []
    
    for sid, roi_data in rois.items():
        info = SUBJECTS[sid]
        first_ses = info['sessions'][0]
        
        for roi_key, sessions_data in roi_data.items():
            sessions = sorted(sessions_data.keys())
            if len(sessions) < 2:
                continue
            
            hemi = roi_key.split('_')[0]
            roi_category = roi_key.split('_')[1]
            
            ref_data = sessions_data[sessions[0]]
            affine = ref_data['affine']
            shape = ref_data['shape']
            
            rdms = {}
            for ses in [sessions[0], sessions[-1]]:
                if ses not in sessions_data:
                    continue
                
                centroid = sessions_data[ses]['centroid']
                sphere = create_sphere(centroid, affine, shape, radius)
                
                feat_dir = BASE_DIR / sid / f'ses-{ses}' / 'derivatives' / 'fsl' / 'loc' / 'HighLevel.gfeat'
                
                patterns = []
                valid = True
                for cat in CATEGORIES:
                    cope_num, mult = pattern_cope_map[cat]
                    z_name = 'zstat1.nii.gz' if ses == first_ses else f'zstat1_ses{first_ses}.nii.gz'
                    cope_file = feat_dir / f'cope{cope_num}.feat' / 'stats' / z_name
                    
                    if not cope_file.exists():
                        valid = False
                        break
                    
                    data = nib.load(cope_file).get_fdata() * mult
                    pattern = data[sphere]
                    
                    if len(pattern) == 0 or not np.all(np.isfinite(pattern)):
                        valid = False
                        break
                    
                    patterns.append(pattern)
                
                if valid and len(patterns) == 4:
                    corr_matrix = np.corrcoef(patterns)
                    rdm = 1 - corr_matrix
                    rdms[ses] = rdm
            
            if len(rdms) == 2:
                try:
                    coords_t1 = mds_2d(rdms[sessions[0]])
                    coords_t2 = mds_2d(rdms[sessions[-1]])
                    
                    R, _ = orthogonal_procrustes(coords_t1, coords_t2)
                    coords_t1_aligned = coords_t1 @ R
                    
                    for i, cat in enumerate(CATEGORIES):
                        dist = np.linalg.norm(coords_t1_aligned[i] - coords_t2[i])
                        results.append({
                            'subject': sid,
                            'code': info['code'],
                            'group': info['group'],
                            'hemi': hemi,
                            'roi_category': roi_category,
                            'measured_category': cat,
                            'mds_shift': dist
                        })
                except:
                    continue
    
    return pd.DataFrame(results)


def compute_spatial_drift(rois):
    """
    Spatial Drift: Euclidean distance between T1 and T2 peak centroids
    """
    results = []
    
    for sid, roi_data in rois.items():
        info = SUBJECTS[sid]
        
        for roi_key, sessions_data in roi_data.items():
            sessions = sorted(sessions_data.keys())
            if len(sessions) < 2:
                continue
            
            hemi = roi_key.split('_')[0]
            category = roi_key.split('_')[1]
            
            c1 = sessions_data[sessions[0]]['centroid']
            c2 = sessions_data[sessions[-1]]['centroid']
            drift = np.linalg.norm(np.array(c2) - np.array(c1))
            
            results.append({
                'subject': sid,
                'code': info['code'],
                'group': info['group'],
                'hemi': hemi,
                'category': category,
                'spatial_drift_mm': drift,
                't1_peak_z': sessions_data[sessions[0]]['peak_z']
            })
    
    return pd.DataFrame(results)

print("✓ RSA metric functions defined")

✓ RSA metric functions defined


## Cell 7: Extract ROIs and Compute All Measures

In [7]:
print("="*70)
print("EXTRACTING ROIs")
print("="*70)

# Liu ROIs for Selectivity Change
print("\nExtracting Liu ROIs (for Selectivity Change)...")
rois_liu = extract_rois(COPE_MAP_LIU)
n_liu = sum(len([k for k, v in roi_data.items() if len(v) >= 2]) for roi_data in rois_liu.values())
print(f"  ✓ {len(rois_liu)} subjects, {n_liu} ROIs with 2+ sessions")

# Scramble ROIs for RSA measures
print("\nExtracting Scramble ROIs (for RSA measures)...")
rois_scramble = extract_rois(COPE_MAP_SCRAMBLE)
n_scr = sum(len([k for k, v in roi_data.items() if len(v) >= 2]) for roi_data in rois_scramble.values())
print(f"  ✓ {len(rois_scramble)} subjects, {n_scr} ROIs with 2+ sessions")

print("\n" + "="*70)
print("COMPUTING MEASURES")
print("="*70)

# Selectivity Change (using Liu ROIs and Liu patterns)
print("\nComputing Selectivity Change (Liu ROIs + Liu patterns)...")
selectivity_df = compute_selectivity_change(rois_liu, COPE_MAP_LIU)
print(f"  ✓ {len(selectivity_df)} measurements")

# Geometry Preservation (using Scramble ROIs and Scramble patterns)
print("\nComputing Geometry Preservation (Scramble ROIs + Scramble patterns)...")
geometry_df = compute_geometry_preservation(rois_scramble, COPE_MAP_SCRAMBLE)
print(f"  ✓ {len(geometry_df)} measurements")

# MDS Shift (using Scramble ROIs and Scramble patterns)
print("\nComputing MDS Shift (Scramble ROIs + Scramble patterns)...")
mds_df = compute_mds_shift(rois_scramble, COPE_MAP_SCRAMBLE)
print(f"  ✓ {len(mds_df)} measurements")

# Spatial Drift (using Scramble ROIs)
print("\nComputing Spatial Drift (Scramble ROIs)...")
drift_df = compute_spatial_drift(rois_scramble)
print(f"  ✓ {len(drift_df)} measurements")

EXTRACTING ROIs

Extracting Liu ROIs (for Selectivity Change)...
  ✓ 20 subjects, 107 ROIs with 2+ sessions

Extracting Scramble ROIs (for RSA measures)...
  ✓ 20 subjects, 107 ROIs with 2+ sessions

COMPUTING MEASURES

Computing Selectivity Change (Liu ROIs + Liu patterns)...
  ✓ 107 measurements

Computing Geometry Preservation (Scramble ROIs + Scramble patterns)...
  ✓ 107 measurements

Computing MDS Shift (Scramble ROIs + Scramble patterns)...
  ✓ 428 measurements

Computing Spatial Drift (Scramble ROIs)...
  ✓ 107 measurements


## Cell 8: Statistical Tests - Bilateral vs Unilateral

In [13]:
def cohens_d(g1, g2):
    """Calculate Cohen's d effect size"""
    n1, n2 = len(g1), len(g2)
    var1, var2 = g1.var(), g2.var()
    pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))
    return (g1.mean() - g2.mean()) / pooled_std if pooled_std > 0 else 0

def fdr_correction(p_values, alpha=0.05):
    """Benjamini-Hochberg FDR correction"""
    p_array = np.array(p_values)
    n = len(p_array)
    sorted_idx = np.argsort(p_array)
    sorted_p = p_array[sorted_idx]
    
    # BH critical values
    critical = (np.arange(1, n+1) / n) * alpha
    
    # Find largest p-value that is <= critical value
    below = sorted_p <= critical
    if not below.any():
        return p_array, np.zeros(n, dtype=bool)
    
    max_idx = np.max(np.where(below)[0])
    threshold = sorted_p[max_idx]
    
    significant = p_array <= threshold
    return p_array, significant

def test_bilateral_effect(df, metric_col, metric_name, higher_means_more_change=True):
    """Test if bilateral differs from unilateral within each group"""
    
    print(f"\n{'='*70}")
    print(f"{metric_name}")
    if higher_means_more_change:
        print("(Higher = more change; expect bilateral > unilateral in OTC)")
    else:
        print("(Lower = more change; expect bilateral < unilateral in OTC)")
    print("="*70)
    
    filtered_df = filter_to_intact_hemisphere(df)
    filtered_df = filtered_df.copy()
    filtered_df['cat_type'] = filtered_df['category'].apply(
        lambda x: 'Bilateral' if x in BILATERAL else 'Unilateral'
    )
    
    print(f"\n{'Group':<10} {'Bilateral':<16} {'Unilateral':<16} {'Diff':<8} {'d':<8} {'t':<8} {'p':<10}")
    print("-"*80)
    
    results = []
    p_values = []
    
    for group in ['OTC', 'nonOTC', 'control']:
        gd = filtered_df[filtered_df['group'] == group]
        bil = gd[gd['cat_type'] == 'Bilateral'][metric_col].dropna()
        uni = gd[gd['cat_type'] == 'Unilateral'][metric_col].dropna()
        
        if len(bil) > 1 and len(uni) > 1:
            t, p = ttest_ind(bil, uni)
            d = cohens_d(bil, uni)
            diff = bil.mean() - uni.mean()
            p_values.append(p)
            
            results.append({
                'group': group,
                'metric': metric_name,
                'bilateral_mean': bil.mean(),
                'bilateral_std': bil.std(),
                'unilateral_mean': uni.mean(),
                'unilateral_std': uni.std(),
                'difference': diff,
                'cohens_d': d,
                't': t,
                'p': p
            })
    
    # FDR correction across the 3 group tests
    _, sig_fdr = fdr_correction(p_values)
    
    for i, res in enumerate(results):
        res['p_fdr_sig'] = sig_fdr[i]
        sig_mark = '**' if sig_fdr[i] else ('*' if res['p'] < 0.05 else '')
        if sig_fdr[i]:
            sig_mark = '** (FDR)'
        elif res['p'] < 0.05:
            sig_mark = '*'
        else:
            sig_mark = ''
        
        print(f"{res['group']:<10} {res['bilateral_mean']:.3f}±{res['bilateral_std']:.3f}    "
              f"{res['unilateral_mean']:.3f}±{res['unilateral_std']:.3f}    "
              f"{res['difference']:+.3f}   {res['cohens_d']:+.2f}    {res['t']:.2f}    {res['p']:.4f} {sig_mark}")
    
    return pd.DataFrame(results)


# Run tests for all measures
print("\n" + "#"*70)
print("BILATERAL vs UNILATERAL STATISTICAL TESTS")
print("#"*70)

selectivity_stats = test_bilateral_effect(
    selectivity_df, 'selectivity_change', 'SELECTIVITY CHANGE', 
    higher_means_more_change=True
)

geometry_stats = test_bilateral_effect(
    geometry_df, 'geometry_preservation', 'GEOMETRY PRESERVATION',
    higher_means_more_change=False
)

drift_stats = test_bilateral_effect(
    drift_df, 'spatial_drift_mm', 'SPATIAL DRIFT',
    higher_means_more_change=True
)

# MDS shift needs special handling - aggregate by subject/category first
print(f"\n{'='*70}")
print("MDS SHIFT")
print("(Higher = more change; expect bilateral > unilateral in OTC)")
print("="*70)

# For MDS, we need to average across ROI locations for each measured_category
mds_agg = mds_df.groupby(['subject', 'group', 'hemi', 'measured_category'])['mds_shift'].mean().reset_index()
mds_agg = mds_agg.rename(columns={'measured_category': 'category'})
mds_stats = test_bilateral_effect(
    mds_agg, 'mds_shift', 'MDS SHIFT',
    higher_means_more_change=True
)

# Collect all p-values for global FDR
print("\n" + "="*70)
print("GLOBAL FDR CORRECTION (across all 12 tests)")
print("="*70)

all_results = pd.concat([selectivity_stats, geometry_stats, drift_stats, mds_stats])
all_p = all_results['p'].values
_, all_sig = fdr_correction(all_p)
all_results['global_fdr_sig'] = all_sig

print(f"\n{'Metric':<25} {'Group':<10} {'p':<10} {'FDR sig':<10}")
print("-"*55)
for _, row in all_results.iterrows():
    print(f"{row['metric']:<25} {row['group']:<10} {row['p']:.4f}    {'Yes' if row['global_fdr_sig'] else 'No'}")


######################################################################
BILATERAL vs UNILATERAL STATISTICAL TESTS
######################################################################

SELECTIVITY CHANGE
(Higher = more change; expect bilateral > unilateral in OTC)

Group      Bilateral        Unilateral       Diff     d        t        p         
--------------------------------------------------------------------------------
OTC        0.396±0.273    0.143±0.121    +0.253   +1.18    2.82    0.0102 ** (FDR)
nonOTC     0.148±0.104    0.125±0.112    +0.023   +0.21    0.56    0.5781 
control    0.233±0.179    0.150±0.121    +0.083   +0.54    2.02    0.0482 *

GEOMETRY PRESERVATION
(Lower = more change; expect bilateral < unilateral in OTC)

Group      Bilateral        Unilateral       Diff     d        t        p         
--------------------------------------------------------------------------------
OTC        -0.025±0.465    0.240±0.466    -0.265   -0.57    -1.36    0.1872 
nonOTC    

In [1]:
def compute_sum_selectivity_and_activity(rois, cope_map, threshold_z=2.3):
    """
    Compute sum selectivity and mean activation at T1 and T2.
    
    Following Ayzenberg et al. (2023):
    - Mean activation: mean of zstat values within suprathreshold voxels
    - Sum selectivity: sum of zstat values within suprathreshold voxels,
      normalized by total voxels in anatomical search mask, × 1000
    """
    results = []
    
    for sid, roi_data in rois.items():
        info = SUBJECTS[sid]
        first_ses = info['sessions'][0]
        
        for roi_key, sessions_data in roi_data.items():
            sessions = sorted(sessions_data.keys())
            if len(sessions) < 2:
                continue
            
            hemi = roi_key.split('_')[0]
            category = roi_key.split('_')[1]
            cope_num, mult = cope_map[category]
            
            # Load anatomical search mask to get total voxel count for normalization
            roi_file = BASE_DIR / sid / f'ses-{first_ses}' / 'ROIs' / f'{hemi}_{category}_searchmask.nii.gz'
            if not roi_file.exists():
                continue
            
            try:
                mask_img = nib.load(roi_file)
                search_mask = mask_img.get_fdata() > 0
                total_mask_voxels = np.sum(search_mask)
                
                if total_mask_voxels == 0:
                    continue
            except:
                continue
            
            # Compute for T1 and T2
            for ses_label, ses in [('T1', sessions[0]), ('T2', sessions[-1])]:
                feat_dir = BASE_DIR / sid / f'ses-{ses}' / 'derivatives' / 'fsl' / 'loc' / 'HighLevel.gfeat'
                z_name = 'zstat1.nii.gz' if ses == first_ses else f'zstat1_ses{first_ses}.nii.gz'
                zstat_file = feat_dir / f'cope{cope_num}.feat' / 'stats' / z_name
                
                if not zstat_file.exists():
                    continue
                
                try:
                    zstat_data = nib.load(zstat_file).get_fdata() * mult
                    
                    # Get suprathreshold voxels within search mask
                    suprathresh_mask = (zstat_data > threshold_z) & search_mask
                    vox_resp = zstat_data[suprathresh_mask]
                    
                    if len(vox_resp) > 0:
                        # Mean activation: mean of suprathreshold voxels
                        mean_act = float(np.mean(vox_resp))
                        
                        # Sum selectivity: sum normalized by search mask size, × 1000
                        sum_selec = (float(np.sum(vox_resp)) / total_mask_voxels) * 1000
                        
                        n_voxels = len(vox_resp)
                    else:
                        mean_act = 0.0
                        sum_selec = 0.0
                        n_voxels = 0
                    
                    results.append({
                        'subject': sid,
                        'code': info['code'],
                        'group': info['group'],
                        'hemi': hemi,
                        'category': category,
                        'session': ses_label,
                        'mean_activation': mean_act,
                        'sum_selectivity': sum_selec,
                        'n_voxels': n_voxels,
                        'total_mask_voxels': total_mask_voxels
                    })
                except:
                    continue
    
    return pd.DataFrame(results)


print("\n" + "#"*70)
print("MEAN ACTIVITY AND SUM SELECTIVITY ANALYSIS")
print("(Following Ayzenberg et al., 2023)")
print("#"*70)

# Compute using Liu ROIs and patterns
print("\nComputing Mean Activation and Sum Selectivity at T1 and T2...")
activity_df = compute_sum_selectivity_and_activity(rois_liu, COPE_MAP_LIU)
print(f"  ✓ {len(activity_df)} measurements")

# Check we have both sessions for each ROI
session_counts = activity_df.groupby(['subject', 'hemi', 'category']).size()
complete_rois = session_counts[session_counts == 2].index
print(f"  ✓ {len(complete_rois)} ROIs with both T1 and T2")

# Pivot to wide format for change calculation
activity_t1 = activity_df[activity_df['session'] == 'T1'].copy()
activity_t2 = activity_df[activity_df['session'] == 'T2'].copy()

# Merge T1 and T2
activity_change = activity_t1.merge(
    activity_t2[['subject', 'hemi', 'category', 'mean_activation', 'sum_selectivity', 'n_voxels']],
    on=['subject', 'hemi', 'category'],
    suffixes=('_T1', '_T2')
)

# Calculate change scores (absolute change)
activity_change['mean_activation_change'] = abs(
    activity_change['mean_activation_T2'] - activity_change['mean_activation_T1']
)
activity_change['sum_selectivity_change'] = abs(
    activity_change['sum_selectivity_T2'] - activity_change['sum_selectivity_T1']
)
activity_change['n_voxels_change'] = (
    activity_change['n_voxels_T2'] - activity_change['n_voxels_T1']
)

print(f"  ✓ Change scores computed for {len(activity_change)} ROIs")

# Statistical tests
print(f"\n{'='*70}")
print("MEAN ACTIVATION CHANGE")
print("(Higher = more change in activation level)")
print("="*70)

mean_act_stats = test_bilateral_effect(
    activity_change, 'mean_activation_change', 'MEAN ACTIVATION CHANGE',
    higher_means_more_change=True
)

print(f"\n{'='*70}")
print("SUM SELECTIVITY CHANGE (Ayzenberg-normalized)")
print("(Higher = more change in overall selectivity)")
print("="*70)

sum_selec_stats = test_bilateral_effect(
    activity_change, 'sum_selectivity_change', 'SUM SELECTIVITY CHANGE',
    higher_means_more_change=True
)

# Summary tables
print(f"\n{'='*70}")
print("DESCRIPTIVE STATISTICS: T1 VALUES")
print("="*70)

filtered_act = filter_to_intact_hemisphere(activity_change)
filtered_act = filtered_act.copy()
filtered_act['cat_type'] = filtered_act['category'].apply(
    lambda x: 'Bilateral' if x in BILATERAL else 'Unilateral'
)

print(f"\n--- Mean Activation at T1 ---")
print(f"{'Group':<12} {'Bilateral':<20} {'Unilateral':<20}")
print("-"*55)
for group in ['OTC', 'nonOTC', 'control']:
    gd = filtered_act[filtered_act['group'] == group]
    bil = gd[gd['cat_type'] == 'Bilateral']['mean_activation_T1']
    uni = gd[gd['cat_type'] == 'Unilateral']['mean_activation_T1']
    if len(bil) > 0 and len(uni) > 0:
        print(f"{group:<12} {bil.mean():.2f}±{bil.std():.2f} (n={len(bil)})    "
              f"{uni.mean():.2f}±{uni.std():.2f} (n={len(uni)})")

print(f"\n--- Sum Selectivity at T1 ---")
print(f"{'Group':<12} {'Bilateral':<20} {'Unilateral':<20}")
print("-"*55)
for group in ['OTC', 'nonOTC', 'control']:
    gd = filtered_act[filtered_act['group'] == group]
    bil = gd[gd['cat_type'] == 'Bilateral']['sum_selectivity_T1']
    uni = gd[gd['cat_type'] == 'Unilateral']['sum_selectivity_T1']
    if len(bil) > 0 and len(uni) > 0:
        print(f"{group:<12} {bil.mean():.2f}±{bil.std():.2f} (n={len(bil)})    "
              f"{uni.mean():.2f}±{uni.std():.2f} (n={len(uni)})")

print(f"\n--- N Voxels at T1 ---")
print(f"{'Group':<12} {'Bilateral':<20} {'Unilateral':<20}")
print("-"*55)
for group in ['OTC', 'nonOTC', 'control']:
    gd = filtered_act[filtered_act['group'] == group]
    bil = gd[gd['cat_type'] == 'Bilateral']['n_voxels_T1']
    uni = gd[gd['cat_type'] == 'Unilateral']['n_voxels_T1']
    if len(bil) > 0 and len(uni) > 0:
        print(f"{group:<12} {bil.mean():.1f}±{bil.std():.1f} (n={len(bil)})    "
              f"{uni.mean():.1f}±{uni.std():.1f} (n={len(uni)})")

print(f"\n{'='*70}")
print("CHANGE SCORES SUMMARY")
print("="*70)

summary_change = filtered_act.groupby(['group', 'cat_type']).agg({
    'mean_activation_change': ['mean', 'std'],
    'sum_selectivity_change': ['mean', 'std'],
    'n_voxels_change': ['mean', 'std']
}).round(3)

print(summary_change)


######################################################################
MEAN ACTIVITY AND SUM SELECTIVITY ANALYSIS
(Following Ayzenberg et al., 2023)
######################################################################

Computing Mean Activation and Sum Selectivity at T1 and T2...


NameError: name 'rois_liu' is not defined

## Cell 9: Bootstrap Analysis

In [15]:
def bootstrap_group_comparison(df, metric_col, n_boot=10000, seed=42):
    """Bootstrap test for OTC bilateral advantage vs other groups"""
    np.random.seed(seed)
    
    filtered_df = filter_to_intact_hemisphere(df)
    filtered_df = filtered_df.copy()
    filtered_df['cat_type'] = filtered_df['category'].apply(
        lambda x: 'Bilateral' if x in BILATERAL else 'Unilateral'
    )
    
    # Calculate subject-level bilateral advantage (gap)
    subject_gaps = {}
    for group in ['OTC', 'nonOTC', 'control']:
        gd = filtered_df[filtered_df['group'] == group]
        gaps = []
        for sid in gd['subject'].unique():
            sd = gd[gd['subject'] == sid]
            bil = sd[sd['cat_type'] == 'Bilateral'][metric_col].mean()
            uni = sd[sd['cat_type'] == 'Unilateral'][metric_col].mean()
            if pd.notna(bil) and pd.notna(uni):
                gaps.append(bil - uni)
        subject_gaps[group] = np.array(gaps)
    
    print(f"\n  Subject-level gaps (Bilateral - Unilateral):")
    for g, gaps in subject_gaps.items():
        if len(gaps) > 0:
            print(f"    {g}: n={len(gaps)}, mean={gaps.mean():.3f}, SD={gaps.std():.3f}")
    
    results = []
    p_values = []
    
    for comp_group in ['nonOTC', 'control']:
        g1 = subject_gaps['OTC']
        g2 = subject_gaps[comp_group]
        
        if len(g1) < 2 or len(g2) < 2:
            continue
        
        observed_diff = np.mean(g1) - np.mean(g2)
        d = cohens_d(pd.Series(g1), pd.Series(g2))
        
        boot_diffs = []
        for _ in range(n_boot):
            s1 = np.random.choice(g1, size=len(g1), replace=True)
            s2 = np.random.choice(g2, size=len(g2), replace=True)
            boot_diffs.append(np.mean(s1) - np.mean(s2))
        
        boot_diffs = np.array(boot_diffs)
        ci_low = np.percentile(boot_diffs, 2.5)
        ci_high = np.percentile(boot_diffs, 97.5)
        
        # Two-tailed p-value
        if observed_diff > 0:
            p_val = 2 * np.mean(boot_diffs <= 0)
        else:
            p_val = 2 * np.mean(boot_diffs >= 0)
        p_val = min(p_val, 1.0)  # Cap at 1
        
        p_values.append(p_val)
        results.append({
            'comparison': f'OTC vs {comp_group}',
            'observed_diff': observed_diff,
            'cohens_d': d,
            'ci_low': ci_low,
            'ci_high': ci_high,
            'p_value': p_val
        })
    
    # FDR correction
    if len(p_values) > 0:
        _, sig_fdr = fdr_correction(p_values)
        for i, res in enumerate(results):
            res['fdr_sig'] = sig_fdr[i]
            sig_mark = '** (FDR)' if sig_fdr[i] else ('*' if res['p_value'] < 0.05 else '')
            print(f"\n  {res['comparison']}: diff={res['observed_diff']:.3f}, d={res['cohens_d']:.2f}, "
                  f"95%CI=[{res['ci_low']:.3f}, {res['ci_high']:.3f}], p={res['p_value']:.4f} {sig_mark}")
    
    return pd.DataFrame(results)


print("\n" + "#"*70)
print("BOOTSTRAP ANALYSIS")
print("#"*70)

all_boot_results = []

print("\n--- SELECTIVITY CHANGE ---")
print("(Positive diff = OTC shows MORE bilateral effect than comparison group)")
boot_sel = bootstrap_group_comparison(selectivity_df, 'selectivity_change')
boot_sel['metric'] = 'Selectivity Change'
all_boot_results.append(boot_sel)

print("\n--- GEOMETRY PRESERVATION ---")
print("(Negative diff = OTC shows LESS preservation = more reorganization)")
boot_geom = bootstrap_group_comparison(geometry_df, 'geometry_preservation')
boot_geom['metric'] = 'Geometry Preservation'
all_boot_results.append(boot_geom)

print("\n--- SPATIAL DRIFT ---")
print("(Positive diff = OTC shows MORE spatial drift)")
boot_drift = bootstrap_group_comparison(drift_df, 'spatial_drift_mm')
boot_drift['metric'] = 'Spatial Drift'
all_boot_results.append(boot_drift)

print("\n--- MDS SHIFT ---")
print("(Positive diff = OTC shows MORE MDS shift)")
mds_agg = mds_df.groupby(['subject', 'group', 'hemi', 'measured_category'])['mds_shift'].mean().reset_index()
mds_agg = mds_agg.rename(columns={'measured_category': 'category'})
boot_mds = bootstrap_group_comparison(mds_agg, 'mds_shift')
boot_mds['metric'] = 'MDS Shift'
all_boot_results.append(boot_mds)

# Global FDR across all bootstrap tests
print("\n" + "="*70)
print("GLOBAL FDR CORRECTION (all bootstrap comparisons)")
print("="*70)

all_boot_df = pd.concat(all_boot_results, ignore_index=True)
all_boot_p = all_boot_df['p_value'].values
_, all_boot_sig = fdr_correction(all_boot_p)
all_boot_df['global_fdr_sig'] = all_boot_sig

print(f"\n{'Metric':<25} {'Comparison':<18} {'p':<10} {'Global FDR':<10}")
print("-"*65)
for _, row in all_boot_df.iterrows():
    print(f"{row['metric']:<25} {row['comparison']:<18} {row['p_value']:.4f}    {'Yes' if row['global_fdr_sig'] else 'No'}")


######################################################################
BOOTSTRAP ANALYSIS
######################################################################

--- SELECTIVITY CHANGE ---
(Positive diff = OTC shows MORE bilateral effect than comparison group)

  Subject-level gaps (Bilateral - Unilateral):
    OTC: n=6, mean=0.257, SD=0.257
    nonOTC: n=7, mean=0.023, SD=0.074
    control: n=7, mean=0.083, SD=0.095



  OTC vs nonOTC: diff=0.234, d=1.18, 95%CI=[0.029, 0.454], p=0.0132 ** (FDR)

  OTC vs control: diff=0.174, d=0.85, 95%CI=[-0.034, 0.398], p=0.1142 

--- GEOMETRY PRESERVATION ---
(Negative diff = OTC shows LESS preservation = more reorganization)

  Subject-level gaps (Bilateral - Unilateral):
    OTC: n=6, mean=-0.291, SD=0.353
    nonOTC: n=7, mean=0.075, SD=0.469
    control: n=7, mean=0.051, SD=0.210

  OTC vs nonOTC: diff=-0.366, d=-0.80, 95%CI=[-0.816, 0.079], p=0.1104 

  OTC vs control: diff=-0.342, d=-1.10, 95%CI=[-0.650, -0.016], p=0.0414 *

--- SPATIAL DRIFT ---
(Positive diff = OTC shows MORE spatial drift)

  Subject-level gaps (Bilateral - Unilateral):
    OTC: n=6, mean=-2.410, SD=9.231
    nonOTC: n=7, mean=-2.070, SD=5.260
    control: n=7, mean=-2.792, SD=4.370

  OTC vs nonOTC: diff=-0.340, d=-0.04, 95%CI=[-8.485, 7.998], p=0.9102 

  OTC vs control: diff=0.383, d=0.05, 95%CI=[-7.421, 8.460], p=0.9530 

--- MDS SHIFT ---
(Positive diff = OTC shows MORE MDS shift)



## Cell 10: Category-Level Results

In [10]:
print("\n" + "="*70)
print("CATEGORY-LEVEL RESULTS")
print("="*70)

def print_category_table(df, metric_col, title):
    filtered = filter_to_intact_hemisphere(df)
    
    print(f"\n--- {title} ---")
    print(f"{'Group':<12} {'Face':<10} {'Word':<10} {'Object':<10} {'House':<10}")
    print("-"*55)
    
    for group in ['OTC', 'nonOTC', 'control']:
        gd = filtered[filtered['group'] == group]
        vals = []
        for cat in ['face', 'word', 'object', 'house']:
            cd = gd[gd['category'] == cat][metric_col]
            if len(cd) > 0:
                vals.append(f"{cd.mean():.2f}")
            else:
                vals.append("--")
        print(f"{group:<12} {vals[0]:<10} {vals[1]:<10} {vals[2]:<10} {vals[3]:<10}")

print_category_table(selectivity_df, 'selectivity_change', 'SELECTIVITY CHANGE (higher = more change)')
print_category_table(geometry_df, 'geometry_preservation', 'GEOMETRY PRESERVATION (lower = more change)')
print_category_table(drift_df, 'spatial_drift_mm', 'SPATIAL DRIFT (mm)')

# MDS shift needs special handling (has roi_category and measured_category)
print("\n--- MDS SHIFT (averaged across ROI locations) ---")
mds_filtered = filter_to_intact_hemisphere(
    mds_df.rename(columns={'roi_category': 'category'})
)
mds_avg = mds_filtered.groupby(['group', 'measured_category'])['mds_shift'].mean().reset_index()

print(f"{'Group':<12} {'Face':<10} {'Word':<10} {'Object':<10} {'House':<10}")
print("-"*55)
for group in ['OTC', 'nonOTC', 'control']:
    gd = mds_avg[mds_avg['group'] == group]
    vals = []
    for cat in ['face', 'word', 'object', 'house']:
        cd = gd[gd['measured_category'] == cat]['mds_shift']
        if len(cd) > 0:
            vals.append(f"{cd.values[0]:.2f}")
        else:
            vals.append("--")
    print(f"{group:<12} {vals[0]:<10} {vals[1]:<10} {vals[2]:<10} {vals[3]:<10}")


CATEGORY-LEVEL RESULTS

--- SELECTIVITY CHANGE (higher = more change) ---
Group        Face       Word       Object     House     
-------------------------------------------------------
OTC          0.15       0.14       0.48       0.31      
nonOTC       0.11       0.14       0.17       0.13      
control      0.18       0.12       0.20       0.27      

--- GEOMETRY PRESERVATION (lower = more change) ---
Group        Face       Word       Object     House     
-------------------------------------------------------
OTC          0.32       0.15       0.06       -0.11     
nonOTC       0.55       0.59       0.69       0.60      
control      0.64       0.30       0.55       0.49      

--- SPATIAL DRIFT (mm) ---
Group        Face       Word       Object     House     
-------------------------------------------------------
OTC          6.99       19.01      5.93       13.19     
nonOTC       3.40       8.96       2.52       5.69      
control      5.74       10.00      3.66       6.4

## Cell 11: Export Final Results

In [11]:
print("="*70)
print("EXPORTING FINAL RESULTS")
print("="*70)

# Build comprehensive export DataFrame
# Use selectivity as base (Liu ROIs)
export_data = []

selectivity_filt = filter_to_intact_hemisphere(selectivity_df)
geometry_filt = filter_to_intact_hemisphere(geometry_df)
drift_filt = filter_to_intact_hemisphere(drift_df)

for _, row in selectivity_filt.iterrows():
    sid = row['subject']
    info = SUBJECTS[sid]
    
    # Match geometry and drift (from scramble ROIs)
    geom_match = geometry_filt[
        (geometry_filt['subject'] == sid) & 
        (geometry_filt['hemi'] == row['hemi']) &
        (geometry_filt['category'] == row['category'])
    ]
    
    drift_match = drift_filt[
        (drift_filt['subject'] == sid) & 
        (drift_filt['hemi'] == row['hemi']) &
        (drift_filt['category'] == row['category'])
    ]
    
    # MDS shift (average across ROI locations for this measured category)
    mds_match = mds_df[
        (mds_df['subject'] == sid) & 
        (mds_df['hemi'] == row['hemi']) &
        (mds_df['measured_category'] == row['category'])
    ]['mds_shift'].mean()
    
    export_row = {
        'Subject': row['code'],
        'Group': info['group'],
        'Surgery_Side': info['surgery_side'],
        'Intact_Hemisphere': 'left' if info['hemi'] == 'l' else 'right',
        'Sex': info['sex'],
        'nonpt_hemi': row['hemi'].upper() if info['group'] == 'control' else 'na',
        'Category': row['category'].title(),
        'Category_Type': 'Bilateral' if row['category'] in BILATERAL else 'Unilateral',
        'age_1': info['age_1'],
        'age_2': info['age_2'],
        'yr_gap': info['age_2'] - info['age_1'] if pd.notna(info['age_1']) and pd.notna(info['age_2']) else np.nan,
        'Selectivity_Change': row['selectivity_change'],
        'Spatial_Relocation_mm': drift_match['spatial_drift_mm'].values[0] if len(drift_match) > 0 else np.nan,
        'Geometry_Preservation_6mm': geom_match['geometry_preservation'].values[0] if len(geom_match) > 0 else np.nan,
        'MDS_Shift': mds_match if pd.notna(mds_match) else np.nan
    }
    
    export_data.append(export_row)

export_df = pd.DataFrame(export_data)

# Save
output_file = OUTPUT_DIR / 'results_final_corrected.csv'
export_df.to_csv(output_file, index=False)

print(f"\n✓ Saved to: {output_file}")
print(f"  Shape: {export_df.shape}")
print(f"\nColumns: {list(export_df.columns)}")

# Summary
print("\n" + "-"*70)
print("FINAL SUMMARY")
print("-"*70)
print(f"\nSubjects per group:")
print(export_df.groupby('Group')['Subject'].nunique())
print(f"\nMeasurements per category:")
print(export_df.groupby('Category').size())

EXPORTING FINAL RESULTS

✓ Saved to: /user_data/csimmon2/git_repos/long_pt/B_analyses/results_final_corrected.csv
  Shape: (107, 15)

Columns: ['Subject', 'Group', 'Surgery_Side', 'Intact_Hemisphere', 'Sex', 'nonpt_hemi', 'Category', 'Category_Type', 'age_1', 'age_2', 'yr_gap', 'Selectivity_Change', 'Spatial_Relocation_mm', 'Geometry_Preservation_6mm', 'MDS_Shift']

----------------------------------------------------------------------
FINAL SUMMARY
----------------------------------------------------------------------

Subjects per group:
Group
OTC        6
control    7
nonOTC     7
Name: Subject, dtype: int64

Measurements per category:
Category
Face      27
House     27
Object    27
Word      26
dtype: int64


## Cell 12: Summary Statistics

In [12]:
print("="*70)
print("FINAL SUMMARY STATISTICS")
print("="*70)

print("\n--- BY GROUP AND CATEGORY TYPE ---")
summary = export_df.groupby(['Group', 'Category_Type']).agg({
    'Selectivity_Change': ['mean', 'std', 'count'],
    'Geometry_Preservation_6mm': ['mean', 'std'],
    'Spatial_Relocation_mm': ['mean', 'std'],
    'MDS_Shift': ['mean', 'std']
}).round(3)

print(summary)

print("\n" + "="*70)
print("KEY FINDINGS")
print("="*70)

# Extract OTC stats
otc_bil = export_df[(export_df['Group'] == 'OTC') & (export_df['Category_Type'] == 'Bilateral')]
otc_uni = export_df[(export_df['Group'] == 'OTC') & (export_df['Category_Type'] == 'Unilateral')]

print(f"\nOTC Selectivity Change:")
print(f"  Bilateral: {otc_bil['Selectivity_Change'].mean():.3f} ± {otc_bil['Selectivity_Change'].std():.3f}")
print(f"  Unilateral: {otc_uni['Selectivity_Change'].mean():.3f} ± {otc_uni['Selectivity_Change'].std():.3f}")
t, p = ttest_ind(otc_bil['Selectivity_Change'], otc_uni['Selectivity_Change'])
print(f"  Bil - Uni = {otc_bil['Selectivity_Change'].mean() - otc_uni['Selectivity_Change'].mean():.3f}, p = {p:.4f}")

print(f"\nOTC Geometry Preservation:")
print(f"  Bilateral: {otc_bil['Geometry_Preservation_6mm'].mean():.3f} ± {otc_bil['Geometry_Preservation_6mm'].std():.3f}")
print(f"  Unilateral: {otc_uni['Geometry_Preservation_6mm'].mean():.3f} ± {otc_uni['Geometry_Preservation_6mm'].std():.3f}")
t, p = ttest_ind(otc_bil['Geometry_Preservation_6mm'].dropna(), otc_uni['Geometry_Preservation_6mm'].dropna())
print(f"  Bil - Uni = {otc_bil['Geometry_Preservation_6mm'].mean() - otc_uni['Geometry_Preservation_6mm'].mean():.3f}, p = {p:.4f}")

print("\n" + "="*70)
print("ANALYSIS COMPLETE")
print("="*70)

FINAL SUMMARY STATISTICS

--- BY GROUP AND CATEGORY TYPE ---
                      Selectivity_Change               \
                                    mean    std count   
Group   Category_Type                                   
OTC     Bilateral                  0.396  0.273    12   
        Unilateral                 0.143  0.121    11   
control Bilateral                  0.233  0.179    28   
        Unilateral                 0.150  0.121    28   
nonOTC  Bilateral                  0.148  0.104    14   
        Unilateral                 0.125  0.112    14   

                      Geometry_Preservation_6mm        Spatial_Relocation_mm  \
                                           mean    std                  mean   
Group   Category_Type                                                          
OTC     Bilateral                        -0.025  0.465                 9.556   
        Unilateral                        0.240  0.466                12.457   
control Bilateral        