# 05 - Assembly Validation (Volumetric)

Validate that the Brain-Score packaged assemblies contain correct values by tracing
the full preprocessing chain from raw NSD HDF5 betas.

**Sections:**
1. Voxel-level spot check: reproduce global z-score from NB02, compare with assembly
2. End-to-end raw data bypass: reproduce V4 from raw HDF5 for subj01
3. Standalone ridge regression: compare sklearn RidgeCV with benchmark output

**Environment:** `conda activate vision-2026`

**Requires:** External drive `/Volumes/Hagibis/nsd` mounted

In [1]:
import sys
sys.path.insert(0, '/Users/kartik/Brain-Score 2026/vision')
sys.path.insert(0, '/Users/kartik/Brain-Score 2026/core')

import numpy as np
import pandas as pd
import xarray as xr
import h5py
import nibabel as nib
import scipy.io
from pathlib import Path
import time

NSD_ROOT = Path('/Volumes/Hagibis/nsd')
ASSEMBLY_DIR = NSD_ROOT / 'assemblies'
BRAINSCORE_DIR = NSD_ROOT / 'brainscore'

REGIONS = ['V1', 'V2', 'V4', 'IT']
SUBJECT_LIST = [1, 2, 3, 4, 5, 6, 7, 8]
SESSIONS_PER_SUBJECT = {1: 40, 2: 40, 3: 32, 4: 30, 5: 40, 6: 32, 7: 40, 8: 30}
TRIALS_PER_SESSION = 750
VARIANT = '_8subj'

# ROI definitions (same as NB02)
REGION_TO_PRF_LABELS = {'V1': [1, 2], 'V2': [3, 4], 'V4': [7]}
STREAMS_VENTRAL_LABEL = 5

# Train/test split
split_df = pd.read_csv(ASSEMBLY_DIR / f'train_test_split{VARIANT}.csv')
train_ids = set(split_df.loc[split_df['split'] == 'train', 'stimulus_id'])
test_ids = set(split_df.loc[split_df['split'] == 'test', 'stimulus_id'])

assert NSD_ROOT.exists(), 'External drive not mounted'
print(f'NSD root: {NSD_ROOT}')
print(f'Train: {len(train_ids)}, Test: {len(test_ids)}')

NSD root: /Volumes/Hagibis/nsd
Train: 412, Test: 103


## Section 1: Voxel-Level Spot Check

Validate the NB03 packaging step by reproducing the global z-score from NB02 data
and comparing with the Brain-Score assembly values.

**Preprocessing chain being validated:**
```
NB02 assembly (session z-scored, rep-averaged)
  -> global z-score (per subject, per region, stats from all 515 images)
  -> split into train/test
  -> Brain-Score assembly
```

In [2]:
# Load Brain-Score packaged assemblies (NB03 output)
bs_train = xr.open_dataarray(str(BRAINSCORE_DIR / f'Allen2022_fmri_train{VARIANT}.nc'))
bs_test = xr.open_dataarray(str(BRAINSCORE_DIR / f'Allen2022_fmri_test{VARIANT}.nc'))
bs_train.load()
bs_test.load()

print(f'Brain-Score train: {bs_train.shape}')
print(f'Brain-Score test:  {bs_test.shape}')
print(f'Train coords: {list(bs_train.coords)}')
print(f'Test coords:  {list(bs_test.coords)}')

Brain-Score train: (412, 84564, 1)
Brain-Score test:  (309, 84564, 1)
Train coords: ['stimulus_id', 'nsd_id', 'neuroid_id', 'subject', 'region', 'nc_testset', 'voxel_x', 'voxel_y', 'voxel_z', 'time_bin_start', 'time_bin_end']
Test coords:  ['stimulus_id', 'nsd_id', 'repetition', 'neuroid_id', 'subject', 'region', 'nc_testset', 'voxel_x', 'voxel_y', 'voxel_z', 'time_bin_start', 'time_bin_end']


In [3]:
# Load NB02 assembly for V4 and reproduce global z-score
nb02_v4 = xr.open_dataarray(str(ASSEMBLY_DIR / f'Allen2022.V4{VARIANT}.nc'))
nb02_v4.load()
print(f'NB02 V4: {nb02_v4.shape}')

# Build train/test masks
nb02_stimulus_ids = nb02_v4.coords['stimulus_id'].values
train_mask = np.array([sid in train_ids for sid in nb02_stimulus_ids])
test_mask = np.array([sid in test_ids for sid in nb02_stimulus_ids])

# Reproduce global z-score for each subject (matching NB03)
reproduced_train_blocks = []
reproduced_test_blocks = []

for subj in SUBJECT_LIST:
    subj_label = f'subj{subj:02d}'
    subj_mask = nb02_v4.coords['subject'].values == subj_label
    data = nb02_v4.values[:, subj_mask]

    mean = data.mean(axis=0)
    std = data.std(axis=0)
    std[std == 0] = 1.0
    data_z = (data - mean) / std

    reproduced_train_blocks.append(data_z[train_mask])
    reproduced_test_blocks.append(data_z[test_mask])

reproduced_train = np.concatenate(reproduced_train_blocks, axis=1)
reproduced_test = np.concatenate(reproduced_test_blocks, axis=1)
print(f'Reproduced train: {reproduced_train.shape}')
print(f'Reproduced test: {reproduced_test.shape}')

NB02 V4: (515, 3982)
Reproduced train: (412, 3982)
Reproduced test: (103, 3982)


In [4]:
# Compare reproduced values with Brain-Score train assembly (V4 neuroids only)
v4_neuroid_mask = bs_train.coords['region'].values == 'V4'
bs_train_v4 = bs_train.values[:, v4_neuroid_mask, 0]

max_diff = np.max(np.abs(reproduced_train - bs_train_v4))
mean_diff = np.mean(np.abs(reproduced_train - bs_train_v4))
r = np.corrcoef(reproduced_train.ravel(), bs_train_v4.ravel())[0, 1]

print('=== NB02 -> Global Z-Score -> Brain-Score Train (V4) ===')
print(f'Max absolute difference:  {max_diff:.2e}')
print(f'Mean absolute difference: {mean_diff:.2e}')
print(f'Correlation:              {r:.10f}')
print(f'Match: {"PASS" if max_diff < 1e-5 else "FAIL"}')
assert max_diff < 1e-5, f'Train V4 mismatch: max_diff={max_diff}'

# Compare test (average the 3 reps in Brain-Score test)
bs_test_v4_all = bs_test.values[:, v4_neuroid_mask, 0]
n_test = len(test_ids)
bs_test_v4_avg = bs_test_v4_all.reshape(n_test, 3, -1).mean(axis=1)

max_diff_test = np.max(np.abs(reproduced_test - bs_test_v4_avg))
print(f'\n=== Test (averaged reps, V4) ===')
print(f'Max absolute difference: {max_diff_test:.2e}')
print(f'Match: {"PASS" if max_diff_test < 1e-5 else "FAIL"}')
assert max_diff_test < 1e-5, f'Test V4 mismatch: max_diff={max_diff_test}'

=== NB02 -> Global Z-Score -> Brain-Score Train (V4) ===
Max absolute difference:  0.00e+00
Mean absolute difference: 0.00e+00
Correlation:              1.0000000000
Match: PASS

=== Test (averaged reps, V4) ===
Max absolute difference: 9.54e-07
Match: PASS


In [5]:
# Spot check: trace 5 specific (subject, image, voxel) values
print('=== Voxel-Level Spot Check (V4) ===')
print(f'{"Subject":<10} {"Stimulus":<14} {"Voxel":<8} '
      f'{"Reproduced":>12} {"Assembly":>12} {"Diff":>12} {"Status"}')
print('-' * 82)

v4_subjects = bs_train.coords['subject'].values[v4_neuroid_mask]
train_stim_ids = bs_train.coords['stimulus_id'].values

spot_checks = [
    ('subj01', 0, 0),
    ('subj01', 0, 100),
    ('subj04', 200, 50),
    ('subj08', 411, -1),
    ('subj05', 100, 200),
]

all_pass = True
for subj_label, img_idx, vox_idx in spot_checks:
    subj_v4_mask = v4_subjects == subj_label
    subj_start = np.where(subj_v4_mask)[0][0]
    actual_vox_idx = subj_start + (vox_idx % subj_v4_mask.sum())

    reproduced_val = reproduced_train[img_idx, actual_vox_idx]
    assembly_val = bs_train_v4[img_idx, actual_vox_idx]
    diff = abs(reproduced_val - assembly_val)
    status = 'PASS' if diff < 1e-6 else 'FAIL'
    if status == 'FAIL':
        all_pass = False

    print(f'{subj_label:<10} {train_stim_ids[img_idx]:<14} {actual_vox_idx:<8} '
          f'{reproduced_val:>12.6f} {assembly_val:>12.6f} {diff:>12.2e} {status}')

print(f'\nAll spot checks: {"PASS" if all_pass else "FAIL"}')
assert all_pass, 'Spot check failed'

=== Voxel-Level Spot Check (V4) ===
Subject    Stimulus       Voxel      Reproduced     Assembly         Diff Status
----------------------------------------------------------------------------------
subj01     nsd_03049      0           -1.760644    -1.760644     0.00e+00 PASS
subj01     nsd_03049      100         -0.401076    -0.401076     0.00e+00 PASS
subj04     nsd_37224      1646         2.987292     2.987292     0.00e+00 PASS
subj08     nsd_72719      3981        -0.981733    -0.981733     0.00e+00 PASS
subj05     nsd_20738      2271         0.676849     0.676849     0.00e+00 PASS

All spot checks: PASS


## Section 2: End-to-End Raw Data Bypass

Reproduce V4 betas for subj01 entirely from raw HDF5 files, applying the full
preprocessing chain:
```
Raw HDF5 int16 / 300
  -> Z-score within session (750 trials per voxel)
  -> Collect shared-image trials, average 3 repetitions
  -> Global z-score (stats from all 515 averaged images)
  -> Compare with Brain-Score assembly
```

Then run a standalone ridge regression and compare with the benchmark score.

In [6]:
# --- Raw data utilities ---

def load_roi(subj: int, roi_name: str) -> np.ndarray:
    path = NSD_ROOT / f'subj{subj:02d}' / 'rois' / f'{roi_name}.nii.gz'
    return nib.load(str(path)).get_fdata().T  # (81,104,83) -> (83,104,81)


def load_session_betas(subj: int, session: int) -> np.ndarray:
    path = NSD_ROOT / f'subj{subj:02d}' / 'betas' / f'betas_session{session:02d}.hdf5'
    with h5py.File(str(path), 'r') as f:
        betas = f['betas'][:]
    return betas.astype(np.float32) / 300.0


def get_v4_mask(subj: int) -> np.ndarray:
    prf = load_roi(subj, 'prf-visualrois')
    return np.isin(prf, [7])  # hV4


# --- Trial mapping ---
expdesign = scipy.io.loadmat(NSD_ROOT / 'metadata' / 'nsd_expdesign.mat')
masterordering = expdesign['masterordering'].flatten()
subjectim = expdesign['subjectim']
sharedix = expdesign['sharedix'].flatten()

all_nsd_ids = sorted(split_df['nsd_id'].values)
nsd_id_to_idx = {nsd_id: idx for idx, nsd_id in enumerate(all_nsd_ids)}
N_IMAGES = len(all_nsd_ids)


def get_trial_info(subj_idx: int, target_nsd_ids: set) -> pd.DataFrame:
    n_sessions = SESSIONS_PER_SUBJECT[subj_idx + 1]
    n_total_trials = n_sessions * TRIALS_PER_SESSION
    subj_nsdids = subjectim[subj_idx]
    nsdid_to_imgidx = {int(nsd_id): img_idx + 1
                       for img_idx, nsd_id in enumerate(subj_nsdids)}

    shared_imgidxs = set()
    for nsd_id in sharedix:
        if int(nsd_id) in nsdid_to_imgidx:
            shared_imgidxs.add(nsdid_to_imgidx[int(nsd_id)])

    records = []
    rep_counter = {}
    for trial_idx in range(n_total_trials):
        img_idx = masterordering[trial_idx]
        if img_idx in shared_imgidxs:
            nsd_id = int(subj_nsdids[img_idx - 1] - 1)
            if nsd_id not in target_nsd_ids:
                rep_counter[img_idx] = rep_counter.get(img_idx, 0) + 1
                continue
            rep = rep_counter.get(img_idx, 0)
            rep_counter[img_idx] = rep + 1
            session = trial_idx // TRIALS_PER_SESSION + 1
            trial_in_session = trial_idx % TRIALS_PER_SESSION
            records.append({
                'nsd_id': nsd_id, 'rep': rep,
                'session': session, 'trial_in_session': trial_in_session,
            })
    return pd.DataFrame(records)


print('Utilities loaded.')

Utilities loaded.


In [7]:
# Reproduce V4 betas for subj01 from raw HDF5
SUBJ = 1
target_nsd_ids = set(all_nsd_ids)

v4_mask = get_v4_mask(SUBJ)
n_v4_voxels = v4_mask.sum()
print(f'subj{SUBJ:02d} V4: {n_v4_voxels} voxels')

trial_info = get_trial_info(SUBJ - 1, target_nsd_ids)
print(f'Trials for {N_IMAGES} shared images: {len(trial_info)} (expected {N_IMAGES * 3})')

per_rep = np.zeros((N_IMAGES, 3, n_v4_voxels), dtype=np.float32)

t0 = time.time()
for session in range(1, SESSIONS_PER_SUBJECT[SUBJ] + 1):
    session_trials = trial_info[trial_info['session'] == session]
    if len(session_trials) == 0:
        continue

    session_betas = load_session_betas(SUBJ, session)
    roi_betas = session_betas[:, v4_mask]

    mean = roi_betas.mean(axis=0, keepdims=True)
    std = roi_betas.std(axis=0, keepdims=True)
    std[std == 0] = 1.0
    roi_betas = (roi_betas - mean) / std

    for _, row in session_trials.iterrows():
        img_idx = nsd_id_to_idx[row['nsd_id']]
        per_rep[img_idx, row['rep']] = roi_betas[row['trial_in_session']]

    del session_betas
    if session % 10 == 0:
        print(f'  Session {session}/{SESSIONS_PER_SUBJECT[SUBJ]} ({time.time()-t0:.0f}s)')

bypass_averaged = per_rep.mean(axis=1)

gz_mean = bypass_averaged.mean(axis=0)
gz_std = bypass_averaged.std(axis=0)
gz_std[gz_std == 0] = 1.0
bypass_final = (bypass_averaged - gz_mean) / gz_std

elapsed = time.time() - t0
print(f'\nDone in {elapsed:.0f}s')
print(f'Bypass output: {bypass_final.shape}')

subj01 V4: 687 voxels
Trials for 515 shared images: 1545 (expected 1545)
  Session 10/40 (41s)
  Session 20/40 (80s)
  Session 30/40 (119s)

Done in 119s
Bypass output: (515, 687)


In [8]:
# Compare bypass result with Brain-Score assembly for subj01 V4
subj01_v4_mask = (bs_train.coords['subject'].values == 'subj01') & \
                 (bs_train.coords['region'].values == 'V4')
bs_train_subj01_v4 = bs_train.values[:, subj01_v4_mask, 0]

subj01_v4_mask_test = (bs_test.coords['subject'].values == 'subj01') & \
                      (bs_test.coords['region'].values == 'V4')
bs_test_subj01_v4 = bs_test.values[:, subj01_v4_mask_test, 0]
bs_test_subj01_v4_avg = bs_test_subj01_v4.reshape(len(test_ids), 3, -1).mean(axis=1)

bypass_train = bypass_final[train_mask]
bypass_test = bypass_final[test_mask]

train_max_diff = np.max(np.abs(bypass_train - bs_train_subj01_v4))
test_max_diff = np.max(np.abs(bypass_test - bs_test_subj01_v4_avg))

print('=== End-to-End Bypass: Raw HDF5 -> Brain-Score Assembly (subj01, V4) ===')
print(f'Train: max_diff={train_max_diff:.2e} [{"PASS" if train_max_diff < 1e-5 else "FAIL"}]')
print(f'Test:  max_diff={test_max_diff:.2e} [{"PASS" if test_max_diff < 1e-5 else "FAIL"}]')
assert train_max_diff < 1e-5, f'TRAIN MISMATCH: {train_max_diff}'
assert test_max_diff < 1e-5, f'TEST MISMATCH: {test_max_diff}'

print(f'\nBypass shape:   {bypass_train.shape} train, {bypass_test.shape} test')
print(f'Assembly shape: {bs_train_subj01_v4.shape} train, {bs_test_subj01_v4_avg.shape} test')
print(f'\nFull preprocessing chain validated for subj01 V4.')

=== End-to-End Bypass: Raw HDF5 -> Brain-Score Assembly (subj01, V4) ===
Train: max_diff=1.91e-06 [PASS]
Test:  max_diff=1.91e-06 [PASS]

Bypass shape:   (412, 687) train, (103, 687) test
Assembly shape: (412, 687) train, (103, 687) test

Full preprocessing chain validated for subj01 V4.


In [9]:
# Verify neuroid ordering: assembly is subject-major
# Layout: [subj01_V1, subj01_V2, subj01_V4, subj01_IT, subj02_V1, ...]
assembly_subjects = bs_train.coords['subject'].values
assembly_regions = bs_train.coords['region'].values

# Check 1: subjects are contiguous (all neuroids for a subject appear together)
subj_contiguous = True
for subj in SUBJECT_LIST:
    subj_label = f'subj{subj:02d}'
    indices = np.where(assembly_subjects == subj_label)[0]
    expected = np.arange(indices[0], indices[0] + len(indices))
    ok = np.array_equal(indices, expected)
    if not ok:
        subj_contiguous = False
    print(f'{subj_label}: {len(indices):,} neuroids, '
          f'indices [{indices[0]}..{indices[-1]}], '
          f'contiguous={"PASS" if ok else "FAIL"}')

# Check 2: within each subject, regions appear in order (V1, V2, V4, IT)
region_order_ok = True
for subj in SUBJECT_LIST:
    subj_label = f'subj{subj:02d}'
    subj_mask = assembly_subjects == subj_label
    subj_regions = assembly_regions[subj_mask]
    # Extract unique region sequence (drop consecutive duplicates)
    unique_seq = []
    for r in subj_regions:
        if len(unique_seq) == 0 or r != unique_seq[-1]:
            unique_seq.append(r)
    if unique_seq != REGIONS:
        region_order_ok = False
        print(f'{subj_label}: region order FAIL: {unique_seq}')

# Check 3: within each region, subjects appear in order
subj_order_ok = True
for region in REGIONS:
    region_mask = assembly_regions == region
    region_subjects = assembly_subjects[region_mask]
    unique_subjs = []
    for s in region_subjects:
        if len(unique_subjs) == 0 or s != unique_subjs[-1]:
            unique_subjs.append(s)
    expected_order = [f'subj{s:02d}' for s in SUBJECT_LIST]
    if unique_subjs != expected_order:
        subj_order_ok = False

print(f'\nLayout: subject-major (all regions per subject, then next subject)')
print(f'Subject contiguity:         {"PASS" if subj_contiguous else "FAIL"}')
print(f'Region order within subject: {"PASS" if region_order_ok else "FAIL"}')
print(f'Subject order within region: {"PASS" if subj_order_ok else "FAIL"}')

assert subj_contiguous, 'Subjects are not contiguous'
assert region_order_ok, 'Regions not in expected order within subject'
assert subj_order_ok, 'Subjects not in expected order within region'

subj01: 11,074 neuroids, indices [0..11073], contiguous=PASS
subj02: 10,845 neuroids, indices [11074..21918], contiguous=PASS
subj03: 10,988 neuroids, indices [21919..32906], contiguous=PASS
subj04: 9,860 neuroids, indices [32907..42766], contiguous=PASS
subj05: 9,355 neuroids, indices [42767..52121], contiguous=PASS
subj06: 12,450 neuroids, indices [52122..64571], contiguous=PASS
subj07: 9,040 neuroids, indices [64572..73611], contiguous=PASS
subj08: 10,952 neuroids, indices [73612..84563], contiguous=PASS

Layout: subject-major (all regions per subject, then next subject)
Subject contiguity:         PASS
Region order within subject: PASS
Subject order within region: PASS


## Section 3: Standalone Ridge Regression

Run sklearn RidgeCV on Brain-Score assembly data for V4 and compare per-subject
correlations with the benchmark output from NB04.

This validates the benchmark metric code: if our standalone ridge gives similar
raw correlations as the benchmark, the `ridge_split` metric is working correctly
on our assemblies.

In [10]:
from brainscore_vision import load_model, load_benchmark

model = load_model('alexnet')
benchmark = load_benchmark('Allen2022_fmri.V4-ridge')

t0 = time.time()
benchmark_score = benchmark(model)
elapsed = time.time() - t0

print(f'Benchmark score (ceiling-normalized): {float(benchmark_score.values):.4f}')
print(f'Benchmark raw:                        {float(benchmark_score.raw.values):.4f}')
print(f'Benchmark ceiling:                    {float(benchmark_score.ceiling.values):.4f}')
print(f'Time: {elapsed:.1f}s')

print(f'\nPer-subject raw scores from benchmark:')
benchmark_per_subj = {}
for subj_label in [f'subj{s:02d}' for s in SUBJECT_LIST]:
    if subj_label in benchmark_score.attrs:
        raw_val = float(benchmark_score.attrs[subj_label].raw.values)
        benchmark_per_subj[subj_label] = raw_val
        print(f'  {subj_label}: {raw_val:.4f}')

  class Score(DataAssembly):
  dual_coef = linalg.solve(K, y, assume_a="pos", overwrite_a=False)


<xarray.Score (subject: 1)>
array([0.37448007])
Coordinates:
  * subject  (subject) <U6 'subj01'
Attributes:
    raw:      <xarray.Score (subject: 1)>\narray([0.39834738], dtype=float32)...
    ceiling:  <xarray.Score ()>\narray(0.42373585)\nAttributes:\n    raw:    ...


  dual_coef = linalg.solve(K, y, assume_a="pos", overwrite_a=False)


<xarray.Score (subject: 1)>
array([0.47905756])
Coordinates:
  * subject  (subject) <U6 'subj02'
Attributes:
    raw:      <xarray.Score (subject: 1)>\narray([0.50740105], dtype=float32)...
    ceiling:  <xarray.Score ()>\narray(0.53742148)\nAttributes:\n    raw:    ...


  dual_coef = linalg.solve(K, y, assume_a="pos", overwrite_a=False)


<xarray.Score (subject: 1)>
array([0.33453603])
Coordinates:
  * subject  (subject) <U6 'subj03'
Attributes:
    raw:      <xarray.Score (subject: 1)>\narray([0.33000082], dtype=float32)...
    ceiling:  <xarray.Score ()>\narray(0.32552709)\nAttributes:\n    raw:    ...


  dual_coef = linalg.solve(K, y, assume_a="pos", overwrite_a=False)


<xarray.Score (subject: 1)>
array([0.23745752])
Coordinates:
  * subject  (subject) <U6 'subj04'
Attributes:
    raw:      <xarray.Score (subject: 1)>\narray([0.2938487], dtype=float32)\...
    ceiling:  <xarray.Score ()>\narray(0.36363159)\nAttributes:\n    raw:    ...


  dual_coef = linalg.solve(K, y, assume_a="pos", overwrite_a=False)


<xarray.Score (subject: 1)>
array([0.25326535])
Coordinates:
  * subject  (subject) <U6 'subj05'
Attributes:
    raw:      <xarray.Score (subject: 1)>\narray([0.33371824], dtype=float32)...
    ceiling:  <xarray.Score ()>\narray(0.439728)\nAttributes:\n    raw:      ...


  dual_coef = linalg.solve(K, y, assume_a="pos", overwrite_a=False)


<xarray.Score (subject: 1)>
array([0.21597222])
Coordinates:
  * subject  (subject) <U6 'subj06'
Attributes:
    raw:      <xarray.Score (subject: 1)>\narray([0.2773865], dtype=float32)\...
    ceiling:  <xarray.Score ()>\narray(0.35626462)\nAttributes:\n    raw:    ...


  dual_coef = linalg.solve(K, y, assume_a="pos", overwrite_a=False)


<xarray.Score (subject: 1)>
array([0.49076643])
Coordinates:
  * subject  (subject) <U6 'subj07'
Attributes:
    raw:      <xarray.Score (subject: 1)>\narray([0.47953454], dtype=float32)...
    ceiling:  <xarray.Score ()>\narray(0.4685597)\nAttributes:\n    raw:     ...
<xarray.Score (subject: 1)>
array([0.39992187])
Coordinates:
  * subject  (subject) <U6 'subj08'
Attributes:
    raw:      <xarray.Score (subject: 1)>\narray([0.40743586], dtype=float32)...
    ceiling:  <xarray.Score ()>\narray(0.41509104)\nAttributes:\n    raw:    ...
Benchmark score (ceiling-normalized): 0.3482
Benchmark raw:                        0.3785
Benchmark ceiling:                    0.4233
Time: 3.4s

Per-subject raw scores from benchmark:
  subj01: 0.3983
  subj02: 0.5074
  subj03: 0.3300
  subj04: 0.2938
  subj05: 0.3337
  subj06: 0.2774
  subj07: 0.4795
  subj08: 0.4074


  dual_coef = linalg.solve(K, y, assume_a="pos", overwrite_a=False)
  common_dims = tuple(pd.unique([d for v in vars for d in v.dims]))
  common_dims = tuple(pd.unique([d for v in vars for d in v.dims]))
  raw_val = float(benchmark_score.attrs[subj_label].raw.values)


In [11]:
from sklearn.linear_model import RidgeCV
from scipy.stats import pearsonr
from brainscore_vision.metrics.regression_correlation.metric import ALPHA_LIST

# Get model activations and neural data from benchmark internals
model_train_act = benchmark.train_activations.values
model_test_act = benchmark.test_activations.values
if model_train_act.ndim > 2:
    model_train_act = model_train_act.squeeze()
    model_test_act = model_test_act.squeeze()

train_neural = benchmark.train_assembly
test_neural = benchmark.test_assembly
if 'time_bin' in train_neural.dims:
    train_neural = train_neural.isel(time_bin=0)
if 'time_bin' in test_neural.dims:
    test_neural = test_neural.isel(time_bin=0)

print(f'Model activations: {model_train_act.shape} train, {model_test_act.shape} test')
print(f'Neural data:       {train_neural.shape} train, {test_neural.shape} test')

# Standalone ridge regression per subject
subjects = np.unique(train_neural['subject'].values)
standalone_results = {}

for subj_label in subjects:
    subj_mask_train = train_neural['subject'].values == subj_label
    subj_mask_test = test_neural['subject'].values == subj_label

    y_train = train_neural.values[:, subj_mask_train]
    y_test = test_neural.values[:, subj_mask_test]

    ridge = RidgeCV(alphas=ALPHA_LIST, fit_intercept=True)
    ridge.fit(model_train_act, y_train)
    y_pred = ridge.predict(model_test_act)

    n_neuroids = y_test.shape[1]
    correlations = np.array([
        pearsonr(y_pred[:, v], y_test[:, v])[0]
        for v in range(n_neuroids)
    ])

    median_r = np.median(correlations)
    standalone_results[subj_label] = {
        'median_r': median_r,
        'alpha': ridge.alpha_,
        'n_neuroids': n_neuroids,
    }
    print(f'{subj_label}: median r = {median_r:.4f}, '
          f'alpha = {ridge.alpha_:.0f}, '
          f'n_neuroids = {n_neuroids}')

standalone_mean_r = np.mean([v['median_r'] for v in standalone_results.values()])
print(f'\nStandalone mean raw r: {standalone_mean_r:.4f}')

Model activations: (412, 64896) train, (103, 64896) test
Neural data:       (412, 1660) train, (103, 1660) test
subj01: median r = 0.4239, alpha = 100000, n_neuroids = 325
subj02: median r = 0.5202, alpha = 75000, n_neuroids = 311
subj03: median r = 0.3390, alpha = 150000, n_neuroids = 184
subj04: median r = 0.3132, alpha = 200000, n_neuroids = 140
subj05: median r = 0.3531, alpha = 150000, n_neuroids = 275
subj06: median r = 0.2850, alpha = 200000, n_neuroids = 163
subj07: median r = 0.4835, alpha = 150000, n_neuroids = 119
subj08: median r = 0.4275, alpha = 150000, n_neuroids = 143

Standalone mean raw r: 0.3932


In [12]:
# Compare standalone vs benchmark results
print('=== Standalone Ridge vs Benchmark Comparison (V4) ===')
print(f'{"Subject":<10} {"Standalone":>12} {"Benchmark":>12} {"Diff":>12}')
print('-' * 50)

for subj_label in subjects:
    standalone_r = standalone_results[subj_label]['median_r']
    bm_r = benchmark_per_subj.get(subj_label, float('nan'))
    diff = abs(standalone_r - bm_r) if not np.isnan(bm_r) else float('nan')
    print(f'{subj_label:<10} {standalone_r:>12.4f} {bm_r:>12.4f} {diff:>12.4f}')

benchmark_raw = float(benchmark_score.raw.values)
print(f'\n{"Mean":>10} {standalone_mean_r:>12.4f} {benchmark_raw:>12.4f} '
      f'{abs(standalone_mean_r - benchmark_raw):>12.4f}')

print(f'\nNote: Small differences are expected because the benchmark uses')
print(f'a cross-validated ridge metric that may handle multi-output fitting')
print(f'differently from sklearn\'s single RidgeCV.')

=== Standalone Ridge vs Benchmark Comparison (V4) ===
Subject      Standalone    Benchmark         Diff
--------------------------------------------------
subj01           0.4239       0.3983       0.0255
subj02           0.5202       0.5074       0.0128
subj03           0.3390       0.3300       0.0090
subj04           0.3132       0.2938       0.0194
subj05           0.3531       0.3337       0.0194
subj06           0.2850       0.2774       0.0076
subj07           0.4835       0.4795       0.0040
subj08           0.4275       0.4074       0.0201

      Mean       0.3932       0.3785       0.0147

Note: Small differences are expected because the benchmark uses
a cross-validated ridge metric that may handle multi-output fitting
differently from sklearn's single RidgeCV.


## Summary

| Validation | What it proves |
|---|---|
| Section 1: NB02 -> global z-score -> assembly | NB03 packaging step is correct |
| Section 1: Voxel spot checks | Individual values trace correctly |
| Section 2: Raw HDF5 -> assembly (subj01, V4) | Full preprocessing chain is correct |
| Section 2: Neuroid ordering | Subject-major layout with correct region/subject order |
| Section 3: Standalone ridge vs benchmark | Benchmark metric code works on our data |

In [13]:
print('=== Volumetric Assembly Validation Summary ===')
print()
print('Section 1 (Voxel-Level Spot Check):')
print(f'  NB02->assembly train max_diff: {max_diff:.2e} [PASS]')
print(f'  NB02->assembly test max_diff:  {max_diff_test:.2e} [PASS]')
print(f'  5/5 spot checks: PASS')
print()
print('Section 2 (End-to-End Raw Data Bypass):')
print(f'  Raw HDF5->assembly train max_diff: {train_max_diff:.2e} [PASS]')
print(f'  Raw HDF5->assembly test max_diff:  {test_max_diff:.2e} [PASS]')
print(f'  Subject contiguity:          {"PASS" if subj_contiguous else "FAIL"}')
print(f'  Region order within subject: {"PASS" if region_order_ok else "FAIL"}')
print(f'  Subject order within region: {"PASS" if subj_order_ok else "FAIL"}')
print()
print('Section 3 (Standalone Ridge):')
print(f'  Standalone mean r: {standalone_mean_r:.4f}')
print(f'  Benchmark mean r:  {benchmark_raw:.4f}')
print(f'  Difference:        {abs(standalone_mean_r - benchmark_raw):.4f}')
print()
all_passed = (max_diff < 1e-5 and max_diff_test < 1e-5 and all_pass
              and train_max_diff < 1e-5 and test_max_diff < 1e-5
              and subj_contiguous and region_order_ok and subj_order_ok)
print(f'All validations passed: {"YES" if all_passed else "NO"}')

=== Volumetric Assembly Validation Summary ===

Section 1 (Voxel-Level Spot Check):
  NB02->assembly train max_diff: 0.00e+00 [PASS]
  NB02->assembly test max_diff:  9.54e-07 [PASS]
  5/5 spot checks: PASS

Section 2 (End-to-End Raw Data Bypass):
  Raw HDF5->assembly train max_diff: 1.91e-06 [PASS]
  Raw HDF5->assembly test max_diff:  1.91e-06 [PASS]
  Subject contiguity:          PASS
  Region order within subject: PASS
  Subject order within region: PASS

Section 3 (Standalone Ridge):
  Standalone mean r: 0.3932
  Benchmark mean r:  0.3785
  Difference:        0.0147

All validations passed: YES
