Importing Libraries:  
Python imports and modules that are required that are imported at the start:
-	os, sys, time, numpy (np alias), pandas (pd alias), matplotlib.pyplot (plt alias),  ipywidgets, tqdm.notebook, nibabel, glmsingle, bids, noise_ceiling, tc2see

In [12]:
%load_ext autoreload
%autoreload 2

import os
import sys
import time
from pprint import pprint
from pathlib import Path
from random import randint

import numpy as np
import torch
import pandas as pd
import matplotlib.pyplot as plt
from ipywidgets import interact
from tqdm.notebook import tqdm
import nibabel as nib
# import glmsingle
# from glmsingle.glmsingle import GLM_single
import bids
from bids import BIDSLayout
from scipy.ndimage import zoom, binary_dilation
import h5py
import nibabel as nib
from einops import rearrange

dir2 = os.path.abspath('..')
dir1 = os.path.dirname(dir2)
if not dir1 in sys.path: 
    sys.path.append(dir1)
    
from tc2see import load_data

from sklearn.model_selection import KFold
from fracridge import FracRidgeRegressorCV
from metrics import (
    cosine_distance, squared_euclidean_distance, r2_score, two_versus_two,
    two_versus_two_slow
)
import warnings

from noise_ceiling import (
    compute_ncsnr,
    compute_nc,
)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Setting Paths and Variables:  
It sets up various directory paths for loading and saving data:
-	tc2see_version and subject are variables that specify the version of the dataset and the subject being analyzed.
-	tr is the repetition time (time between volume acquisitions in the fMRI data).

In [13]:
dataset_root = Path('E:\\fmri_processing\\results')
tc2see_version = 3
dataset_path = dataset_root
derivatives_path = dataset_path / 'derivatives_TC2See'
data_path = derivatives_path / 'fmriprep'

Computing Accuracy, Standard Dev, etc:  
This code segment performs a series of operations to evaluate the accuracy of a model's predictions for a given subject. It calculates the accuracy and variance of the model's predictions using a cross-validation approach with a range of parameters. The code initializes various variables, including the subject ID and configuration parameters. It then iterates through multiple test runs, using the remaining runs for training the model. For each test run, it loads preprocessed fMRI data, either applies a mask, takes the top voxels, or neither of those, and extracts relevant brain responses. These brain responses are used as input features for a machine learning model to predict clip embeddings. The code evaluates the accuracy of the model's predictions by comparing the cosine distances between the ground truth and predicted embeddings. It computes the accuracy, variance, and other statistics, collecting these values for further analysis. The final output includes the mean accuracy, variance, standard deviation, and minimum and maximum accuracy values across all test runs and subjects. The code aims to assess the model's performance in predicting brain responses to visual stimuli. This code especially can be edited to manipulate results, or to try different tests.  
In the next two code cells the first one calculates all of the values in one run. This can get the different accuracies of the actual data. While the second code cell calculates all of the values over 9 runs (can easily be changed ~ probably increased), with shuffled data. This is to get the random data to compare to, to see whether or not the actual data is giving meaningful results.

In [15]:
def bb_mask(mask, vc_height_min, vc_height_max, vc_width, vc_depth):
    brain_width, brain_depth, brain_height = mask.shape
    vc_center = np.array([brain_width//2, 0, 0])
    vc_bl = vc_center + np.array([-vc_width,0,vc_height_min]) # bottom left
    vc_tr = vc_center + np.array([vc_width,vc_depth,vc_height_max]) # top right
    vc_mask = np.zeros_like(mask)
    vc_mask[vc_bl[0]:vc_tr[0], vc_bl[1]:vc_tr[1], vc_bl[2]:vc_tr[2]] = True # boolean array
    return vc_mask[mask] # flattens both 3D arrays into a one dimensional vector (True values inside the bb, False values outside). Intersection of bb and brain

accuracies = {}
# subjs = [str(sub) if sub >= 10 else '0'+str(sub) for sub in range(1,30)] 
subjs = ['31']
for subj in tqdm(subjs):
    # try:
        tr = 2 # 1.97
        subject_no = subj 
        subject = f'sub-{subject_no}'

        bold, stimulus_ids, mask, affine = load_data(
            data_path / f'tc2see-v{tc2see_version}-bold-test-31.hdf5', 
            subject,
            tr_offset=6 / tr,
            run_normalize='linear_trend',
            interpolation=False,
        )

        model_name = 'ViT-B=32'
        embedding_name = 'embedding' 

        # load the clip embeddings
        with h5py.File(derivatives_path / f'{model_name}-features.hdf5', 'r') as f:
            stimulus = f[embedding_name][:]
        Y = stimulus[stimulus_ids] # get the stimulus representations to decode


        subject = f'sub-{subject_no}'
        # 6 Runs - 1 run as the test each time (a run is each time the person gets into the scanner and looks into the scanner for a certain amount of time ~ approx 6 mins)
        results = dict
        permutation_test = False
        nc_threshold = 9
        iterations = 1
        num_runs = 6

        max_tot_acc = 0
        threshold_for_max = 0

        all_itters_avg = 0
        all_itters_var = 0
        all_itters_std = 0
        all_itters_max = 0
        all_itters_min = 0

        for iteration in tqdm(range(iterations)):
            itter_accuracy = 0
            itter_variance = 0
            
            # Cross validation. Use every id as test data once.
            for test_run_id in tqdm(range(num_runs)):
                training_run_ids = list(range(num_runs))
                training_run_ids.remove(test_run_id) # Remove the test data id 

                load_data_params = dict(
                    path = data_path / f'tc2see-v{tc2see_version}-bold-test-31.hdf5', 
                    subject = subject,
                    tr_offset = num_runs / tr,
                    run_normalize='linear_trend',
                    interpolation=False,
                )

                bold_train, stimulus_ids_train, mask, affine = load_data(
                    **load_data_params,
                    run_ids = training_run_ids
                )

                mask = mask[mask] # flatten mask

                bold_test, stimulus_ids_test, _, _ = load_data(
                    **load_data_params,
                    run_ids = [test_run_id]
                )


                # argsort_ids = np.argsort(-nc_vc) # Default ascending, make descending
                # argsort_ids = argsort_ids[:5000] # Up to 500 voxels (go about by around powers of 2)
                # selection_mask = (nc > nc_threshold) & vc_mask
                # print(f'{nc_threshold=}, num_voxels={(nc > nc_threshold).sum()}')
                # X_train = bold_train[:, selection_mask] # X's are the brain responses (brain numbers in response to images)  (Within noise ceiling threshold and bounding box)
                # X_train = bold_train[:, argsort_ids] # X's are the brain responses (brain numbers in response to images) (With limited voxel amounts)
                # X_test = bold_test[:, argsort_ids]
                # X_test = bold_test[:, vc_mask]
                
                print(bold.shape)
                ncsnr = compute_ncsnr(bold_train, stimulus_ids_train) # Compute noise ceiling noise ratio
                nc = compute_nc(ncsnr, num_averages=1)

                nc_vc = nc.copy() 
                nc_vc[~mask] = 0 # Set values not in mask to zero 
                argsort_ids = np.argsort(-nc_vc) # Default ascending, make descending 
                argsort_ids = argsort_ids[:256] 
                X_train = bold_train[:, argsort_ids]    

                # ##################################
                # bold_train[:, argsort_ids] = True
                # print("Number of ones in mask: ", np.count_nonzero(bold_train == 1))
                # bold_train[:, ~argsort_ids] = False
                # print("Number of zeros in mask: ", np.count_nonzero(bold_train == 0))
                # ###################################

                # flattened_mask = mask[mask]
                # X_train = bold_train[:, flattened_mask]
                # X_train = X_train[:, nc > nc_threshold] # X's are the brain responses (brain numbers in response to images)

                X_nan_train = np.isnan(X_train) # Checks if any not a number values in x and sets those to zero
                X_train[X_nan_train] = 0.

                # X_test = bold_test[:, flattened_mask]
                # X_test = X_test[:, nc > nc_threshold]
                X_test = bold_test[:, argsort_ids]
                X_nan_test = np.isnan(X_test) # Checks if any not a number values in x and sets those to zero
                X_test[X_nan_test] = 0.

                with h5py.File(derivatives_path / f'{model_name}-features.hdf5', 'r') as f:
                    stimulus = f[embedding_name][:]
                Y_train = stimulus[stimulus_ids_train] 
                Y_test = stimulus[stimulus_ids_test]

                if permutation_test:
                    ids = np.arange(Y_train.shape[0])
                    np.random.shuffle(ids)
                    Y_train = Y_train[ids]

                model = FracRidgeRegressorCV()
                model.fit(X_train, Y_train)
                Y_test_pred = model.predict(X_test) # Y_test and Y_test_pred are n x 512 matrics (n is the number of birds).

                distances = cosine_distance(
                    torch.from_numpy(Y_test[None]).float(), 
                    torch.from_numpy(Y_test_pred[:, None]).float()
                ) # Y_test(1, N, 512) & Y_test_pred(N, 1, 512) converted to pytorch arrays from np
                print(distances)
                # Chance is 50% (above 50% is good, below not great, if really close ex. 54% or 52%, prove statistically above chance)
                accuracy = round(two_versus_two(distances, stimulus_ids=stimulus_ids).item() * 100, 2) 
                itter_accuracy += accuracy
                
                #### code here to see if min/max changed in next cell
                if accuracy < all_itters_min:
                    min = accuracy
                if accuracy > all_itters_max:
                    max = accuracy

                # itter_variance = 0
                #### 66.59 is 50 in next cell
                # variance = np.mean([(accuracy - 66.5917) ** 2])
                # total_variance += variance

                # print(f'{test_run_id=}, {accuracy=}')
            
            all_itters_avg += itter_accuracy

            print(f"Iteration {iteration} avg accuracy: ", itter_accuracy/num_runs)
            # variance = total_variance/6
            # std_dev = np.sqrt(variance)
            # print("Standard Dev", std_dev)

            # another level to print total avg and std and min/max
        accuracies[subj] = all_itters_avg/(num_runs*iterations)
        total_accuracy = all_itters_avg/(num_runs*iterations)
        print("Total Accuracy: ", total_accuracy)
    # except Exception as e:
    #     print(f"There was an error for subject {subj}: ", e)

  0%|          | 0/1 [00:00<?, ?it/s]

  run_bold = (run_bold - predicted_bold) / group['bold_trend_std'][i]
  bold = (bold - np.nanmean(bold, axis=0)) / np.nanstd(bold, axis=0)
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,


  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/6 [00:00<?, ?it/s]

(450, 166634)


  betas_var_mean = np.nanmean(np.stack(betas_var), axis=0)


tensor([[0.7977, 1.0910, 0.8242,  ..., 0.9765, 0.7973, 0.8976],
        [0.9113, 0.8307, 0.8458,  ..., 1.0671, 0.9270, 0.8633],
        [0.9988, 0.9060, 0.8498,  ..., 1.0243, 0.9998, 0.9083],
        ...,
        [0.9447, 1.1194, 0.9651,  ..., 1.1311, 0.9656, 1.0313],
        [0.8691, 1.0451, 0.9205,  ..., 1.0467, 0.8793, 0.9136],
        [1.1197, 1.0239, 0.9971,  ..., 1.1348, 1.0017, 0.9544]])


  run_bold = (run_bold - predicted_bold) / group['bold_trend_std'][i]


KeyboardInterrupt: 

In [None]:
print(accuracies)
# {'30': 50.824999999999996}

{'31': 55.13}


Running the code:  
To run all of the code, the only areas that need to be changed in both files is each subjects = [] line. And the subject that the data is to be found for should be included in the square brackets. For example:  
subjects = ['sub-19']  
The code cells should be run in order.  
Path files could be changed to if different.  

The following different lines of these in the Computing Accuracy, Standard Dev, etc. should be commented out, corresponding to each other. The first one is for computing the accuracies without a bounding box mask, the second one is with a bounding box mask, and the third one simply takes the top 256 voxels. (Have only one of the three in each corresponding to each other not commented out when running it).

(1) X_train = bold_train[:, nc > nc_threshold] # X's are the brain responses (brain numbers in response to images)  
(2) X_train = bold_train[:, selection_mask] # X's are the brain responses (brain numbers in response to images)  (Within noise ceiling threshold and bounding box)  
(3) X_train = bold_train[:, argsort_ids] # X's are the brain responses (brain numbers in response to images) (With limited voxel amounts)  

(1)	X_test = bold_test[:, nc > nc_threshold]  
(2)	X_test = bold_test[:, selection_mask]  
(3)	X_test = bold_test[:, argsort_ids]  