# Classifying trialwise CorrectGo and NoGo trials

There are a number of steps to this. Hopefully we can recycle previous code and be up fairly quickly!

1. Load beta data. Ideally this process should include a cache into a pure python object so we don't have to reload it each time.
2. Preprocess the data.
3. Do cross-validated training and testing. Ideally an inner loop to select best parameters, an outer loop to get cross-validated performance, and final training over all the data to get an image. The inner loop can be probably be handled within the package we use probably.

In [2]:
import sys
import os
import pandas as pd



sys.path.append(os.path.abspath("../../ml/"))
from apply_loocv_and_save import load_and_preprocess
from dev_utils import read_yaml_for_host
import warnings


config_data = read_yaml_for_host("sst_config.yml")



In [12]:
import multiprocessing
import math
import nibabel as nib
import nilearn as nl
from nilearn.decoding import DecoderRegressor,Decoder
from sklearn.model_selection import KFold,GroupKFold,LeaveOneOut
cpus_available = multiprocessing.cpu_count()

cpus_to_use = min(cpus_available-1,math.floor(0.9*cpus_available))
print(cpus_to_use)

25


In [3]:
from dev_wtp_io_utils import cv_train_test_sets, asizeof_fmt

In [4]:
nonbids_data_path = config_data['nonbids_data_path']
ml_data_folderpath = nonbids_data_path + "fMRI/ml"


## Set up the paradigm

In [5]:

def trialtype_resp_trans_func(X):
    return(X.trial_type)


## Loading beta data

beta data is generally written in `load_multisubject_brain_data_sst_w1.ipynb`.

We just have to load it.

In [14]:
brain_data_filepath = ml_data_folderpath + '/SST/Brain_Data_betaseries_15subs_correct_cond.pkl'
warnings.warn("not sure if this file holds up--it was created in 2021; need to see if it's still valid")
train_test_markers_filepath = ml_data_folderpath + "/train_test_markers_20220818T144138.csv"



In [7]:


all_subjects = load_and_preprocess(
    brain_data_filepath,
    train_test_markers_filepath,
    subjs_to_use = None,
    response_transform_func = trialtype_resp_trans_func,
    clean=None)

warnings.warn("the data hasn't been cleaned at any point. the fMRIPrep cleaning pipeline has been applied; nothing else has been.")


checked for intersection and no intersection between the brain data and the subjects was found.
there were 15 subjects overlapping between the subjects marked for train data and the training dump file itself.
test_train_set: 62918
pkl_file: 168
brain_data_filepath: 152
train_test_markers_filepath: 141
response_transform_func: 136
sys: 72
Brain_Data_allsubs: 48
clean: 16
subjs_to_use: 16


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Brain_Data_allsubs.Y[Brain_Data_allsubs.Y=='NULL']=None


1549
1549
cleaning memory




I'm not 100% clear this is good data, but it looks good; let's try it for now and move ahead. How do we classify it?

In [8]:
all_subjects['metadata'][0:10]

Unnamed: 0,onset,duration,trial_type,subject,wave,beta
0,0.0,2.25834,correct-stop,DEV005,wave1,beta_0001.nii
1,2.75834,0.40082,correct-go,DEV005,wave1,beta_0003.nii
2,5.5139,0.66191,correct-go,DEV005,wave1,beta_0005.nii
3,12.5278,0.51712,correct-go,DEV005,wave1,beta_0009.nii
4,15.90975,0.39906,correct-go,DEV005,wave1,beta_0011.nii
5,18.35212,0.33093,correct-go,DEV005,wave1,beta_0013.nii
6,24.42644,0.72669,correct-go,DEV005,wave1,beta_0017.nii
7,33.87298,0.43048,correct-go,DEV005,wave1,beta_0021.nii
8,38.87993,0.40215,correct-go,DEV005,wave1,beta_0023.nii
9,42.5723,0.50627,correct-go,DEV005,wave1,beta_0025.nii


In [46]:
all_subjects['y'].value_counts()

correct-go      1378
correct-stop     171
Name: trial_type, dtype: int64

In [45]:
type(all_subjects['X'])

nibabel.nifti1.Nifti1Image

We need to select just the trials we want to classify.

In [35]:
# groups_to_classify = ['correct-go','correct-stop']


# # create a vector of booleans that will be used to filter the data
# include_vec = all_subjects['y'].apply(lambda x: True if x in groups_to_classify else False)

# #print the shape of each object in the dictionary
# for key in all_subjects.keys():
#     print(key, all_subjects[key].shape)

# all_sub_go_stop={}
# # great, now use y to pick out the items we want, then filter the other objects accordingly.
# all_sub_go_stop['y'] = all_subjects['y'][include_vec]
# #slice the nifti object with the vector of booleans
# all_sub_go_stop['X'] = all_subjects['X'].slice(include_vec, axis=0)

# all_sub_go_stop['groups'] = all_subjects['groups'][include_vec]
# all_sub_go_stop['metadata'] = all_subjects['groups'][include_vec,]

X (97, 115, 97, 1536)
y (1536,)
groups (1536,)
metadata (1536, 6)


TypeError: 'SpatialFirstSlicer' object is not callable

In [16]:
# get the PFC mask
mask_nifti = nib.load(ml_data_folderpath + '/prefrontal_cortex.nii.gz')

full_img = all_subjects['X']
print(full_img.shape)

# #now apply the mask to the data using nibabel
# masked_img = nl.masking.apply_mask(full_img, mask_nifti)


(97, 115, 97, 1549)


In [18]:
#convert the y array to an integer array representing the string values of the y array
all_subjects['y_cat'] = all_subjects['y'].astype('category')
all_subjects['y_int']=all_subjects['y_cat'].cat.codes

# Training

I'm going to start with `cv_train_test_sets` and see how that goes. It sems likely it'll have to be re-written somewhat, but it might be a good starting point.

In [20]:
dec_main = Decoder(standardize=True,cv=GroupKFold(3),scoring='roc_auc',n_jobs=cpus_to_use,mask=mask_nifti)
cv_results = cv_train_test_sets(
    trainset_X = all_subjects['X'],
    trainset_y = all_subjects['y_int'],
    trainset_groups = all_subjects['metadata']['subject'],
    decoders = [dec_main],
    cv=KFold(n_splits=3) # we use KFold, not GroupKfold, because it's splitting on Group anyway
    )

Groups are the same.
fold 1 of 3
In order to test on a training group of 10 items, holding out the following subjects:['DEV019' 'DEV018' 'DEV023' 'DEV010' 'DEV009']. prepping fold data.... fitting.... 8.2 GiB. trying decoder 1 of 1. predicting. test score was:. 0.4849092749222872
fold 2 of 3
In order to test on a training group of 10 items, holding out the following subjects:['DEV005' 'DEV021' 'DEV012' 'DEV006' 'DEV022']. prepping fold data.... fitting.... 8.5 GiB. trying decoder 1 of 1. 



predicting. test score was:. 0.4525170068027211
fold 3 of 3
In order to test on a training group of 10 items, holding out the following subjects:['DEV016' 'DEV014' 'DEV015' 'DEV017' 'DEV013']. prepping fold data.... fitting.... 8.3 GiB. trying decoder 1 of 1. 



predicting. test score was:. 0.5216873706004141
