# SST Individual Discriminability

This notebook measures discriminability of conditions for each subject, within runs.

It then takes the average discriminability across any multiple runs (I don't think we have them though).

Then we can compare discriminability against other subject-level variables.



Something similar was previously been done in `rsa_within_subj.ipynb`; it took similarity between matrices representing the averages of GNG and compared them intra-class similarity.

That file doesn't take the next step of trying to see how that similarity score might measure up against across-subject variables; it seems, because I did not find good evidence that intrasimilarity was actually higher than interclass similarity.

Conclusions:

 - maybe I should have tried discriminability, not just similarity
 - If we're interested in discriminability, then running an ML algorithm should be superior?
 - But don't rule out going back to similarity.

## TESQ

### Machine learning

First set up.

In [10]:
import pickle
from IPython.core.display import display, HTML, Markdown

In [11]:
from nilearn.decoding import Decoder

In [12]:
from sklearn.model_selection import StratifiedKFold
from random import randint
import math

In [13]:
import sys
import os
import pandas as pd
import gc

sys.path.append(os.path.abspath("../../ml/"))

from apply_loocv_and_save import *
from dev_wtp_io_utils import *
import gc
import nibabel as nib

from os import path



nonbids_data_path = "/gpfs/projects/sanlab/shared/DEV/nonbids_data/"
ml_data_folderpath = "/gpfs/projects/sanlab/shared/DEV/nonbids_data/fMRI/ml"
train_test_markers_filepath = ml_data_folderpath + "/train_test_markers_20210601T183243.csv"
test_train_df = pd.read_csv(train_test_markers_filepath)

all_sst_events= pd.read_csv(ml_data_folderpath +"/SST/" + "all_sst_events.csv")


dataset_name = 'conditions'

from nilearn.decoding import DecoderRegressor, Decoder

script_path = '/gpfs/projects/sanlab/shared/DEV/DEV_scripts/fMRI/ml'
# HRF 2s

#get a PFC mask
#pfc_mask = create_mask_from_images(get_pfc_image_filepaths(ml_data_folderpath + "/"),threshold=10)


def trialtype_resp_trans_func(X):
    return(X.trial_type)



In [14]:
gc.collect()

100

In [15]:

cpus_available = int(os.getenv('CPUS_PER_TASK'))
#custom thing I have set in my jupyter notebook task.
print(cpus_available)
cpus_to_use = cpus_available-1

4


In [16]:
cpus_to_use

3

In [17]:

dataset_name = 'conditions'


brain_data_filepath = ml_data_folderpath + '/SST/Brain_Data_betaseries_40subs_correct_cond.pkl'
#brain_data_filepath = ml_data_folderpath + '/SST/Brain_Data_conditions_43subs_correct_cond.pkl'

def decoderConstructor(*args, **kwargs):
    return(Decoder(scoring='accuracy',verbose=0, *args, **kwargs))


relevant_mask = None

 `load_and_preprocess` has problems but I"m going to continue using it because I'm having problems with the standardization in the nilearn package. So there's not much point in deviating. We can also use the slicer in load_and_preprocess, if we get teh subject list first.
 
To get the subject list, we'll load the data once, get the list, and then load each subject individually.

In [18]:
all_subjects = load_and_preprocess(
    brain_data_filepath,
    train_test_markers_filepath,
    subjs_to_use = None,
    response_transform_func = trialtype_resp_trans_func,
    clean=None)

all_subjects['groups']

subj_list = np.unique(all_subjects['groups'])

del all_subjects
gc.collect()

checked for intersection and no intersection between the brain data and the subjects was found.
there were 40 subjects overlapping between the subjects marked for train data and the training dump file itself.
test_train_set: 9549
pkl_file: 168
brain_data_filepath: 152
train_test_markers_filepath: 141
response_transform_func: 136
sys: 72
Brain_Data_allsubs: 48
clean: 16
subjs_to_use: 16


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Brain_Data_allsubs.Y[Brain_Data_allsubs.Y=='NULL']=None


4229
4229


0

This takes too long. For that reason, even though there are accuracy challenges, I'd like to proceed with a short-cut--we do 10-fold cross validation using the LeaveOneGroupOut feature, and ensure that Go and NoGo values are as evenly distributed across the groups as possible.

But we'll leave the capacity to do full LeaveOneOut in there because it's probably good to go back to in the future.

First to test the technical process, let's try doing just three manually-generated groups.

OK, so this isn't very good for actually closely understanding what is being classified as what; we're not getting predictions for the scores under the hood. But we can take a look at the overall predictive accuracy on a non-held-out set. Note that we can't use this for asssessing model fit (to do that we better take the average of the prediction accuracies for the folds, or run an independent test-train analysis) but we can use this to understand things like class bias or look at the image being predicted.

In [19]:
def get_subject_discriminability(sample_subject):
    min_splits = 2
    # iterate through each subject; for each subject:
    display(Markdown("loading subject " + sample_subject))

    subj_i_processed_data = load_and_preprocess(
        brain_data_filepath,
        train_test_markers_filepath,
        subjs_to_use = [sample_subject],
        response_transform_func = trialtype_resp_trans_func,
        clean="standardize")
    
    display(subj_i_processed_data['y'].value_counts())
    
    display(Markdown("setting up decoder..."))
    #we use stratified Kfold
    correct_stop_count =np.sum(subj_i_processed_data['y']=='correct-stop')
    
    if correct_stop_count< min_splits:
        return(None)
        #could do one split per correct-stop
        #because there are generally very few of them
    skf = StratifiedKFold(n_splits = 3,random_state= randint(0,math.pow(2,32)),
                          shuffle=True)
        #for testing for now we'll use 3
        
    #do this separately for each outcome group
    decoder = Decoder(standardize=True, 
                      cv = skf, #mask = mask,
                      n_jobs = cpus_to_use,#verbose=10,
                      scoring='accuracy'
                     )

    display(Markdown("fitting"))
    #get overfit individual predictions--only way we can assess individual predictions
    decoder_result = decoder.fit(X=subj_i_processed_data['X'],y=subj_i_processed_data['y'])
    
    display(Markdown("evaluating"))
    
    predictions = decoder.predict(subj_i_processed_data['X'])
    y_pred_vs_obs = pd.DataFrame({'y_obs':subj_i_processed_data['y'],'y_pred':predictions})
    overfit_accuracy = np.sum(y_pred_vs_obs['y_obs']==y_pred_vs_obs['y_pred'])/len(y_pred_vs_obs['y_obs'])
    
    #get mean_cv_scores - cross-validated scores but I'm not sure what they mean becuase the package is vague
    mean_cv_scores = np.mean([np.mean(c_scores) for c_name, c_scores in decoder.cv_scores_.items()])
    
    #alternative to this is to do our own cv stuff
    subj_discrim_results = {
        'mean_cv_scores':mean_cv_scores,
        'overfit_accuracy':overfit_accuracy,
        'overfit_y_pred_vs_obs': y_pred_vs_obs,
        'decoder_object' : decoder
    }
    display(subj_discrim_results)


    return(subj_discrim_results)

Not sure it actually matters to overfit here? We're interested in discriminability not to see if we really can discriminate above chance, but as an individual difference; to see if relative discriminability relates to other things we care about.

In that sense, we probably don't have to worry about overfitting.

In [20]:
# def get_function_with_cache(function_to_run,function_path):
#     if path.exists(function_path) is False:
#         results = function_to_run()
#     else:
#         results=pickle.load(open(function_path,'rb'))
        
#     return(results)
    
def get_subject_discriminability_with_cache(sample_subject,run_desc):
    results_filepath = ml_data_folderpath + "/SST/discriminability_tt_results_" + run_desc + "_" + sample_subject + ".pkl"
    
    if path.exists(results_filepath) is False:
        subj_discrim_results = get_subject_discriminability(sample_subject)
        with open(results_filepath, 'wb') as handle:
            pickle.dump(subj_discrim_results,handle)
    else:
        subj_discrim_results=pickle.load(open(results_filepath,'rb'))
        display(Markdown("pre-loaded."))
        
    
    return(subj_discrim_results)

    
    
    

In [21]:
def get_all_subjs_discriminability_whole_brain(subj_list):
    results_dict = {}

    for sample_subject in subj_list:
        display(Markdown(sample_subject))
        run_desc = 'v1_whole_brain'
        results_dict[sample_subject] = get_subject_discriminability_with_cache(sample_subject,run_desc)
        
    summary_results = pd.concat([pd.DataFrame({
        'subid':k,
        'overfit_accuracy':[v['overfit_accuracy']],
        'mean_cv_scores':[v['mean_cv_scores']]}) 

                                 for k,v in results_dict.items()])
    
    return(summary_results)



In [None]:
summary_results = get_all_subjs_discriminability_whole_brain()

How to deal with imbalanced classes?
We're using svc. One solution: https://chrisalbon.com/code/machine_learning/support_vector_machines/imbalanced_classes_in_svm/

### analysis

In [17]:
from analyze_results import remove_selected_outliers
from scipy.stats import pearsonr,spearmanr

In [None]:
# correlate discriminability against shit we care about.

summary_results2 = summary_results.rename(columns={
    'mean_cv_scores':'discriminability_mean_cv_scores',
    'overfit_accuracy':'discriminability_overfit_accuracy'})


individual_differences = pd.read_csv(ml_data_folderpath + "/" + data_by_ppt_name)
individual_differences = individual_differences.rename(columns={'SID':'subid'})
individual_differences['wave']=1
#individual_differences['wave'] = individual_differences['wave'].astype(object) # for compatibility with the wave column in the dataset
ind_div_combined = summary_results2.merge(individual_differences)

In [None]:

def remove_selected_outliers_tesq_study(ind_div_combined,show_plot=False):
    idc_outliers_removed = remove_selected_outliers(ind_div_combined,
    ['discriminability_overfit_accuracy','discriminability_mean_cv_scores',
        'BFI_extraversion','RMQ_locomotion','ses_aggregate','PLAN_cognitive_strategies',
     'SST_SSRT','BIS_11','BSCS','TESQ_E_suppression', 'TESQ_E_avoidance_of_temptations', 
     'TESQ_E_goal_deliberation', 'TESQ_E_controlling_temptations', 'TESQ_E_distraction',
     'TESQ_E_goal_and_rule_setting','EDM','RS','TRSQ','ROC_Crave_Regulate_Minus_Look',
     'SRHI_unhealthy',
     'cancer_promoting_minus_preventing_FFQ','bf_1'],
    show_plot=show_plot)
    return(idc_outliers_removed)

In [None]:
ind_div_combined_3sd = remove_selected_outliers_tesq_study(
    ind_div_combined,
    show_plot=True)

In [None]:
def display_discriminability_correlations(ind_div_combined_3sd):
    for neural_var in ['discriminability_overfit_accuracy','discriminability_mean_cv_scores']:
        display(Markdown("### " + neural_var))
        for correlate in ['BFI_extraversion','RMQ_locomotion','ses_aggregate','PLAN_cognitive_strategies',
                          'SST_SSRT','BIS_11','BSCS',
                          'TESQ_E_suppression', 'TESQ_E_avoidance_of_temptations', 
                          'TESQ_E_goal_deliberation', 'TESQ_E_controlling_temptations', 'TESQ_E_distraction',
                          'TESQ_E_goal_and_rule_setting',
                        'EDM','RS','TRSQ','ROC_Crave_Regulate_Minus_Look','SRHI_unhealthy']:
            display(Markdown("#### " + correlate))
            nan_rows = np.isnan(ind_div_combined_3sd[correlate]) | np.isnan(ind_div_combined_3sd[neural_var])
            cor2way_df = ind_div_combined_3sd.loc[nan_rows==False,]
            pearson_result = pearsonr(cor2way_df[neural_var],cor2way_df[correlate])
            display(HTML("r=" + format(pearson_result[0],".2f") +"; p-value=" + format(pearson_result[1],".4f")))
            spearman_result = spearmanr(cor2way_df[neural_var],cor2way_df[correlate])
            display(HTML("rho=" + format(spearman_result[0],".2f") +"; p-value=" + format(spearman_result[1],".4f")))
            cplot = pyplot.scatter(cor2way_df[neural_var],cor2way_df[correlate])
            cplot.axes.set_xlabel(neural_var)
            cplot.axes.ylabel=correlate
            pyplot.show()

In [None]:
display_discriminability_correlations()

### Next steps

probably repeat the whole process with some masks excluding, at a minimum, movement and visual cortices. We also are using a 40-subject dataset. It needs to be extended to 84-subject, even though this is going to be difficult because the dataset is so much more detailed. We need to get good at only storing a minimal amount of data at a time.

In fact, extending to 84-subjects is probably the very first thing we need to handle.

### Repeating the above with 84 subjects

In [22]:
dataset_name = 'conditions'


brain_data_filepath = ml_data_folderpath + '/SST/Brain_Data_betaseries_84subs_correct_cond.pkl'
brain_data_filepath = ml_data_folderpath + '/SST/Brain_Data_betaseries_58subs_correct_cond.pkl'
#brain_data_filepath = ml_data_folderpath + '/SST/Brain_Data_conditions_43subs_correct_cond.pkl'

def decoderConstructor(*args, **kwargs):
    return(Decoder(scoring='accuracy',verbose=0, *args, **kwargs))


relevant_mask = None

In [23]:
all_subjects = load_and_preprocess(
    brain_data_filepath,
    train_test_markers_filepath,
    subjs_to_use = None,
    response_transform_func = trialtype_resp_trans_func,
    clean=None)

all_subjects['groups']

subj_list = np.unique(all_subjects['groups'])

del all_subjects
gc.collect()

checked for intersection and no intersection between the brain data and the subjects was found.
there were 58 subjects overlapping between the subjects marked for train data and the training dump file itself.
test_train_set: 9549
pkl_file: 168
brain_data_filepath: 152
train_test_markers_filepath: 141
response_transform_func: 136
sys: 72
Brain_Data_allsubs: 48
clean: 16
subjs_to_use: 16


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Brain_Data_allsubs.Y[Brain_Data_allsubs.Y=='NULL']=None


6219
6219


0

In [None]:
get_all_subjs_discriminability_whole_brain(subj_list)

DEV005

pre-loaded.

DEV006

pre-loaded.

DEV009

pre-loaded.

DEV010

pre-loaded.

DEV012

pre-loaded.

DEV013

pre-loaded.

DEV014

pre-loaded.

DEV015

pre-loaded.

DEV016

pre-loaded.

DEV017

pre-loaded.

DEV018

pre-loaded.

DEV019

pre-loaded.

DEV021

pre-loaded.

DEV022

loading subject DEV022

checked for intersection and no intersection between the brain data and the subjects was found.
there were 58 subjects overlapping between the subjects marked for train data and the training dump file itself.
test_train_set: 9549
pkl_file: 168
brain_data_filepath: 152
train_test_markers_filepath: 141
response_transform_func: 136
sys: 72
subjs_to_use: 64
clean: 60
Brain_Data_allsubs: 48


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Brain_Data_allsubs.Y[Brain_Data_allsubs.Y=='NULL']=None


6219
6219


correct-go      80
correct-stop    19
Name: trial_type, dtype: int64

setting up decoder...

fitting

evaluating

{'mean_cv_scores': 0.3636363636363636,
 'overfit_accuracy': 0.7272727272727273,
 'overfit_y_pred_vs_obs':              y_obs        y_pred
 1342    correct-go    correct-go
 1343    correct-go  correct-stop
 1344    correct-go    correct-go
 1345    correct-go    correct-go
 1346    correct-go    correct-go
 ...            ...           ...
 1436    correct-go    correct-go
 1437    correct-go  correct-stop
 1438  correct-stop  correct-stop
 1439    correct-go    correct-go
 1440    correct-go    correct-go
 
 [99 rows x 2 columns],
 'decoder_object': Decoder(cv=StratifiedKFold(n_splits=3, random_state=2027971282, shuffle=True),
         estimator=LinearSVC(max_iter=10000.0), memory=Memory(location=None),
         n_jobs=3, scoring='accuracy')}

DEV023

loading subject DEV023

checked for intersection and no intersection between the brain data and the subjects was found.
there were 58 subjects overlapping between the subjects marked for train data and the training dump file itself.
test_train_set: 9549
pkl_file: 168
brain_data_filepath: 152
train_test_markers_filepath: 141
response_transform_func: 136
sys: 72
subjs_to_use: 64
clean: 60
Brain_Data_allsubs: 48


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Brain_Data_allsubs.Y[Brain_Data_allsubs.Y=='NULL']=None


6219
6219


correct-go      95
correct-stop    13
Name: trial_type, dtype: int64

setting up decoder...

fitting

evaluating

{'mean_cv_scores': 0.3333333333333333,
 'overfit_accuracy': 0.6944444444444444,
 'overfit_y_pred_vs_obs':              y_obs        y_pred
 1441    correct-go    correct-go
 1442    correct-go    correct-go
 1443    correct-go  correct-stop
 1444    correct-go  correct-stop
 1445    correct-go  correct-stop
 ...            ...           ...
 1544    correct-go    correct-go
 1545    correct-go  correct-stop
 1546    correct-go  correct-stop
 1547    correct-go    correct-go
 1548  correct-stop  correct-stop
 
 [108 rows x 2 columns],
 'decoder_object': Decoder(cv=StratifiedKFold(n_splits=3, random_state=4047585180, shuffle=True),
         estimator=LinearSVC(max_iter=10000.0), memory=Memory(location=None),
         n_jobs=3, scoring='accuracy')}

DEV024

loading subject DEV024

checked for intersection and no intersection between the brain data and the subjects was found.
there were 58 subjects overlapping between the subjects marked for train data and the training dump file itself.
test_train_set: 9549
pkl_file: 168
brain_data_filepath: 152
train_test_markers_filepath: 141
response_transform_func: 136
sys: 72
subjs_to_use: 64
clean: 60
Brain_Data_allsubs: 48


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Brain_Data_allsubs.Y[Brain_Data_allsubs.Y=='NULL']=None


6219
6219


correct-go      95
correct-stop    16
Name: trial_type, dtype: int64

setting up decoder...

fitting

evaluating

{'mean_cv_scores': 0.2972972972972973,
 'overfit_accuracy': 0.6216216216216216,
 'overfit_y_pred_vs_obs':              y_obs        y_pred
 1549    correct-go  correct-stop
 1550    correct-go  correct-stop
 1551    correct-go  correct-stop
 1552    correct-go  correct-stop
 1553    correct-go  correct-stop
 ...            ...           ...
 1655    correct-go    correct-go
 1656    correct-go    correct-go
 1657    correct-go  correct-stop
 1658  correct-stop  correct-stop
 1659    correct-go    correct-go
 
 [111 rows x 2 columns],
 'decoder_object': Decoder(cv=StratifiedKFold(n_splits=3, random_state=1448813713, shuffle=True),
         estimator=LinearSVC(max_iter=10000.0), memory=Memory(location=None),
         n_jobs=3, scoring='accuracy')}

DEV025

loading subject DEV025

checked for intersection and no intersection between the brain data and the subjects was found.
there were 58 subjects overlapping between the subjects marked for train data and the training dump file itself.
test_train_set: 9549
pkl_file: 168
brain_data_filepath: 152
train_test_markers_filepath: 141
response_transform_func: 136
sys: 72
subjs_to_use: 64
clean: 60
Brain_Data_allsubs: 48


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Brain_Data_allsubs.Y[Brain_Data_allsubs.Y=='NULL']=None


6219
6219


correct-go    96
Name: trial_type, dtype: int64

setting up decoder...

DEV026

loading subject DEV026

checked for intersection and no intersection between the brain data and the subjects was found.
there were 58 subjects overlapping between the subjects marked for train data and the training dump file itself.
test_train_set: 9549
pkl_file: 168
brain_data_filepath: 152
train_test_markers_filepath: 141
response_transform_func: 136
sys: 72
subjs_to_use: 64
clean: 60
Brain_Data_allsubs: 48


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Brain_Data_allsubs.Y[Brain_Data_allsubs.Y=='NULL']=None


6219
6219


correct-go      95
correct-stop    16
Name: trial_type, dtype: int64

setting up decoder...

fitting

evaluating

{'mean_cv_scores': 0.37837837837837834,
 'overfit_accuracy': 0.6756756756756757,
 'overfit_y_pred_vs_obs':              y_obs        y_pred
 1756    correct-go    correct-go
 1757    correct-go  correct-stop
 1758    correct-go  correct-stop
 1759    correct-go  correct-stop
 1760    correct-go    correct-go
 ...            ...           ...
 1862    correct-go    correct-go
 1863  correct-stop  correct-stop
 1864    correct-go    correct-go
 1865  correct-stop  correct-stop
 1866    correct-go    correct-go
 
 [111 rows x 2 columns],
 'decoder_object': Decoder(cv=StratifiedKFold(n_splits=3, random_state=3274636757, shuffle=True),
         estimator=LinearSVC(max_iter=10000.0), memory=Memory(location=None),
         n_jobs=3, scoring='accuracy')}

DEV027

loading subject DEV027

checked for intersection and no intersection between the brain data and the subjects was found.
there were 58 subjects overlapping between the subjects marked for train data and the training dump file itself.
test_train_set: 9549
pkl_file: 168
brain_data_filepath: 152
train_test_markers_filepath: 141
response_transform_func: 136
sys: 72
subjs_to_use: 64
clean: 60
Brain_Data_allsubs: 48


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Brain_Data_allsubs.Y[Brain_Data_allsubs.Y=='NULL']=None


6219
6219


correct-go      94
correct-stop    20
Name: trial_type, dtype: int64

setting up decoder...

fitting

evaluating

{'mean_cv_scores': 0.40350877192982454,
 'overfit_accuracy': 0.6491228070175439,
 'overfit_y_pred_vs_obs':              y_obs        y_pred
 1867  correct-stop  correct-stop
 1868    correct-go  correct-stop
 1869  correct-stop  correct-stop
 1870    correct-go  correct-stop
 1871    correct-go  correct-stop
 ...            ...           ...
 1976    correct-go  correct-stop
 1977    correct-go    correct-go
 1978    correct-go    correct-go
 1979  correct-stop  correct-stop
 1980    correct-go    correct-go
 
 [114 rows x 2 columns],
 'decoder_object': Decoder(cv=StratifiedKFold(n_splits=3, random_state=1776701050, shuffle=True),
         estimator=LinearSVC(max_iter=10000.0), memory=Memory(location=None),
         n_jobs=3, scoring='accuracy')}

DEV028

loading subject DEV028

checked for intersection and no intersection between the brain data and the subjects was found.
there were 58 subjects overlapping between the subjects marked for train data and the training dump file itself.
test_train_set: 9549
pkl_file: 168
brain_data_filepath: 152
train_test_markers_filepath: 141
response_transform_func: 136
sys: 72
subjs_to_use: 64
clean: 60
Brain_Data_allsubs: 48


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Brain_Data_allsubs.Y[Brain_Data_allsubs.Y=='NULL']=None


6219
6219


correct-go      96
correct-stop    13
Name: trial_type, dtype: int64

setting up decoder...

fitting

evaluating

{'mean_cv_scores': 0.37662662662662666,
 'overfit_accuracy': 0.7889908256880734,
 'overfit_y_pred_vs_obs':              y_obs        y_pred
 1981    correct-go    correct-go
 1982    correct-go    correct-go
 1983    correct-go    correct-go
 1984    correct-go    correct-go
 1985    correct-go    correct-go
 ...            ...           ...
 2085  correct-stop  correct-stop
 2086    correct-go    correct-go
 2087    correct-go    correct-go
 2088    correct-go    correct-go
 2089    correct-go    correct-go
 
 [109 rows x 2 columns],
 'decoder_object': Decoder(cv=StratifiedKFold(n_splits=3, random_state=2216442768, shuffle=True),
         estimator=LinearSVC(max_iter=10000.0), memory=Memory(location=None),
         n_jobs=3, scoring='accuracy')}

DEV029

loading subject DEV029

checked for intersection and no intersection between the brain data and the subjects was found.
there were 58 subjects overlapping between the subjects marked for train data and the training dump file itself.
test_train_set: 9549
pkl_file: 168
brain_data_filepath: 152
train_test_markers_filepath: 141
response_transform_func: 136
sys: 72
subjs_to_use: 64
clean: 60
Brain_Data_allsubs: 48


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Brain_Data_allsubs.Y[Brain_Data_allsubs.Y=='NULL']=None


6219
6219


correct-go      96
correct-stop    16
Name: trial_type, dtype: int64

setting up decoder...

fitting

evaluating

{'mean_cv_scores': 0.42911332385016593,
 'overfit_accuracy': 0.6517857142857143,
 'overfit_y_pred_vs_obs':              y_obs        y_pred
 2090    correct-go    correct-go
 2091    correct-go  correct-stop
 2092    correct-go  correct-stop
 2093    correct-go  correct-stop
 2094    correct-go    correct-go
 ...            ...           ...
 2197    correct-go  correct-stop
 2198    correct-go    correct-go
 2199  correct-stop  correct-stop
 2200    correct-go    correct-go
 2201    correct-go    correct-go
 
 [112 rows x 2 columns],
 'decoder_object': Decoder(cv=StratifiedKFold(n_splits=3, random_state=4064508541, shuffle=True),
         estimator=LinearSVC(max_iter=10000.0), memory=Memory(location=None),
         n_jobs=3, scoring='accuracy')}

DEV030

loading subject DEV030

checked for intersection and no intersection between the brain data and the subjects was found.
there were 58 subjects overlapping between the subjects marked for train data and the training dump file itself.
test_train_set: 9549
pkl_file: 168
brain_data_filepath: 152
train_test_markers_filepath: 141
response_transform_func: 136
sys: 72
subjs_to_use: 64
clean: 60
Brain_Data_allsubs: 48


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Brain_Data_allsubs.Y[Brain_Data_allsubs.Y=='NULL']=None


6219
6219


correct-go      95
correct-stop    15
Name: trial_type, dtype: int64

setting up decoder...

fitting

evaluating

{'mean_cv_scores': 0.3178178178178178,
 'overfit_accuracy': 0.6909090909090909,
 'overfit_y_pred_vs_obs':              y_obs        y_pred
 2202  correct-stop  correct-stop
 2203    correct-go    correct-go
 2204    correct-go  correct-stop
 2205    correct-go  correct-stop
 2206    correct-go    correct-go
 ...            ...           ...
 2307    correct-go    correct-go
 2308    correct-go  correct-stop
 2309    correct-go    correct-go
 2310    correct-go    correct-go
 2311  correct-stop  correct-stop
 
 [110 rows x 2 columns],
 'decoder_object': Decoder(cv=StratifiedKFold(n_splits=3, random_state=2584696405, shuffle=True),
         estimator=LinearSVC(max_iter=10000.0), memory=Memory(location=None),
         n_jobs=3, scoring='accuracy')}

DEV034

loading subject DEV034

checked for intersection and no intersection between the brain data and the subjects was found.
there were 58 subjects overlapping between the subjects marked for train data and the training dump file itself.
test_train_set: 9549
pkl_file: 168
brain_data_filepath: 152
train_test_markers_filepath: 141
response_transform_func: 136
sys: 72
subjs_to_use: 64
clean: 60
Brain_Data_allsubs: 48


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Brain_Data_allsubs.Y[Brain_Data_allsubs.Y=='NULL']=None


6219
6219


correct-go      96
correct-stop     9
Name: trial_type, dtype: int64

setting up decoder...

fitting

evaluating

{'mean_cv_scores': 0.42857142857142855,
 'overfit_accuracy': 0.6285714285714286,
 'overfit_y_pred_vs_obs':              y_obs        y_pred
 2312    correct-go  correct-stop
 2313  correct-stop  correct-stop
 2314    correct-go  correct-stop
 2315    correct-go  correct-stop
 2316    correct-go  correct-stop
 ...            ...           ...
 2412    correct-go  correct-stop
 2413    correct-go    correct-go
 2414    correct-go    correct-go
 2415    correct-go    correct-go
 2416    correct-go    correct-go
 
 [105 rows x 2 columns],
 'decoder_object': Decoder(cv=StratifiedKFold(n_splits=3, random_state=2357341125, shuffle=True),
         estimator=LinearSVC(max_iter=10000.0), memory=Memory(location=None),
         n_jobs=3, scoring='accuracy')}

DEV035

loading subject DEV035

checked for intersection and no intersection between the brain data and the subjects was found.
there were 58 subjects overlapping between the subjects marked for train data and the training dump file itself.
test_train_set: 9549
pkl_file: 168
brain_data_filepath: 152
train_test_markers_filepath: 141
response_transform_func: 136
sys: 72
subjs_to_use: 64
clean: 60
Brain_Data_allsubs: 48


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Brain_Data_allsubs.Y[Brain_Data_allsubs.Y=='NULL']=None


6219
6219


correct-go      96
correct-stop    15
Name: trial_type, dtype: int64

setting up decoder...

fitting

evaluating

{'mean_cv_scores': 0.3333333333333333,
 'overfit_accuracy': 0.6576576576576577,
 'overfit_y_pred_vs_obs':              y_obs        y_pred
 2417    correct-go    correct-go
 2418    correct-go    correct-go
 2419    correct-go  correct-stop
 2420    correct-go  correct-stop
 2421    correct-go    correct-go
 ...            ...           ...
 2523    correct-go    correct-go
 2524    correct-go    correct-go
 2525    correct-go  correct-stop
 2526    correct-go  correct-stop
 2527  correct-stop  correct-stop
 
 [111 rows x 2 columns],
 'decoder_object': Decoder(cv=StratifiedKFold(n_splits=3, random_state=1747180723, shuffle=True),
         estimator=LinearSVC(max_iter=10000.0), memory=Memory(location=None),
         n_jobs=3, scoring='accuracy')}

DEV036

loading subject DEV036

checked for intersection and no intersection between the brain data and the subjects was found.
there were 58 subjects overlapping between the subjects marked for train data and the training dump file itself.
test_train_set: 9549
pkl_file: 168
brain_data_filepath: 152
train_test_markers_filepath: 141
response_transform_func: 136
sys: 72
subjs_to_use: 64
clean: 60
Brain_Data_allsubs: 48


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Brain_Data_allsubs.Y[Brain_Data_allsubs.Y=='NULL']=None


6219
6219


correct-go      95
correct-stop     8
Name: trial_type, dtype: int64

setting up decoder...

fitting

evaluating

{'mean_cv_scores': 0.37899159663865545,
 'overfit_accuracy': 0.6601941747572816,
 'overfit_y_pred_vs_obs':            y_obs        y_pred
 2528  correct-go  correct-stop
 2529  correct-go  correct-stop
 2530  correct-go    correct-go
 2531  correct-go    correct-go
 2532  correct-go    correct-go
 ...          ...           ...
 2626  correct-go  correct-stop
 2627  correct-go    correct-go
 2628  correct-go    correct-go
 2629  correct-go    correct-go
 2630  correct-go    correct-go
 
 [103 rows x 2 columns],
 'decoder_object': Decoder(cv=StratifiedKFold(n_splits=3, random_state=1731973251, shuffle=True),
         estimator=LinearSVC(max_iter=10000.0), memory=Memory(location=None),
         n_jobs=3, scoring='accuracy')}

DEV039

loading subject DEV039

checked for intersection and no intersection between the brain data and the subjects was found.
there were 58 subjects overlapping between the subjects marked for train data and the training dump file itself.
test_train_set: 9549
pkl_file: 168
brain_data_filepath: 152
train_test_markers_filepath: 141
response_transform_func: 136
sys: 72
subjs_to_use: 64
clean: 60
Brain_Data_allsubs: 48


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Brain_Data_allsubs.Y[Brain_Data_allsubs.Y=='NULL']=None


6219
6219


correct-go      87
correct-stop     9
Name: trial_type, dtype: int64

setting up decoder...

fitting

In [None]:
# correlate discriminability against shit we care about.

summary_results2 = summary_results.rename(columns={
    'mean_cv_scores':'discriminability_mean_cv_scores',
    'overfit_accuracy':'discriminability_overfit_accuracy'})


individual_differences = pd.read_csv(ml_data_folderpath + "/" + data_by_ppt_name)
individual_differences = individual_differences.rename(columns={'SID':'subid'})
individual_differences['wave']=1
#individual_differences['wave'] = individual_differences['wave'].astype(object) # for compatibility with the wave column in the dataset
ind_div_combined = summary_results2.merge(individual_differences)

In [None]:
ind_div_combined_3sd = remove_selected_outliers_tesq_study(
    ind_div_combined,
    show_plot=True)

In [None]:
display_discriminability_correlations()

In [None]:
## pfc only

In [None]:
pfc_mask = create_mask_from_images(get_pfc_image_filepaths(ml_data_folderpath + "/"),threshold=10)
relevant_mask = pfc_mask
