In [1]:
import pandas as pd
import numpy as np
import os
import nltools as nlt
import nilearn as nil
import nibabel as nib
import warnings
import glob
import random
import pickle
import dev_wtp_io_utils
from dev_wtp_io_utils import cv_train_test_sets
import gc #garbage collection
from nilearn import plotting
from sklearn.model_selection import KFold,GroupKFold

  warn("Fetchers from the nilearn.datasets module will be "


In [2]:
pd.set_option('display.max_rows', 99)

In this doc we're going to repeatedly do analyses to work out how large a sample do we need to get fairly stable estimates.

### Load brain data

In [3]:
test_train_set = pd.read_csv("../data/train_test_markers_20210601T183243.csv")

In [4]:
with open('../data/Brain_Data_2sns_60subs.pkl', 'rb') as pkl_file:
    Brain_Data_allsubs = pickle.load(pkl_file)
    
dev_wtp_io_utils.check_BD_against_test_train_set(Brain_Data_allsubs,test_train_set)

checked for intersection and no intersection between the brain data and the subjects was found.
there were 60 subjects overlapping between the subjects marked for train data and the training dump file itself.


In [5]:
with open('../data/Brain_Data_2sns_60subs_grouped.pkl', 'rb') as pkl_file:
    Brain_Data_allsubs_grouped = pickle.load(pkl_file)
    
dev_wtp_io_utils.check_BD_against_test_train_set(Brain_Data_allsubs_grouped,test_train_set)

checked for intersection and no intersection between the brain data and the subjects was found.
there were 60 subjects overlapping between the subjects marked for train data and the training dump file itself.


### Preprocess

In [6]:
Brain_Data_allsubs.Y = Brain_Data_allsubs.X.response.copy()
print(Brain_Data_allsubs.Y.value_counts())
Brain_Data_allsubs.Y[Brain_Data_allsubs.Y=='NULL']=None
print(Brain_Data_allsubs.Y.value_counts())
print(Brain_Data_allsubs.Y.isnull().value_counts())
Brain_Data_allsubs_nn = Brain_Data_allsubs[Brain_Data_allsubs.Y.isnull()==False]
print(len(Brain_Data_allsubs_nn))
print(len(Brain_Data_allsubs))

5.0    1164
6.0    1018
7.0     904
8.0     604
Name: response, dtype: int64
5.0    1164
6.0    1018
7.0     904
8.0     604
Name: response, dtype: int64
False    3690
True      150
Name: response, dtype: int64
3690
3840


In [7]:
all_subs_nn_nifti = Brain_Data_allsubs_nn.to_nifti()
all_subs_nn_nifti_Y = Brain_Data_allsubs_nn.Y
all_subs_nn_nifti_groups = Brain_Data_allsubs_nn.X.subject
all_subs_nn_nifti_groups

0       DEV001
1       DEV001
2       DEV001
3       DEV001
4       DEV001
         ...  
3685    DEV089
3686    DEV089
3687    DEV089
3688    DEV089
3689    DEV089
Name: subject, Length: 3690, dtype: object

In [8]:
Brain_Data_allsubs_grouped.Y = Brain_Data_allsubs_grouped.X.response.copy()
print(Brain_Data_allsubs_grouped.Y.value_counts())
all_subs_grouped_nifti = Brain_Data_allsubs_grouped.to_nifti()
all_subs_grouped_nifti_Y = Brain_Data_allsubs_grouped.Y
all_subs_grouped_nifti_groups = Brain_Data_allsubs_grouped.X.subject
all_subs_grouped_nifti_groups

6.0    236
7.0    235
5.0    227
8.0    202
Name: response, dtype: int64


0      DEV001
1      DEV001
2      DEV001
3      DEV001
4      DEV001
        ...  
895    DEV089
896    DEV089
897    DEV089
898    DEV089
899    DEV089
Name: subject, Length: 900, dtype: object

### Predict

Regressing in nilearn:
 - https://nilearn.github.io/decoding/estimator_choice.html
 - http://www.ncbi.nlm.nih.gov/pubmed/20691790







OK, so that's how you do it. It's pretty straightforward.

So...we won't look at nested cross-validation juuust yet, because the next step is to work out how to train on one set and predict on another. that will definitely require a custom pipeline. Let's get started...

In [9]:
del Brain_Data_allsubs
del Brain_Data_allsubs_grouped
gc.collect()

15

In [10]:
import sys
for name, size in sorted(((name, sys.getsizeof(value)) for name, value in locals().items()),
                         key= lambda x: -x[1])[:10]:
    print(name + ': ' + str(size))

___: 232614
all_subs_nn_nifti_groups: 232614
_7: 232614
__: 56844
all_subs_grouped_nifti_groups: 56844
_8: 56844
all_subs_nn_nifti_Y: 29664
test_train_set: 9549
all_subs_grouped_nifti_Y: 7344
_i6: 448


As a control, we'll try this again, this time just training and testing on individual values:

In [11]:
from pympler.asizeof import asizeof

def asizeof_fmt(obj, suffix='B'):
    ''' by Fred Cirera,  https://stackoverflow.com/a/1094933/1870254, modified'''
    num = asizeof(obj)
    for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']:
        if abs(num) < 1024.0:
            return "%3.1f %s%s" % (num, unit, suffix)
        num /= 1024.0
    return "%.1f %s%s" % (num, 'Yi', suffix)

# for name, size in sorted(((name, asizeof(value)) for name, value in locals().items()),
#                          key= lambda x: -x[1])[:10]:


In [12]:
asizeof_fmt(Brain_Data_allsubs_nn)

'3.3 GiB'

In [13]:
asizeof_fmt(all_subs_nn_nifti)

'12.4 GiB'

### Sample size of 20

We'll try this with different KFold sizes which will give us differing amounts of folds.

The more folds, the slower it is, but the more accurate and stable the estimate will be.

If we increase the number of samples, don't think that it will slow down too much. I expect estimate would increase linearly with (a) size of fold multiplied by (b) number of folds. So 40 subjects with 2-fold would probably be much more accurate than 20 subjects with 5-fold but would take the same amount of time.

buuuuuuut....we can't handle a kernel that large... :-( 

get a small sample of subjects to extract

In [14]:
sample_subject_items = np.unique(all_subs_nn_nifti_groups)[0:20]

In [15]:
sample_subject_vector = [i for i, x in enumerate(all_subs_nn_nifti_groups) if x in sample_subject_items]
sample_grouped_subject_vector = [i for i, x in enumerate(all_subs_grouped_nifti_groups) if x in sample_subject_items]

In [16]:
len(sample_subject_vector)

1208

and now extract them

In [17]:
first_subs_nifti = nib.funcs.concat_images([all_subs_nn_nifti.slicer[...,s] for s in sample_subject_vector])
first_subs_nifti_Y = all_subs_nn_nifti_Y[sample_subject_vector]
first_subs_nifti = nil.image.clean_img(first_subs_nifti,detrend=False,standardize=True)
first_subs_nifti_groups = all_subs_nn_nifti_groups[sample_subject_vector]

#del all_subs_nn_nifti
#gc.collect()

In [18]:
first_subs_grouped_nifti = nib.funcs.concat_images([all_subs_grouped_nifti.slicer[...,s] for s in sample_grouped_subject_vector])
first_subs_grouped_nifti_Y = all_subs_grouped_nifti_Y[sample_grouped_subject_vector]
first_subs_grouped_nifti = nil.image.clean_img(first_subs_grouped_nifti,detrend=False,standardize=True)
first_subs_grouped_nifti_groups = all_subs_grouped_nifti_groups[sample_grouped_subject_vector]

#del all_subs_grouped_nifti
#gc.collect()

In [21]:
split_dict  = {}
for n_split in [3,5]:
    print("__________________")
    print("TRYING WITH N SPLIT " + str(n_split))
    attempt = []
    for i in range(5):
        print("ATTEMPT " + str(i))
        test_scores_different = cv_train_test_sets(
            trainset_X=first_subs_grouped_nifti,
            trainset_y=first_subs_grouped_nifti_Y,
            trainset_groups=first_subs_grouped_nifti_groups,
            testset_X=first_subs_nifti,
            testset_y=first_subs_nifti_Y,
            testset_groups=first_subs_nifti_groups,
            cv = KFold(n_splits = n_split,shuffle=True)
        )
        attempt = attempt + [np.mean(test_scores_different)]
        
    print(attempt)
    split_dict[n_split] = attempt


__________________
TRYING WITH N SPLIT 3
ATTEMPT 0
using default regressor. In order to test on a training group of 13 items, holding out the following subjects:['DEV001' 'DEV012' 'DEV010' 'DEV017' 'DEV015' 'DEV026' 'DEV027']. selecting training data. (91, 109, 91, 199). 1.3 GiB. selecting test data. 2.8 GiB. (91, 109, 91, 414). regressing. 1.3 GiB. trying regressor 1 of 1. predicting. test score was:. 0.1213591957682395
In order to test on a training group of 13 items, holding out the following subjects:['DEV005' 'DEV006' 'DEV009' 'DEV021' 'DEV025' 'DEV023' 'DEV022']. selecting training data. (91, 109, 91, 189). 1.3 GiB. selecting test data. 2.9 GiB. (91, 109, 91, 435). regressing. 1.3 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.07168659067695593
In order to test on a training group of 14 items, holding out the following subjects:['DEV016' 'DEV018' 'DEV024' 'DEV013' 'DEV014' 'DEV019']. selecting training data. (91, 109, 91, 208). 1.4 GiB. selecting test data. 2.4 GiB. (91, 109, 91, 359). regressing. 1.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.028535436877964004
ATTEMPT 1
using default regressor. In order to test on a training group of 13 items, holding out the following subjects:['DEV001' 'DEV010' 'DEV018' 'DEV024' 'DEV026' 'DEV022' 'DEV019']. selecting training data. (91, 109, 91, 193). 1.3 GiB. selecting test data. 2.8 GiB. (91, 109, 91, 416). regressing. 1.3 GiB. trying regressor 1 of 1. predicting. test score was:. 0.08675396529590695
In order to test on a training group of 13 items, holding out the following subjects:['DEV012' 'DEV013' 'DEV017' 'DEV015' 'DEV023' 'DEV014' 'DEV027']. selecting training data. (91, 109, 91, 198). 1.3 GiB. selecting test data. 2.8 GiB. (91, 109, 91, 421). regressing. 1.3 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.03860237444608705
In order to test on a training group of 14 items, holding out the following subjects:['DEV005' 'DEV006' 'DEV009' 'DEV016' 'DEV021' 'DEV025']. selecting training data. (91, 109, 91, 205). 1.4 GiB. selecting test data. 2.5 GiB. (91, 109, 91, 371). regressing. 1.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.104964470665178
ATTEMPT 2
using default regressor. In order to test on a training group of 13 items, holding out the following subjects:['DEV021' 'DEV024' 'DEV015' 'DEV026' 'DEV014' 'DEV027' 'DEV019']. selecting training data. (91, 109, 91, 195). 1.3 GiB. selecting test data. 2.8 GiB. (91, 109, 91, 414). regressing. 1.3 GiB. trying regressor 1 of 1. predicting. test score was:. 0.1169246345302285
In order to test on a training group of 13 items, holding out the following subjects:['DEV009' 'DEV016' 'DEV010' 'DEV013' 'DEV017' 'DEV023' 'DEV022']. selecting training data. (91, 109, 91, 193). 1.3 GiB. selecting test data. 2.8 GiB. (91, 109, 91, 422). regressing. 1.3 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.005769637726260601
In order to test on a training group of 14 items, holding out the following subjects:['DEV005' 'DEV001' 'DEV006' 'DEV012' 'DEV025' 'DEV018']. selecting training data. (91, 109, 91, 208). 1.4 GiB. selecting test data. 2.5 GiB. (91, 109, 91, 372). regressing. 1.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.11616137280039252
ATTEMPT 3
using default regressor. In order to test on a training group of 13 items, holding out the following subjects:['DEV005' 'DEV012' 'DEV024' 'DEV013' 'DEV023' 'DEV014' 'DEV022']. selecting training data. (91, 109, 91, 191). 1.3 GiB. selecting test data. 3.0 GiB. (91, 109, 91, 441). regressing. 1.3 GiB. trying regressor 1 of 1. predicting. test score was:. -0.02342095354056939
In order to test on a training group of 13 items, holding out the following subjects:['DEV001' 'DEV009' 'DEV016' 'DEV021' 'DEV025' 'DEV015' 'DEV026']. selecting training data. (91, 109, 91, 193). 1.3 GiB. selecting test data. 2.9 GiB. (91, 109, 91, 429). regressing. 1.3 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.13058069913108394
In order to test on a training group of 14 items, holding out the following subjects:['DEV006' 'DEV010' 'DEV018' 'DEV017' 'DEV027' 'DEV019']. selecting training data. (91, 109, 91, 212). 1.4 GiB. selecting test data. 2.3 GiB. (91, 109, 91, 338). regressing. 1.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.10479875101558
ATTEMPT 4
using default regressor. In order to test on a training group of 13 items, holding out the following subjects:['DEV009' 'DEV012' 'DEV018' 'DEV015' 'DEV023' 'DEV014' 'DEV019']. selecting training data. (91, 109, 91, 197). 1.3 GiB. selecting test data. 2.8 GiB. (91, 109, 91, 414). regressing. 1.3 GiB. trying regressor 1 of 1. predicting. test score was:. 0.09922921157700848
In order to test on a training group of 13 items, holding out the following subjects:['DEV006' 'DEV016' 'DEV010' 'DEV025' 'DEV013' 'DEV026' 'DEV027']. selecting training data. (91, 109, 91, 190). 1.3 GiB. selecting test data. 2.9 GiB. (91, 109, 91, 428). regressing. 1.3 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.06615892082442165
In order to test on a training group of 14 items, holding out the following subjects:['DEV005' 'DEV001' 'DEV021' 'DEV024' 'DEV017' 'DEV022']. selecting training data. (91, 109, 91, 209). 1.4 GiB. selecting test data. 2.5 GiB. (91, 109, 91, 366). regressing. 1.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.11602320890409668
[0.05483678318907714, 0.051038687171665965, 0.07577212320145348, 0.07065283220203152, 0.09380378043517561]
__________________
TRYING WITH N SPLIT 5
ATTEMPT 0
using default regressor. In order to test on a training group of 16 items, holding out the following subjects:['DEV018' 'DEV024' 'DEV015' 'DEV019']. selecting training data. (91, 109, 91, 241). 1.6 GiB. selecting test data. 1.5 GiB. (91, 109, 91, 226). regressing. 1.6 GiB. trying regressor 1 of 1. predicting. test score was:. 0.11476413426513765
In order to test on a training group of 16 items, holding out the following subjects:['DEV005' 'DEV017' 'DEV023' 'DEV027']. selecting training data. (91, 109, 91, 241). 1.6 GiB. selecting test data. 1.6 GiB. (91, 109, 91, 240). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.1086744599941496
In order to test on a training group of 16 items, holding out the following subjects:['DEV001' 'DEV016' 'DEV021' 'DEV013']. selecting training data. (91, 109, 91, 237). 1.6 GiB. selecting test data. 1.7 GiB. (91, 109, 91, 253). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.1761931770139633
In order to test on a training group of 16 items, holding out the following subjects:['DEV006' 'DEV009' 'DEV010' 'DEV014']. selecting training data. (91, 109, 91, 235). 1.6 GiB. selecting test data. 1.6 GiB. (91, 109, 91, 243). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.09651513285360058
In order to test on a training group of 16 items, holding out the following subjects:['DEV012' 'DEV025' 'DEV026' 'DEV022']. selecting training data. (91, 109, 91, 238). 1.6 GiB. selecting test data. 1.7 GiB. (91, 109, 91, 246). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.019130553151902152
ATTEMPT 1
using default regressor. In order to test on a training group of 16 items, holding out the following subjects:['DEV001' 'DEV017' 'DEV022' 'DEV019']. selecting training data. (91, 109, 91, 244). 1.6 GiB. selecting test data. 1.5 GiB. (91, 109, 91, 221). regressing. 1.6 GiB. trying regressor 1 of 1. predicting. test score was:. 0.1323080555282825
In order to test on a training group of 16 items, holding out the following subjects:['DEV021' 'DEV018' 'DEV024' 'DEV014']. selecting training data. (91, 109, 91, 235). 1.6 GiB. selecting test data. 1.7 GiB. (91, 109, 91, 252). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.1348821419515711
In order to test on a training group of 16 items, holding out the following subjects:['DEV009' 'DEV010' 'DEV013' 'DEV026']. selecting training data. (91, 109, 91, 235). 1.6 GiB. selecting test data. 1.6 GiB. (91, 109, 91, 244). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.11514585491283102
In order to test on a training group of 16 items, holding out the following subjects:['DEV005' 'DEV006' 'DEV012' 'DEV025']. selecting training data. (91, 109, 91, 238). 1.6 GiB. selecting test data. 1.6 GiB. (91, 109, 91, 245). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.07060121713395329
In order to test on a training group of 16 items, holding out the following subjects:['DEV016' 'DEV015' 'DEV023' 'DEV027']. selecting training data. (91, 109, 91, 240). 1.6 GiB. selecting test data. 1.7 GiB. (91, 109, 91, 246). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.06379659962179729
ATTEMPT 2
using default regressor. In order to test on a training group of 16 items, holding out the following subjects:['DEV005' 'DEV012' 'DEV010' 'DEV019']. selecting training data. (91, 109, 91, 241). 1.6 GiB. selecting test data. 1.5 GiB. (91, 109, 91, 226). regressing. 1.6 GiB. trying regressor 1 of 1. predicting. test score was:. 0.11461834763515555
In order to test on a training group of 16 items, holding out the following subjects:['DEV006' 'DEV009' 'DEV026' 'DEV022']. selecting training data. (91, 109, 91, 235). 1.6 GiB. selecting test data. 1.7 GiB. (91, 109, 91, 248). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.14145991828558213
In order to test on a training group of 16 items, holding out the following subjects:['DEV025' 'DEV017' 'DEV015' 'DEV027']. selecting training data. (91, 109, 91, 243). 1.6 GiB. selecting test data. 1.5 GiB. (91, 109, 91, 228). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.016030812079597778
In order to test on a training group of 16 items, holding out the following subjects:['DEV016' 'DEV021' 'DEV018' 'DEV024']. selecting training data. (91, 109, 91, 236). 1.6 GiB. selecting test data. 1.7 GiB. (91, 109, 91, 252). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.11566449065569806
In order to test on a training group of 16 items, holding out the following subjects:['DEV001' 'DEV013' 'DEV023' 'DEV014']. selecting training data. (91, 109, 91, 237). 1.6 GiB. selecting test data. 1.7 GiB. (91, 109, 91, 254). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.04325496851082122
ATTEMPT 3
using default regressor. In order to test on a training group of 16 items, holding out the following subjects:['DEV012' 'DEV010' 'DEV015' 'DEV019']. selecting training data. (91, 109, 91, 243). 1.6 GiB. selecting test data. 1.5 GiB. (91, 109, 91, 219). regressing. 1.6 GiB. trying regressor 1 of 1. predicting. test score was:. 0.08695343503553077
In order to test on a training group of 16 items, holding out the following subjects:['DEV006' 'DEV009' 'DEV017' 'DEV027']. selecting training data. (91, 109, 91, 241). 1.6 GiB. selecting test data. 1.6 GiB. (91, 109, 91, 236). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.057796697120590546
In order to test on a training group of 16 items, holding out the following subjects:['DEV001' 'DEV025' 'DEV023' 'DEV014']. selecting training data. (91, 109, 91, 238). 1.6 GiB. selecting test data. 1.7 GiB. (91, 109, 91, 250). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.0849463653263215
In order to test on a training group of 16 items, holding out the following subjects:['DEV005' 'DEV021' 'DEV026' 'DEV022']. selecting training data. (91, 109, 91, 234). 1.6 GiB. selecting test data. 1.7 GiB. (91, 109, 91, 251). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.15999863145343096
In order to test on a training group of 16 items, holding out the following subjects:['DEV016' 'DEV018' 'DEV024' 'DEV013']. selecting training data. (91, 109, 91, 236). 1.6 GiB. selecting test data. 1.7 GiB. (91, 109, 91, 252). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.09427814219938313
ATTEMPT 4
using default regressor. In order to test on a training group of 16 items, holding out the following subjects:['DEV018' 'DEV026' 'DEV027' 'DEV022']. selecting training data. (91, 109, 91, 236). 1.6 GiB. selecting test data. 1.7 GiB. (91, 109, 91, 251). regressing. 1.6 GiB. trying regressor 1 of 1. predicting. test score was:. 0.1404590990083533
In order to test on a training group of 16 items, holding out the following subjects:['DEV001' 'DEV010' 'DEV017' 'DEV023']. selecting training data. (91, 109, 91, 241). 1.6 GiB. selecting test data. 1.6 GiB. (91, 109, 91, 235). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.07365239612218422
In order to test on a training group of 16 items, holding out the following subjects:['DEV005' 'DEV009' 'DEV016' 'DEV025']. selecting training data. (91, 109, 91, 237). 1.6 GiB. selecting test data. 1.7 GiB. (91, 109, 91, 247). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.1035928920818181
In order to test on a training group of 16 items, holding out the following subjects:['DEV012' 'DEV024' 'DEV014' 'DEV019']. selecting training data. (91, 109, 91, 242). 1.6 GiB. selecting test data. 1.6 GiB. (91, 109, 91, 232). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.0792304350068248
In order to test on a training group of 16 items, holding out the following subjects:['DEV006' 'DEV021' 'DEV013' 'DEV015']. selecting training data. (91, 109, 91, 236). 1.6 GiB. selecting test data. 1.6 GiB. (91, 109, 91, 243). regressing. 1.6 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.1259039646895026
[0.05958570745809082, 0.10334677382968704, 0.07979338260153183, 0.07367597537881516, 0.1045677573817366]


In [22]:
split_dict

{3: [0.05483678318907714,
  0.051038687171665965,
  0.07577212320145348,
  0.07065283220203152,
  0.09380378043517561],
 5: [0.05958570745809082,
  0.10334677382968704,
  0.07979338260153183,
  0.07367597537881516,
  0.1045677573817366]}

In [13]:
split_dict_20 = {3:[0.05483678318907714,
  0.051038687171665965,
  0.07577212320145348,
  0.07065283220203152,
  0.09380378043517561],
                5:[0.05958570745809082,
  0.10334677382968704,
  0.07979338260153183,
  0.07367597537881516,
  0.1045677573817366]}


0.015406379493602391

In [20]:
def get_summary_stats(dict_results):
    return([{k:{'mean':np.mean(v),'sd':np.std(v)}} for k, v in dict_results.items()])

get_summary_stats(split_dict_20)


NameError: name 'split_dict_20' is not defined

### Sample size 30

In [16]:
sample_subject_items = np.unique(all_subs_nn_nifti_groups)[0:30]

In [17]:
sample_subject_vector = [i for i, x in enumerate(all_subs_nn_nifti_groups) if x in sample_subject_items]
sample_grouped_subject_vector = [i for i, x in enumerate(all_subs_grouped_nifti_groups) if x in sample_subject_items]
len(sample_subject_vector)

first_subs_nifti = nib.funcs.concat_images([all_subs_nn_nifti.slicer[...,s] for s in sample_subject_vector])
first_subs_nifti_Y = all_subs_nn_nifti_Y[sample_subject_vector]
first_subs_nifti = nil.image.clean_img(first_subs_nifti,detrend=False,standardize=True)
first_subs_nifti_groups = all_subs_nn_nifti_groups[sample_subject_vector]

#del all_subs_nn_nifti
#gc.collect()

first_subs_grouped_nifti = nib.funcs.concat_images([all_subs_grouped_nifti.slicer[...,s] for s in sample_grouped_subject_vector])
first_subs_grouped_nifti_Y = all_subs_grouped_nifti_Y[sample_grouped_subject_vector]
first_subs_grouped_nifti = nil.image.clean_img(first_subs_grouped_nifti,detrend=False,standardize=True)
first_subs_grouped_nifti_groups = all_subs_grouped_nifti_groups[sample_grouped_subject_vector]

#del all_subs_grouped_nifti
#gc.collect()

In [18]:
split_dict_30  = {}
for n_split in [3,5,10]:
    attempt = []
    for i in range(5):
        test_scores_different = cv_train_test_sets(
            trainset_X=first_subs_grouped_nifti,
            trainset_y=first_subs_grouped_nifti_Y,
            trainset_groups=first_subs_grouped_nifti_groups,
            testset_X=first_subs_nifti,
            testset_y=first_subs_nifti_Y,
            testset_groups=first_subs_nifti_groups,
            cv = KFold(n_splits = n_split,shuffle=True)
        )
        attempt = attempt + [np.mean(test_scores_different)]
    print("attempt:" + str(attempt))
    split_dict_30[n_split] = attempt


using default regressor. In order to test on a training group of 20 items, holding out the following subjects:['DEV018' 'DEV030' 'DEV035' 'DEV027' 'DEV014' 'DEV039' 'DEV009' 'DEV042'
 'DEV010' 'DEV015']. selecting training data. (91, 109, 91, 297). 2.0 GiB. selecting test data. 4.1 GiB. (91, 109, 91, 616). regressing. 2.0 GiB. trying regressor 1 of 1. predicting. test score was:. 0.06619781671402536
In order to test on a training group of 20 items, holding out the following subjects:['DEV001' 'DEV041' 'DEV028' 'DEV021' 'DEV034' 'DEV017' 'DEV006' 'DEV025'
 'DEV036' 'DEV024']. selecting training data. (91, 109, 91, 297). 2.0 GiB. selecting test data. 4.1 GiB. (91, 109, 91, 611). regressing. 2.0 GiB. trying regressor 1 of 1. 



4.1 GiB. (91, 109, 91, 617). regressing. 2.0 GiB. trying regressor 1 of 1. predicting. test score was:. 0.10662993325129377
In order to test on a training group of 20 items, holding out the following subjects:['DEV018' 'DEV041' 'DEV029' 'DEV028' 'DEV027' 'DEV013' 'DEV009' 'DEV005'
 'DEV012' 'DEV022']. selecting training data. (91, 109, 91, 294). 2.0 GiB. selecting test data. 4.2 GiB. (91, 109, 91, 625). regressing. 2.0 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.10164896215373154
In order to test on a training group of 20 items, holding out the following subjects:['DEV019' 'DEV023' 'DEV026' 'DEV014' 'DEV039' 'DEV034' 'DEV017' 'DEV042'
 'DEV006' 'DEV025']. selecting training data. (91, 109, 91, 300). 2.0 GiB. selecting test data. 4.0 GiB. (91, 109, 91, 593). regressing. 2.0 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.05839593767530993
using default regressor. In order to test on a training group of 20 items, holding out the following subjects:['DEV019' 'DEV001' 'DEV028' 'DEV026' 'DEV027' 'DEV039' 'DEV010' 'DEV006'
 'DEV015' 'DEV005']. selecting training data. (91, 109, 91, 297). 2.0 GiB. selecting test data. 4.0 GiB. (91, 109, 91, 594). regressing. 2.0 GiB. trying regressor 1 of 1. predicting. test score was:. 0.11635861387234725
In order to test on a training group of 20 items, holding out the following subjects:['DEV023' 'DEV041' 'DEV029' 'DEV013' 'DEV034' 'DEV017' 'DEV022' 'DEV025'
 'DEV036' 'DEV024']. selecting training data. (91, 109, 91, 296). 2.0 GiB. selecting test data. 4.1 GiB. (91, 109, 91, 616). regressing. 2.0 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.005818887935712613
In order to test on a training group of 20 items, holding out the following subjects:['DEV018' 'DEV016' 'DEV030' 'DEV035' 'DEV040' 'DEV014' 'DEV021' 'DEV009'
 'DEV042' 'DEV012']. selecting training data. (91, 109, 91, 299). 2.0 GiB. selecting test data. 4.2 GiB. (91, 109, 91, 625). regressing. 2.0 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.1128512397921414
using default regressor. In order to test on a training group of 20 items, holding out the following subjects:['DEV023' 'DEV026' 'DEV040' 'DEV039' 'DEV021' 'DEV013' 'DEV034' 'DEV015'
 'DEV005' 'DEV012']. selecting training data. (91, 109, 91, 294). 2.0 GiB. selecting test data. 4.2 GiB. (91, 109, 91, 622). regressing. 2.0 GiB. trying regressor 1 of 1. predicting. test score was:. 0.14881587021720322
In order to test on a training group of 20 items, holding out the following subjects:['DEV018' 'DEV001' 'DEV035' 'DEV017' 'DEV042' 'DEV010' 'DEV006' 'DEV022'
 'DEV036' 'DEV024']. selecting training data. (91, 109, 91, 301). 2.0 GiB. selecting test data. 4.1 GiB. (91, 109, 91, 611). regressing. 2.0 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.033363322350682734
In order to test on a training group of 20 items, holding out the following subjects:['DEV019' 'DEV016' 'DEV041' 'DEV029' 'DEV028' 'DEV030' 'DEV027' 'DEV014'
 'DEV009' 'DEV025']. selecting training data. (91, 109, 91, 297). 2.0 GiB. selecting test data. 4.0 GiB. (91, 109, 91, 602). regressing. 2.0 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.052454493732482343
using default regressor. In order to test on a training group of 20 items, holding out the following subjects:['DEV023' 'DEV029' 'DEV026' 'DEV027' 'DEV039' 'DEV021' 'DEV042' 'DEV010'
 'DEV005' 'DEV022']. selecting training data. (91, 109, 91, 294). 2.0 GiB. selecting test data. 4.2 GiB. (91, 109, 91, 624). regressing. 2.0 GiB. trying regressor 1 of 1. predicting. test score was:. 0.07633562210373124
In order to test on a training group of 20 items, holding out the following subjects:['DEV016' 'DEV041' 'DEV030' 'DEV040' 'DEV014' 'DEV009' 'DEV034' 'DEV017'
 'DEV012' 'DEV025']. selecting training data. (91, 109, 91, 298). 2.0 GiB. selecting test data. 4.1 GiB. (91, 109, 91, 609). regressing. 2.0 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.029169417648594487
In order to test on a training group of 20 items, holding out the following subjects:['DEV019' 'DEV018' 'DEV001' 'DEV028' 'DEV035' 'DEV013' 'DEV006' 'DEV015'
 'DEV036' 'DEV024']. selecting training data. (91, 109, 91, 300). 2.0 GiB. selecting test data. 4.0 GiB. (91, 109, 91, 602). regressing. 2.0 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.1469000992666767
attempt:[0.08814339041826484, 0.08889161102677841, 0.07834291386673375, 0.07821122876678943, 0.08413504633966747]
using default regressor. In order to test on a training group of 24 items, holding out the following subjects:['DEV023' 'DEV014' 'DEV013' 'DEV017' 'DEV015' 'DEV025']. selecting training data. (91, 109, 91, 358). 2.4 GiB. selecting test data. 2.4 GiB. (91, 109, 91, 355). regressing. 2.4 GiB. trying regressor 1 of 1. predicting. test score was:. -0.049677300891257214
In order to test on a training group of 24 items, holding out the following subjects:['DEV018' 'DEV026' 'DEV040' 'DEV039' 'DEV012' 'DEV024']. selecting training data. (91, 109, 91, 356). 2.4 GiB. selecting test data. 2.5 GiB. (91, 109, 91, 375). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.023149904099926788
In order to test on a training group of 24 items, holding out the following subjects:['DEV019' 'DEV041' 'DEV035' 'DEV042' 'DEV010' 'DEV005']. selecting training data. (91, 109, 91, 360). 2.4 GiB. selecting test data. 2.4 GiB. (91, 109, 91, 353). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.05717805814816679
In order to test on a training group of 24 items, holding out the following subjects:['DEV028' 'DEV027' 'DEV021' 'DEV034' 'DEV022' 'DEV036']. selecting training data. (91, 109, 91, 355). 2.4 GiB. selecting test data. 2.5 GiB. (91, 109, 91, 377). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.07609050223289626
In order to test on a training group of 24 items, holding out the following subjects:['DEV001' 'DEV016' 'DEV029' 'DEV030' 'DEV009' 'DEV006']. selecting training data. (91, 109, 91, 355). 2.4 GiB. selecting test data. 2.5 GiB. (91, 109, 91, 375). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.21424231016572992
using default regressor. In order to test on a training group of 24 items, holding out the following subjects:['DEV023' 'DEV029' 'DEV030' 'DEV035' 'DEV009' 'DEV024']. selecting training data. (91, 109, 91, 356). 2.4 GiB. selecting test data. 2.5 GiB. (91, 109, 91, 378). regressing. 2.4 GiB. trying regressor 1 of 1. predicting. test score was:. 0.09424648258131096
In order to test on a training group of 24 items, holding out the following subjects:['DEV019' 'DEV018' 'DEV040' 'DEV039' 'DEV021' 'DEV022']. selecting training data. (91, 109, 91, 356). 2.4 GiB. selecting test data. 2.4 GiB. (91, 109, 91, 358). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.1315032762018966
In order to test on a training group of 24 items, holding out the following subjects:['DEV016' 'DEV034' 'DEV042' 'DEV012' 'DEV025' 'DEV036']. selecting training data. (91, 109, 91, 361). 2.4 GiB. selecting test data. 2.5 GiB. (91, 109, 91, 374). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.07290642976885842
In order to test on a training group of 24 items, holding out the following subjects:['DEV001' 'DEV041' 'DEV026' 'DEV013' 'DEV006' 'DEV015']. selecting training data. (91, 109, 91, 354). 2.4 GiB. selecting test data. 2.5 GiB. (91, 109, 91, 369). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.15426448410272997
In order to test on a training group of 24 items, holding out the following subjects:['DEV028' 'DEV027' 'DEV014' 'DEV017' 'DEV010' 'DEV005']. selecting training data. (91, 109, 91, 357). 2.4 GiB. selecting test data. 2.4 GiB. (91, 109, 91, 356). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.006969513544133887
using default regressor. In order to test on a training group of 24 items, holding out the following subjects:['DEV041' 'DEV029' 'DEV028' 'DEV035' 'DEV014' 'DEV012']. selecting training data. (91, 109, 91, 357). 2.4 GiB. selecting test data. 2.5 GiB. (91, 109, 91, 375). regressing. 2.4 GiB. trying regressor 1 of 1. predicting. test score was:. 0.09068194303317723
In order to test on a training group of 24 items, holding out the following subjects:['DEV023' 'DEV026' 'DEV017' 'DEV042' 'DEV025' 'DEV024']. selecting training data. (91, 109, 91, 361). 2.4 GiB. selecting test data. 2.4 GiB. (91, 109, 91, 360). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.13845470433327534
In order to test on a training group of 24 items, holding out the following subjects:['DEV018' 'DEV001' 'DEV040' 'DEV013' 'DEV034' 'DEV022']. selecting training data. (91, 109, 91, 354). 2.4 GiB. selecting test data. 2.5 GiB. (91, 109, 91, 378). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.10839181354942673
In order to test on a training group of 24 items, holding out the following subjects:['DEV019' 'DEV030' 'DEV027' 'DEV009' 'DEV006' 'DEV005']. selecting training data. (91, 109, 91, 357). 2.4 GiB. selecting test data. 2.4 GiB. (91, 109, 91, 355). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.1412802640995111
In order to test on a training group of 24 items, holding out the following subjects:['DEV016' 'DEV039' 'DEV021' 'DEV010' 'DEV015' 'DEV036']. selecting training data. (91, 109, 91, 355). 2.4 GiB. selecting test data. 2.5 GiB. (91, 109, 91, 367). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.06814596292672426
using default regressor. In order to test on a training group of 24 items, holding out the following subjects:['DEV018' 'DEV030' 'DEV035' 'DEV027' 'DEV042' 'DEV022']. selecting training data. (91, 109, 91, 358). 2.4 GiB. selecting test data. 2.5 GiB. (91, 109, 91, 377). regressing. 2.4 GiB. trying regressor 1 of 1. predicting. test score was:. 0.12701535316258072
In order to test on a training group of 24 items, holding out the following subjects:['DEV001' 'DEV041' 'DEV013' 'DEV017' 'DEV025' 'DEV024']. selecting training data. (91, 109, 91, 358). 2.4 GiB. selecting test data. 2.4 GiB. (91, 109, 91, 362). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.040361638188362026
In order to test on a training group of 24 items, holding out the following subjects:['DEV040' 'DEV014' 'DEV039' 'DEV010' 'DEV012' 'DEV036']. selecting training data. (91, 109, 91, 357). 2.4 GiB. selecting test data. 2.5 GiB. (91, 109, 91, 371). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.05515637539011908
In order to test on a training group of 24 items, holding out the following subjects:['DEV019' 'DEV028' 'DEV021' 'DEV009' 'DEV034' 'DEV015']. selecting training data. (91, 109, 91, 358). 2.4 GiB. selecting test data. 2.3 GiB. (91, 109, 91, 349). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.1476505004231794
In order to test on a training group of 24 items, holding out the following subjects:['DEV023' 'DEV016' 'DEV029' 'DEV026' 'DEV006' 'DEV005']. selecting training data. (91, 109, 91, 353). 2.4 GiB. selecting test data. 2.5 GiB. (91, 109, 91, 376). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.19754282472724138
using default regressor. In order to test on a training group of 24 items, holding out the following subjects:['DEV028' 'DEV027' 'DEV039' 'DEV034' 'DEV015' 'DEV012']. selecting training data. (91, 109, 91, 358). 2.4 GiB. selecting test data. 2.5 GiB. (91, 109, 91, 369). regressing. 2.4 GiB. trying regressor 1 of 1. predicting. test score was:. 0.11685268247322556
In order to test on a training group of 24 items, holding out the following subjects:['DEV019' 'DEV041' 'DEV035' 'DEV009' 'DEV006' 'DEV036']. selecting training data. (91, 109, 91, 359). 2.4 GiB. selecting test data. 2.4 GiB. (91, 109, 91, 358). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.16716038404514977
In order to test on a training group of 24 items, holding out the following subjects:['DEV016' 'DEV030' 'DEV014' 'DEV005' 'DEV022' 'DEV025']. selecting training data. (91, 109, 91, 352). 2.4 GiB. selecting test data. 2.5 GiB. (91, 109, 91, 373). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.05228208805439871
In order to test on a training group of 24 items, holding out the following subjects:['DEV001' 'DEV029' 'DEV021' 'DEV013' 'DEV017' 'DEV042']. selecting training data. (91, 109, 91, 361). 2.4 GiB. selecting test data. 2.5 GiB. (91, 109, 91, 365). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.03726006748788924
In order to test on a training group of 24 items, holding out the following subjects:['DEV018' 'DEV023' 'DEV026' 'DEV040' 'DEV010' 'DEV024']. selecting training data. (91, 109, 91, 354). 2.4 GiB. selecting test data. 2.5 GiB. (91, 109, 91, 370). regressing. 2.4 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.0010491902775677797
attempt:[0.0641966947510925, 0.06281546533224261, 0.0540090558551128, 0.09740068310295172, 0.07492088246764621]
using default regressor. In order to test on a training group of 27 items, holding out the following subjects:['DEV018' 'DEV006' 'DEV005']. selecting training data. (91, 109, 91, 398). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 187). regressing. 2.7 GiB. trying regressor 1 of 1. predicting. test score was:. 0.26525255005851345
In order to test on a training group of 27 items, holding out the following subjects:['DEV023' 'DEV016' 'DEV024']. selecting training data. (91, 109, 91, 401). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 190). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.11569688757795826
In order to test on a training group of 27 items, holding out the following subjects:['DEV019' 'DEV026' 'DEV022']. selecting training data. (91, 109, 91, 402). 2.7 GiB. selecting test data. 1.1 GiB. (91, 109, 91, 169). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.12194935012057306
In order to test on a training group of 27 items, holding out the following subjects:['DEV040' 'DEV013' 'DEV034']. selecting training data. (91, 109, 91, 400). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 188). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.07666797018951299
In order to test on a training group of 27 items, holding out the following subjects:['DEV039' 'DEV042' 'DEV025']. selecting training data. (91, 109, 91, 403). 2.7 GiB. selecting test data. 1.2 GiB. (91, 109, 91, 185). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.07476236184137619
In order to test on a training group of 27 items, holding out the following subjects:['DEV028' 'DEV012' 'DEV036']. selecting training data. (91, 109, 91, 404). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 186). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.005080871300414125
In order to test on a training group of 27 items, holding out the following subjects:['DEV029' 'DEV014' 'DEV010']. selecting training data. (91, 109, 91, 399). 2.7 GiB. selecting test data. 1.2 GiB. (91, 109, 91, 183). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.12650100822080224
In order to test on a training group of 27 items, holding out the following subjects:['DEV009' 'DEV017' 'DEV015']. selecting training data. (91, 109, 91, 405). 2.7 GiB. selecting test data. 1.1 GiB. (91, 109, 91, 168). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.07675599212134676
In order to test on a training group of 27 items, holding out the following subjects:['DEV001' 'DEV027' 'DEV021']. selecting training data. (91, 109, 91, 402). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 190). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.1819763281219653
In order to test on a training group of 27 items, holding out the following subjects:['DEV041' 'DEV030' 'DEV035']. selecting training data. (91, 109, 91, 400). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 189). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.10181284649028877
using default regressor. In order to test on a training group of 27 items, holding out the following subjects:['DEV029' 'DEV040' 'DEV024']. selecting training data. (91, 109, 91, 402). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 187). regressing. 2.7 GiB. trying regressor 1 of 1. predicting. test score was:. 0.0010545005092640336
In order to test on a training group of 27 items, holding out the following subjects:['DEV018' 'DEV015' 'DEV022']. selecting training data. (91, 109, 91, 400). 2.7 GiB. selecting test data. 1.2 GiB. (91, 109, 91, 182). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.23544936314641407
In order to test on a training group of 27 items, holding out the following subjects:['DEV016' 'DEV030' 'DEV013']. selecting training data. (91, 109, 91, 399). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 188). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.08853281236098054
In order to test on a training group of 27 items, holding out the following subjects:['DEV019' 'DEV014' 'DEV017']. selecting training data. (91, 109, 91, 406). 2.7 GiB. selecting test data. 1.1 GiB. (91, 109, 91, 157). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.36454040484894334
In order to test on a training group of 27 items, holding out the following subjects:['DEV039' 'DEV034' 'DEV010']. selecting training data. (91, 109, 91, 398). 2.7 GiB. selecting test data. 1.2 GiB. (91, 109, 91, 185). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.17862379773084447
In order to test on a training group of 27 items, holding out the following subjects:['DEV001' 'DEV028' 'DEV009']. selecting training data. (91, 109, 91, 402). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 186). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.2085916991833131
In order to test on a training group of 27 items, holding out the following subjects:['DEV035' 'DEV006' 'DEV005']. selecting training data. (91, 109, 91, 400). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 188). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.2755155670645447
In order to test on a training group of 27 items, holding out the following subjects:['DEV023' 'DEV041' 'DEV025']. selecting training data. (91, 109, 91, 400). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 186). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.1480116356715533
In order to test on a training group of 27 items, holding out the following subjects:['DEV026' 'DEV027' 'DEV021']. selecting training data. (91, 109, 91, 400). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 188). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.1907208562568048
In order to test on a training group of 27 items, holding out the following subjects:['DEV042' 'DEV012' 'DEV036']. selecting training data. (91, 109, 91, 407). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 188). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.7661612760545193
using default regressor. In order to test on a training group of 27 items, holding out the following subjects:['DEV023' 'DEV035' 'DEV025']. selecting training data. (91, 109, 91, 402). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 187). regressing. 2.7 GiB. trying regressor 1 of 1. predicting. test score was:. 0.10404246547832807
In order to test on a training group of 27 items, holding out the following subjects:['DEV017' 'DEV010' 'DEV006']. selecting training data. (91, 109, 91, 402). 2.7 GiB. selecting test data. 1.1 GiB. (91, 109, 91, 168). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.0641310070676443
In order to test on a training group of 27 items, holding out the following subjects:['DEV016' 'DEV026' 'DEV012']. selecting training data. (91, 109, 91, 402). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 187). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.20046275478077002
In order to test on a training group of 27 items, holding out the following subjects:['DEV019' 'DEV001' 'DEV030']. selecting training data. (91, 109, 91, 404). 2.7 GiB. selecting test data. 1.1 GiB. (91, 109, 91, 170). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.286344773524543
In order to test on a training group of 27 items, holding out the following subjects:['DEV018' 'DEV040' 'DEV024']. selecting training data. (91, 109, 91, 401). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 187). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.03008200398682548
In order to test on a training group of 27 items, holding out the following subjects:['DEV028' 'DEV014' 'DEV013']. selecting training data. (91, 109, 91, 399). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 186). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.034828798367610436
In order to test on a training group of 27 items, holding out the following subjects:['DEV029' 'DEV021' 'DEV009']. selecting training data. (91, 109, 91, 400). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 188). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.09827530933725814
In order to test on a training group of 27 items, holding out the following subjects:['DEV041' 'DEV039' 'DEV015']. selecting training data. (91, 109, 91, 400). 2.7 GiB. selecting test data. 1.2 GiB. (91, 109, 91, 183). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.03758348501627762
In order to test on a training group of 27 items, holding out the following subjects:['DEV034' 'DEV042' 'DEV036']. selecting training data. (91, 109, 91, 404). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 190). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.2799945621852211
In order to test on a training group of 27 items, holding out the following subjects:['DEV027' 'DEV005' 'DEV022']. selecting training data. (91, 109, 91, 400). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 189). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.15530496770821545
using default regressor. In order to test on a training group of 27 items, holding out the following subjects:['DEV028' 'DEV039' 'DEV015']. selecting training data. (91, 109, 91, 401). 2.7 GiB. selecting test data. 1.2 GiB. (91, 109, 91, 180). regressing. 2.7 GiB. trying regressor 1 of 1. predicting. test score was:. 0.13308336675259258
In order to test on a training group of 27 items, holding out the following subjects:['DEV018' 'DEV029' 'DEV035']. selecting training data. (91, 109, 91, 401). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 190). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.22592648596356035
In order to test on a training group of 27 items, holding out the following subjects:['DEV041' 'DEV010' 'DEV022']. selecting training data. (91, 109, 91, 398). 2.7 GiB. selecting test data. 1.2 GiB. (91, 109, 91, 183). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.05866554340710772
In order to test on a training group of 27 items, holding out the following subjects:['DEV001' 'DEV025' 'DEV036']. selecting training data. (91, 109, 91, 403). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 187). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.09013338986017327
In order to test on a training group of 27 items, holding out the following subjects:['DEV034' 'DEV017' 'DEV042']. selecting training data. (91, 109, 91, 406). 2.7 GiB. selecting test data. 1.2 GiB. (91, 109, 91, 176). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.17830634866178352
In order to test on a training group of 27 items, holding out the following subjects:['DEV013' 'DEV005' 'DEV012']. selecting training data. (91, 109, 91, 401). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 188). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.07554026741486997
In order to test on a training group of 27 items, holding out the following subjects:['DEV040' 'DEV009' 'DEV006']. selecting training data. (91, 109, 91, 401). 2.7 GiB. selecting test data. 1.2 GiB. (91, 109, 91, 184). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.09670114069023861
In order to test on a training group of 27 items, holding out the following subjects:['DEV026' 'DEV027' 'DEV024']. selecting training data. (91, 109, 91, 401). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 188). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.06694452394977723
In order to test on a training group of 27 items, holding out the following subjects:['DEV019' 'DEV016' 'DEV021']. selecting training data. (91, 109, 91, 403). 2.7 GiB. selecting test data. 1.1 GiB. (91, 109, 91, 170). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.09386401234344521
In order to test on a training group of 27 items, holding out the following subjects:['DEV023' 'DEV030' 'DEV014']. selecting training data. (91, 109, 91, 399). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 189). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.09149777857553221
using default regressor. In order to test on a training group of 27 items, holding out the following subjects:['DEV018' 'DEV021' 'DEV009']. selecting training data. (91, 109, 91, 399). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 188). regressing. 2.7 GiB. trying regressor 1 of 1. predicting. test score was:. 0.15788208881713472
In order to test on a training group of 27 items, holding out the following subjects:['DEV035' 'DEV040' 'DEV025']. selecting training data. (91, 109, 91, 403). 2.7 GiB. selecting test data. 1.2 GiB. (91, 109, 91, 184). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.029752133993287355
In order to test on a training group of 27 items, holding out the following subjects:['DEV030' 'DEV027' 'DEV010']. selecting training data. (91, 109, 91, 400). 2.7 GiB. selecting test data. 1.2 GiB. (91, 109, 91, 182). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.16391818249156898
In order to test on a training group of 27 items, holding out the following subjects:['DEV026' 'DEV005' 'DEV012']. selecting training data. (91, 109, 91, 401). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 187). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.21798652856478684
In order to test on a training group of 27 items, holding out the following subjects:['DEV028' 'DEV006' 'DEV022']. selecting training data. (91, 109, 91, 399). 2.7 GiB. selecting test data. 1.2 GiB. (91, 109, 91, 184). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.08012451031650902
In order to test on a training group of 27 items, holding out the following subjects:['DEV001' 'DEV016' 'DEV014']. selecting training data. (91, 109, 91, 401). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 190). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.18471900678031294
In order to test on a training group of 27 items, holding out the following subjects:['DEV039' 'DEV036' 'DEV024']. selecting training data. (91, 109, 91, 401). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 191). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.023141435272355793
In order to test on a training group of 27 items, holding out the following subjects:['DEV023' 'DEV041' 'DEV029']. selecting training data. (91, 109, 91, 400). 2.7 GiB. selecting test data. 1.3 GiB. (91, 109, 91, 190). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.013325802064383052
In order to test on a training group of 27 items, holding out the following subjects:['DEV013' 'DEV017' 'DEV042']. selecting training data. (91, 109, 91, 406). 2.7 GiB. selecting test data. 1.2 GiB. (91, 109, 91, 175). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. -0.12846896552984344
In order to test on a training group of 27 items, holding out the following subjects:['DEV019' 'DEV034' 'DEV015']. selecting training data. (91, 109, 91, 404). 2.7 GiB. selecting test data. 1.1 GiB. (91, 109, 91, 164). regressing. 2.7 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.21265858691608175
attempt:[0.06120256829613887, -0.010022472032285023, 0.06708969951086005, 0.05737833805751671, 0.08492521711552907]


In [19]:
split_dict_30

{3: [0.08814339041826484,
  0.08889161102677841,
  0.07834291386673375,
  0.07821122876678943,
  0.08413504633966747],
 5: [0.0641966947510925,
  0.06281546533224261,
  0.0540090558551128,
  0.09740068310295172,
  0.07492088246764621],
 10: [0.06120256829613887,
  -0.010022472032285023,
  0.06708969951086005,
  0.05737833805751671,
  0.08492521711552907]}

In [23]:
get_summary_stats(split_dict_30)

[{3: {'mean': 0.08354483808364678, 'sd': 0.004595441527611343}},
 {5: {'mean': 0.07066855630180915, 'sd': 0.014925038465440786}},
 {10: {'mean': 0.05211467018955194, 'sd': 0.032471846352409746}},
 {2: {'mean': 0.06252641986765033, 'sd': 0.02081765232658925}}]

In [22]:
for n_split in [2]:
    attempt = []
    for i in range(5):
        test_scores_different = cv_train_test_sets(
            trainset_X=first_subs_grouped_nifti,
            trainset_y=first_subs_grouped_nifti_Y,
            trainset_groups=first_subs_grouped_nifti_groups,
            testset_X=first_subs_nifti,
            testset_y=first_subs_nifti_Y,
            testset_groups=first_subs_nifti_groups,
            cv = KFold(n_splits = n_split,shuffle=True)
        )
        attempt = attempt + [np.mean(test_scores_different)]
    print("attempt:" + str(attempt))
    split_dict_30[n_split] = attempt


using default regressor. In order to test on a training group of 15 items, holding out the following subjects:['DEV019' 'DEV001' 'DEV041' 'DEV035' 'DEV040' 'DEV027' 'DEV039' 'DEV021'
 'DEV013' 'DEV009' 'DEV034' 'DEV017' 'DEV005' 'DEV036' 'DEV024']. selecting training data. (91, 109, 91, 226). 1.5 GiB. selecting test data. 6.2 GiB. (91, 109, 91, 915). regressing. 1.5 GiB. trying regressor 1 of 1. predicting. test score was:. 0.03985298630731049
In order to test on a training group of 15 items, holding out the following subjects:['DEV018' 'DEV023' 'DEV016' 'DEV029' 'DEV028' 'DEV026' 'DEV030' 'DEV014'
 'DEV042' 'DEV010' 'DEV006' 'DEV015' 'DEV012' 'DEV022' 'DEV025']. selecting training data. (91, 109, 91, 220). 1.5 GiB. selecting test data. 6.2 GiB. (91, 109, 91, 920). regressing. 1.5 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.07249206329258251
using default regressor. In order to test on a training group of 15 items, holding out the following subjects:['DEV019' 'DEV016' 'DEV041' 'DEV028' 'DEV026' 'DEV035' 'DEV014' 'DEV009'
 'DEV034' 'DEV010' 'DEV015' 'DEV005' 'DEV012' 'DEV025' 'DEV036']. selecting training data. (91, 109, 91, 223). 1.5 GiB. selecting test data. 6.1 GiB. (91, 109, 91, 906). regressing. 1.5 GiB. trying regressor 1 of 1. predicting. test score was:. 0.021567051703714313
In order to test on a training group of 15 items, holding out the following subjects:['DEV018' 'DEV023' 'DEV001' 'DEV029' 'DEV030' 'DEV040' 'DEV027' 'DEV039'
 'DEV021' 'DEV013' 'DEV017' 'DEV042' 'DEV006' 'DEV022' 'DEV024']. selecting training data. (91, 109, 91, 223). 1.5 GiB. selecting test data. 6.2 GiB. (91, 109, 91, 929). regressing. 1.5 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.0345711893964491
using default regressor. In order to test on a training group of 15 items, holding out the following subjects:['DEV019' 'DEV018' 'DEV041' 'DEV029' 'DEV026' 'DEV035' 'DEV027' 'DEV021'
 'DEV013' 'DEV009' 'DEV010' 'DEV015' 'DEV022' 'DEV025' 'DEV024']. selecting training data. (91, 109, 91, 220). 1.5 GiB. selecting test data. 6.1 GiB. (91, 109, 91, 908). regressing. 1.5 GiB. trying regressor 1 of 1. predicting. test score was:. 0.09676325823052268
In order to test on a training group of 15 items, holding out the following subjects:['DEV023' 'DEV001' 'DEV016' 'DEV028' 'DEV030' 'DEV040' 'DEV014' 'DEV039'
 'DEV034' 'DEV017' 'DEV042' 'DEV006' 'DEV005' 'DEV012' 'DEV036']. selecting training data. (91, 109, 91, 226). 1.5 GiB. selecting test data. 6.2 GiB. (91, 109, 91, 927). regressing. 1.5 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.06388964341210968
using default regressor. In order to test on a training group of 15 items, holding out the following subjects:['DEV023' 'DEV001' 'DEV016' 'DEV029' 'DEV026' 'DEV030' 'DEV040' 'DEV014'
 'DEV021' 'DEV013' 'DEV009' 'DEV034' 'DEV015' 'DEV012' 'DEV024']. selecting training data. (91, 109, 91, 220). 1.5 GiB. selecting test data. 6.3 GiB. (91, 109, 91, 935). regressing. 1.5 GiB. trying regressor 1 of 1. predicting. test score was:. 0.13555410849817473
In order to test on a training group of 15 items, holding out the following subjects:['DEV019' 'DEV018' 'DEV041' 'DEV028' 'DEV035' 'DEV027' 'DEV039' 'DEV017'
 'DEV042' 'DEV010' 'DEV006' 'DEV005' 'DEV022' 'DEV025' 'DEV036']. selecting training data. (91, 109, 91, 226). 1.5 GiB. selecting test data. 6.1 GiB. (91, 109, 91, 900). regressing. 1.5 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.03925303808802083
using default regressor. In order to test on a training group of 15 items, holding out the following subjects:['DEV019' 'DEV018' 'DEV001' 'DEV029' 'DEV026' 'DEV014' 'DEV039' 'DEV021'
 'DEV017' 'DEV042' 'DEV010' 'DEV005' 'DEV025' 'DEV036' 'DEV024']. selecting training data. (91, 109, 91, 225). 1.5 GiB. selecting test data. 6.1 GiB. (91, 109, 91, 904). regressing. 1.5 GiB. trying regressor 1 of 1. predicting. test score was:. 0.0480011393916816
In order to test on a training group of 15 items, holding out the following subjects:['DEV023' 'DEV016' 'DEV041' 'DEV028' 'DEV030' 'DEV035' 'DEV040' 'DEV027'
 'DEV013' 'DEV009' 'DEV034' 'DEV006' 'DEV015' 'DEV012' 'DEV022']. selecting training data. (91, 109, 91, 221). 1.5 GiB. selecting test data. 6.3 GiB. (91, 109, 91, 931). regressing. 1.5 GiB. trying regressor 1 of 1. 



predicting. test score was:. 0.07331972035593737
attempt:[0.0561725247999465, 0.028069120550081705, 0.08032645082131618, 0.08740357329309778, 0.060660429873809485]
