# 1. Subject averaging.

## Goals:

Subjects viewed two slightly different types of video. Rather than averaging the data across all subjects, it is better to do due dilligence and average them depending on the specific video that they viewed. 

I want to determine who viewed what video, average accordingly and then save in the appropriate format.

In [1]:
%load_ext autoreload
%autoreload 2
from cfhcpy.base import AnalysisBase
ab = AnalysisBase()

  from tqdm.autonotebook import tqdm

 | Using Nistats with Nilearn versions >= 0.7.0 is redundant and potentially conflicting.
 | Nilearn versions 0.7.0 and up offer all the functionality of Nistats as well the latest features and fixes.
 | We strongly recommend uninstalling Nistats and using Nilearn's stats & reporting modules.

  from nistats.hemodynamic_models import spm_hrf


Start an analysis base with an arbitrary subject, just so we have access to all the paths etc.

In [2]:
from funcs import HCP_subject
subno=114823

In [3]:
ab.startup(subject=str(subno), experiment_id='movie', yaml_file='/tank/hedger/software/hcp_movie/config.yml')

Starting analysis of subject 114823 on romulus with settings 
{
 "identifier": "node230",
 "base_dir": "/tank/shared/2020/hcp_{experiment}/",
 "code_dir": "/tank/hedger/scripts/HCP_tonotopy",
 "threads": 40
}


In [4]:
len(ab.full_data_subjects)

174

In [15]:
ab.experiment_dict

{'wc_exp': 'MOVIE',
 'runs': [1, 2, 3, 4],
 'test_duration': 103,
 'run_durations': [921, 918, 915, 901],
 'data_file_wildcard': 'tfMRI_{experiment_id}{run}_*_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries_sg_psc.nii'}

Go through all the subjects, use their CSV file to return the video they watched. Dump this in a list.

In [None]:
movtype=[]
dpaths=[]
from tqdm import tqdm

for sub in tqdm(ab.full_data_subjects):
    tbase = AnalysisBase()
    tbase.startup(subject=str(sub), experiment_id='movie', yaml_file='/tank/hedger/software/hcp_movie/config.yml')
    mysub=HCP_subject(tbase)
    movtype.append(mysub.vidprefix)
    mysub.get_data_paths()
    dpaths.append(mysub.dpaths)
    tbase=None

In [6]:
import numpy as np
np.unique(movtype)

array(['movies/Post_20140821_version/', 'movies/Pre_20140821_version/'],
      dtype='<U29')

In [7]:
from collections import Counter
Counter(movtype).keys() 

dict_keys(['movies/Post_20140821_version/', 'movies/Pre_20140821_version/'])

Count the number of participants that viewed each video.

In [8]:
Counter(movtype).values()

dict_values([132, 42])

It would seem that most viewed the new video. Get the indices of the participants that viewed each video.

In [9]:
earlyvidsubs= [i for i, s in enumerate(movtype) if 'Pre_20140821' in s]
latevidsubs= [i for i, s in enumerate(movtype) if 'Post_20140821' in s]

In [14]:
def read_array(dpaths,s,movind):
    data=nib.load(dpaths[s][movind])
    darray=np.array(data.get_data())
    return darray

In [23]:
from joblib import Parallel, delayed

We want to output the early and late subjects to different directories. Do this in my own space so as to not disrupt the original folder. 

In [16]:
import cifti
earlydir='/tank/hedger/DATA/HCP_temp/early'
latedir='/tank/hedger/DATA/HCP_temp/late'

We might as well preserve the same kind of filename so that all of the HCP_movie functionality will still work.

In [25]:
PEDS=['AP','PA','PA','AP']

In [31]:
import nibabel as nib
import os

def create_mean_data(dpaths,subs,movienum,outputdir,save):
    
    sub_readin=Parallel(n_jobs=20,verbose=1)(delayed(read_array)(dpaths,s,movienum)  for s in subs) #Read in all the data
    sub_array=np.array(sub_readin) # Make into big array
    sub_meandat=np.mean(sub_array,0) # Take mean.
    
    if save==True:
        
        tps=np.array(range(ab.experiment_dict['run_durations'][movienum])).astype('str') # The cifti writing seems to require us to give the number of timepoints
        
        fname=os.path.join(outputdir,'tfMRI_{experiment_id}{run}_{PED}_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries_sg_psc.nii'.format(experiment_id='MOVIE',run=str(ab.experiment_dict['runs'][movienum]),PED=PEDS[movienum]))        
        
        
        cifti.write(fname, sub_meandat,(cifti.Scalar.from_names(tps),ab.cifti_brain_model))
    
    return sub_meandat,fname

Now go though the early and late subjects, save the mean volume for each movie and then split it into surface giftis and subcortex data.

In [None]:
from cfhcpy.surf_utils import split_cii

def split_dfile(f2split):
        # Summons workbench to split a data file
        nfileL,nfileR=f2split[:-4]+'_L.gii',f2split[:-4]+'_R.gii'
        if not os.path.isfile(nfileL):
            result=split_cii(fn=f2split,workbench_split_command=ab.workbench_split_command,resample=False)
        else:
            print(f'split files already exist subject {ab.subject}','Skipping..')
        
        return nfileL,nfileR

In [32]:
x=create_mean_data(dpaths,earlyvidsubs,0,earlydir,True)
mysub.split_dfile(x[1])

[Parallel(n_jobs=20)]: Using backend LokyBackend with 20 concurrent workers.
[Parallel(n_jobs=20)]: Done  42 out of  42 | elapsed:   50.6s finished


('/tank/hedger/DATA/HCP_temp/early/tfMRI_MOVIE1_AP_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries_sg_psc_L.gii',
 '/tank/hedger/DATA/HCP_temp/early/tfMRI_MOVIE1_AP_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries_sg_psc_R.gii')

In [33]:
x=create_mean_data(dpaths,earlyvidsubs,1,earlydir,True)
mysub.split_dfile(x[1])

[Parallel(n_jobs=20)]: Using backend LokyBackend with 20 concurrent workers.
[Parallel(n_jobs=20)]: Done  42 out of  42 | elapsed:   53.6s finished


('/tank/hedger/DATA/HCP_temp/early/tfMRI_MOVIE2_PA_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries_sg_psc_L.gii',
 '/tank/hedger/DATA/HCP_temp/early/tfMRI_MOVIE2_PA_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries_sg_psc_R.gii')

In [34]:
x=create_mean_data(dpaths,earlyvidsubs,2,earlydir,True)
mysub.split_dfile(x[1])

[Parallel(n_jobs=20)]: Using backend LokyBackend with 20 concurrent workers.
[Parallel(n_jobs=20)]: Done  42 out of  42 | elapsed:   57.8s finished


('/tank/hedger/DATA/HCP_temp/early/tfMRI_MOVIE3_PA_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries_sg_psc_L.gii',
 '/tank/hedger/DATA/HCP_temp/early/tfMRI_MOVIE3_PA_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries_sg_psc_R.gii')

In [35]:
x=create_mean_data(dpaths,earlyvidsubs,3,earlydir,True)
mysub.split_dfile(x[1])

[Parallel(n_jobs=20)]: Using backend LokyBackend with 20 concurrent workers.
[Parallel(n_jobs=20)]: Done  42 out of  42 | elapsed:   50.9s finished


('/tank/hedger/DATA/HCP_temp/early/tfMRI_MOVIE4_AP_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries_sg_psc_L.gii',
 '/tank/hedger/DATA/HCP_temp/early/tfMRI_MOVIE4_AP_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries_sg_psc_R.gii')

In [38]:
x=None

In [37]:
x=create_mean_data(dpaths,latevidsubs,0,latedir,True)
mysub.split_dfile(x[1])

[Parallel(n_jobs=20)]: Using backend LokyBackend with 20 concurrent workers.
[Parallel(n_jobs=20)]: Done  10 tasks      | elapsed:   16.1s
[Parallel(n_jobs=20)]: Done 132 out of 132 | elapsed:  2.6min finished


('/tank/hedger/DATA/HCP_temp/late/tfMRI_MOVIE1_AP_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries_sg_psc_L.gii',
 '/tank/hedger/DATA/HCP_temp/late/tfMRI_MOVIE1_AP_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries_sg_psc_R.gii')

In [39]:
x=create_mean_data(dpaths,latevidsubs,1,latedir,True)
mysub.split_dfile(x[1])

[Parallel(n_jobs=20)]: Using backend LokyBackend with 20 concurrent workers.
[Parallel(n_jobs=20)]: Done  10 tasks      | elapsed:   16.8s
[Parallel(n_jobs=20)]: Done 132 out of 132 | elapsed:  2.7min finished


('/tank/hedger/DATA/HCP_temp/late/tfMRI_MOVIE2_PA_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries_sg_psc_L.gii',
 '/tank/hedger/DATA/HCP_temp/late/tfMRI_MOVIE2_PA_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries_sg_psc_R.gii')

In [41]:
x=create_mean_data(dpaths,latevidsubs,2,latedir,True)
mysub.split_dfile(x[1])

[Parallel(n_jobs=20)]: Using backend LokyBackend with 20 concurrent workers.
[Parallel(n_jobs=20)]: Done  10 tasks      | elapsed:   15.4s
[Parallel(n_jobs=20)]: Done 132 out of 132 | elapsed:  2.5min finished


('/tank/hedger/DATA/HCP_temp/late/tfMRI_MOVIE3_PA_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries_sg_psc_L.gii',
 '/tank/hedger/DATA/HCP_temp/late/tfMRI_MOVIE3_PA_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries_sg_psc_R.gii')

In [43]:
x=create_mean_data(dpaths,latevidsubs,3,latedir,True)
mysub.split_dfile(x[1])

[Parallel(n_jobs=20)]: Using backend LokyBackend with 20 concurrent workers.
[Parallel(n_jobs=20)]: Done  10 tasks      | elapsed:   15.7s
[Parallel(n_jobs=20)]: Done 132 out of 132 | elapsed:  2.5min finished


('/tank/hedger/DATA/HCP_temp/late/tfMRI_MOVIE4_AP_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries_sg_psc_L.gii',
 '/tank/hedger/DATA/HCP_temp/late/tfMRI_MOVIE4_AP_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries_sg_psc_R.gii')