# 1. Subject averaging.

## Goals:

Subjects viewed two slightly different types of video as described in the HCP 12000 reference manual. Rather than averaging the data across all subjects, it is better to do due dilligence and average them depending on the specific video that they viewed (even if the differences are minor). 

This allows us to investigate the agreement of the parameters estimated from two across-subject folds.

I want to determine which subject viewed what video, average accordingly and then save in the appropriate format.

### Imports

In [None]:
%load_ext autoreload
%autoreload 2
from funcs import HCP_subject
from tqdm import tqdm
import numpy as np
from collections import Counter
from joblib import Parallel, delayed

import nibabel as nib
import os
import cifti

Start an analysis base with an arbitrary subject, just so we have access to all the paths and other constants.

In [None]:
subno=114823
expt_id='movie'
yaml='/tank/hedger/scripts/Tonotopy_2021/config.yml'
tempsub=HCP_subject(str(subno),experiment_id=expt_id,yaml_file=yaml)

Load a cifti brain model for saving out the new files

In [None]:
import cifti
cifti_brain_model = cifti.read(os.path.join(tempsub.experiment_base_dir, tempsub.brainmodel_cifti_file))[1][1]

Go through all the subjects, use their CSV file to return the video they watched. Dump this in a list. Also get a list of functional files for each subject.

In [None]:
movtype=[]
dpaths=[]

for sub in tqdm(tempsub.full_data_subjects):
    mysub=HCP_subject(str(sub),experiment_id=expt_id,yaml_file=yaml)
    movtype.append(mysub.vidprefix)
    mysub.get_data_paths()
    dpaths.append(mysub.dpaths)
    tbase=None

Count the number of participants that viewed each video.

In [None]:
Counter(movtype).values()

It would seem that most viewed the new video. Get the indices of the participants that viewed each video.

In [None]:
earlyvidsubs= [i for i, s in enumerate(movtype) if 'Pre_20140821' in s]
latevidsubs= [i for i, s in enumerate(movtype) if 'Post_20140821' in s]

Define function for reading in the data for a specific run/movie

In [None]:
def read_array(dpaths,s,movind):
    data=nib.load(dpaths[s][movind])
    darray=np.array(data.get_data())
    return darray

We want to output the early and late subjects to different directories. Do this in my own space so as to not disrupt the original folder. 

In [None]:
earlydir=os.path.join(tempsub.agg_path,'early')
latedir=os.path.join(tempsub.agg_path,'late')

We need to preserve the same kind of filename so that all of the functionality will still work.

In [None]:
PEDS=['AP','PA','PA','AP']

Define a function for creating the mean data.

In [None]:
def create_mean_data(dpaths,subs,movienum,outputdir,save):
    
    sub_readin=Parallel(n_jobs=20,verbose=1)(delayed(read_array)(dpaths,s,movienum)  for s in subs) #Read in all the data
    sub_array=np.array(sub_readin) # Make into big array
    sub_meandat=np.mean(sub_array,0) # Take mean.
    
    if save==True:
        
        tps=np.array(range(tempsub.experiment_dict['run_durations'][movienum])).astype('str') # The cifti writing seems to require us to give the number of timepoints
        
        fname=os.path.join(outputdir,'tfMRI_{experiment_id}{run}_{PED}_Atlas_1.6mm_MSMAll_hp2000_clean.dtseries_sg_psc.nii'.format(experiment_id='MOVIE',run=str(tempsub.experiment_dict['runs'][movienum]),PED=PEDS[movienum]))        
        
        cifti.write(fname, sub_meandat,(cifti.Scalar.from_names(tps),cifti_brain_model))
    
    return sub_meandat,fname

Break this task down and do it for each run. It is quite memory intensive given the size of the arrays we are reading in.

In [None]:
x=create_mean_data(dpaths,earlyvidsubs,0,earlydir,True)

In [None]:
x=create_mean_data(dpaths,earlyvidsubs,1,earlydir,True)

In [None]:
x=create_mean_data(dpaths,earlyvidsubs,2,earlydir,True)

In [None]:
x=create_mean_data(dpaths,earlyvidsubs,3,earlydir,True)

In [None]:
x=create_mean_data(dpaths,latevidsubs,0,latedir,True)

In [None]:
x=create_mean_data(dpaths,latevidsubs,1,latedir,True)

In [None]:
x=create_mean_data(dpaths,latevidsubs,2,latedir,True)

In [None]:
x=create_mean_data(dpaths,latevidsubs,3,latedir,True)

Also make splithalf averages

In [None]:
from utils import random_splithalf

In [None]:
a,b=random_splithalf(tempsub.full_data_subjects)

In [None]:
import os

splitpath1=os.path.join(tempsub.agg_path,'subsplit_{splitind}'.format(splitind=1))
splitpath2=os.path.join(tempsub.agg_path,'subsplit_{splitind}'.format(splitind=2))
os.makedirs(splitpath1,exist_ok=True)
os.makedirs(splitpath2,exist_ok=True)

In [None]:
for i in range(4):
    x=create_mean_data(dpaths,a,i,splitpath1,True)
    x=create_mean_data(dpaths,b,i,splitpath2,True)