## Dataset specification

Explain Dataset Principle, link, paradigm of interest
The (subject, session) that we're gonna grab

### I/O

#### Imports

In [42]:
import gzip, shutil
import time
import yaml
import glob
from nipype.interfaces.matlab import MatlabCommand
from nipype.interfaces.spm import SliceTiming, Realign, Coregister, Normalize12
MatlabCommand.set_default_matlab_cmd('/Applications/MATLAB_R2020a.app/bin/matlab')


#### Preferences Paradigm Dataset Variables

In [2]:
subject_id = ['01','04','05','06']
ses_id = ['19','15']
n_run_per_task = 2
tasks = ['Houses','Food','Faces','Paintings']
path = 'dataset/'
experiment_data = dict()

#### Recursively Extracting Anatomical and Functional Nii.gz archives 

In [3]:
#Allows processing by the spm matlab underlying engine

t1 = time.time()

for file in [f for f in glob.glob(path + "**/**.nii.gz", recursive=True)]:
    
    print('- Extracting file {}'.format(file[20:]))
    
    with gzip.open(file, 'r') as f_in, open(file[:-3], 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)
    
t2 = time.time()

print('Done Successfully in {} minutes !'.format((t2-t1)/60))


- Extracting file sub-01_ses-00_anat_sub-01_ses-00_T1w.nii.gz
- Extracting file sub-01_ses-19_func_sub-01_ses-19_task-PreferenceHouses_dir-ap_sbref.nii.gz
- Extracting file sub-01_ses-19_func_sub-01_ses-19_task-PreferenceFaces_dir-ap_sbref.nii.gz
- Extracting file sub-01_ses-19_func_sub-01_ses-19_task-PreferenceFood_dir-ap_sbref.nii.gz
- Extracting file sub-01_ses-19_func_sub-01_ses-19_task-PreferencePaintings_dir-ap_sbref.nii.gz
- Extracting file sub-01_ses-19_func_sub-01_ses-19_task-PreferenceFood_dir-pa_sbref.nii.gz
- Extracting file sub-01_ses-19_func_sub-01_ses-19_task-PreferenceFaces_dir-pa_sbref.nii.gz
- Extracting file sub-01_ses-19_func_sub-01_ses-19_task-PreferencePaintings_dir-pa_bold.nii.gz
- Extracting file sub-01_ses-19_func_sub-01_ses-19_task-PreferenceFaces_dir-ap_bold.nii.gz
- Extracting file sub-01_ses-19_func_sub-01_ses-19_task-PreferenceFood_dir-ap_bold.nii.gz
- Extracting file sub-01_ses-19_func_sub-01_ses-19_task-PreferenceHouses_dir-pa_sbref.nii.gz
- Extracting f

#### Experiment Data Files Paths Extraction

In [4]:
#Building a structred dict for any pair (subject,acquisition_type) 
#with acquisition_type being Anatomical or Functional Data

for sub in subject_id:
    
    experiment_data[sub] = {'func':None,'anat':None}
    experiment_data[sub]['func'] = [f[:-3] for f in glob.glob(path + "**/*func_sub-{}*.nii.gz".format(sub), recursive=True)]
    experiment_data[sub]['anat'] = [f[:-3] for f in glob.glob(path + "**/*anat_sub-{}*.nii.gz".format(sub), recursive=True)]
    

#### Dataset Structure

In [5]:
print(yaml.dump(experiment_data))

'01':
  anat:
  - dataset/sub-01/anat/sub-01_ses-00_anat_sub-01_ses-00_T1w.nii
  func:
  - dataset/sub-01/func/sub-01_ses-19_func_sub-01_ses-19_task-PreferenceHouses_dir-ap_sbref.nii
  - dataset/sub-01/func/sub-01_ses-19_func_sub-01_ses-19_task-PreferenceFaces_dir-ap_sbref.nii
  - dataset/sub-01/func/sub-01_ses-19_func_sub-01_ses-19_task-PreferenceFood_dir-ap_sbref.nii
  - dataset/sub-01/func/sub-01_ses-19_func_sub-01_ses-19_task-PreferencePaintings_dir-ap_sbref.nii
  - dataset/sub-01/func/sub-01_ses-19_func_sub-01_ses-19_task-PreferenceFood_dir-pa_sbref.nii
  - dataset/sub-01/func/sub-01_ses-19_func_sub-01_ses-19_task-PreferenceFaces_dir-pa_sbref.nii
  - dataset/sub-01/func/sub-01_ses-19_func_sub-01_ses-19_task-PreferencePaintings_dir-pa_bold.nii
  - dataset/sub-01/func/sub-01_ses-19_func_sub-01_ses-19_task-PreferenceFaces_dir-ap_bold.nii
  - dataset/sub-01/func/sub-01_ses-19_func_sub-01_ses-19_task-PreferenceFood_dir-ap_bold.nii
  - dataset/sub-01/func/sub-01_ses-19_func_sub-01_ses-1

## fMRI Data Preprocessing Pipeline

#### Preprocessing Strategy

![Preprocesing Streams](preprocessing_streams.png)

The preprocessing strategy is articulated on the level of analysis performed. As shown on the diagram, the difference between the two approches stems from whether or not the *Spatial Normalization* step is included. This distinguishing feature revolves around two considerations:
    
   - Analytical consideration Willingness to not introduce too many unecessary transformations in the first level analysis pipeline as Spatial Normalization is intented to correct spatial structural variabilty amongs different subjects.
   - Practical consideration : Computationally intensive step that will drastically improve computation time if avoided when its benifit is not evident.
   
The structuration of each distinct processing stream is based upon Poldrack, R., Mumford, J., & Nichols, T. (2011). *Preprocessing fMRI data. In Handbook of Functional MRI Data Analysis* (pp. 34-52). Cambridge: Cambridge University Press.

The avoidance of structural-functional co-registration is based upon its computationnaly intensive nature for my setup and the fact that its avoidance is not an impediment for statistical analysis (cf. Martin Chadwick and Catherine Sebastian Co-registration course).

Distorsion Correction is implicit. As we are in the context of posterior-anterior phase encoded data, SPM Realign module used in *Motion Correction* account for these benign distorsion (cf.https://en.wikibooks.org/wiki/Neuroimaging_Data_Processing/Field_map_correction). Moreover, we know from litterature on the neural correlates we want to assess that they're typically not lying in air/tissue interfaces.

Smoothing is directly performed in Python just before analysis as it allows better computation performance relative to the SPM interface.

Additionnaly, the script below include/exclude Slice Timing Correction to assess its potential impact on later statistical analysis.

#### fMRI Acquisition Specification

In [44]:
t_r = 2.0
voxel_sizes = [1.5,1.5,1.5]
slice_timings = [0.0, 1.0225, 0.0625, 1.085, 0.1275, 1.15, 0.1925, 1.2125, 0.255, 1.2775, 0.32, 1.34, 0.3825, 1.405, 0.4475, 1.47, 0.51, 1.5325, 0.575, 1.5975, 0.6375, 1.66, 0.7025, 1.725, 0.765, 1.7875, 0.83, 1.8525, 0.895, 1.915, 0.9575, 0.0, 1.0225, 0.0625, 1.085, 0.1275, 1.15, 0.1925, 1.2125, 0.255, 1.2775, 0.32, 1.34, 0.3825, 1.405, 0.4475, 1.47, 0.51, 1.5325, 0.575, 1.5975, 0.6375, 1.66, 0.7025, 1.725, 0.765, 1.7875, 0.83, 1.8525, 0.895, 1.915, 0.9575, 0.0, 1.0225, 0.0625, 1.085, 0.1275, 1.15, 0.1925, 1.2125, 0.255, 1.2775, 0.32, 1.34, 0.3825, 1.405, 0.4475, 1.47, 0.51, 1.5325, 0.575, 1.5975, 0.6375, 1.66, 0.7025, 1.725, 0.765, 1.7875, 0.83, 1.8525, 0.895, 1.915, 0.9575]
slice_ordering =  list(range(1,n_slices+1))
mni_template = 'TPM.nii'

### Preprocessing Computation 

#### First Level Analysis Pipeline

##### Slice Timing Correction Included

In [29]:
t1 = time.time()
print('* '*3 + 'fMRI Preprocessing Computation' + ' *'*3 + '\n' )

for sub in subject_id:
    
    print('\t Processing Subject {}'.format(sub))
    
    
    
    for func_data in experiment_data[sub]['func']:
        
        if func_data[-9:-4] == 'sbref':
            
            pass
        
        else:
            
            print('\t \t - {}'.format(func_data[20:]))

            a_func = func_data[:20] +'a'+ func_data[20:]
            r_func = a_func[:20] +'r'+ a_func[20:]
            
            
            # Slice Timing
            print('\t \t \t % Slice Timing Correction')
            st = SliceTiming()
            st.inputs.in_files = func_data
            st.inputs.num_slices = n_slices
            st.inputs.time_repetition = t_r
            st.inputs.time_acquisition = t_r - t_r/n_slices
            st.inputs.slice_order = slice_ordering
            st.inputs.ref_slice = 1
            st.run()

            # Realignement (Motion Correction)
            print('\t \t \t % Realignement Motion Correction')
            realign = Realign()
            realign.inputs.in_files = a_func
            realign.inputs.register_to_mean = True
            realign.run() 
            
            
            #Too Much Computationaly intensive (extensive amount of time spent)
            # Coregistration
            #print('\t \t \t % Coregistration')
            #coreg = Coregister()
            #coreg.inputs.target = experiment_data[sub]['anat'][0]
            #coreg.inputs.source = r_func
            #coreg.out_prefix ='w'
            #coreg.run() 

    

t2 = time.time()

print('Done Successfully in {} minutes !'.format((t2-t1)/60))




* * * fMRI Preprocessing Computation * * *

	 Processing Subject 01
	 	 - sub-01_ses-19_func_sub-01_ses-19_task-PreferencePaintings_dir-pa_bold.nii
	 	 	 % Slice Timing Correction
	 	 	 % Realignement Motion Correction
	 	 - sub-01_ses-19_func_sub-01_ses-19_task-PreferenceFaces_dir-ap_bold.nii
	 	 	 % Slice Timing Correction
	 	 	 % Realignement Motion Correction
	 	 - sub-01_ses-19_func_sub-01_ses-19_task-PreferenceFood_dir-ap_bold.nii
	 	 	 % Slice Timing Correction
	 	 	 % Realignement Motion Correction
	 	 - sub-01_ses-19_func_sub-01_ses-19_task-PreferenceHouses_dir-pa_bold.nii
	 	 	 % Slice Timing Correction
	 	 	 % Realignement Motion Correction
	 	 - sub-01_ses-19_func_sub-01_ses-19_task-PreferencePaintings_dir-ap_bold.nii
	 	 	 % Slice Timing Correction
	 	 	 % Realignement Motion Correction
	 	 - sub-01_ses-19_func_sub-01_ses-19_task-PreferenceFood_dir-pa_bold.nii
	 	 	 % Slice Timing Correction
	 	 	 % Realignement Motion Correction
	 	 - sub-01_ses-19_func_sub-01_ses-19_task

##### Slice Timing Correction Not Included

In [29]:
t1 = time.time()
print('* '*3 + 'fMRI Preprocessing Computation' + ' *'*3 + '\n' )

for sub in subject_id:
    
    print('\t Processing Subject {}'.format(sub))
    
    
    
    for func_data in experiment_data[sub]['func']:
        
        if func_data[-9:-4] == 'sbref':
            
            pass
        
        else:
            
            print('\t \t - {}'.format(func_data[20:]))

            a_func = func_data[:20] +'a'+ func_data[20:]
            r_func = a_func[:20] +'r'+ a_func[20:]
            
            
            # Slice Timing
            #print('\t \t \t % Slice Timing Correction')
            #st = SliceTiming()
            #st.inputs.in_files = func_data
            #st.inputs.num_slices = n_slices
            #st.inputs.time_repetition = t_r
            #st.inputs.time_acquisition = t_r - t_r/n_slices
            #st.inputs.slice_order = slice_ordering
            #st.inputs.ref_slice = 1
            #st.run()

            # Realignement (Motion Correction)
            print('\t \t \t % Realignement Motion Correction')
            realign = Realign()
            realign.inputs.in_files = a_func
            realign.inputs.register_to_mean = True
            realign.run() 
            
            
            #Too Much Computationaly intensive (extensive amount of time spent)
            # Coregistration
            #print('\t \t \t % Coregistration')
            #coreg = Coregister()
            #coreg.inputs.target = experiment_data[sub]['anat'][0]
            #coreg.inputs.source = r_func
            #coreg.out_prefix ='w'
            #coreg.run() 

    

t2 = time.time()

print('Done Successfully in {} minutes !'.format((t2-t1)/60))




* * * fMRI Preprocessing Computation * * *

	 Processing Subject 01
	 	 - sub-01_ses-19_func_sub-01_ses-19_task-PreferencePaintings_dir-pa_bold.nii
	 	 	 % Slice Timing Correction
	 	 	 % Realignement Motion Correction
	 	 - sub-01_ses-19_func_sub-01_ses-19_task-PreferenceFaces_dir-ap_bold.nii
	 	 	 % Slice Timing Correction
	 	 	 % Realignement Motion Correction
	 	 - sub-01_ses-19_func_sub-01_ses-19_task-PreferenceFood_dir-ap_bold.nii
	 	 	 % Slice Timing Correction
	 	 	 % Realignement Motion Correction
	 	 - sub-01_ses-19_func_sub-01_ses-19_task-PreferenceHouses_dir-pa_bold.nii
	 	 	 % Slice Timing Correction
	 	 	 % Realignement Motion Correction
	 	 - sub-01_ses-19_func_sub-01_ses-19_task-PreferencePaintings_dir-ap_bold.nii
	 	 	 % Slice Timing Correction
	 	 	 % Realignement Motion Correction
	 	 - sub-01_ses-19_func_sub-01_ses-19_task-PreferenceFood_dir-pa_bold.nii
	 	 	 % Slice Timing Correction
	 	 	 % Realignement Motion Correction
	 	 - sub-01_ses-19_func_sub-01_ses-19_task

#### Second Level Analysis Pipeline Extension

In [None]:
t1 = time.time()
print('* '*3 + 'fMRI Preprocessing Computation' + ' *'*3 + '\n' )

for sub in subject_id:
    
    print('\t Processing Subject {}'.format(sub))
    
    
    
    for func_data in experiment_data[sub]['func']:
        
        if func_data[-9:-4] == 'sbref':
            
            pass
        
        else:
            
            print('\t \t - {}'.format(func_data[20:]))

            ra_func = func_data[:20] +'ra'+ func_data[20:]
            
            
            
            # Slice Timing
            print('\t \t \t % Spatial Normalization')
            norm = Normalize12()
            norm.inputs.tpm = mni_template
            norm.inputs.write_voxel_sizes = voxel_sizes
            norm.inputs.image_to_align= ra_func
            norm.run() 


    

t2 = time.time()

print('Done Successfully in {} minutes !'.format((t2-t1)/60))


In [2]:
st = [0.0, 1.0225, 0.0625, 1.085, 0.1275, 1.15, 0.1925, 1.2125, 0.255, 1.2775, 0.32, 1.34, 0.3825, 1.405, 0.4475, 1.47, 0.51, 1.5325, 0.575, 1.5975, 0.6375, 1.66, 0.7025, 1.725, 0.765, 1.7875, 0.83, 1.8525, 0.895, 1.915, 0.9575, 0.0, 1.0225, 0.0625, 1.085, 0.1275, 1.15, 0.1925, 1.2125, 0.255, 1.2775, 0.32, 1.34, 0.3825, 1.405, 0.4475, 1.47, 0.51, 1.5325, 0.575, 1.5975, 0.6375, 1.66, 0.7025, 1.725, 0.765, 1.7875, 0.83, 1.8525, 0.895, 1.915, 0.9575, 0.0, 1.0225, 0.0625, 1.085, 0.1275, 1.15, 0.1925, 1.2125, 0.255, 1.2775, 0.32, 1.34, 0.3825, 1.405, 0.4475, 1.47, 0.51, 1.5325, 0.575, 1.5975, 0.6375, 1.66, 0.7025, 1.725, 0.765, 1.7875, 0.83, 1.8525, 0.895, 1.915, 0.9575]

In [9]:
st

[0.0,
 1.0225,
 0.0625,
 1.085,
 0.1275,
 1.15,
 0.1925,
 1.2125,
 0.255,
 1.2775,
 0.32,
 1.34,
 0.3825,
 1.405,
 0.4475,
 1.47,
 0.51,
 1.5325,
 0.575,
 1.5975,
 0.6375,
 1.66,
 0.7025,
 1.725,
 0.765,
 1.7875,
 0.83,
 1.8525,
 0.895,
 1.915,
 0.9575,
 0.0,
 1.0225,
 0.0625,
 1.085,
 0.1275,
 1.15,
 0.1925,
 1.2125,
 0.255,
 1.2775,
 0.32,
 1.34,
 0.3825,
 1.405,
 0.4475,
 1.47,
 0.51,
 1.5325,
 0.575,
 1.5975,
 0.6375,
 1.66,
 0.7025,
 1.725,
 0.765,
 1.7875,
 0.83,
 1.8525,
 0.895,
 1.915,
 0.9575,
 0.0,
 1.0225,
 0.0625,
 1.085,
 0.1275,
 1.15,
 0.1925,
 1.2125,
 0.255,
 1.2775,
 0.32,
 1.34,
 0.3825,
 1.405,
 0.4475,
 1.47,
 0.51,
 1.5325,
 0.575,
 1.5975,
 0.6375,
 1.66,
 0.7025,
 1.725,
 0.765,
 1.7875,
 0.83,
 1.8525,
 0.895,
 1.915,
 0.9575]

In [15]:
st_o = dict()


for r, t in enumerate(st):
    
    st_o[t] = r+1

In [16]:
st_o

{0.0: 3,
 0.0625: 6,
 0.1275: 9,
 0.1925: 12,
 0.255: 15,
 0.32: 18,
 0.3825: 21,
 0.4475: 24,
 0.51: 27,
 0.575: 30,
 0.6375: 33,
 0.7025: 36,
 0.765: 39,
 0.83: 42,
 0.895: 45,
 0.9575: 48,
 1.0225: 51,
 1.085: 54,
 1.15: 57,
 1.2125: 60,
 1.2775: 63,
 1.34: 66,
 1.405: 69,
 1.47: 72,
 1.5325: 75,
 1.5975: 78,
 1.66: 81,
 1.725: 84,
 1.7875: 87,
 1.8525: 90,
 1.915: 93}

In [12]:
st.sort()

In [13]:
st

[0.0,
 0.0,
 0.0,
 0.0625,
 0.0625,
 0.0625,
 0.1275,
 0.1275,
 0.1275,
 0.1925,
 0.1925,
 0.1925,
 0.255,
 0.255,
 0.255,
 0.32,
 0.32,
 0.32,
 0.3825,
 0.3825,
 0.3825,
 0.4475,
 0.4475,
 0.4475,
 0.51,
 0.51,
 0.51,
 0.575,
 0.575,
 0.575,
 0.6375,
 0.6375,
 0.6375,
 0.7025,
 0.7025,
 0.7025,
 0.765,
 0.765,
 0.765,
 0.83,
 0.83,
 0.83,
 0.895,
 0.895,
 0.895,
 0.9575,
 0.9575,
 0.9575,
 1.0225,
 1.0225,
 1.0225,
 1.085,
 1.085,
 1.085,
 1.15,
 1.15,
 1.15,
 1.2125,
 1.2125,
 1.2125,
 1.2775,
 1.2775,
 1.2775,
 1.34,
 1.34,
 1.34,
 1.405,
 1.405,
 1.405,
 1.47,
 1.47,
 1.47,
 1.5325,
 1.5325,
 1.5325,
 1.5975,
 1.5975,
 1.5975,
 1.66,
 1.66,
 1.66,
 1.725,
 1.725,
 1.725,
 1.7875,
 1.7875,
 1.7875,
 1.8525,
 1.8525,
 1.8525,
 1.915,
 1.915,
 1.915]

In [7]:
import numpy as np

In [8]:
np.linspace(0,2,93)

array([0.        , 0.02173913, 0.04347826, 0.06521739, 0.08695652,
       0.10869565, 0.13043478, 0.15217391, 0.17391304, 0.19565217,
       0.2173913 , 0.23913043, 0.26086957, 0.2826087 , 0.30434783,
       0.32608696, 0.34782609, 0.36956522, 0.39130435, 0.41304348,
       0.43478261, 0.45652174, 0.47826087, 0.5       , 0.52173913,
       0.54347826, 0.56521739, 0.58695652, 0.60869565, 0.63043478,
       0.65217391, 0.67391304, 0.69565217, 0.7173913 , 0.73913043,
       0.76086957, 0.7826087 , 0.80434783, 0.82608696, 0.84782609,
       0.86956522, 0.89130435, 0.91304348, 0.93478261, 0.95652174,
       0.97826087, 1.        , 1.02173913, 1.04347826, 1.06521739,
       1.08695652, 1.10869565, 1.13043478, 1.15217391, 1.17391304,
       1.19565217, 1.2173913 , 1.23913043, 1.26086957, 1.2826087 ,
       1.30434783, 1.32608696, 1.34782609, 1.36956522, 1.39130435,
       1.41304348, 1.43478261, 1.45652174, 1.47826087, 1.5       ,
       1.52173913, 1.54347826, 1.56521739, 1.58695652, 1.60869

In [14]:
len(st)/3

31.0

In [4]:
list(range(32,0,-1))

[32,
 31,
 30,
 29,
 28,
 27,
 26,
 25,
 24,
 23,
 22,
 21,
 20,
 19,
 18,
 17,
 16,
 15,
 14,
 13,
 12,
 11,
 10,
 9,
 8,
 7,
 6,
 5,
 4,
 3,
 2,
 1]

In [19]:
'ses-19/func/sub-01_ses-19_task-Preference{}_dir-{}_bold.nii.gz'[:12] + 'a' + 'ses-19/func/sub-01_ses-19_task-Preference{}_dir-{}_bold.nii.gz'[12:]

'ses-19/func/asub-01_ses-19_task-Preference{}_dir-{}_bold.nii.gz'

In [23]:
'ses-19/func/sub-01_ses-19_task-Preference{}_dir-{}_bold.nii'[:-4] + '_corrected' + 'ses-19/func/sub-01_ses-19_task-Preference{}_dir-{}_bold.nii'[-4:]

'ses-19/func/sub-01_ses-19_task-Preference{}_dir-{}_bold_corrected.nii'