Given
- a sequence of stimuli,
- fMRI data from subjects exposed to these sequence
- one or several models making quantitative predictions from each stimulus in the sequence
The code allows you build a flexible analysis pipeline combining:
- data compression methods (already coded or that you can add),
- data transformation methods (standardization, convolution with a kernel, or whatever your heart desires...),
- splitting strategies,
- encoding models,
- or any task that you might find useful (and that you are willing to code). <-???
- functions that allow you to generate R² or r maps.
For example, the pipeline programmed in main.py
fits stimuli-representations of several lanuage models to fMRI data obtained from participants listening to an audiobook (in 9 runs). R² are computed from a nested-cross validated Ridge-regression.
The pipeline runs for tuples (1 subject, 1 model), to facilitate distributing the code on a cluster.
-
Check
requirements.txt
for the list of pyhton modules required. -
For each model to be evaluated, generate the "stimuli-representations" from the sequence of stimuli.
A "stimuli representation" is a set of "features", that is, numeric vectors of the same length as the sequence. It must be saved as a matrix in a .csv
file in
$LPP/derivatives/fMRI/stimuli-representations/{language}/model_name/, with one file per run.
-
Upload the adequate onsets/offsets files of the stimuli for each model included in your analysis (the path to onsets/offsets folder, and the type of each onset/offsets for each model should be specified in the
yaml
template). <-- yaml template has not been mentioned yet ??? -
Optional: if the creation of a regressor for a given model requires specific duration, you need to create the adequate array for each run (the path should be specified in the yaml template), otherwise a vector of 1 should be used.
- Define variables.
- Fetch (or compute) a global masker (+ a smoothed version) over all subjects (to have a common analysis ground).
- Retrieve the arguments for classes instanciations.
- Instanciate the classes.
- Define dependencies relations between classes.
- Load paths to stimuli-representation and fMRI data.
- Extract the needed columns of the stimuli-representations, and concatenate them.
- Load fMRI data and add small variation* to voxel with constant activation (otherwise they generate error with the Pearson correlation coefficient).
- Fit the pipeline.
- Launch the pipeline.
- Create brain maps from the R2 (Pearson coefficient) maps resulting from the pipeline.
*small variation is added to the first time step only.
- We split the data into 9 splits for a CV over the R2.
- We split each train set of each former split for a CV over the regularization hyperparameter alpha of our encoding model.
- For each split of the inner CV, we compress the stimuli-representation of the models that require it.
- For each split of the inner CV, we compute the regressor by convolving the (compressed or not) stimuli-representations with an hrf kernel.
- For each split of the inner CV, we standardize the newly computed regressors before the Ridge regression.
- For each split of the inner CV, we fit Ridge encoding models for each voxel and each alpha included in our grid search, and compute R2/Pearson values.
- For each split of the outter CV, we compress the stimuli-representations of the models that require it.
- For each split of the outter CV, we compute the regressor by convolving the (compressed or not) stimuli-representations with an hrf kernel.
- For each split of the outter CV, we standardize the newly computed regressors before the Ridge regression.
- For each split of the outter CV, we fit Ridge encoding models with the best alpha* for each voxel and compute R2/Pearson values.
*the best alpha is determined for each voxel.
After having verified the requirements, run the following command:
python main.py --yaml_file path_to_yaml_file --input path_to_representations_folder --output path_to_output_folder --logs path_to_log_file
The files are organized in the following overall folder structure ({language}
can take the values french
or english
)
--- ├── paradigm (scripts to run the experiment and stimuli) ├── oldstuff (older scripts, personnal data/code, ...) ├── code (complete code for analyses) │ ├── MEG (code for MEG analysis pipeline) │ └── fMRI (code for fMRI analysis pipeline) │ ├── data_compression.py (Class regrouping methods to compress the representation data) │ ├── data_transformation.py (Class regrouping methods to transform the data: standardization, creating rergessors, ...) │ ├── encoding_models.py (Class where the Linear (regularized or not) model is implemented) │ ├── logger.py (Logging class to check piepeline status) │ ├── main.py (Launch the pipeline for the given yaml config file) │ ├── regression_pipeline.py (Class implementing the pipeline for the regression analysis) │ ├── requirements.txt (required librairies + versions) │ ├── splitter.py (Class regrouping splitting/distributing methods) │ ├── task.py (Class implementing a Task which is a step of the pipeline) │ ├── template.yml (Yaml config file to fill for each call of main.py) │ └── utils.py (utilities functions: parameters settings, fetching, reading/writing ...) ├── data (all the raw data acquired from sources) │ ├── fMRI (fMRI data, 9 runs per subject) │ │ └── {language} │ │ └── sub-057 │ │ └── func │ ├── text (text of each chapter) │ │ └── {language} │ │ └── text_{language}_run[1-9].txt │ ├── onsets-offsets (onsets-offsets data) │ │ └── {language} │ │ ├── word_run_1.csv │ │ ├── ... │ │ ├── word_run_n.csv │ │ ├── word+punctuation_run_1.csv │ │ ├── ... │ │ └── word+punctuation_run_n.csv │ └── stimuli-representations (stimuli representation dataframes extracted from the models activity) │ └── {language} │ └── model_of_interest │ ├── deep_representation_run_1.csv │ ├── ... │ └── deep_representation_run_n.csv │ └── derivatives (results of the code above) └── MEG └── fMRI (results from the fMRI pipeline in code/fMRI/) └── maps (Maps deriving from our pipeline) └── {language} └── sub-057 └── model_of_interest ├── R2.nii.gz └── R2.png
To give more insights on the three main parts of the project:
-
code
- MEG data analysis pipeline
- fMRI data analysis pipeline
- splitter.py (Split/distribute the data)
- data_compression.py (Compress data representations if needed: PCA, etc...)
- data_transformation.py (Transform data representations: create regressor by convolving with an HRF kernel and standardize before regression)
- encoding_models.py ((Regularized) Linear model that fit the regressors to fmri data)
- task.py (Step of the pipeline to be executed)
- regression_pipeline.py (Define and execute the pipeline)
- requirements.txt (Required libraries)
- logger.py (Report pipeline progression)
- main.py (Execute pipeline with the config from the yaml file)
- template.yml (Yaml file specifying the configuration of the analysis)
- utils.py (Utilities functions)
-
data
- we have the fMRI data organized following the BIDS standard except for the name of the final file
- the MEG should be added in a few months
- there is the text of LPP, the text is divided into 9 runs, the original onset-offsets of LPP and training data for models
-
derivatives
- MEG
- fMRI (to prevent useless use of memory, we only save deep-model representations as well as the final maps output)
English, French and Chinese participants were scanned using fMRI while listening to the whole audiobook of the Little Prince (~90min), in their native language.
The audio stream was segmented into nine parts presented in 9 successive runs:
- Chapters 1 to 3 --> run 1
- Chapters 4 to 6 --> run 2
- Chapters 7 to 9 --> run 3
- Chapters 10 to 12 --> run 4
- Chapters 13 to 14 --> run 5
- Chapters 15 to 19 --> run 6
- Chapters 20 to 22 --> run 7
- Chapters 23 to 25 --> run 8
- Chapters 26 to 27 --> run 9
- English Multiecho
- French Multiecho Siemens 3T