<img src="https://www.epfl.ch/about/overview/wp-content/uploads/2020/07/logo-epfl-1024x576.png" style="padding-right:10px;width:140px;float:left"></td>
<h2 style="white-space: nowrap">Neural Signals and Signal Processing (NX-421)</h2>
<hr style="clear:both"></hr>

Welcome to the NX-421 class! In today's week, you will get more familiar with preprocessing steps typically conducted in MRI analysis!

# Lab 2: Preprocessing

In this lab, we will have you look at some of the essential preprocessing steps that should be conducted before any analysis. 

The most important steps, you will run separately to get to know them. Some more involved steps can also be found in details in the **advanced_preprocessing.ipynb** notebook, which is **purely optional**. 

Once you've studied the most important steps, we will show how to conduct all these in a streamlined fashion.


<div class="warning" style='background-color:#805AD5; color: #90EE90; border-left: solid #805AD5 4px; border-radius: 4px; padding:0.7em;'>
<span>
<p style='margin-top:1em; text-align:center'>
<b>⚠️ Preprocessing warnings ⚠️</b></p>
<p style='text-indent: 10px;'>
The steps of preprocessing you have seen in class are dependent on the analysis you wish to conduct.
In particular, there is no yet a clear consensus on the order in which some steps should be applied, although said order is known to impact subsequent analysis.
As an example, we will teach you how to put your subjects in a common space, the MNI space through a step of <b>normalization</b> (more on that later!), such that subjects can be compared at the group-level. Assume now that for the purpose of your analysis, you are interested only in a single subject, or that the method of choice should be at an individual level for your application. Clearly the normalization is superfluous in this case.
You should also know that most steps will have parameters to set. These parameters can be set at the population level (this is often the case). Such choice means the preprocessing won't be optimal for each subject.
<br>
</p></span>
</div>
<br>
<div class="warning" style='background-color:#90EE90; color: #805AD5; border-left: solid #805AD5 4px; border-radius: 4px; padding:0.7em;'>
<span>
<p style='margin-top:1em; text-align:center'>
<b>Preprocessing pipelines</b></p>
<p style='text-indent: 10px;'>
We will teach you how to perform each step individually, so that you get a precise understanding of what each step's purpose is. You will then use software to perform these steps for you, <a href="https://fmriprep.org/en/stable/">fMRIPrep</a>. 
<br>
</p></span>
</div>
<br>
<div class="warning" style='background-color:#C1ECFA; color: #112A46; border-left: solid #darkblue 4px; border-radius: 4px; padding:0.7em;'>
<span>
<p style='margin-top:1em; text-align:center'>
<b>💡 Quality Control (QC): the 1 - 10 - 100 dollar rule 💡</b></p>
<p style='text-indent: 10px;'>
The 1-10-100 rule states that it takes 1 dollar to verify and correct data at the start, 10 dollars to identify and clean data after the fact and 100 dollars to correct a failure due to bad data.

In preprocessing this is especially true. It will take you much less effort to first look at your data, detect what might be wrong from the start and deal with it rather than apply everything blindly and notice after the fact that something went wrong. Always, ALWAYS look at your data before <u>anything and any analysis</u>. A surgeon always looks at the patient before operating, you should do the same: you are surgeons to your dataset, so please look at it carefully (it's craving for attention, the poor thing 💔).
These intermediary steps of controlling the health of your dataset are called quality control steps. You're doing exactly what you might expect to do: you check the quality of your data before and after a given step, to ensure that nothing went awry for example.
<br>
</p></span>
</div>

In [1]:
# General purpose imports to handle paths, files etc
import os
import os.path as op
import glob
import pandas as pd
import numpy as np
import json


# Useful functions to define and import datasets from open neuro
import openneuro
from mne.datasets import sample
from mne_bids import BIDSPath, read_raw_bids, print_dir_tree, make_report


# Useful imports to define the direct download function below
import requests
import urllib.request
from tqdm import tqdm


# FSL function wrappers which we will call from python directly
from fsl.wrappers import fast, bet
from fsl.wrappers.misc import fslroi
from fsl.wrappers import flirt



def reset_overlays():
    """
    Clears view and completely remove visualization. All files opened in FSLeyes are closed.
    The view (along with any color map) is reset to the regular ortho panel.
    """
    l = frame.overlayList
    while(len(l)>0):
        del l[0]
    frame.removeViewPanel(frame.viewPanels[0])
    # Put back an ortho panel in our viz for future displays
    frame.addViewPanel(OrthoPanel)
    
def mkdir_no_exist(path):
    if not op.isdir(path):
        os.makedirs(path)
        
class DownloadProgressBar(tqdm):
    def update_to(self, b=1, bsize=1, tsize=None):
        if tsize is not None:
            self.total = tsize
        self.update(b * bsize - self.n)


def download_url(url, output_path):
    with DownloadProgressBar(unit='B', unit_scale=True,
                             miniters=1, desc=url.split('/')[-1]) as t:
        urllib.request.urlretrieve(url, filename=output_path, reporthook=t.update_to)

def direct_file_download_open_neuro(file_list, file_types, dataset_id, dataset_version, save_dirs):
    # https://openneuro.org/crn/datasets/ds004226/snapshots/1.0.0/files/sub-001:sub-001_scans.tsv
    for i, n in enumerate(file_list):
        subject = n.split('_')[0]
        download_link = 'https://openneuro.org/crn/datasets/{}/snapshots/{}/files/{}:{}:{}'.format(dataset_id, dataset_version, subject, file_types[i],n)
        print('Attempting download from ', download_link)
        download_url(download_link, op.join(save_dirs[i], n))
        print('Ok')
        
def get_json_from_file(fname):
    f = open(fname)
    data = json.load(f)
    f.close()
    return data

## 0. Loading a dataset

We have not touched yet upon how you might load datasets. 

### BIDS standard
For this, we should first tell you more about a standard: the BIDS standard.
This one allows you to format a dataset such that other researchers in neuroimaging can reuse your data with the smallest overhead possible. It is a way to unify how files and acquisitions are organized in folders.

This unification comes with several advantages including many tools that ease our life.
Indeed, if your dataset is in such a format, it is very easy to conduct any analysis: your scripts will expect a specific structure, so you don't need to play a million times with paths for example!

A whole software ecosystem has evolved around this standard, including tools that enable us to load datasets that are BIDs compliant!

In [2]:
dataset_id = 'ds004226'
subject = '001' 

# Download one subject's data from each dataset
sample_path = "dataset"
mkdir_no_exist(sample_path)
bids_root = op.join(os.path.abspath(""),sample_path, dataset_id)
deriv_root = op.join(bids_root, 'derivatives')
preproc_root = op.join(bids_root, 'derivatives','preprocessed_data')

mkdir_no_exist(bids_root)

In [14]:
os.system("""openneuro-py download --dataset {} --include sub-{}/anat/* 
          --include sub-{}/func/sub-{}_task-sitrep_run-01_bold.nii.gz 
          --include sub-{}/func/sub-{}_task-sitrep_run-01_bold.json 
          --target_dir {}""".format(dataset_id, subject, subject, subject, subject, subject, bids_root).replace("\n", " "))

0

Have a look at the terminal in which you launched this notebook. You will see an output that looks like this:
<img src="imgs/download_picture.png"/>

What this tells you is that the download is, actually, still undergoing. Be mindful of this when downloading a dataset: you should avoid opening a file until it is fully downloaded, otherwise you have a high chance of corrupting it!

Luckily here, we have a blocking command above which will only return when the download is finished. Always check that your commands are blocking and design your code accordingly!

We rely above on openneuro-py to pull our files. We provide you with the way below to download  directly the files as well, should you wish to pull files from open-neuro without openneuro-py:

In [16]:
func_path = op.join(bids_root, 'sub-001', 'func')
anat_path = op.join(bids_root, 'sub-001', 'anat')
mkdir_no_exist(op.join(bids_root, 'sub-001'))
mkdir_no_exist(func_path)
mkdir_no_exist(anat_path)

direct_file_download_open_neuro(file_list=['sub-001_task-sitrep_run-01_bold.nii.gz', 
                                           'sub-001_task-sitrep_run-01_bold.json',
                                           'sub-001_T1w.nii.gz',
                                           'sub-001_T1w.json'], 
                                file_types=['func', 'func', 'anat', 'anat'], 
                                dataset_id=dataset_id, 
                                dataset_version='1.0.0', 
                                save_dirs=[func_path,
                                           func_path,
                                           anat_path,
                                           anat_path])

Attempting download from  https://openneuro.org/crn/datasets/ds004226/snapshots/1.0.0/files/sub-001:func:sub-001_task-sitrep_run-01_bold.nii.gz


sub-001:func:sub-001_task-sitrep_run-01_bold.nii.gz: 169MB [02:20, 1.20MB/s]                              


Ok
Attempting download from  https://openneuro.org/crn/datasets/ds004226/snapshots/1.0.0/files/sub-001:func:sub-001_task-sitrep_run-01_bold.json


sub-001:func:sub-001_task-sitrep_run-01_bold.json: 8.19kB [00:00, 11.2kB/s]


Ok
Attempting download from  https://openneuro.org/crn/datasets/ds004226/snapshots/1.0.0/files/sub-001:anat:sub-001_T1w.nii.gz


sub-001:anat:sub-001_T1w.nii.gz: 14.3MB [00:15, 943kB/s]                             


Ok
Attempting download from  https://openneuro.org/crn/datasets/ds004226/snapshots/1.0.0/files/sub-001:anat:sub-001_T1w.json


sub-001:anat:sub-001_T1w.json: 8.19kB [00:00, 11.9kB/s]

Ok





In [17]:
mkdir_no_exist(op.join(bids_root, 'derivatives'))
preproc_root = op.join(bids_root, 'derivatives','preprocessed_data')
mkdir_no_exist(preproc_root)
mkdir_no_exist(op.join(preproc_root, 'sub-001'))
mkdir_no_exist(op.join(preproc_root, 'sub-001', 'anat'))
mkdir_no_exist(op.join(preproc_root, 'sub-001', 'func'))
mkdir_no_exist(op.join(preproc_root, 'sub-001', 'fmap'))

Once the download is done, we can have a look at the resulting folder structure:

In [18]:
print_dir_tree(bids_root, max_depth=4)

|ds004226/
|--- CHANGES
|--- dataset_description.json
|--- participants.tsv
|--- derivatives/
|------ preprocessed_data/
|--------- sub-001/
|------------ anat/
|------------ fmap/
|------------ func/
|--- sub-001/
|------ anat/
|--------- sub-001_T1w.json
|--------- sub-001_T1w.nii.gz
|------ func/
|--------- sub-001_task-sitrep_run-01_bold.json
|--------- sub-001_task-sitrep_run-01_bold.nii.gz


This organization is typical of a BIDs dataset. Each subject's file is split between anatomical data and functional data. You are already a bit familiar with the .nii.gz file extension, but what might be the .json file? Well, let's open it to figure it out!

In [19]:
data = get_json_from_file(op.join(bids_root, 'sub-001', 'func', 'sub-001_task-sitrep_run-01_bold.json'))
data

{'Modality': 'MR',
 'MagneticFieldStrength': 3,
 'Manufacturer': 'Siemens',
 'ManufacturersModelName': 'Skyra',
 'InstitutionName': 'Princeton_University_-_Neuroscience_Institute',
 'InstitutionalDepartmentName': 'Department',
 'InstitutionAddress': 'Washington_and_Faculty_Rd._-_Building_25_25_Princeton_NJ_US_085',
 'DeviceSerialNumber': '45031',
 'StationName': 'AWP45031',
 'BodyPartExamined': 'BRAIN',
 'PatientPosition': 'HFS',
 'ProcedureStepDescription': 'TamirL_Mark',
 'SoftwareVersions': 'syngo_MR_E11',
 'MRAcquisitionType': '2D',
 'SeriesDescription': 'EPI_2.5mm_1.5TR_32TE_SMS4_Siemens',
 'ProtocolName': 'EPI_2.5mm_1.5TR_32TE_SMS4_Siemens',
 'ScanningSequence': 'EP',
 'SequenceVariant': 'SK',
 'ScanOptions': 'FS',
 'SequenceName': '_epfid2d1_78',
 'ImageType': ['ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM', 'MOSAIC'],
 'SeriesNumber': 9,
 'AcquisitionTime': '16:17:36.350000',
 'AcquisitionNumber': 1,
 'SliceThickness': 2.5,
 'SpacingBetweenSlices': 2.5,
 'SAR': 0.0635751,
 'EchoTime'

This JSON is extremely important: it is what we call a JSON **sidecar**, and it holds precious acquisition informations! Based **only on the text printed above**, are you able to determine any or all of the following?
- [ ] TR?
- [ ] Modality of acquisition?
- [ ] How many Teslas the scanner was?

### Loading more datasets: how to

Great! Note that we've loaded only one subject and one file of each modality for the subject. You can have a look <a href="https://openneuro.org/datasets/ds004226/versions/1.0.0">here</a> for this dataset. As you can see, it is a big dataset; we've restricted our download to the bare minimum to spare your computer's disk space as much as possible.

Notice two things on the web page. The first is the dataset's accession number:
<img src="imgs/openneuro_access_nbr.png">
This number is the one we've put in our code earlier, to specify what dataset we wanted to load from:
```python
dataset_fmap = 'ds004226'
...
openneuro.download(dataset=dataset_fmap, ...)

```

Should you wish to download another openneuro dataset that piqued your interest, you'll simply need to change the above variable with the accession number of the dataset on the corresponding page!

Secondly, you can observe the entire folder structure, size and many other interest informations of this dataset by simply scrolling its page!
<img src="imgs/openneuro_full_view.png">
You can really decide whether a dataset is what you need this way, before burdening your connection with any heavy download :)

## 0.5 A mysterious command to run

Before going any further, open a terminal and launch the following command:

In [29]:
display("docker run -ti --rm -v {}:/data:ro -v {}:/out -v /home/NX421:/license_path:ro nipreps/fmriprep:latest /data /out/fmiprep_results participant --participant-label {} --fs-license-file /license_path/license.txt".format(bids_root, 
                                                                                                                                               deriv_root, "001"))

'docker run -ti --rm -v /home/guibert/NX_NSSP/Week2/dataset/ds004226:/data:ro -v /home/guibert/NX_NSSP/Week2/dataset/ds004226/derivatives:/out -v /home/NX421:/license_path:ro nipreps/fmriprep:latest /data /out/fmiprep_results participant --participant-label 001 --fs-license-file /license_path/license.txt'

It will run in the background and we will come back to it later !

## 1. Anatomical preprocessing

Let's have a look at the nice anatomical we downloaded above.

In [24]:
reset_overlays()
load(op.join(bids_root, 'sub-001', 'anat', 'sub-001_T1w.nii.gz'))

Image(sub-001_T1w, /home/guibert/NX_NSSP/Week2/dataset/ds004226/sub-001/anat/sub-001_T1w.nii.gz)

Take some time to explore the exquisite anatomy of the brain. Notice around the brain, the human skull. It is full of regions which show up **in this contrast** whiter than others. Based on what you might know from class about the T1 contrast, can you identify the different regions annotated below taken from another T1?

As a hint, think about white matter: it is composed of axons and myelin. Contrast with grey matter, which is comprised of soma. Based on your understanding:
<img src="imgs/annotated_regions.png">


- [ ] Tissues high in fat are bright in T1 contrast, which is why white matter is brighter than grey matter
- [ ] Tissues high in fibers are bright in T1 contrast, which is why white matter is brighter than grey matter
- [ ] Region 1 is likely high in fibers and might be tendons and ligaments, which are dense connective tissues full of fibers.
- [ ] Region 1 is likely high in fat, and is probably subcutaneous fat.
- [ ] Region 2 contains a mix of fat and water, hence the slightly darker color. Given its location, it is probably bone marrow.
- [ ] Region 2 contains a mix of fibers and water, hence the slightly darker color. Given its location, it is probably the dura mater, connective tissues that make up the outer-most layer of the meninges.
- [ ] Region 3 contains air, which is why we do not see it in T1.
- [ ] Region 3 contains mostly water, which appears dark in T1.


### Skull stripping

#### Preprocessing and BIDs
An important part of **anatomical** preprocessing is to remove the skull around the brain.
To adhere to the BIDs format, all modified files should be put in a new folder, called derivatives, such that you always have clean data in the source directory. The derivatives folder can be used for different preprocessing and treatments, each needing their own subfolders. In our case, we've created a single folder, preprocessed_data, hence the following structure:

In [30]:
print_dir_tree(bids_root, max_depth=4)

|ds004226/
|--- CHANGES
|--- dataset_description.json
|--- participants.tsv
|--- derivatives/
|------ fmiprep_results/
|--------- logs/
|------------ CITATION.bib
|------------ CITATION.html
|------------ CITATION.md
|------------ CITATION.tex
|--------- sourcedata/
|------------ freesurfer/
|--------- sub-001/
|------------ figures/
|------------ func/
|------------ log/
|------ preprocessed_data/
|--------- sub-001/
|------------ anat/
|------------ fmap/
|------------ func/
|--- sub-001/
|------ anat/
|--------- sub-001_T1w.json
|--------- sub-001_T1w.nii.gz
|------ func/
|--------- sub-001_task-sitrep_run-01_bold.json
|--------- sub-001_task-sitrep_run-01_bold.nii.gz


#### Actual skull stripping

Perfect! Let's move on to actually extracting the brain! To make it easier for you to detect what was actually extracted, we will let the brain extraction proceed, using FSL's <a href="https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/BET/UserGuide">BET</a> (brain extraction tool) and show you the mask of the region determined by FSL as brain.

In [31]:
preproc_root

'/home/guibert/NX_NSSP/Week2/dataset/ds004226/derivatives/preprocessed_data'

In [32]:
anatomical_path = op.join(bids_root, 'sub-001', 'anat', 'sub-001_T1w.nii.gz')
betted_brain_path = op.join(preproc_root, 'sub-001', 'anat', 'sub-001_T1w')
resulting_mask_path = op.join(preproc_root, 'sub-001', 'anat', 'sub-001_T1w_mask')
bet(anatomical_path, betted_brain_path, mask=resulting_mask_path)

{}

In [33]:
load(resulting_mask_path)

Image(sub-001_T1w_mask, /home/guibert/NX_NSSP/Week2/dataset/ds004226/derivatives/preprocessed_data/sub-001/anat/sub-001_T1w_mask.nii.gz)

Is the mask nicely fitting around the brain? What you would like is that the mask is taking all parts of the brain and excluding the rest.
To answer this one, play with the mask's opacity in FSL eyes.<br>
*Hint: have a look at the frontal regionsm inspect as well the superior parietal regions*<br>
<br><br>
What you are doing here is simply **Quality Control** (QC). It is a crucial step that you should **NEVER** skip when dealing with data preprocessing. As all steps are dependent on the success of previous steps, always make sure that everything is performing properly before moving on!

#### Improving the fit
If you look a bit into bet's documentation, you'll quickly find that there are parameters with which you can play; robust brain centre estimation and fractional intensity threshold. To demonstrate the importance and impact of these parameters, let's use a robust brain center estimation.

In [34]:
bet(anatomical_path, betted_brain_path, mask=resulting_mask_path, robust=True)

{}

In [35]:
reset_overlays()

load(op.join(bids_root, 'sub-001', 'anat', 'sub-001_T1w.nii.gz'))
load(resulting_mask_path)

Image(sub-001_T1w_mask, /home/guibert/NX_NSSP/Week2/dataset/ds004226/derivatives/preprocessed_data/sub-001/anat/sub-001_T1w_mask.nii.gz)

How good is the mask now?

Is it perfect? (*Hint: have a look at voxels around 94-209-131*)

#### Hand corrections
If you really want good fit, you might want to resort to **hand correcting the mask**. 

FSLeyes readily allows you to do such things! While on FSLeyes, press **Alt + E** to open the editing interface.
<img src="imgs/editing_menu_fsl.png">
<center><i>FSLeyes editing menu</i></center>

We will work here on removing some unwanted voxels. Toggle the 'Select mode' first. This way, FSL will show us which voxels we currently have selected, before changing their value
<img src="imgs/selection_mode_toggle_fsl.png">
<center><i>Make sure to be in Select mode by clicking it</i></center>

Then let's pick the pencil tool, to select the voxels we want.
<img src="imgs/brush_tool.png">

Good, we're set and we can now select voxels. We'll try to select some **unwanted** voxels. Simply paint over them!
<img src="imgs/painted_voxels.png">
<center><i>Selected voxels are shown in purple</i></center>

Now, we are dealing with a mask. We thus want to put the value of our selection to **0**, so as to remove it from the mask. To do so, we must change the fill value to 0, and then click to replace our selection with the provided value:
<img src="imgs/paint_steps.png">
<center><i>The two steps to set selection to a specific fill value.</i></center>
<img src="imgs/mask_painted.png">
<center><i>Painting with zero a mask means we remove the painted voxels from the mask.</i></center>

It remains now to apply the mask to our anatomical data. This is fortunately something that you now know how to do from the previous lab! Fill in the next cell with the appropriate code **and make sure to save the masked brain in the proper directory**.

In [None]:
# Fill me with the code to use your mask and apply it to the subject's anatomical data. 
# Save the result in the derivatives folder!!
??? 

### Tissue segmentation

For the purpose of analysis, it can be useful to separate the tissues into tissue classes; in particular extracting the white matter, grey matter and cerebrospinal fluid (abreviated as CSF) is very interesting in fMRI analysis. Consider for example an analysis that you wish to restrict to the somas of your neurons, would it make sense to conduct your analysis on the CSF ?

You'll find that the segmentation is not done on fMRI volumes; it is done on the anatomical and the resulting tissue masks are then used on the functional data. Can you imagine why this is the case?

Let's perform tissue segmentation. To do so, we'll use FSL's <a href="https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FAST">FAST</a> (FMRIB's Automated Segmentation Tool).

The underlying idea of FAST is to try and model each voxel's intensity as being a mixture between the different tissue types.
Pay attention in the documentation to the following line:
<div class="warning" style='background-color:#C1ECFA; color: #112A46; border-left: solid darkblue 4px; border-radius: 4px; padding:0.7em;'>
<span>
<p style='margin-top:1em; text-align:center'><b></b></p>
<p style='text-indent: 10px;'>
    Before running FAST an image of a head should first be brain-extracted, using BET. The resulting brain-only image can then be fed into FAST.</p>
</span>
</div>

Based on this, **in the cell below choose which image should be used as fast_target, between the anatomical_path and the brain_extracted_path images**.


<div class="warning" style='background-color:orange; color: #112A46; border-left: solid red 4px; border-radius: 4px; padding:0.7em;'>
<span>
<p style='margin-top:1em; text-align:center'><b>🐞 Troubleshooting: FSL stopped responding </b></p>
<p style='text-indent: 10px;'>
    It is perfectly possible (even likely) that FSLeyes will stop responding over the course of this lab. This is perfectly normal! Simply wait for whichever function (such as FAST) to finish and it should start responding again, don't worry too quickly, be patient :)</p>
</span>
</div>

In [36]:
anatomical_path = op.join(bids_root, 'sub-001', 'anat', 'sub-001_T1w.nii.gz')
bet_path = op.join(preproc_root, 'sub-001', 'anat', 'sub-001_T1w')

fast_target = bet_path # Replace with either anatomical_path or bet_path (note: you can try both and decide which is more reasonable!)

[os.remove(f) for f in glob.glob(op.join(preproc_root, 'sub-001', 'anat', '*fast*'))] # Just to clean the directory in between runs of the cell
segmentation_path = op.join(preproc_root, 'sub-001', 'anat', 'sub-001_T1w_fast')
fast(imgs=[fast_target], out=segmentation_path, n_classes=3)

{}

Let's check the quality of the segmentation, shall we?
We want to extract 3 tissue types here: the white matter, the grey matter and the csf. How well did fast perform?

In [37]:
reset_overlays()
load(bet_path)

Image(sub-001_T1w, /home/guibert/NX_NSSP/Week2/dataset/ds004226/derivatives/preprocessed_data/sub-001/anat/sub-001_T1w.nii.gz)

If you look at the directories now, we have new files in our hierarchy:

In [38]:
print_dir_tree(bids_root, max_depth=5)

|ds004226/
|--- CHANGES
|--- dataset_description.json
|--- participants.tsv
|--- derivatives/
|------ fmiprep_results/
|--------- logs/
|------------ CITATION.bib
|------------ CITATION.html
|------------ CITATION.md
|------------ CITATION.tex
|--------- sourcedata/
|------------ freesurfer/
|--------------- fsaverage/
|--------------- sub-001/
|--------- sub-001/
|------------ figures/
|--------------- sub-001_desc-about_T1w.html
|--------------- sub-001_desc-conform_T1w.html
|--------------- sub-001_desc-summary_T1w.html
|--------------- sub-001_task-sitrep_run-01_desc-validation_bold.html
|------------ func/
|--------------- sub-001_task-sitrep_run-01_from-scanner_to-boldref_mode-image_xfm.txt
|------------ log/
|--------------- 20230925-093603_45554640-44ae-4e5f-9e36-f3033f9c69e0/
|------ preprocessed_data/
|--------- sub-001/
|------------ anat/
|--------------- sub-001_T1w.nii.gz
|--------------- sub-001_T1w_fast_mixeltype.nii.gz
|--------------- sub-001_T1w_fast_pve_0.nii.gz
|--

The pve files correspond to our segmented tissues. We have exactly three files, because we set n_classes to 3 above:
```python
fast(..., n_classes=3)
```

Let's try to identify which segmentation is which tissue type in the brain. To do this, you'll have to visualize the tissues and decide for yourself:

- [ ] pve_0 is white matter, pve_1 is grey matter, pve_2 is CSF
- [ ] pve_0 is grey matter, pve_1 is white matter, pve_2 is CSF
- [ ] pve_0 is grey matter, pve_1 is CSF, pve_2 is white matter
- [ ] pve_0 is CSF, pve_1 is grey matter, pve_2 is white matter


To make it easier on you, we will display:

- pve_0 in <span style="color:red;">red</span>
- pve_1 in <span style="color:green;">green</span>
- pve_2 in <span style="color:blue;">blue</span>

In [39]:
load(glob.glob(op.join(preproc_root, 'sub-001', 'anat','*pve_0*'))[0])
load(glob.glob(op.join(preproc_root, 'sub-001', 'anat','*pve_1*'))[0])
load(glob.glob(op.join(preproc_root, 'sub-001', 'anat','*pve_2*'))[0])
displayCtx.getOpts(overlayList[1]).cmap = 'Red'
displayCtx.getOpts(overlayList[2]).cmap = 'Green'
displayCtx.getOpts(overlayList[3]).cmap = 'Blue'

<div class="warning" style='background-color:#90EE90; color: #112A46; border-left: solid #805AD5 4px; border-radius: 4px; padding:0.7em;'>
<span>
<p style='margin-top:1em; text-align:center'><b>🏠 Tissues and contrast: Take home message 🏠</b></p>
<p style='text-indent: 10px;'>
    Tissues in T1 or T2 will show up in different colour, dependent on their content. This is because their content affects their relaxation time and, in turn, the intensity captured during the acquisition. Here are different tissue types for both modalities (from <a href="https://www.researchgate.net/publication/324396120_Basic_MRI_for_the_liver_oncologists_and_surgeons">Vu, Lan N., John N. Morelli, and Janio Szklaruk. "Basic MRI for the liver oncologists and surgeons." Journal of hepatocellular carcinoma 5 (2017): 37.</a>):
    <table>
        <tr>
            <th>MR</th>
            <th>High signal (bright)</th>
            <th>Low signal (dark)</th>
        </tr>
        <tr>
            <th>T1</th>
            <th>Fat, melanin</th>
            <th>Iron</th>
        </tr>
        <tr>
            <th></th>
            <th>Blood</th>
            <th>Water</th>
        </tr>
        <tr>
            <th></th>
            <th>Proteinaceous fluid</th>
            <th>Air, bone</th>
        </tr>
        <tr>
            <th></th>
            <th>Paramagnetic substances</th>
            <th>Collagen</th>
        </tr>
        <tr>
            <th></th>
            <th>Chelated gadolinium contrast</th>
            <th>Most tumors</th>
        </tr>
        <tr>
            <th>T2</th>
            <th>Water</th>
            <th>Air</th>
        </tr>
        <tr>
            <th></th>
            <th>Edema</th>
            <th>Bone</th>
        </tr>
        <tr>
            <th></th>
            <th>Blood</th>
            <th>Hemosiderin, deoxyhemoglobin, methemoglobin</th>
        </tr>
    </table>
    Typical preprocessing steps of anatomical data starts by extracting the brain by removing skull tissues. This step can be conducted mostly automatically, but it is perfectly possible to manually correct the extracted brain, either to include more or less voxels when tweaking the parameters does not yield satisfactory results.
    The extracted brain can be segmented into different tissues. Using the difference in brightness due to contrast, we can separate the grey matter, the white matter and the CSF, which is useful for later analysis.</p>
</span>
</div>

## Normalization

This is a step you already conducted last week, if you remember! If not, go back to lab 1 and look for Ducky and his sunglasses. It is also critical to proper preprocessing.

## Anatomical: conclusions

As a final note, all these steps (<u>including</u> non linear normalization!) can be done automatically for you with a single command: fsl_anat. So you might want to use this command, instead of running all of the above.
Here's a quick run for you (note: it will take several minutes to complete so be patient :) ).

Now, there are some subtleties and renamings that are needed because of the way FSL operates. We thus provide you with a wrapper around fsl_anat to do all this.

In [40]:
import shutil

def fsl_anat_wrapped(anatomical_target, output_path):
    fsl_anat(img=anatomical_target, clobber=True, nosubcortseg=True, o=output_path)
    # Now move all files from the output_path.anat folder created by FSL to 
    # the actual output_path
    fsl_anat_path = output_path+'.anat'
    files_to_move = glob.glob(op.join(fsl_anat_path, '*'))
    for f in files_to_move:
        shutil.move(f, op.join(output_path, op.split(f)[1]))
    
    # Remove the output_path.anat folder
    os.rmdir(fsl_anat_path)

In [None]:
fsl_anat_wrapped(anatomical_path, op.join(preproc_root, 'sub-001', 'anat'))

Mon Sep 25 11:48:41 CEST 2023
Reorienting to standard orientation
Mon Sep 25 11:48:53 CEST 2023
Automatically cropping the image
Starting Single Image Segmentation
T1-weighted image
Imagesize : 176 x 256 x 170
Pixelsize : 1 x 1 x 1

1 6.64379
2 6.9344
3 7.23706
KMeans Iteration 0
KMeans Iteration 1
KMeans Iteration 2
KMeans Iteration 3
KMeans Iteration 4
KMeans Iteration 5
KMeans Iteration 6
KMeans Iteration 7
KMeans Iteration 8
KMeans Iteration 9
KMeans Iteration 10
KMeans Iteration 11
KMeans Iteration 12
KMeans Iteration 13
KMeans Iteration 14
Tanaka Iteration 0 bias field 10
Tanaka-inner-loop-iteration=0 MRFWeightsTotal=1.73111e+07 beta=0.02
Tanaka-inner-loop-iteration=1 MRFWeightsTotal=1.75433e+07 beta=0.02
Tanaka-inner-loop-iteration=2 MRFWeightsTotal=1.75483e+07 beta=0.02
Tanaka-inner-loop-iteration=3 MRFWeightsTotal=1.75484e+07 beta=0.02
Tanaka-inner-loop-iteration=4 MRFWeightsTotal=1.75484e+07 beta=0.02
 CLASS 1 MEAN 481.122 STDDEV 436.353 CLASS 2 MEAN 945.703 STDDEV 159.348 CL

Let's inspect the resulting files:

In [None]:
print_dir_tree(bids_root, max_depth=5)

That's a lot of files! But let's worry about mostly two of them. <br>
Notice the T1_to_MNI_lin and the T1_to_MNI_nonlin ?
<br>In the former's case, FLIRT was run to obtain a linear normalization, whereas FNIRT was used for the latter to obtain a non linear normalization. But what difference does it make, in practice? Well, let's inspect the results, shall we?

*Hint: consider the brain landmarks, such as the ventricles but also the overall shape of the brain to determine if there was a change and if so which one(s)*

In [None]:
reset_overlays()
load(reference)
load(op.join(preproc_root, 'sub-001', 'anat', 'T1_to_MNI_lin'))
load(op.join(preproc_root, 'sub-001', 'anat', 'T1_to_MNI_nonlin'))

## 2. fMRI preprocessing

You are now familiar with the few steps of preprocessing revolving around the T1 anatomical file. The main preprocessing starts now, with the functional data. 

<div class="warning" style='background-color:#C1ECFA; color: #112A46; border-left: solid darkblue 4px; border-radius: 4px; padding:0.7em;'>
<span>
<p style='margin-top:1em; text-align:center'><b>💡 Do not forget QC! 💡</b></p>
<p style='text-indent: 10px;'>
    As always, in all your steps visualize the effect of what you're doing. This is the easiest way to check that what you're doing is actually having an effect and better yet: a *correct* effect!</p>
</span>
</div>


### 2.0 Problematic volumes removal

A problem can arise in fMRI. To showcase this, please execute the cell below (we'll show you something from another dataset just to drive our point home).

In [None]:
bids_root_demo = op.join(op.dirname(sample.data_path()), dataset_demo)
op.join(bids_root_demo, 'derivatives')

In [None]:
dataset_demo = 'ds004218'
subject_demo = '01'

# Download one subject's data from each dataset
bids_root_demo = op.join(op.dirname(sample.data_path()), dataset_demo)
preproc_root_demo = op.join(bids_root_demo, 'derivatives')

mkdir_no_exist(bids_root_demo)
mkdir_no_exist(op.join(bids_root_demo, 'sub-01'))
mkdir_no_exist(op.join(bids_root_demo, 'sub-01', 'func'))


mkdir_no_exist(preproc_root_demo)
mkdir_no_exist(op.join(preproc_root_demo, 'sub-01'))
mkdir_no_exist(op.join(preproc_root_demo, 'sub-01', 'anat'))

os.system("""openneuro-py download --dataset {}
          --include sub-{}/func/sub-{}_task-listening_run-1_bold.nii.gz
          --target_dir {}""".format(dataset_demo, subject_demo, 
                                    subject_demo, bids_root_demo).replace("\n", " "))

We've downloaded one functional volume from another dataset, because the phenomenon is really visible in this dataset. 
Before going any further in this tutorial, let's open up our data and have a look at them. <u>You should always look at your data before conducting any sort of analysis</u>. See if you find anything at all that looks strange. You should look for

- [ ] Volumes moving in space (ie: head motion)
- [ ] Non homogeneities that do not seem to be coming from brain activity

To open the volume of interest in FSL eyes, simply run:

In [None]:
load(op.join(bids_root_demo, 'sub-01', 'func', 'sub-01_task-listening_run-1_bold.nii.gz'))

Did you find anything?
If so, what volumes would you remove, approximately?

#### 2.0.1 Field stabilization

The scanner's field takes some time to settle. You probably noticed that the volumes had an initially high contrast that quickly decayed to some baseline? It is precisely caused by the scanner's field settling.

There's little to be done in this regard; we can only throw away the volumes that are contaminated in this specific case, as the 'staircase' brain that we observe is not really meaningful and might hurt our analysis later on.

We'll throw away the first 10 volumes (first 20 seconds here), to err on the safe side.

In [None]:
file_to_realign = glob.glob(op.join(bids_root_demo, 'sub-01', 'func', 'sub-01_task-listening_run-1_bold.nii*'))[0]
output_target = op.join(preproc_root_demo, 'sub-01', 'func', 'sub-01_task-listening_run-1_bold_settled')

# We will start from the 10th volume.
# For this, knowing that there are originally 225 and that you want to throw away the first 10, please fill in
# the following variables
start_vol = 10 # Where should we start? (First volume is 0, not 1 !)
number_of_volumes = 215 # How many volumes should we keep?

fslroi(file_to_realign, output_target, str(start_vol), str(number_of_volumes))

So, take-away message of this section which is also the point of all these preprocessing steps: <u>always look at your data!</u>.
Let's go back to our original dataset now. :)

### Motion correction

Motion correction here specifically means trying to make it such that a given voxel describes the same brain position in all volumes.

To illustrate why it might be a good idea, let's have a look at the functional data of our participant. Watch the movie. Do you notice anything strange?

<center><img src="imgs/motion.gif"/>
    <p style="text-align:center;"><i>You might want to pay attention to the axial view (right)</i></p></center>


The volumes tend to move a bit around, don't they?
<an example moment of motion>
    
This is a problem. Indeed, when we talk of a given voxel, our hope for analysis is that it represents a specific coordinate of anatomy. Imagine if you were trying to find your way with Google Maps, but every now and then the houses would suddenly all move by one kilometer! Would not be so easy to get to the right address, would it? Well, here it's the same. We want that a given (X, Y, Z) position describes always the same portion of the brain, otherwise our analysis will simply not work.

But because of motion, this is not the case.
    
This is one of the core issues of fMRI: the participant simply moved, ever so slightly, during the acquisition. As a consequence, well, we have a recording of a moving participant. This is not a rare phenomenon: imagine having to keep your head perfectly still for several minutes and you'll quickly understand that it is **hard**!
    
Still, we would like to do something about it. This is where motion correction steps in. There are two sides to motion correction. The first, which we'll cover here, attempts to put all volumes back in alignment, so that a given position is indeed consistently describing the same anatomical part for all volumes. The second, which you'll see next week, attempts to correct for the consequences of motion on the magnetic field.
<br><br>
Do you remember Ducky? Well, imagine now that our dear duck has a rare shaking disease.
    <br><img src="imgs/ducky/shakyducky.png" style="width:auto;height:500px;"/>
<br>If we take several consecutive pictures of Ducky, it won't be as aligned as it should be
    <br><img src="imgs/ducky/duckies_before_reg.png" style="width:900px;height:auto;"/>
<br>To correct this, I can apply the idea we used in normalization. Let's pick one Ducky image as reference. Then, all other images of Ducky will be registered to this Ducky
    <br><img src="imgs/ducky/duckies_reg.png" style="width:900px;height:auto;"/>
    <br><img src="imgs/ducky/duckies_after_reg.png" style="width:900px;height:auto;"/>
<br>Now, what if I want to remember how much Ducky had moved ? Well, I can remember the parameters of the transformation I had to apply to align the volumes. This fully encapsulates the motion information.

This is precisely what motion correction sets out to achieve. For this, we need first to define a reference, if possible in fMRI space and that would not require too much transformations. Which option(s) seem reasonable to you?:

- [ ] Choosing as reference a volume of the fMRI timeserie
- [ ] Averaging the fMRI timeserie and using the mean volume as reference
- [ ] Choosing as reference an anatomical volume, preferably of T1 contrast
- [ ] Choosing as reference an fMRI standard space volume, derived from a cohort of participants


It turns out the first two options are usually equivalent. Because it saves us one pass of average computation, we will choose the first option: picking a volume and using it as reference! Which one do you think would be good?
- [ ] The first volume of the timeserie
- [ ] The last volume of the timeserie
- [ ] The middle volume of the timeserie
- [ ] Any volume such that the bold had the time to settle down

Now, let us perform this step, on our **first** dataset (the one without fieldmaps). In FSL, we use <a href="https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/MCFLIRT">MCFLIRT</a> to perform this correction.
<br>
    
<div class="warning" style='background-color:#C1ECFA; color: #112A46; border-left: solid darkblue 4px; border-radius: 4px; padding:0.7em;'>
<span>
<p style='margin-top:1em; text-align:center'><b> 💡 Pay attention ! 💡</b></p>
<p style='text-indent: 10px;'>
    By default, MCFLIRT selects the middle volume of the EPI serie as reference to which other volumes are realigned.</p>
</span>
</div>

In [1]:
path_original_data = os.path.join(bids_root_fmap, 'sub-001', 'func', 'sub-001_task-sitrep_run-01_bold')
path_moco_data = os.path.join(preproc_root, 'sub-001', 'func', 'sub-001_task-sitrep_run-01_bold_moco')
mcflirt(infile=path_original_data,o=path_moco_data, plots=True, report=True, dof=6, mats=True)

NameError: name 'op' is not defined

Okay! So, what do we have now?

In [None]:
print_dir_tree(bids_root_fmap, max_depth=5)

In the functional folder, notice that we have two new files:
```
sub-001_task-sitrep_run-01_bold_moco.nii.gz
sub-001_task-sitrep_run-01_bold_moco.par

```

The first one is the corrected EPI time serie, with volumes realigned. The second is a file describing the motion parameters that were used to move each volume. It will be useful very shortly to determine which volume moved by a lot.
Notice as well a new directory!
```
sub-001_task-sitrep_run-01_bold_moco.mat/
```
This directory is full of .MAT files. These are the transformation matrices used for every volume to realign them.

<div class="warning" style='background-color:#C1ECFA; color: #112A46; border-left: solid darkblue 4px; border-radius: 4px; padding:0.7em;'>
<span>
<p style='margin-top:1em; text-align:center'><b> 💡 Pay attention ! 💡</b></p>
<p style='text-indent: 10px;'>
    The motion parameters and the transformation matrices are related, but they are not exactly the same thing. While you can recover one from the other, it is not trivial. Applying the transformation matrix to a volume will put it 'in alignment' as you've done with FLIRT. However the motion parameters cannot be applied directly. Loosely, the motion parameters describe how you would move if you first applied a rotation along x, then along y, then along y, followed by transition along x, then transition along y, then transition along z. This ordering of transformations is **not** really what happens with the transformation matrices. It is a convention adopted by FSL to make it easier to decouple transformations and rotations in the motion parameter analysis; it is therefore a <b>convenience</b>.
    <u>Do not confuse transformation matrices and motion parameters</u>!</p>
</span>
</div>

Before going <u>any further</u>, go and have a look at the corrected timeseries.

In [None]:
reset_overlays()
load(path_original_data)
load(path_moco_data)

Did mcflirt help correct motion? Are you convinced it did somewhat a proper job?
<br>
It's actually not too easy to tell right? Well, let's see if we can figure something out to ease our quality control!

#### Motion parameters and degrees of freedom

We told you earlier that motion parameters can be used to estimate the motion along every axis.

In our invocation of mcflirt, notice the following:
```python
mcflirt(..., dof=6)
```
dof stands for <i><b>d</b>egrees <b>o</b>f <b>f</b>reedom</i>, it really means what kind of transformation we wish to apply. In a 3D transformation, we have 3 axis:
<img src="imgs/3d_axis.png"/>
Along each axis, we can apply one transformation. Because we apply here only **affine** transformations, we can choose any transformation from:
- Translation along the axis
- Rotation along the axis
- Shear along the axis
- Scale along the axis

Together, you can see this gives in total **12** DOF.
We've chosen 6 DOFs, which is the standard choice: we want only to translate and rotate around the volumes, since they've been displaced by motion.

#### Looking at the resulting correction parameters
Recall the motion parameters are stored in the .par file produced by MCFLIRT. Notice that since each volume moved differently, we have one transformation per volume, thus one set of motion parameters per volume as well. We provide you with a way to load these parameters:

In [None]:
def load_mot_params_fsl_6_dof(path):
    return pd.read_csv(path, sep='  ', header=None, 
            engine='python', names=['Rotation x', 'Rotation y', 'Rotation z','Translation x', 'Translation y', 'Translation z'])

mot_params = load_mot_params_fsl_6_dof(op.join(preproc_root, 'sub-001', 'func', 'sub-001_task-sitrep_run-01_bold_moco.par'))
mot_params

Based on **translation on X alone**, can you find perhaps a volume which exceeds with respect to the **preceding volume** a 0.2 mm displacement?

In [None]:
# write your code here to inspect quickly the translation on X :)

Some metrics have been created, to compute the displacement of a frame compared to the preceding frame: this is the frame-wise displacement. <br>(see <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3254728/">Power, Jonathan D., et al. "Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion." Neuroimage 59.3 (2012): 2142-2154.</a> for more details).<br>
We can use this one to extract an aggregate measure of motion for all volumes. 

In [None]:
def compute_FD_power(mot_params):
    framewise_diff = mot_params.diff().iloc[1:]

    rot_params = framewise_diff[['Rotation x', 'Rotation y', 'Rotation z']]
    # Estimating displacement on a 50mm radius sphere
    # To know this one, we can remember the definition of the radian!
    # Indeed, let the radian be theta, the arc length be s and the radius be r.
    # Then theta = s / r
    # We want to determine here s, for a sphere of 50mm radius and knowing theta. Easy enough!
    
    # Another way to think about it is through the line integral along the circle.
    # Integrating from 0 to theta with radius 50 will give you, unsurprisingly, r0 theta.
    converted_rots = rot_params*50
    trans_params = framewise_diff[['Translation x', 'Translation y', 'Translation z']]
    fd = converted_rots.abs().sum(axis=1) + trans_params.abs().sum(axis=1)
    return fd

fd = compute_FD_power(mot_params).to_numpy()

In [None]:
threshold = np.quantile(fd,0.75) + 1.5*(np.quantile(fd,0.75) - np.quantile(fd,0.25))

In [None]:
%matplotlib inline
plt.plot(list(range(1, fd.size+1)), fd)
plt.xlabel('Volume')
plt.ylabel('FD displacement (mm)')
plt.hlines(threshold, 0, 370,colors='black', linestyles='dashed', label='FD threshold')
plt.legend()
plt.show()

Okay great, but what if we want to know which volumes are actually above threshold? Simply run the cell below!

In [None]:
np.where(fd > threshold)[0] + 1

So, you now know which volumes might present motion that is worth checking. Go back to FSLeyes and contrast the uncorrected volumes with the corrected ones. Can you see what sort of motion was problematic and was eliminated?

### Motion-correction: conclusions

Motion correction should always be conducted. As you've seen, it is extremely easy to do and has many benefits. However it is not infaillible. High motion tends to cause non linear effects in the signal that simple motion correction above cannot correct since it has no awareness of the magnetic field. <br>
<br> Motion parameters can, in this case, come to our rescue. As they represent the effect of motion, including them in our modeling to try and correct the signal can help. One could for example include this information in a General Linear Model to regress out the signal of these volumes (censoring) from overall timeseries. ➡️ More on this next week!

### Where are we?

So, let's see what we have done so far:

<table>
    <tr><th style='text-align:justify;'>Data type</th><th style='text-align:justify;'>Step name </th><th style='text-align:justify;'>Details of the step</th><th style='text-align:justify;'>FSL tool </th></tr>
    <tr><th>Anatomical</th><td></td><td></td></tr>
    <tr><td></td><td style='text-align:justify;'>Skull stripping</td><td style='text-align:justify;'>Removing skull and surrounding tissues to keep only the brain</td><td style='text-align:justify;'>BET</td></tr>
    <tr><td></td><td style='text-align:justify;'>Segmentation</td><td style='text-align:justify;'>Segmenting brain tissues based on their contrasts</td><td style='text-align:justify;'>FAST</td></tr>
    <tr><td></td><td style='text-align:justify;'>Normalization</td><td style='text-align:justify;'>Mapping participant's brain to a reference brain, making its orientation and scale match so that comparison across participants become feasible.</td><td style='text-align:justify;'>FLIRT</td></tr>
    <tr><th>Functional</th><th></th><th></th></tr>
    <tr><td></td><td style='text-align:left;'>First few volumes removal</td><td style='text-align:justify;'>Removing volumes for which the b0 field is still not stable and that could contaminate all our data if left unchecked.</td><td style='text-align:justify;'>fslroi</td></tr>
    <tr><td></td><td style='text-align:left;'>Motion correction</td><td style='text-align:justify;'>Realignment of fMRI volumes to a common reference - typically one volume or the average of the volumes - to correct for inter-volume motion. The extracted motion parameters can be used for subsequent analysis (see GLM next week!)</td><td style='text-align:justify;'>MCFLIRT (which is one suboption of FLIRT in fact)</td></tr>
</table>


As mentioned earlier:
- Doing motion correction or slice-timing first is still a matter of debate in the literature
- Field unwarping and coregistration (which you'll see now) can be conducted jointly to improve results. It means that a typical pipeline would actually be in the order: Volumes removal > (Motion correction > slice-timing correction) > Coregistration + Fieldmap unwarping > ...


### Coregistration

Just what is coregistration? Well, it is basically a registration between images of different modalities. In our specific case, we want to register fMRI (EPI) to an anatomical image (T1). There are several reasons for this. The first that comes to mind is that if you overlay your fMRI on the anatomy, you can of course reason much more easily on where you are in the brain, what activations you might be looking at and so forth. Imagine a participant has a brain lesion visible on the anatomy and you want to see how this reflects on the fMRI. Being able to put the two together would make it much easier, would it not?

This is the first reason behind coregistration.

The second is because of normalization. Assume you want to compare all fMRI data of participants. Clearly, putting all of them into a common reference frame is a bit trickier, because of how noisy and low-resolution the data is, right? But you know how to map the anatomical to this common space with excellent accuracy, and you've saved this transformation earlier.
If you could figure out how to go from the fMRI space to anatomical, clearly the problem would be solved! You'd only have then to apply the transformation from anatomical to common space and be done with it.


Computing the fMRI space to anatomical transformation is precisely the goal of coregistration.
<br><br>
To do this step, we will use a wonderful command: <a href="https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FLIRT/UserGuide#epi_reg">epi_reg</a> ! As the name states, it is a command to register an EPI. Hard to make it clearer huh? 

#### What to do

Notice that we want to compute the transformation to use for coregistration.
Now, we have an EPI, here of 364 volumes, each supposedly aligned by motion-correction. How many times should we compute the transformation?
- [ ] 364 times, once for each volume
- [ ] Once, selecting any volume from the EPI

Your task is simple. You should:
- Fill in the name of the EPI target. It should be the **motion-corrected** EPI that you corrected using MCFLIRT (ignore the slice-timing corrected volume). If you want to use a single volume, set the use_first_vol variable to True!
- Fill in the path to the whole head T1 image (**before** skull stripping was conducted!)
- Fill in the path to the skull-stripped T1 image (**after** skull stripping was conducted!)

<div class="warning" style='background-color:#C1ECFA; color: #112A46; border-left: solid darkblue 4px; border-radius: 4px; padding:0.7em;'>
<span>
<p style='margin-top:1em; text-align:center'><b> 💡 Pay attention ! 💡</b></p>
<p style='text-indent: 10px;'>
    Make sure that the whole head T1 and the skull-stripped T1 have the same orientation.
For example, if you ran fsl_anat to extract the brain (which is fine), FSL will change in the headers the orientation of the T1 before skull-stripping. As a consequence, the brain-extracted T1 no longer has the same orientation as the original T1. If you display them on top of each other, they are perfectly matched, but not from the perspective of the <b>headers</b>, which can play nasty tricks on you when performing coregistration.</p>
</span>
</div>

In [None]:
epi_target = ???
use_first_vol = ???
whole_t1 = ???
skull_stripped_t1 = ???
output_path = op.join(preproc_root, 'sub-001', 'func', 'sub-001_task-sitrep_run-01_bold_bbr')
ref_vol_name =  op.join(preproc_root, 'sub-001', 'func', 'sub-001_task-sitrep_run-01_bold_moco_vol_middle')

In [None]:
if use_single_vol:
    # Extract the middle volume with fslroi as we've seen before :)
    fslroi(epi_target, ref_vol_name, str(182), str(1))
    # Call epi_reg
    epi_reg(ref_vol_name, whole_t1, skull_stripped_t1, output_path)
    # Delete the first volume (we don't need it anymore :0)
else:
    epi_reg(epi_target, whole_t1, skull_stripped_t1, output_path)

Notice how FAST is run?
This is because the specific coregistration cost (boundary-based registration, BBR) uses the anatomical white-matter tissues from FAST. If no such tissue is provided to the function, it re-runs FAST to obtain it and use it. If you've already done anatomical segmentation, clearly there's no need to redo it right?
In particular, imagine if you had to yourself correct the white matter with the help of an expert because somehow FSL did not do a poor job on your data. Clearly you'd like to have this one used instead of the result from FAST, right?

Well- you can! We just need a new option in the epi_reg command:
```python
epi_reg(...,wmseg=path_to_your_white_matter_segmentation)
```

In [None]:
epi_reg(ref_vol_name, whole_t1, skull_stripped_t1, output_path, wmseg=op.join(preproc_root, 'sub-001', 'anat', 'T1_fast_pve_2'))

Let's overlay the two (EPI and anatomical) on top of each other to visualize the quality of the coregistration!

In [None]:
reset_overlays()
load(skull_stripped_t1)
load(output_path)

Now, how do we *know* if the registration is good or bad?
Well, there are several things to watch out for, but here are some main leads:
- Is the functional in the right orientation?
- Are the ventricles correctly aligned?
- Are the boundaries of the EPI more or less matching the anatomical?

➡️ You can also check how the white matter of the EPI matches your anatomical's white matter provided you have sufficient resolution

#### Some cleanup
If you have a look, you might notice that perhaps your directory got filled with many files. These are temporary files, created but uncorrectly not eliminated by epi_reg. The following should help:

In [None]:
def cleanup_epi_reg(path_to_clean):
    patterns = ['*_fast_*', '*_fieldmap*']
    for p in patterns:
        files = glob.glob(op.join(path_to_clean, p))
        for f in files:
            os.remove(f)

In [None]:
cleanup_epi_reg(op.join(preproc_root, 'sub-001', 'func'))

## Smoothing
All these transforms are not exactly perfect. As you've seen in class, a step of smoothing is typically applied, with the size of the smoothing being dependent on your application, starting resolution etc.
The idea of smoothing is really that, as you're averaging, hopefully you increase the signal to noise ratio. <br>
A side-effect is that finest patterns of activation will be lost in the averaging (we can't have everything: there's no free lunch).

With FSL, smoothing is rather easy to do. However, one thing which is important is the size of your filter.
Different softwares might use different conventions. For MRI, it is typical to talk about FWHM (Full-width at half maximum), expressed in mms.

FSL, however, takes as input in sigma instead of FWHM. The conversion is easy fortunately:

$$ \sigma = \frac{FWHM}{2.3548}$$

Here for example would be the smoothing command for 6mm FWHM smoothing:

In [None]:
cmd = 'fslmaths {} -kernel gauss {} {}_smoothed-6mm'.format(total_epi, 6/2.3548, total_epi)
os.system(cmd)

Let's observe what we have now:

In [None]:
load(total_epi + '_smoothed-6mm')

Do you feel as though the signal-to-noise ratio was improved?

## MRI + fMRI preprocessing: summary

So, these were all the steps you were meant to study.

You should know by now: preprocessing is extremely important and you will likely spend a lot of time on it. Decisions in preprocessing will affect your analysis, so do not take this step lightly, it is <u>critical</u> to do it as well as possible!

<u>Always perform quality control to ensure everything is okay!</u>

Let's review one last time the different steps you've studied and which FSL tool(s) you used to do it:
<table>
    <tr><th style='text-align:justify;'>Data type</th><th style='text-align:justify;'>Step name </th><th style='text-align:justify;'>Details of the step</th><th style='text-align:justify;'>FSL tool </th></tr>
    <tr><th>Anatomical</th><td></td><td></td></tr>
    <tr><td></td><td style='text-align:justify;'>Skull stripping</td><td style='text-align:justify;'>Removing skull and surrounding tissues to keep only the brain</td><td style='text-align:justify;'>BET</td></tr>
    <tr><td></td><td style='text-align:justify;'>Segmentation</td><td style='text-align:justify;'>Segmenting brain tissues based on their contrasts</td><td style='text-align:justify;'>FAST</td></tr>
    <tr><td></td><td style='text-align:justify;'>Normalization</td><td style='text-align:justify;'>Mapping participant's brain to a reference brain, making its orientation and scale match so that comparison across participants become feasible.</td><td style='text-align:justify;'>FLIRT + FNIRT (from last week)</td></tr>
    <tr><th>Functional</th><th></th><th></th></tr>
    <tr><td></td><td style='text-align:left;'>First few volumes removal</td><td style='text-align:justify;'>Removing volumes for which the b0 field is still not stable and that could contaminate all our data if left unchecked.</td><td style='text-align:justify;'>fslroi</td></tr>
    <tr><td></td><td style='text-align:left;'>Motion correction</td><td style='text-align:justify;'>Realignment of fMRI volumes to a common reference - typically one volume or the average of the volumes - to correct for inter-volume motion. The extracted motion parameters can be used for subsequent analysis (see GLM next week!)</td><td style='text-align:justify;'>MCFLIRT (which is one suboption of FLIRT in fact)</td></tr>
    <tr><td></td><td style='text-align:left;'>Coregistration to anatomical</td><td style='text-align:justify;'>Putting the functional volumes in anatomical space</td><td style='text-align:justify;'>FLIRT (epi_reg being a specialized instance)</td></tr>
    <tr><td></td><td style='text-align:left;'>Smoothing</td><td style='text-align:justify;'>Allowing a bit of lee-way in the voxel's values to account for the imperfection of the registration</td><td style='text-align:justify;'>fslmath with smoothing operation</td></tr>
</table>

# Running it all: fmriprep

Performing all of these by hands would be tedious as you can imagine.
Furthermore, exact choices, bugs, softwares could lead to a lack of reproducibility in your research. For this reason (and many others), the neuroscientific community developed a suite of softwares to conduct all preprocessing steps in an automated fashion, called fmriprep!

You actually already launched this step, in the step 0.5 !
The command you used was:

In [None]:
display("docker run -ti --rm -v {}:/data:ro -v {}:/out -v /home/NX421:/license_path:ro nipreps/fmriprep:latest /data /out/fmiprep_results participant --participant-label {} --fs-license-file /license_path/license.txt".format(bids_root, 
                                                                                                                                               deriv_root, "001"))

Several things to note.

- We go through a container technology called docker, such that fmriprep and all its subprograms come in bundled and we don't have to compile anything. Very handy!
- We must mount the paths where  docker has to look for data and licenses. These are the -v commands. Any path not mounted will not be visible to docker!
- We point to the root path of the unprocessed data for docker to fetch data (expected to be BIDS compliant!)
- The derivative path is where we store the output
- We specified here a participant label. Had we not done so, fmriprep would apply the preprocessing to all participants in the folder. Pay attention to the --participant-label flag when you want to preprocess only one participant in a study!
- For a subject named sub-SOMENAME, the subject ID is SOMENAME, not sub-SOMENAME

Beware! Although it is easy, it will still take some time depending on your CPU, the volume of data to be preprocessed etc. fMRIprep is here ran through a Docker container. We will not detail docker too much, as it is out of scope for this class. All you need to know is that fmriprep and all its subprograms this way come as bundle, avoiding the hassle of painful installations :)

You can nonetheless get more information on all potential options of fmriprep like so:

In [None]:
os.system("docker run -ti --rm -v {}:/data:ro -v {}:/out fmriprep:latest --help")

Once fmiprep is done (check back the terminal where you started fmripre, if done it will not write anything anymore!), navigate to:

In [None]:
os.path.join(deriv_root,"fmriprep_results")

You should see index.html, click it:

This will open a new page in firefox to read the hmtl file. Within this page, you will see all results of the preprocessing, including QC steps for you to visually inspect that all went well!

We **strongly** encourage you for the rest of the class to use fmriprep whenever you can, as it will make your life easier. 


<div class="warning" style='background-color:#C1ECFA; color: #112A46; border-left: solid darkblue 4px; border-radius: 4px; padding:0.7em;'>
<span>
<p style='margin-top:1em; text-align:center'><b> 💡 Pay attention ! 💡</b></p>
<p style='text-indent: 10px;'>
    fmriprep is an amazing tool, but it is still not magic. You will need to still inspect your data to make sure everything is fine. Furthermore, you might need to play with fmriprep parameters, such as the template space for the normalization, but also potentially specify a template for the skull stripping, degrees of freedom for the coregistration...You should never use it as a black box and just be done with it! Be critical of what you get.</p>
</span>
</div>