In [None]:
%matplotlib inline

import os
import os.path as op
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Day 5. Surface-Based Morphometry on MRiShare dataset
============================================

This example uses Surface-Based Morphometry (VBM) to study the relationship
between aging, sex and cortical thickness and/or surface area.

The data come from the MRiShare database, which have been processed with 
Freesurfer v6.0 pipeline inside ABACI to create VBM maps.


SBM analysis of aging
---------------------

We run a standard GLM analysis to study the association between age
and surface-based metrics for each vertices from the Freesurfer data.

We will use the same sample_mrishare_subinfo.csv to construct design matrices
and run GLM analysis using mri_glm from Freesurfer. After preparing necessary input files, 
you can use this tool either directly in terminal or in this notebook, or using nipype
interface.



In [None]:
# Authors:  Ami Tsuchida <atsuch@gmail.com>, July, 2019

Prepare input design files
------------------

The principle of design matrix is exactly the same for running SBM analysis using Freesurfer. 
However, Freesurfer program will take either one of the following format, so here you will practice
creating both types.

1. FSGD file 

2. Design mat file

The first type is actually just a text file with information about any categorical variables in your design, and
the rest of the continuous variable. From this file, it **automatically creates your design matrix file used for the actualy GLM analysis**. 

Here you can look at the description of this type of input here (https://surfer.nmr.mgh.harvard.edu/fswiki/FsgdFormat). You can also follow the example link to see the examples of these files for different types of design.

The key thing to know about FSGD input option is that when you have any categorical variable of interest (e.g. Sex, healthy vs patients etc), it automatically creates a design that test for **different slopes and offset (DODS)**. You can read this (https://surfer.nmr.mgh.harvard.edu/fswiki/DodsDoss) for understanding what it means, but basically you will be testing interactions between your categorical variable and **every other continuous variables in your design**. This is fine as long as this is what you want to test, but if you have reasons to test for a simpler model, you will have to manually construct design mat file (second option).

I suspect that most published studies using Freesurfer GLM use the first type of input, and never bother to create a simpler model with design mat, but it's good to know it can be done, and that sometimes it's probably more appropriate.

### 1. Load the variables from sample subinfo.csv

In [None]:
dat_dir= '../data/'
sub_info = pd.read_csv(op.join(dat_dir, 'sample_mrishare_subinfo.csv'))
sub_info.head()

In [None]:
n_subjects = len(sub_info)
n_subjects

### 2. Create FSGD input file and associated contrast files.

This can be done with any text editor, but here we will do it with python.

In [None]:
# Create working dir for Freesurfer SBM and store the fsgd file there.

fs_wd = '/home/padawan/fs_sbm'
design1_wd = op.join(fs_wd, 'MyDesign1')
design2_wd = op.join(fs_wd, 'MyDesign2')
os.makedirs(design1_wd, exist_ok=True)
os.makedirs(design2_wd, exist_ok=True)

To be able to look at a meaningful intercept (i.e. mean CT/CSA across groups at mean age/score), let's create demeaned versions of the continuous variables.

In [None]:
sub_info['Age_c'] = sub_info.Age - sub_info.Age.mean()
sub_info['Score_c'] = sub_info.Score - sub_info.Score.mean()

In [None]:
sub_info.head()

In [None]:
fsgd_lines = ['GroupDescriptorFile 1', 'Title SBMtest', 'Class F', 'Class M']

# another line should contain 'Variables', then name of your continuous variables
group_var = ['Sex']
cont_vars = ['Age_c', 'Score_c']

another_line_list = ['Variables'] + cont_vars
another_line = ' '.join(another_line_list)
fsgd_lines.append(another_line)

In [None]:
# Now we grab columns that contain id and all the variables from the subinfo DF.

fsgd_df = sub_info[['mrishare_id'] + group_var + cont_vars]

# We also need to have an extra column that simply says 'Input'
fsgd_df['Input'] = 'Input'

# Reorder the df
fsgd_df = fsgd_df[['Input', 'mrishare_id'] + group_var + cont_vars]

In [None]:
# Finally we cerate a text file and save these info

fsgd_file = op.join(design1_wd, 'SBMtest.fsgd')
with open(fsgd_file, 'w') as f:
    for line in fsgd_lines: # First write the lines
        f.write(line + '\n')
    fsgd_df.to_csv(f, header=False, index=False, sep=' ') # Then add the DF without header

You can open the file to make sure it looks good. Now, let's also create some contrats you may be interested in.

In [None]:
contrasts = {
             'group.intercept': [0.5, 0.5, 0, 0, 0, 0], # Does mean of group intercepts differ from 0?
             'group.diff': [1, -1, 0, 0, 0, 0], # Is there a sex diff bet group intercept after correcting for age and cognitive score?
             'group-x-age': [0, 0, 1, -1, 0, 0], # Is there a difference bet group in the effect of age?
             'group-x-score': [0, 0, 0, 0, 1, -1], # Is there a difference bet group in the effect of cognitive score?
             'FM-age': [0, 0, 0.5, 0.5, 0, 0], # Is there any average age effect across sex after correcting for cognitive score?
             'FM-score': [0, 0, 0, 0, 0.5, 0.5], # Is there any average score effect across sex after correcting for age?
            }

In [None]:
# Save each contrast as mtx txt file

# also keep contrast file names for a later use
cont_files = []

for contrast_name, contrast_list in contrasts.items():
    contrast_file = op.join(design1_wd, '{}.mtx'.format(contrast_name))
    with open(contrast_file, 'w') as f:
        f.write(' '.join(str(val) for val in contrast_list))
        
    cont_files.append(contrast_file)

### 3. Create Design mat input file

But let's say you know from prior studies/analyses that there is no Age by Sex interaction onthe CT or CSA values. But you still want to test the interaction between Sex and cognitive score. Although you can look at 'group-x-score' contrast above (and I suspect most people simply do that), technically it's more appropreate to look at this effect in a model that does not include Age by Sex terms. 

To test such model, you have to create design matrix file manually and skip FSGD. But it's not very difficult to do so.

In [None]:
# First You need two columns of 1 and 0 that represent each sex

sex_M = np.array(sub_info.Sex.values == 'M')
sex_F = np.array(sub_info.Sex.values == 'F')

# Then you need one column for Age
age = sub_info.Age.values

# The last two columns are cogniive scores, but one for male and another for female
score_M = np.multiply(sub_info.Score, sex_M)
score_F = np.multiply(sub_info.Score, sex_F)

# then finally put them in one array
design_arr = np.vstack((sex_M, sex_F, age, score_M, score_F)).T
design_arr.shape

This may be saved as txt, but Freesurfer official documentation specifies matlab mat file format, so save in this format using scipy io.



In [None]:
import scipy.io as sio

In [None]:
design_mat = {'X': design_arr}
design_file = op.join(design2_wd, 'SBM_test2.mat')
sio.savemat(design_file, design_mat)

Can you create contrasts that should be used with the design you just created above? 
You don't have to include aevery possible contrast that can be tested. Just make sure to include contrast testing for the presence of Sex and Score interaction.


In [None]:
contrasts2 = {}

In [None]:
# Save each contrast as mtx txt file and save these filenames
cont_files2 = []

for contrast_name, contrast_list in contrasts.items():
    contrast_file = op.join(design2_wd, '{}.mtx'.format(contrast_name))
    with open(contrast_file, 'w') as f:
        f.write(' '.join(str(val) for val in contrast_list))
        
    cont_files2.append(contrast_file)

Analyse data
------------

### 1. Prepare CT/CSA data

To run GLM analysis with Freesurfer, you first have to assemble your CT/CSA data for each subject using mris_preproc.

Here I will use nipype interface to demonstrate, but you can check the cmdline to see equivalent command you would use if running directly in terminal.


In [None]:
from nipype.interfaces.freesurfer.model import MRISPreproc

In [None]:
# For many Freesurfer commands, you need to specify $SUBJECTS_DIR where you do all your freesurfer analysis.
# To use pre-computed FS data from example subjects, we provide the following path

fs_subjects_dir = '/data/rw_eleves/Cajal-Morphometry2019/derived_mrishare/freesurfer/'

In [None]:
lhCTpreproc = MRISPreproc()
lhCTpreproc.inputs.surf_measure = 'thickness'
lhCTpreproc.inputs.subjects_dir = fs_subjects_dir
lhCTpreproc.inputs.target = 'fsaevrage'
lhCTpreproc.inputs.hemi = 'lh'
lhCTpreproc.inputs.out_file = op.join(fs_wd, 'stacked.lh.thickness.00.mgh')
lhCTpreproc.inputs.subjects = list(sub_info.mrishare_id.values)
lhCTpreproc.cmdline

In [None]:
lhCTpreproc.run()

This command resamples each subject's left hemisphere thickness data to fsaverage. The output is a stacked thickness data in fsaverage space for the specified subjects.

Note that you can specify the subjects either by giving the list of subjects as above, or from fsgd file (fsgd_file input in nipype or --fsgd in cmdline), or from a file containing a list of subjects (subject_file input in nipype or --f in cmdline) 

Next, you need to smooth the data to improve the robustness of statistical behavior, using mri_surf2surf.

In [None]:
from nipype.interfaces.freesurfer.utils import SurfaceSmooth

In [None]:
lhCTsmooth = SurfaceSmooth()
lhCTsmooth.inputs.in_file = op.join(fs_wd, 'stacked.lh.thickness.00.mgh')
lhCTsmooth.inputs.subject_id = 'fsaverage'
lhCTsmooth.inputs.hemi = 'lh'
lhCTsmooth.inputs.subjects_dir = fs_subjects_dir
lhCTsmooth.inputs.fwhm = 10.0
lhCTsmooth.inputs.cortex = True
lhCTsmooth.inputs.out_file = op.join(fs_wd, 'stacked.lh.thickness.10.mgh')
lhCTsmooth.cmdline

In [None]:
# run it
lhCTsmooth.run()

This smooths each subject's resampled data by 10mm FWHM.
"--cortex" means only smooth areas in cortex (exclude medial wall).

### 2. Fit GLM 

Now you have the image data ready. So we will use mri_glmfit to fit the model, first using the FSGD file as input, then we will try with design input.

In [None]:
from nipype.interfaces.freesurfer.model import GLMFit

In [None]:
# FSGD input example 

lhSBMglmfit1 = GLMFit()
lhSBMglmfit1.inputs.subjects_dir = fs_subjects_dir
lhSBMglmfit1.inputs.surf = True
lhSBMglmfit1.inputs.subject_id = target_atlas
lhSBMglmfit1.inputs.hemi = 'lh'
lhSBMglmfit1.inputs.cortex = True
lhSBMglmfit1.inputs.fsgd = (fsgd_file, 'dods')
lhSBMglmfit1.inputs.contrast = cont_files
lhSBMglmfit1.inputs.glmdir = op.join(design1_wd, 'glm')
lhSBMglmfit1.cmdline

In [None]:
# run it
lhSBMglmfit1.run()

In [None]:
# desing mat input example

lhSBMglmfit2 = GLMFit()
lhSBMglmfit2.inputs.subjects_dir = fs_subjects_dir
lhSBMglmfit2.inputs.surf = True
lhSBMglmfit2.inputs.subject_id = target_atlas
lhSBMglmfit2.inputs.hemi = 'lh'
lhSBMglmfit2.inputs.cortex = True
lhSBMglmfit2.inputs.design = design_file
lhSBMglmfit2.inputs.contrast = cont_files
lhSBMglmfit2.inputs.glmdir = op.join(design2_wd, 'glm')
lhSBMglmfit2.cmdline

In [None]:
# run it
lhSBMglmfit2.run()

When they finish running, checkout the output directory to see files that were generated.



Let's try visualzing one of the result p-val map using nilearn plotting.



In [None]:
from nilearn import plotting

In [None]:
age_sig_map = op.join(design1_wd, 'glm', 'FM-age', 'sig.mgh')


Since the interactive surface plotting in nilearn does not seem to suppor mgh format yet, we will first read this file using nibabel, and will pass image data as an array directly.

In [None]:
import nibabel.freesurfer.mghformat as fsmgh

In [None]:
age_sig_map_im = fsmgh.load(age_sig_map)
age_sig_map_dat = age_sig_map_im.get_data()

In [None]:
age_sig_map_dat.shape

In addition, when passing a numpy array, the plot_surf function of nilearn expects the surface map to have a shape similar to morphometry data (files that end with .thickness, .curv, .sulc in Freesurfer). To demonstrate what this means, here we will load lh.sulc of the fsaverage and look at the shape.

In [None]:
fsaverage_dir = op.join(fs_subjects_dir, 'fsaverage')
fsaverage_lh_infl = op.join(fsaverage_dir, 'surf', 'lh.inflated')
fsaverage_lh_sulc =  op.join(fsaverage_dir, 'surf', 'lh.sulc')

In [None]:
import nibabel.freesurfer.io as fsio

In [None]:
lh_sulc_dat = fsio.read_morph_data(fsaverage_lh_sulc)
lh_sulc_dat.shape

You see that they both contain data for 163842 vertices, but the array shape is not the same in thesetwo forms of data. To use plot_surf function, we have to reshape the sig map data by stripping the extra dimensions like below.

In [None]:
age_sig_map_dat_rs = np.reshape(age_sig_map_dat, (age_sig_map_dat.shape[0],))
age_sig_map_dat_rs.shape

To overlay this with nilearn plotting, we just need to provide background surface image (fsaverage inflated, pial surfaces).

In [None]:
plotting.view_surf(fsaverage_lh_infl, age_sig_map_dat_rs)

This map is unthresholded log p map (i.e. p < 0.01 = log p > 2). Let's try thresholding at 2 so that we can look at the map at this threhold. When thresholding, it's better to provide a background map for shading on the inflated brain (usually sulc image)

In [None]:
# now plot 
plotting.view_surf(fsaverage_lh_infl, age_sig_map_dat_rs, threshold=2, 
                   bg_map=fsaverage_lh_sulc)

Try using this of freeview to check your results for different contrasts.

### Multiple Comparison correction

To correct for multiple comparisons, we can use mri_fdr for FDR correction, and mri_glmfit-sim for permutation. Unfortunately, nipype has not wrapped either of these function. Although mri_glmfit you used earlier seems to have a functionality for simulation analysis, the latest recommendation is to use mri_glmfit-sim permutation testing (you can read more about this under resources/multiplecomparisons.pdf).

For this practical, we will simply use the commandline directly, either from this notebook or directly in terminal. But it's not all that difficult to wrap a missing function in nipype if you want to use the function within the context of a pipeline. As an example, I created a custom interface for mri_fdr in ginnipi_tools/interfaces/custom.py, so oyu can take a look at it to see how it can be done. 

In [None]:
! mri_fdr

Here, your main input is sig.mgh from your contrast of interest. It can accept more than one input file, typically because you might usually want to correct for the same analysis in both R and L hemispheres. For the practical, can you try to correct for one hemisphere you did above?

Once it finishes running, check the output and try to visualize it using a viewer of your choice.

In [None]:
! mri_glmfit-sim

This takes glm dir of your mri_glmfir as the input to perform permutations for contrasts found in that directory. To use the recommended permutation method, you specify --perm and the number of permutation as in the example below.

In [None]:
design1_glm = op.join(design1_wd, 'glm')

!mri_glmfit-sim \
  --glmdir {design1_glm} \
  --perm 1000 4.0 abs \
  --cwp  0.05\
  --2spaces \
  --bg 1

Agein, once it finishes running, check the output using a viewer of your choice. 