In [None]:
%matplotlib inline

import os
import os.path as op
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Day 3. Quality control of the sample MRiShare dataset

One of the most important yet least-standardized procedure is the **quality control** of your data. Every lab/researchers have their own method to check the quality of the acquired data and/or processing to make sure they are measuring what they intend to measure. While there is often no clear guideline about what should be checked, since it depends on the modality and processing, it can be classified into the two broad categories.

1. QC on raw acquired data
    * Does the image have intended FOV?
    * Is there significant artefact/noise?
    * Are there any abnormalities in the brain (incidental findings)?
    --> If any problem is found, either exclude subject/data or keep them and see.
    
    
2. QC on processed data
    * Did processing go as intended?
    e.g. Skull-stripping, registration, tissue segmentation...
    --> If any problem is found, either exclude subject/data or modify the processing steps to resolve the issue.
    
The most important thing is to **look at your data** systematically, and save the results of any QC check you do in a spreadsheet.

In addition to checking your data visually one by one, there are various QC metrics you can collect to find any outliers. For morphometric studies, the morphometric values themselves should be checked for the presence of any outliers. When you find outliers, you can go back to the image with outlier values to decide whether something went wrong in the processing or not.

### Systematically checking the individual images

You can use the viewer of your choice to check all the images you have (both the raw images and after some processing), and that's what many labs do. But to make it more efficient, you can include a node in your pipeline that will take an image as input and produce a picture (saved as png file, for example) for every subject and for each processing that should be checked.

In ABACI pipeline for MRiShare, we have many such nodes, and we also have a custom script to generate a web html pages that gather generated png files for viewing.

Since you learned to create a basic pipeline yesterday, let's start by creating a simple workflow that does the following:

1. Coregister FLAIR image to T1
2. Skullstrip FLAIR using BET
3. Apply mask to T1
4. Use FSL FAST (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FAST, nipype documentation https://nipype.readthedocs.io/en/latest/interfaces/generated/interfaces.fsl/preprocess.html#fast)

5. brainmask QC
6. coregistration QC
7. tissue segmentation QC

8. Datasink to collect important outputs

You can use custom interfaces I created in ginnipi_tools I import below to create the 3 QC nodes.


In [None]:
from ginnipi_tools.interfaces.custom import MaskOverlayQCplot, CoregQC, VbmQCplot

In [None]:
# now your turn to create your workflow!
from nipype.pipeline.engine import Workflow, Node
from nipype.interfaces.utility import IdentityInterface
from nipype.interfaces.io import DataGrabber, DataSink
from nipype.interfaces.fsl import BET, ApplyMask, FLIRT, FAST

### Freesurfer QC

One useful tool, in particular for checking **Freesurfer** processed results, is called visualQC (https://raamana.github.io/visualqc/). We will try this out on the processed Freesurfer data for the selected MRiShare subjects. 

In [None]:
fs_dat_dir = "/data/rw_eleves/Cajal-Morphometry2019/derived_mrishare/freesurfer/"
sample_dat_dir = '../data/'

This tool has to be used outside of the notebook to be able to use its interacive interface properly. Open a terminal, go to Cajal2019_morphometry folder, then type the following:

```bash
visualqc_freesurfer -f /data/rw_eleves/Cajal-Morphometry2019/copy/freesurfer/ -o visQCtest -old
```

I have run this for you once already so that you see the output folder 'visQCtest'. The first time you run it, it creates and saves a series of snapshot useful for reviewing the freesurfer output for every subject in the freesurfer subjects dir you specified.

You can also specify a specific set of subjects to review by providing a text file with subject id like below.

```bash
visualqc_freesurfer -i data/simple_sublist.txt -f /data/rw_eleves/Cajal-Morphometry2019/copy/freesurfer/ -o visQCtest -old
```

Once it creates the necessary snapshots, executing the same command will trigger the interactive viewer where you check individual images. You need to rate at least one subject to be able to press 'Quit' to exit the interface.

There are several options for what/how you can review. Try out a few examples from https://raamana.github.io/visualqc/examples_freesurfer.html.

### Checking the distribution and outliers for QC and other metrics

Any metric you collect as part of the analysis should be checked for any outliers. In addition, there are many other QC metrics proposed for structural and functional image processing, as listed here (http://preprocessed-connectomes-project.org/quality-assessment-protocol/).

Here we will use both the morphometric data and some of the selected QC metric we computed for MRiShare subjects to see if there is any problematic subjects.

**QC metrics**

1) Tissue SNR

    * computed as mean/sd in each tissue in each compartments

2) Tissue CNR

    * for T1 stats, WMGM (WM mean/GM mean)and GMCSF (GM mean/CSF mean)
    * for T2flair stats, GMWM (GM mean/ WM mean) and GMCSF (GM mean/CSF mean)
    
3) Coregistration cost function

**Morphometrics**

1) SPM GM, WM, CSF volume

2) Freesurfer global metrics

In [None]:
qc_dat = pd.read_csv(op.join(sample_dat_dir, 'sample_qc.csv'))
qc_dat.head()

In [None]:
morph_dat = pd.read_csv(op.join(sample_dat_dir, 'sample_mrishare_morphometry.csv'))
morph_dat.head()

There are several python visualization packages that allows you to interactively inspect your data.

Perhaps one of the most easiest one to use is **plotly_express** (https://medium.com/plotly/introducing-plotly-express-808df010143d), as demonstrated below.

In [None]:
import plotly.express as px

In [None]:
fig = px.scatter(qc_dat, x="SPM_GM_hemiR_SNR", y="SPM_GM_hemiL_SNR", hover_name="mrishare_id", marginal_y="violin",
           marginal_x="box", trendline="ols")
fig.show()

In [None]:
hippRL_vols = morph_dat[['mrishare_id', 'FS6_gm_R_hippo', 'FS6_gm_L_hippo']]
hippRL_vols.set_index('mrishare_id', inplace = True)
stacked_hippRL = hippRL_vols.stack()
stacked_hippRL = stacked_hippRL.reset_index()
stacked_hippRL.columns = ['mrishare_id', 'measure', 'volume']
stacked_hippRL.head()

In [None]:
fig = px.violin(stacked_hippRL, y="volume", color="measure", box=True, points="all", hover_data=stacked_hippRL.columns)
fig.show()


The generated files can be saved as a web html or image.

In [None]:
import plotly.io as pio

In [None]:
pio.write_html(fig, 'FS6_hipp_dist.html')

Another nice python package for interactive plotting is **bokeh** (https://bokeh.pydata.org/en/latest/index.html). But it's slightly more involved, so here we share you some functions we created in our lab to have two types of plots we use for QC check:

1. Distribution plot to check for outliers
2. Pairplots to check for asymmetry

Each type of plot can be created with *plot_hist_box* and *pairplots_by_region* functions in ginnipi_tools package. 

Here is the usage example.

In [None]:
from ginnipi_tools.toolbox.plotting_tools import plot_hist_box, pairplots_by_region

First we just need to set the column with subject id as "index" in DF.

In [None]:
morph_dat2 = morph_dat.set_index('mrishare_id')

We use the first function to summarize the SPM volumes, and save as 'SPM_vol.html'

In [None]:
plot_hist_box(morph_dat2,
              measure_name='volume',
              col_groupname='tissue',
              cols_to_plot=['SPM_GM_Volume','SPM_WM_Volume', 'SPM_CSF_Volume'],
              title='Distribution of SPM volumes',
              out_html='SPM_vol.html')

Next we use the pairplots_by_region function to summarize the asymmetry of hipp volumes and save as 'FS6_hipp_asym.html'.

In [None]:
pairplots_by_region(morph_dat2,
                    measure_name='volume',
                    col1='FS6_gm_R_hippo',
                    col2='FS6_gm_L_hippo',
                    plot_size=(400, 400),
                    bgcolor="white",
                    title='Hippocampal GM R vs L',
                    out_html='FS6_hipp_asym.html')

Rather than saving each plot separately, you can use bokeh layout tools (https://bokeh.pydata.org/en/latest/docs/user_guide/layout.html) to combine multiple plots and save them as one file. Below is some examples.

In [None]:
from bokeh.layouts import gridplot, row, column
from bokeh.models import Div, Spacer
import bokeh.io

In [None]:
dist_plot = plot_hist_box(morph_dat2,
                          measure_name='volume',
                          col_groupname='tissue',
                          cols_to_plot=['SPM_GM_Volume','SPM_WM_Volume', 'SPM_CSF_Volume'],
                          title='Distribution of SPM volumes')
pplot = pairplots_by_region(morph_dat2,
                            measure_name='volume',
                            col1='FS6_gm_R_hippo',
                            col2='FS6_gm_L_hippo',
                            plot_size=(600, 600),
                            bgcolor="white",
                            title='Hippocampal GM R vs L')


In [None]:
combined_plot = column([dist_plot, Spacer(height=100), pplot])
bokeh.io.save(combined_plot,
              'combined_plot.html',
              title='test of combining plots')