In [None]:
%matplotlib inline

import os
import os.path as op
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Day 3. Quality control of the sample MRiShare dataset

One of the most important yet least-standardized procedure is the **quality control** of your data. Every lab/researchers have their own method to check the quality of the acquired data and/or processing to make sure they are measuring what they intend to measure. While there is often no clear guideline about what should be checked, since it depends on the modality and processing, it can be classified into the two broad categories.

1. QC on raw acquired data
    * Does the image have intended FOV?
    * Is there significant artefact/noise?
    * Are there any abnormalities in the brain (incidental findings)?
    --> If any problem is found, either exclude subject/data or keep them and see.
    
    
2. QC on processed data
    * Did processing go as intended?
    e.g. Skull-stripping, registration, tissue segmentation...
    --> If any problem is found, either exclude subject/data or modify the processing steps to resolve the issue.
    
The most important thing is to **look at your data** systematically, and save the results of any QC check you do in a spreadsheet.

In addition to checking your data visually one by one, there are various QC metrics you can collect to find any outliers. For morphometric studies, the morphometric values themselves should be checked for the presence of any outliers. When you find outliers, you can go back to the image with outlier values to decide whether something went wrong in the processing or not.

### Systematically checking the individual images

You can use the viewer of your choice to check all the images you have (both the raw images and after some processing), and that's what many labs do. But to make it more efficient, you can include a node in your pipeline that will take an image as input and produce a picture (saved as png file, for example) for every subject and for each processing that should be checked.

In ABACI pipeline for MRiShare, we have many such nodes, and we also have a custom script to generate a web html pages that gather generated png files for viewing.

One useful tool, in particular for checking **Freesurfer** processed results, is called visualQC (https://raamana.github.io/visualqc/). We will try this out on the processed Freesurfer data for the selected MRiShare subjects. 

In [None]:
!visualqc_freesurfer -i simple_sublist.txt -f /data/analyses/work_in_progress/freesurfer/fsmrishare-flair6.0 -o visQCtest -old

### Checking the distribution and outliers for QC and other metrics

Any metric you collect as part of the analysis should be checked for any outliers. In addition, there are many other QC metrics proposed for structural and functional image processing, as listed here (http://preprocessed-connectomes-project.org/quality-assessment-protocol/).

Here we will use both the morphometric data and some of the selected QC metric we computed for MRiShare subjects to see if there is any problematic subjects.

**QC metrics**

1) Tissue SNR

    * computed as mean/sd in each tissue in each compartments

2) Tissue CNR

    * for T1 stats, WMGM (WM mean/GM mean)and GMCSF (GM mean/CSF mean)
    * for T2flair stats, GMWM (GM mean/ WM mean) and GMCSF (GM mean/CSF mean)
    
3) Coregistration cost function

**Morphometrics**

1) SPM GM, WM, CSF volume

2) Freesurfer global metrics

There are several python visualization packages that allows you to interactively inspect your data.

Perhaps one of the most easiest one to use is **plotly_express** (https://medium.com/plotly/introducing-plotly-express-808df010143d), as demonstrated below.

In [None]:

sample_dat_dir = '../data/'
morph_dat = pd.read_csv(op.join(sample_dat_dir, 'sample_mrishare_morphometry.csv'))

In [None]:
morph_dat.head()

In [None]:
import plotly.express as px

In [None]:
fig = px.scatter(morph_dat, x="SPM_GM_Volume", y="FS6_gm_T_vol", hover_name="mrishare_id", marginal_y="violin",
           marginal_x="box", trendline="ols")
fig.show()

In [None]:
fig = px.violin(morph_dat, y="volume", x="measure", color="measure", box=True, points="all", hover_data=stacked_hippRL.columns)
fig.show()

In [None]:
hippRL_vols = morph_dat[['mrishare_id', 'FS6_gm_R_hippo', 'FS6_gm_L_hippo']]
hippRL_vols.set_index('mrishare_id', inplace = True)
stacked_hippRL = hippRL_vols.stack()
stacked_hippRL = stacked_hippRL.reset_index()
stacked_hippRL.columns = ['mrishare_id', 'measure', 'volume']
stacked_hippRL.head()

In [None]:
fig = px.violin(stacked_hippRL, y="volume", color="measure", box=True, points="all", hover_data=stacked_hippRL.columns)
fig.show()


The generated files can be saved as a web html or image.

In [None]:
import plotly.io as pio

In [None]:
pio.write_html(fig, 'FS6_hipp_dist.html')