# Surface Cortical Thickness W-Mapping Pipeline
### Authors: William Drew
### Last updated: October 13, 2022
***
# Introduction
This notebook will guide you through our Surface-space Functional Network Mapping Pipeline. This pipeline will outline the following steps:
1. FreeSurfer recon-all
2. Creating a GLM for cortical thickness
3. Creating W-maps (in fsaverage5 space)
4. Performing Surface network mapping with the Preprocessing notebook

#### The surface-space functional network mapping pipeline requires certain covariate data, namely **age(years)** and **sex** for each of your subjects.

*** 
# Step 0: Where is your project be located?

## Please set a project folder below

In [None]:
# Set your project folder here
project_folder = "/data/nimlab/new_ADNI/william_analysis/preprocessing_bug/test_surface_pipeline"
dataset_name = "surface_test"

############################## DO NOT EDIT BELOW #####################################################
import os, shutil
import numpy as np
import pandas as pd
import nibabel as nib

from glob import glob
from termcolor import cprint
from tqdm import tqdm, trange
from natsort import natsorted
from nimlab import surface as nimsurf

project_folder = os.path.join(os.path.abspath(project_folder), dataset_name)
os.makedirs(project_folder, exist_ok=True)
os.makedirs(os.path.join(project_folder, "config"), exist_ok=True)
tmpdir = os.path.join(project_folder, "tmp")
os.makedirs(tmpdir, exist_ok=True)
os.makedirs(os.path.join(project_folder, "scripts"), exist_ok=True)
os.makedirs(os.path.join(project_folder, "scripts","recon_all"), exist_ok=True)
freesurfer_subjects_folder = os.path.join(project_folder, "freesurfer_RA_subjects")
os.makedirs(freesurfer_subjects_folder, exist_ok=True)
nimsurf.copy_fsaverage("fsaverage5", freesurfer_subjects_folder)
outdir = os.path.join(project_folder, "w_maps")
os.makedirs(outdir, exist_ok=True)

***
# Step 1: Prepare your Data

## If you have DICOMs, convert them to Nifti

- If you have DICOMs instead of `.nii` or `.nii.gz` files, convert the DICOMs to Nifti using `dcm2niix`

# Create a Data CSV

## Please create a CSV file with the following data:
- In Column 1, titled **`subid`**, enter your subject identifiers.
- In Column 2, titled **`t1`**, enter full filepaths to your subjects' T1 MRI Nifti files.
- In Column 3, titled **`age`**, enter the ages(years) of your subjects. It is OK (and encouraged) if you have decimals to account for months as well. 
- In Column 4, titled **`sex`**, enter the sex (M/F) of your subjects. This column must contain only **`M`** or **`F`**. 
- In Column 5, titled **`field`**, enter the MRI field strength for the captured imaging. This column must contain either only all **`1.5`** or all **`3`**. You **cannot** mix field strengths.
- In Column 6, titled **`control`**, enter a **`1`** if you wish to build a GLM using the subject. Otherwise, enter a **`0`**.
An example of such a formatted file is below:
***
```
subid,t1,age,sex,field,control
100890,/PHShome/wd957/test_subjects/100890.nii,67.9,M,3,1
101039,/PHShome/wd957/test_subjects/101039.nii,72.7,M,3,1
101747,/PHShome/wd957/test_subjects/101747.nii,71,F,3,1
106126,/PHShome/wd957/test_subjects/106126.nii,66.9,F,3,1
106127,/PHShome/wd957/test_subjects/106127.nii,69.5,F,3,0
```
***
Save the CSV to disk and enter its file path below to `input_csv`.

In [None]:
# Set your input CSV here
input_csv = "/PHShome/wd957/test_subjects/test_t1_subjects.csv"


############################## DO NOT EDIT BELOW #####################################################
copy_csv = os.path.join(project_folder, "subjects.csv")
shutil.copyfile(input_csv, os.path.join(project_folder, "subjects.csv"))
subject_df = pd.read_csv(copy_csv, dtype={"subid":str, "t1": str, "age": float, "sex": str, "field": float})
for _, r in subject_df.iterrows():
    if not os.path.exists(r["t1"]):
        raise FileNotFoundError(f"{r['t1']} doesn't exist!")
    if r["age"]<0:
        raise ValueError(f"Age of {os.path.basename(r['t1'])} is invalid!")
    if r["sex"] not in ["M", "F"]:
        raise ValueError(f"Sex of {os.path.basename(r['t1'])} is invalid!")
    if r["field"] not in [1.5, 3]:
        raise ValueError(f"Field strength of {os.path.basename(r['t1'])} is invalid!")
if len(subject_df["field"].unique()) != 1:
    raise ValueError(f"Only a single field strength is allowed. Either use only 1.5T images or 3T images!")
if len(subject_df["subid"].unique()) != len(subject_df):
    raise ValueError(f"Duplicate Subject ID detected!")
else:
    cprint(f"I found {len(subject_df)} source T1 images!\n", "green", attrs=["bold"])
    print("----- Sample rows of data csv -----\n")
    print(subject_df.head())

***
# Step 2: Run FreeSurfer recon-all

If you have just a couple of subjects, it's OK to run this on the ERIS cluster. However, if you're running hundreds of subjects, ideally you will have access to a bigger cluster such as [Harvard FAS's Cannon cluster](https://www.rc.fas.harvard.edu/) or [Harvard Medical School's O2 cluster](https://it.hms.harvard.edu/our-services/research-computing/services/high-performance-computing) so that the recon-all can finish faster.

### The below cell will create a script that you can run on the ERIS cluster to submit individual jobs for each subject.

In [None]:
############################## DO NOT EDIT BELOW #####################################################
recon_all_settings = ['#!/bin/bash',
'#BSUB -q normal',
'#BSUB -n 1',
'#BSUB -M 4000',         
'#BSUB -R "rusage[mem=4000]"',
f'export SUBJECTS_DIR={freesurfer_subjects_folder}']

recon_all_scripts = []
for _, r in subject_df.iterrows():
    subject_id, t1_path = r[["subid","t1"]]
    script_path = os.path.join(project_folder, "scripts","recon_all", f"{subject_id}.sh")
    recon_all_scripts.append(script_path)
    with open(script_path, "w") as fp:
        for item in recon_all_settings:
            fp.write("%s\n" % item)
        recon_str = f"recon-all -s {subject_id} -i {t1_path} -all"
        if r["field"]==3:
            recon_str += " -3T"
        fp.write(recon_str)
launch_recon_all_script = os.path.join(project_folder, "scripts", "launch_recon_all.sh")
with open(launch_recon_all_script, "w") as fp:
    fp.write("#!/bin/bash\n")
    for s in recon_all_scripts:
        fp.write(f"bsub < {s};\nsleep 0.1;\n")
os.chmod(launch_recon_all_script, 0o770)

cprint(f"To run FreeSurfer recon-all on the ERIS cluster, run the following on a login node:\n\n.{launch_recon_all_script}", "green", attrs=['bold'])

### If you want to run recon-all on a different cluster, you will need to do a couple of things:
** Ask William for help with this **

1. Copy your T1 weighted MRI Nifti files to the external cluster.
2. Edit the `scripts/launch_recon_all.sh` script to match whatever job scheduler the external cluster is using. 
3. Edit the recon-all job scripts in `scripts/recon_all/` to match whatever job scheduler the external cluster is using.
4. Copy the `scripts` folder to the external cluster.
5. Run the `scripts/launch_recon_all.sh` script on the external cluster and wait for all your jobs to finish.
6. Download the contents of the FreeSurfer `SUBJECTS_DIR` folder from the exernal cluster to the `freesurfer_RA_subjects` folder in your project directory on the ERIS cluster.



## Check FreeSurfer recon-all progress

In [None]:
############################## DO NOT EDIT BELOW #####################################################
success_counter = 0
error_counter = 0
pending_counter = 0
running_counter = 0
success_subjects = []
error_subjects = []
for _, r in subject_df.iterrows():
    log_file = f"{freesurfer_subjects_folder}/{r['subid']}/scripts/recon-all.log"
    if os.path.exists(log_file):
        with open(log_file, 'r') as f:
            last_line = f.readlines()[-1]
            if "finished without error" in last_line:
                success_counter += 1
                success_subjects.append(r["subid"])
            elif "exited with ERRORS" in last_line:
                error_subjects.append(r["subid"])
                error_counter += 1
            else:
                running_counter += 1
    else:
        pending_counter += 1
if len(success_subjects)>0:
    pd.Series(success_subjects, dtype=object).to_csv(os.path.join(project_folder, "recon_all_success.csv"), index=False, header=None)
if len(error_subjects)>0:
    pd.Series(error_subjects, dtype=object).to_csv(os.path.join(project_folder, "recon_all_error.csv"), index=False, header=None)
cprint(f"Total {len(subject_df)}", "blue")
cprint(f"Pending {pending_counter}", "magenta")
cprint(f"Running {running_counter}", "magenta")
print("---------------------------")
cprint(f"Success {success_counter}", "green")
cprint(f"Error {error_counter}", "red")
print("---------------------------")
recon_all_complete=False

if pending_counter==0 and running_counter==0:
    recon_all_complete=True
    cprint(f"FreeSurfer recon-all complete! Please check recon_all_success.csv and recon_all_error.csv in your project folder for successful/errored subjects!", "green", attrs=["bold"])
else:
    raise RuntimeError(f"FreeSurfer recon-all not yet complete! Please check back again later.")

***
# Step 3: Load a Cortical Thickness GLM

## IMPORTANT: Only run **ONE** of the following two sets of cells depending on whether you already have a GLM you want to use:

## Option 1: If you already have a GLM you want to use...

1. Set the path to the GLM directory and GLM name below. This should be a folder containing a `lh` folder and a `rh` folder.
2. Set the FWHM smoothing kernel of the GLM. 

In [None]:
# Set the path to the GLM here
glm_dir = ""
model = ""

# DO NOT USE THIS EXAMPLE (THIS IS A FSAVERAGE7 MODEL AND WILL NOT WORK WITH OUR FSAVERAGE5 FILES)
# glm_dir = "/data/nimlab/Darby_Data/models/"
# model = "1cn"

# Set the FWHM smoothing kernel of this model, if known
fwhm_kernel = 10

############################## DO NOT EDIT BELOW #####################################################
glm = ""
if recon_all_complete == True:
    check_paths = ["lh/b0000.nii",
                   "lh/b0001.nii",
                   "lh/b0002.nii",
                   "lh/rstd.mgh",
                   "lh/y.fsgd",
                   "rh/b0000.nii",
                   "rh/b0001.nii",
                   "rh/b0002.nii",
                   "rh/rstd.mgh",
                   "rh/y.fsgd",]
    if np.all([os.path.exists(os.path.join(glm_dir, model, p)) for p in check_paths]):
        glm = os.path.join(glm_dir, model)
        cprint(f"GLM at {glm} loaded!", "green", attrs=["bold"])
        fwhm = fwhm_kernel
    else:
        raise RuntimeError(f"GLM incomplete or not found at {os.path.join(glm_dir, model)}!")
    
else:
    raise RuntimeError("FreeSurfer recon-all is incomplete! Please wait till FreeSurfer recon-all is complete before continuing!")

## Option 2 (Part A): If you want to create a GLM with the control subjects in your dataset...
### Create FSGD file

In [None]:
############################## DO NOT EDIT BELOW #####################################################
glm_dir = os.path.join(project_folder, "models")
glms = []
fsgds = []
os.makedirs(glm_dir, exist_ok=True)
if recon_all_complete == True:
    successful_subids = list(pd.read_csv(os.path.join(project_folder, "recon_all_success.csv"), header = None, dtype=str)[0])
    field = subject_df['field'].unique()[0]
    filtered_subject_df = subject_df[subject_df['control'] == 1]
    filtered_subject_df = filtered_subject_df.replace({'sex': {"M": "Male", "F": "Female"}})
    glm_name = f"{dataset_name}_{field}T"
    glm_model_dir = os.path.join(glm_dir, glm_name)
    os.makedirs(glm_model_dir, exist_ok = True)
    fsgd_fname = os.path.join(project_folder, "config", glm_name+".fsgd")
    print(f"GLM ({glm_name}) consists of {len(filtered_subject_df)} control subjects with {field}T T1 imaging.\nIts FSGD file is located at {fsgd_fname}")
    print("=======================================================================")
    if len(filtered_subject_df) < 20:
        cprint("WARNING: This GLM will be built from fewer than 20 subjects!", "red", attrs=["bold"])

    male_count = 0
    female_count = 0
    with open(fsgd_fname, "w") as fp:
        fp.write("GroupDescriptorFile 1\n")
        fp.write(f"Title {glm_name}\n")
        fp.write("Class Male\n")
        fp.write("Class Female\n")
        fp.write("Variables Age\n")
        for i, row in filtered_subject_df.iterrows():
            if row['subid'] in successful_subids:
                fp.write(f"Input {row['subid']} {row['sex']} {row['age']}\n")
                if row['sex'] == "Male":
                    male_count += 1
                elif row['sex'] == "Female":
                    female_count += 1
    if male_count == 0:
        cprint("WARNING: There are 0 Male subjects in this GLM!", "red", attrs=["bold"])
    else:
        cprint(f"There are {male_count} Male subjects in this GLM.", "green", attrs=["bold"])
    if female_count == 0:
        cprint("WARNING: There are 0 Female subjects in this GLM!", "red", attrs=["bold"])
    else:
        cprint(f"There are {female_count} Female subjects in this GLM.", "green", attrs=["bold"])
    cprint(f"Ages span from {np.min(filtered_subject_df['age'])} to {np.max(filtered_subject_df['age'])}", "green", attrs=["bold"])
        
else:
    raise RuntimeError("FreeSurfer recon-all is incomplete! Please wait till FreeSurfer recon-all is complete before continuing!")

## Option 2 (Part B): Create GLM
- Set your FWHM smoothing kernel size (in mm) below 

In [None]:
# GLM options; Select Smoothing Kernel
fwhm_kernel = 10

############################## DO NOT EDIT BELOW #####################################################
if recon_all_complete == True:
    fwhm = fwhm_kernel
    nimsurf.make_cortical_thickness_glm(fsgd_fname, glm_model_dir, freesurfer_subjects_folder, tmpdir, fwhm)
    glm = glm_model_dir
    cprint(f"GLM at {glm} exists!", "green", attrs=["bold"])
else:
    raise RuntimeError("FreeSurfer recon-all is incomplete! Please wait till FreeSurfer recon-all is complete before continuing!")

# Step 4: W-Mapping

In [None]:
############################## DO NOT EDIT BELOW #####################################################
if recon_all_complete == True and glm:
    cprint(f"Using GLM Model located at: {glm}", "green", attrs=["bold"])
    filtered_subject_df = subject_df.replace({'sex': {"M": "Male", "F": "Female"}})
    wmap_config = os.path.join(project_folder, "config", "wmap_config.txt")
    with open(wmap_config, "w") as fp:
        for i, row in filtered_subject_df.iterrows():
            if row['subid'] in successful_subids:
                fp.write(f"{row['subid']} {row['sex']} {row['age']}\n")
    nimsurf.make_cortical_thickness_wmap(wmap_config, freesurfer_subjects_folder, tmpdir, glm, outdir, fwhm)
    wmap_complete = True
elif recon_all_complete == False:
    raise RuntimeError("FreeSurfer recon-all is incomplete! Please wait till FreeSurfer recon-all is complete before continuing!")
elif not glm:
    raise RuntimeError("No GLM is selected!")

# Step 5 (Optional): Threshold/Binarize your W-Maps
**Instructions**
1. If you wish to binarize your images, set `binarize = True`. If you just want to threshold, set `binarize = False`.
2. Set the level to threshold or binarize at with `threshold`.
3. Set the threshold/binarization direction with `direction`.

    - If direction is `twosided`, will **threshold/binarize outside** the threshold level.
    - Example: if threshold is 1 and direction is "twosided", then values **between** -1 and +1 will be zeroed.
    ***
    - If direction is `less`, will **zero out values greater than** the threshold level. (keeping values "less" than the threshold level)
    - Example: if threshold is -1 and direction is "less", then values **greater** than -1 will be zeroed.
    ***
    - If direction is `greater`, will **zero out values less than the** threshold level. (keeping values "greater" than the threshold level)
    - Example: if threshold is +1 and direction is "greater", then values **less** than +1 will be zeroed.

In [None]:
binarize = True
threshold = -2
direction = "less"


############################## DO NOT EDIT BELOW #####################################################
if recon_all_complete == True and glm and wmap_complete:
    wmaps = glob(outdir+"/*.gii")
    wmap_dir = "w_maps"
    if binarize:
        wmap_dir += "_bin"
    else:
        wmap_dir += "_thr"
    wmap_dir += f"_{str(threshold)}"
    wmap_dir += f"_{direction}"
    wmap_dir = os.path.join(project_folder, wmap_dir)
    result_string = ""
    if binarize:
        result_string += "W-Maps binarized at "
    else:
        result_string += "W-Maps thresholded at "
    result_string += f"{str(threshold)} "
    result_string += f"({direction}) "
    result_string += f"are located at {wmap_dir}"
    os.makedirs(wmap_dir, exist_ok=True)
    for wmap in tqdm(wmaps):
        fname = os.path.basename(wmap)
        gifti = nib.load(wmap)
        nimsurf.threshold(gifti, threshold, direction, binarize, replace_val=0.0).to_filename(os.path.join(wmap_dir, fname))
    cprint(result_string, "green", attrs=["bold"])
elif recon_all_complete == False:
    raise RuntimeError("FreeSurfer recon-all is incomplete! Please wait till FreeSurfer recon-all is complete before continuing!")
elif not glm:
    raise RuntimeError("No GLM is selected!")
elif not wmap_complete:
    raise RuntimeError("W-Mapping not complete!")

# Step 6: Surface Functional Network Mapping
## Use the Preprocessing notebook on your W-Maps to calculate surface functional connectivity

# Step 7: Clean Up

In [None]:
shutil.rmtree("/data/nimlab/new_ADNI/william_analysis/preprocessing_bug/test_surface_pipeline/tmp")