# PTSD Mouse Data 

The total data consist of two data sets, measuring responses from 20 different mice at 4 timepoints each: Baseline, Pre-fear exposure, immediately after the Fear exposure, and 9 days later (D9). 10 of these mice are seratonin transporter knockouts (KO) and 10 are wildtype (WT). 

The data are: 
- 79 MRIs. The pre-fear, fear, and D9 images are MN(II) enhanced MRIs (MEMRI). The MEMRI images are used to measure neuronal functioning - (when neurons are active, their uptake of MN(II) is increased?). The baseline images, which are regular fMRI, are 
- 79 measurements of percent time spent in the light by the mice. 

In both data sets, the KO_04_D9 datapoint is missing (hence 79 instead of 80). 

In [1]:
# Import needed packages for analysis
import os

from tqdm import tqdm
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import nibabel as nib

import cfl.util.brain_util as BU
import cfl.util.brain_vis as BV
import cfl.util.fear_mice_functions as fm

# load response data 
Y = pd.read_pickle('Y.pkl')

# MRI Data 
Some facts about the data: 
- the dimensions of the brain box in each image are (124, 200, 82) (2,033,600 voxels total)
- the images are originally in RPS orientation. We flip them to RAS orientation because then they have the same alignment as some other MRIs we've been looking at 

In [3]:
behav_csv = 'PTSD_Data_Share\Behavior_data\PTSD_PerLight.csv'
mri_dir = 'PTSD_Data_Share\MEMRI_data'

# load one image to check out its dimensions
img = BU.load_brain(os.path.join(mri_dir, "PTSD_KO_03_BL.nii"), ori='RPS')
mri_dims = img.shape

In [4]:
# load all the images in RPS orientation 
X, Y_unused = BU.load_data(mri_dir, behav_csv, mri_dims, ori='RPS')

In [5]:
nib_loaded_img = nib.load(os.path.join(mri_dir, "PTSD_KO_03_BL.nii"))
affine = nib_loaded_img.affine

# Load Masks
We were given two masks to fit the the MRIs, a linearly aligned- and non-linearly aligned mask. These masks tell which voxels in the image are part of the brain vs which are empty space. 

The difference between the non-linear and linear mask (from an email from Taylor): "The non-linearly aligned mask is better aligned to the data, however the non-linear "warping" creates artifacts near the surface of the brain that may result in grabbing more undesired non-brain voxels when masking. The linearly aligned mask may not align to the surface of the brain as well and may miss or cut out brain tissue, but may not grab as many non-brain voxels. 

The non-linear mask leaves 531,632 voxels and the linear mask 482,793 voxels unmasked. 

In [7]:
# load the non-linear mask template
nl_mask_path = os.path.join('PTSD_Data_Share/templates\MuseTemplate_nonlinear_mask.nii')
nl_mask = BU.load_brain(nl_mask_path, ori='RPS')
nolin_mask_vec = BU.flatten(nl_mask)

# load the linear mask template
l_mask_path = os.path.join('PTSD_Data_Share/templates\MuseTemplate_linear_mask.nii')
l_mask = BU.load_brain(l_mask_path, ori='RPS')
lin_mask_vec = BU.flatten(l_mask)

# Heatmaps, WT vs KO 

These heatmaps show the average value of each voxel for the timepoint, by each genotype  

In [5]:
WT_indices = Y[Y["Genotype"]=="WT"].index.tolist()
KO_indices = Y[Y["Genotype"]=="KO"].index.tolist() 

In [8]:
# create WT and KO heatmaps for each timepoint 

indices_dir = fm.geno_time_indices_dir(Y)
all_HM_dir = fm.empty_geno_time_dir(Y, mri_dims)

for key in indices_dir: 
    # create empty array for heatmap 
    currentHM = all_HM_dir[key]

    #get all the relevant MRI images 
    indices = indices_dir[key]
    n = len(indices)
    #add to the heatmap 
    for brain in tqdm(X[indices]):
        currentHM += brain

    # divide by number of images to get average 
    np.divide(currentHM, n)
    all_HM_dir[key] = currentHM

    save_name = 'heatmap_' + key + '.npy'
    np.save(os.path.join(save_name), currentHM) 

100%|██████████| 10/10 [00:00<00:00, 285.73it/s]
100%|██████████| 10/10 [00:00<00:00, 277.79it/s]
100%|██████████| 10/10 [00:00<00:00, 227.26it/s]
100%|██████████| 10/10 [00:00<00:00, 263.17it/s]
100%|██████████| 10/10 [00:00<00:00, 277.78it/s]
100%|██████████| 10/10 [00:00<00:00, 246.82it/s]
100%|██████████| 9/9 [00:00<00:00, 290.29it/s]
100%|██████████| 10/10 [00:00<00:00, 259.31it/s]


In [7]:
# load heatmaps for each genotype, timepoint 
all_HM_dir = fm.empty_geno_time_dir(Y, mri_dims)
for key in all_HM_dir: 
    all_HM_dir[key] = np.load(str(key) + '_heatmap.npy')

In [9]:
# display WT heatmaps 
wts = []
wt_names = []
for key in all_HM_dir: 
    if 'WT' in key: 
        wts.append(all_HM_dir[key])
        wt_names.append(key)

# specify labels for plot (note the labels below are specifically for RAS orientation)
dir_labels = { 'saggital' :   ['P', 'A', 'D', 'V'],
               'coronal' :    ['L', 'R', 'D', 'V'],
               'horizontal' : ['L', 'R', 'A', 'P']} 

# BV.plot_interactive_panels(np.vstack(wts), mri_dims, nolin_mask_vec, figsize=(17, 3), dir_labels=dir_labels, column_titles=wt_names)

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=123), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=199), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=81), Output()…

In [9]:
# display KO heatmaps 
kos = []
ko_names = []
for key in all_HM_dir: 
    if 'KO' in key: 
        kos.append(all_HM_dir[key])
        ko_names.append(key)

# specify labels for plot (note the labels below are specifically for RAS orientation)
dir_labels = { 'saggital' :   ['P', 'A', 'D', 'V'],
               'coronal' :    ['L', 'R', 'D', 'V'],
               'horizontal' : ['L', 'R', 'A', 'P']} 

# BV.plot_interactive_panels(np.vstack(kos), mri_dims, nolin_mask_vec, figsize=(17, 3), dir_labels=dir_labels, column_titles=ko_names)

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=123), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=199), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=81), Output()…

In [5]:
# # load all heatmaps to display side-by-side
# all_HM_dir = fm.empty_geno_time_dir(Y, mri_dims)
# for key in all_HM_dir: 
#     all_HM_dir[key] = np.load(str(key) + '_heatmap.npy')

# # specify labels for plot (note the labels below are specifically for RAS orientation)
dir_labels = { 'saggital' :   ['P', 'A', 'D', 'V'],
               'coronal' :    ['L', 'R', 'D', 'V'],
               'horizontal' : ['L', 'R', 'A', 'P']} 

# #display BL and pre-fear heatmaps 
# ### DOESN'T WORK RN 
# BV.plot_interactive_panels(np.vstack([all_HM_dir.values()][:4]), mri_dims, nolin_mask_vec, figsize=(17, 3), dir_labels=dir_labels, column_titles=[all_HM_dir.keys()][:4])

In [11]:
#display post fear and D9 heatmaps 
# BV.plot_interactive_panels(np.vstack(all_HMs[4:]), mri_dims, nolin_mask_vec, figsize=(17, 3), dir_labels=dir_labels, column_titles=["WT Fear", "KO Fear", "WT D9", "KO D9"])

# Variance
This plot shows the level of variance between individual voxels in all the MRIs for each timepoint.

In [13]:
indices_dir = fm.geno_time_indices_dir(Y)

In [14]:
# make a new dict that has the variance array for each keys value 
var_dir = fm.empty_geno_time_dir(Y, mri_dims)
for group in indices_dir: 
    current_var = np.var(X[indices_dir[group]], axis=0)
    var_dir[group] = current_var


BV.plot_interactive_panels(np.vstack(var_dir.values())[:4], mri_dims, nolin_mask_vec, figsize=(15, 3), dir_labels=dir_labels, colormap="Reds", column_titles=list(var_dir.keys())[:4]) 

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=123), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=199), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=81), Output()…

In [15]:
# BV.plot_interactive_panels(np.vstack(var_dir.values())[4:], mri_dims, nolin_mask_vec, figsize=(15, 3), dir_labels=dir_labels, colormap="Reds", column_titles=list(var_dir.keys())[4:]) 

# Remove pieces with high variance 

In [16]:
#variance histograms 
# full_name = {"BL": "baseline", "PreF": "Before fear experience", "Fear": "Fear experience", "D9": "9 days after fear experience"}

fig, ax = plt.subplots(nrows=4, ncols=2, squeeze=False, figsize=(10, 12), sharex=True, sharey=True)
fig.suptitle("Variance across each timepoint", fontsize=15)
timepoints = var_dir.keys()
for n, timepoint in enumerate(timepoints):
    col = n % 2
    row = int(n / 2)
    # use masked array to exclude variances equal to 0 
    a = np.ma.masked_equal(var_dir[timepoint],0)
    ax[row, col].hist(a.compressed(), bins=20)
    ax[row, col].title.set_text(timepoint)
    ax[row, col].set_ylabel("Frequency", fontsize=12)

plt.yscale('log')

ax[3, 1].set_xlabel("Variance")
ax[3, 0].set_xlabel("Variance")
plt.tight_layout()
plt.show()   

# plot histogram of the variance at each genotype/timepoint (excluding 0 variances)
# y is graphed on a log scale 

KeyboardInterrupt: 

In [68]:
# use a masked array so that the all the variances aren't just 0 for things outside the brain 

var_all = np.var(X, axis=0)
var_all_no0 = np.ma.masked_equal(var_all,0).compressed()


# find top 5% of variance across all images 
# top_5 = np.percentile(var_all_no0, 95)
# print(top_5)

In [71]:
# calculate the variance for each percentile 
# percentiles go from 0 to 100 inclusive
all_thresholds = np.percentile(var_all_no0, list(range(0, 101)))

# okay so now we want to find what percentile each voxel of var is in 
var_percentiles = np.searchsorted(all_thresholds, var_all)

In [23]:
BV.plot_interactive_panels(var_percentiles, mri_dims, nolin_mask_vec, figsize=(12, 5), dir_labels=dir_labels, column_titles=["Variance percentile for each voxel"])

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=123), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=199), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=81), Output()…

In [35]:
# graph the top 5 percentiles for each image 
var_percentiles_top_5 = var_percentiles.copy()
var_percentiles_top_5[var_all < np.percentile(var_all, 95)] = 5

BV.plot_interactive_panels(var_percentiles_top_5, mri_dims, nolin_mask_vec, figsize=(8, 3), dir_labels=dir_labels, column_titles=["Location of top 5% voxels"])

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=123), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=199), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=81), Output()…

In [38]:
# graph the top 1 percentiles for each image 
var_percentiles_top_1 = var_percentiles.copy()
var_percentiles_top_1[var_all < np.percentile(var_all, 99)] = 5

BV.plot_interactive_panels(var_percentiles_top_1, mri_dims, nolin_mask_vec, figsize=(8, 3), dir_labels=dir_labels, column_titles=["Location of top 1% variance"])

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=123), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=199), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=81), Output()…

In [39]:
# graph the top 3 percentiles for each image 
var_percentiles_top_3 = var_percentiles.copy()
var_percentiles_top_3[var_all < np.percentile(var_all, 97)] = 5

BV.plot_interactive_panels(var_percentiles_top_3, mri_dims, nolin_mask_vec, figsize=(8, 3), dir_labels=dir_labels, column_titles=["Locations of top 3% variance"])

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=123), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=199), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=81), Output()…

conclusions: 
-  high variance in many places tends to concentrate around the edges of the brain (54)
- seems like knocking out high variance will get rid of some of the really high activity spots ? (60 ish)
- most high variances places on the edges, but some notable places not (slice 64)

In [27]:
# remove items with top 3% of variance from X 
X_var3 = X.copy()
X_var3[X > np.percentile(var_all, 97)] = 0

np.save("X_var3.npy", X_var3)


In [28]:
# graph the new Xs with the high variance removed 
X_var3 = np.load("X_var3.npy")


# remove all data from indices with high variance 
#### TODOOOT TODO :
# everywhere where the np.percentile is
    for bi in range(brains.shape[0]):
        brains[bi,mask_regions] = 0


IndentationError: unexpected indent (<ipython-input-28-60390f863f0f>, line 8)

In [63]:
# load all heatmaps to display side-by-side
all_HMs = []
for timepoint in timepoints_dir: 
    all_HMs.append(np.load('wt_heatmap_novar_' + timepoint + '.npy'))
    all_HMs.append(np.load('ko_heatmap_novar_' + timepoint + '.npy'))


#display BL and pre-fear heatmaps 
BV.plot_interactive_panels(np.vstack(all_HMs[:4]), mri_dims, nolin_mask_vec, figsize=(17, 3), dir_labels=dir_labels, column_titles=["WT BL", "KO BL", "WT Pre-fear", "KO Pre-fear"])

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=123), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=199), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=81), Output()…

In [64]:
#display fear and d9 heatmaps 
BV.plot_interactive_panels(np.vstack(all_HMs[4:]), mri_dims, nolin_mask_vec, figsize=(17, 3), dir_labels=dir_labels, column_titles=["WT Fear", "KO Fear", "WT D9", "KO D9"])

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=123), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=199), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=81), Output()…

In [None]:
######^ TODO: Haven't revamped the variance section yet 

# Graphing Fear Minus Pre-Fear for each Genotype

In [10]:
# load Fear_KO_heatmap, pre-fear ko heatmap 
fear_ko_hm = np.load("Fear_KO_heatmap.npy")
pref_ko_hm = np.load("PreF_KO_heatmap.npy")

fear_wt_hm = np.load("Fear_WT_heatmap.npy")
pref_wt_hm = np.load("PreF_WT_heatmap.npy")

diffs = np.zeros((3, fear_ko_hm.shape[0]))

diffs[0] = fear_ko_hm - pref_ko_hm
diffs[1] = fear_wt_hm - pref_wt_hm
diffs[2] = diffs[0] + diffs[1]

FileNotFoundError: [Errno 2] No such file or directory: 'Fear_KO_heatmap.npy'

In [7]:
BV.plot_interactive_panels(diffs[:2], mri_dims, nolin_mask_vec, figsize=(17, 3), dir_labels=dir_labels, column_titles=["KO Fear - Pre-F", "WT Fear - Pre-F"])

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=123), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=199), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=81), Output()…

In [19]:
BV.plot_interactive_panels(diffs[2], mri_dims, nolin_mask_vec, figsize=(8, 3), dir_labels=dir_labels, column_titles=["Overall Fear - Pre-F"])

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=123), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=199), Output(…

interactive(children=(IntSlider(value=0, continuous_update=False, description='brain_slice', max=81), Output()…

In [9]:
# same them as niftis
fm.save_as_nifti(diffs[0], 'fear_minus_pref_KO.nii', mri_dims, affine)
fm.save_as_nifti(diffs[1], 'fear_minus_pref_WT', mri_dims, affine)
# fm.save_as_nifti(var_percentiles, 'variance_percentiles.nii', mri_dims, affine)