# Overview 

In the first ~1/3 of this lecture, we will review homework, programming, and fMRI concepts

In the second ~1/3 of this lecture, we will introduce data normalization and logical indexing

In the last ~1/3 of this lecture we will introduce software to display fMRI data on a 3D representation of the cortical surface.

# Goals
* Understand the way pycortex represents a volumetric data set
* Create pycortex Volume objects from 3D data, masked data, and time series data
* Display data on the cortical surface



# Updating resources in your server home directory
(Run the cells in this section once, then restart your kernel, reload the web page, and skip this section the next time through!)

In [None]:
# Updating functions
import neurods
import cortex as cx
# Add figures to notebook
dropbox_link = 'https://www.dropbox.com/s/dkibicpvc13ng27/Archive.zip'
fname = 'figures/'
basedir = '~/cogneuro-connector/weekly_materials/Lecture04_Normalization_Masking_pycortex/'
neurods.io.download_file(dropbox_link, fname, basedir, zipfile=True)
# Update neurods package
neurods.io.update_neurods()
# Update pycortex configuration file
cx.config.set('basic', 'filestore', '/data/shared/cogneuro88/pycortex_store/')
with open(cx.options.usercfg, 'w') as fid:
    cx.config.write(fid)

# Part 1:  Writing reusable functions
& misc. code / efficiency tips!

In [None]:
# Load some necessary libraries
import matplotlib.pyplot as plt
import numpy as np

In [None]:
# Set plotting defaults
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

We created the following functions to make plots of slices of fMRI volumes. In some code libraries, these are called light table or mosaic plots, because in the (literally) dark old days of film photography, photographers used to lay out their film negatives on light tables in a format similar to this to see them.

We will want to make plots like this many times throughout the course, so we would like to formalize these functions a little more to make them more readily reusable. 

In [None]:
def get_any_slice(volume, slice_number, dimension):
    """Given an integer and a 3D volume, this function returns the data of 
    that horizontal slice """ 
    if dimension == 0:
        img = volume[slice_number, :, :]
    elif dimension == 1:
        img = volume[:, slice_number, :]
    elif dimension == 2:
        img = volume[:, :, slice_number]
    return img

def plot_any_slice_v2(volume, slice_number, dimension, cmap = 'viridis', vmin=0, vmax=2000,
                     origin = 'lower', interpolation='nearest', aspect='equal'):
    img = get_any_slice( volume, slice_number, dimension)
    _ = plt.imshow(img, cmap = cmap, vmin= vmin, vmax = vmax, origin = origin, interpolation = interpolation,
                  aspect = aspect)
    _ = plt.axis('off')
    
def plot_all_slices(volume, slice_dimension, nrows, ncols , cmap = 'viridis', vmin=0, vmax=2000,
                     origin = 'lower', interpolation='nearest', aspect='equal' ):
    fig = plt.figure(figsize = (8,8))
    n_slices = first_volume.shape[slice_dimension]
    for s in range(n_slices):
        ax = fig.add_subplot(nrows, ncols, s+1)
        plot_any_slice_v2(first_volume, s, slice_dimension, cmap = cmap, vmin= vmin, vmax = vmax, 
                          origin = origin, interpolation = interpolation,aspect = aspect)    

We will modify the final function to be more production-ready below. Follow along...

* One major addition we need to make is a ***docstring***. See [here](https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt#docstring-standard) for a general description of writing docstrings according to the numpy format, and [here](https://github.com/numpy/numpy/blob/master/doc/example.py) for clear examples.
* We will use the `**kwargs` syntax to pass keyword arguments from function to function
* We will modify argument names to be more similar to arguments in similar functions

In [None]:
### STUDENT ANSWER


In [None]:
### STUDENT ANSWER


In [None]:
import neurods
neurods.viz.slice_3d_array?

## Survey 1!

In [None]:
# Teacher info
https://docs.google.com/a/berkeley.edu/forms/d/e/1FAIpQLSfzF7dZ3z2GI8TzrnImXeXf0qIyDmUof8XuB97j0aZ1UyK-mg/viewform

# Load data
Today, we will load data from a common neuroimaging data format ([NIfTI](https://nifti.nimh.nih.gov/nifti-1/) format). If you find open source neuroimaging data online, it will most likely be in this format. To load this data, we need to use a neuroimaging code library called nibabel. 

In [None]:
import nibabel
import os

In [None]:
# Create a nifti (nii) proxy object
fbase = '/Users/mark/Dropbox/data8/' #'/data/shared/cogneuro88'
fname = os.path.join(fbase,'fMRI/categories/s01_categories_01.nii.gz')
nii = nibabel.load(fname) 

# This object stores the infomation *about* the fMRI data stored in the file. 
# This meta-data can be accessed via attributes of the `nii` object.
print('nii.in_memory :', nii.in_memory)
print('nii.shape :', nii.shape)
print('voxel sizes :', nii.header.get_zooms())

There is also information stored about how the brain was oriented in space as it was scanned, but that is beyond the scope of what we will go into here. 

In [None]:
# Retrieve actual data as an array
data = nii.get_data().T
print('nii.in_memory : ', nii.in_memory)
print('data shape : ', data.shape)

In [None]:
# Plot a few voxels
_ = plt.plot(data[:,22:, 45, 45])
_ = plt.xlabel('Time (TRs)')
_ = plt.ylabel('BOLD signal')

### HW Recap

In [None]:
# Plot an image of the first 10 time points for all voxels in one horizontal slice
slice_3d_array(data[:10,5], 0, 2, 5, slice_prefix='Time = {}', figsize=(10,4), vmin=0, vmax=1000)

### BREAKOUT SESSION
Make an image plot of all the voxels in one horizonatal slice. 

1. Select all time points for one horizontal slice
2. Use np.reshape make a 2D array that is (time x voxels) 
3. Use `plt.imshow` to show that 2D array
4. Make the plot pretty! (label axes, set an appropriate color scale, etc)

In [None]:
### STUDENT ANSWER


The plot you make should show clearly that the problem is that there's more variability across voxels than there is across time (Some voxels *always* show higher signal than others). This scaling issue complicates some kinds of visualization and analysis. To look at this issue a slightly different way, let's look at histograms of the timecourses for two voxels.

# Part 2: Data normalization

In [None]:
bins = np.linspace(250, 650, 31)
voxel_a = data[:, 5, 50, 43]
voxel_b = data[:, 5, 50, 53]
_ = plt.hist(voxel_a, bins, label='Voxel A')
_ = plt.hist(voxel_b, bins, label='Voxel B')
plt.legend()

## Survey 2!

In [None]:
def minmax_norm(data):
    """Normalize data to range of 0-1 by subtracting min, dividing by range"""
    data_norm = (data-data.min()) / (data.max()-data.min())
    return data_norm

In [None]:
bins01 = np.linspace(0,1,31)
voxel_a_n = minmax_norm(voxel_a)
voxel_b_n = minmax_norm(voxel_b)
_ = plt.hist(voxel_a_n, bins01, label='Voxel A')
_ = plt.hist(voxel_b_n, bins01, label='Voxel B')
plt.legend()

Looks sensible. However, a problem with this normalization method shows up when you have outlying values. What would happen if Voxel A, at one time point, had a (spuriously) very large value? Let's see!

In [None]:
# Aside: a cautionary tale. Python memory is weird.
a = np.arange(10)
b = a
b[3] = 63
print(a)

In [None]:
# You can avoid changing a variable by using the copy package
import copy
a = np.arange(10)
b = copy.copy(a)
b[3] = 63
print(a)

In [None]:
# Create a variable called voxel_a_wonky and add an outlier to it
voxel_a_wonky = copy.copy(voxel_a)
voxel_a_wonky[5] = voxel_a_wonky.max()*1.5

In [None]:
# See what that outlier does to histogram plots created after our first normalization method!
_ = plt.hist(minmax_norm(voxel_a_wonky), bins01, label='Voxel A (w/ outlier)')
_ = plt.hist(minmax_norm(voxel_b), bins01, label='Voxel B')
plt.legend()

This does not put the data from both voxels into a similar range, because the max value is not stable (it can change a lot depending on only one data point). In general, (linear) data normalization set you need to *center* it (by subtracting off some value) and to *scale* it (by dividing by some value or performing some nonlinear operation). 
A more robust, stable way to normalize data is to subtract the *mean* of the data instead of the min, and to divide by the *standard deviation* instead of the range. 

The *standard deviation* is a measure of the variability of the data, derived from the whole data set (rather than the two points - the min and the max - that we used before.)

This process converts the data to *standard scores* or *z scores*. 

In [None]:
# You can compute the standard deviation of voxel A and voxel B using 
print(np.std(voxel_a), ',', np.std(voxel_b))
# or: 
print(voxel_a.std(), ',', voxel_b.std())

In [None]:
# We can look at the standard deviation of the data by slice
slice_3d_array(data.std(axis=0), 0, 5, 6)
#slice_3d_array(data.std(axis=0), 0, 5, 6, vmin=0, vmax=200, figsize=(6*2,5*2))
#_ = neurods.viz.slice_3d_array(data.std(axis=0), axis=0)

### Breakout session
Perform normalization of Voxel A and Voxel B by (1) subtracting off the mean (**`array.mean()`** or **`np.mean(array)`**) and (2) dividing by the standard deviation (**`array.std()`** or **`np.std(array)`**)

Then repeat the plots we created above (making histograms of the timecourses for Voxel A, with and without an outlier, and for Voxel B)

In [None]:
### STUDENT ANSWER


z normalization can be performed easily on multi-dimensional arrays using the `zscore` function in scipy.stats. 

In [None]:
from scipy.stats import zscore
zscore?

In [None]:
data_z = zscore(data, axis=0)

Now we can repeat a few of the plots we made above, to show that we now see more variation across time than across voxels.

In [None]:
# Plot time x voxels
plt.figure(figsize=(10,4))
plt.imshow(data_z[:, 5].reshape(120, 100*100), aspect='auto', vmin=-3, vmax=3)
plt.xlabel('Voxels')
plt.ylabel('Time (TRs)')

In [None]:
# Plot 
slice_3d_array(data_z[:10,5], 0, 2, 5, slice_prefix='Time = {}', figsize=(10,4), vmin=-3, vmax=3)

# Part 3: Cortical surface extraction

fMRI studies often focus on the cerebral cortex (the outermost layer of the brain). Consequently, it is common to display the results of statistical analyses of fMRI data on inflated and flattened representations of the cerebral cortex. Such cortical surface maps provide a way to examine all cortical fMRI data at once, with the anatomical location of the functional data made clear. 

The cortical surface must be computationally extracted from high spatial resolution anatomical MRI scans, and often manually edited (*NOTE: Manual editing to create a good corical surface can take days or weeks of effort! This data is not free!*)

<img src="figures/MPRAGE.png" align='left' style="height: 200px;">

<img src="figures/MPRAGE_wcortex.png" align='left' style="height: 200px;">

<img src="figures/cortex_3views.png" align='left' style="height: 200px;">

Remember our histogram of values for data, which show a ton of voxels with zero values (from outside the brain).

In [None]:
bins = np.linspace(0,2000,31)
_ = plt.hist(data.flatten(), bins)
# xlabel?
# ylabel?

So: how can we extract the data that is only from the region of the scan that contains the brain? We could try to write down an index for each data point in the data that contains a brain voxel (e.g. [25, 33, 33], [25, 33, 34]), but you can see how such a list would get quite long (tens of thousands) and would be difficult to construct. 

One simple way to find data that is in or near the brain is to threshold the data to find only the voxels where the signal is greater than zero. 

In [None]:
# Here, consider only the first volume
brain_voxels = data[0] > 0
print(brain_voxels)

In [None]:
# What do each of these quantities mean?
print('dtype of `brain_voxels`: ', brain_voxels.dtype)
print('Sum of of `brain_voxels`: ', brain_voxels.sum())
print('Mean of of `brain_voxels`: ', brain_voxels.mean())
print('Shape of `brain_voxels`: ', brain_voxels.shape)

### Breakout session
Display the `brain_voxels` variable in some sensible way. What does the array LOOK like?

In [None]:
### STUDENT ANSWER


Now we have an array of True/False values (a boolean array). This array can be directly used to INDEX our data! 

In [None]:
# Logical indices are fun!
a = np.arange(10)
b = np.array([True, False, True, False, True, False, True, False, True, False])
a[b]

In [None]:
# This works in multiple dimensions, too!
a = np.arange(20).reshape(10,2)
print(a)

In [None]:
print(a[b,:])

In [None]:
# or even for brain data!
brain_data = data[:, brain_voxels]
print(brain_data.shape)

### BREAKOUT SESSION
Make a histogram of `brain_data`. Z-score it, and plot it as an image.

In [None]:
### Student answer
plt.hist(brain_data.flatten(), bins)
plt.xlabel('Raw BOLD response')
plt.ylabel('TRs (count)')
plt.figure()
plt.imshow(zscore(brain_data, axis=0), aspect='auto')
plt.xlabel("Voxels")
plt.ylabel("Time (TRs)")

# Onward to 3D data visualizations! 

We will use a python module called **`pycortex`** to show data in 3D on the brain. This module was developed here at UC Berkeley in the Gallant lab, mostly by James Gao, with help from Alex Huth, Mark Lescroart, and other lab members. The code is freely available online [here](https://github.com/gallantlab/pycortex), and a paper summarizing the code can be found [here](http://journal.frontiersin.org/article/10.3389/fninf.2015.00023/full). 

To map the functional data onto the cortex, pycortex requires at least two things:

1. The cortical surface of the subject. 
    * pycortex stores cortical surface files (and several other files) for each subject in a reliably structured directory of files. Because of this reliable directory structure, all we need to provide to the code is a subject ID string, and the code will be able to find and load the relevant cortical surface files. 
2. The functional to anatomical aligmnent of this data to that cortical surface
    * Alignment of functional data to anatomical data proceeds by an *affine transform*. How this transformation works is beyond the scope of this class, but you can look it up on [wikipedia](https://en.wikipedia.org/wiki/Affine_transformation) or in your favorite linear algebra textbook if you're curious. The practical upshot is that a 4x4 matrix of numbers is sufficient to store the 3 rotations (around the x, y, and z axes) and 3 the transformations (in the x, y and z dimensions) that will transform the functional data in space such that they are aligned with the anatomical data (with the cortical surface). In the pycortex code (and in other neuroimaging software), "transform" is abbreviated in variable names as `xfm`. Just as with the cortical surface, we only need to specify a name for a transform, and the code will know where to find the file that contains the affine transformation matrix. 

In [None]:
import cortex

In [None]:
# (1) subject (specifies the cortical surface of the brain)
subject = 'S2' 
# (2) transform = functional-to-anatomical alignment
transform = 'S2_category_auto' 

In [None]:
# Create a volume
data_volume = cortex.Volume(data[0], subject, transform) 

In [None]:
# Show the volume in a 3D brain (ooooh)
cortex.webgl.show(data_volume)

In [None]:
cortex.quickflat.make_figure(data_volume)