# Overview 

Today's class will have two parts: 

First, we will review the homework, and describe ways to limit the amount of memory used in loading large data sets. 

Second, we will describe the structure of the experiment that produced the data we have been analyzing, and we will compute averages of activity around the time of specific experimental events. 


# Goals
* Understand ways to reduce the amount of memory used when loading data
* Understand *masking* data with logical indices
* Estimate the average response to an experimental event

# Updating resources in your server home directory
(Run the cells in this section once, then restart your kernel, reload the web page, and skip this section the next time through!)

In [None]:
# Updating functions
import neurods
# Update neurods package
neurods.io.update_neurods()

# Memory management and masking
A big difficulty in last week's homework - and in data science in general - is how to deal with large data sets. 

In [None]:
# Load some necessary libraries
import matplotlib.pyplot as plt
import numpy as np
import nibabel
import neurods
import cortex
import os

In [None]:
# Set plotting defaults
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

In [None]:
# Set matplotlib defaults!
plt.rcParams['image.cmap'] = 'viridis'
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.origin'] = 'lower'
plt.rcParams['image.aspect'] = 'equal'

### Python digression: Floating point vs integer numbers

`numpy` stores numbers in several different formats: numbers can be stored as boolean values (True or False); as integers (0, 1, 2...) or as floating-point numbers (2.3256..., 3.63212..., etc). This is a common aspect of all programming languages that deal with images or numbers. Different formats for numbers use different amounts of memory. For data types that allow decimals (e.g. numpy's float32 and float64), the more decimal places that are stored for each number in an array, the more memory the array takes up. 

Thus, converting to a less-precise format (np.float32) can save memory, if precision is not critically important.

In [None]:
print(np.float64(np.pi))
print(np.float32(np.pi))

In [None]:
r64 = np.random.rand(30,100,100)
r32 = r64.astype(np.float32)
print('data type of `r64` is: ', r64.dtype)
print('data type of `r32` is: ', r32.dtype)

In [None]:
whos

### HW Recap

In [None]:
from scipy.stats import zscore

# An OK implementation of load_data:
from scipy.stats import zscore
def load_data_ok(files, do_zscore=False):
    """Load fMRI data from files and optionally z-normalize data"""
    # Create a list to store data
    data = None
    for f in files:
        nii = nibabel.load(f)
        if data is None:
            data = nii.get_data().T
            if do_zscore:
                data = zscore(data, axis=0)
        else:
            tmp = nii.get_data().T
            if do_zscore:
                tmp = zscore(tmp, axis=0)            
            data = np.vstack([data, tmp])
    return data

# A better implementation
def load_data_better(files, do_zscore=False):
    """Load fMRI data from files and optionally z-normalize data
    
    Parameters
    ----------
    files : list 
        List of file names (absolute paths)
    do_zscore : bool
        Flag that determines whether to zscore data in time or not
    
    Returns
    -------
    data : array
        fMRI data array, in (time, z, y, x) format
    """
    # Create a list to store data
    data = []
    # Loop over files in list
    for f in files:
        nii = nibabel.load(f)
        tmp = nii.get_data().T
        # Optionally zscore each run independently
        if do_zscore:
            data = zscore(data, axis=0)
        data.append(tmp)
    # Concatenate full data
    data = np.vstack(data)
    return data

In [None]:
### STUDENT ANSWER
neurods.io.data_list['fmri'] = '/Users/mark/Dropbox/data8/fMRI/'
files = ['s01_categories_{:02d}.nii.gz'.format(r) for r in [1, 2, 3]]
files = [os.path.join(neurods.io.data_list['fmri'], 'categories', f) for f in files]

In [None]:
### STUDENT ANSWER
data = load_data_better(files[:1], do_zscore=False)
print(data.shape)

In [None]:
### STUDENT ANSWER
# Add sequentially:
# (0) print statements! (is_verbose?)
# (1) isinstance(files, (list, tuple)):
# (2) *files
# (3) dtype=np.float32
# (4) load_data_lazy(exp, runs, **kwargs)
#  -> show neurods.io.data_list     
 

In [None]:
ls /Users/mark/Dropbox/data8/fMRI/motor/

In [None]:
### STUDENT ANSWER
### IF TIME: Option 2: recursive
def load_data(fname, do_zscore=False, dtype=np.float32):
    """Load fMRI data from nifti file, optionally with masking and standardization"""
    if isinstance(fname, (list, tuple)):
        return np.vstack([load_data(f, mask=mask, do_zscore=do_zscore) for f in fname])
    nii = nibabel.load(fname)
    data = nii.get_data().T
    # Convert to float 32 instead of 
    data = data.astype(np.float32)
    if do_zscore:
        data = zscore(data, axis=0)
    return data

# Final answer:
def load_data(*files, do_zscore=False, mask=None, dtype=np.float32):
    """Load fMRI data from files and optionally z-normalize data
    
    Parameters
    ----------
    files : strings 
        absolute path names for files to be loaded
    do_zscore : bool
        Flag that determines whether to zscore data in time or not
    
    Returns
    -------
    data : array
        fMRI data array, in (time, z, y, x) format
    """
    # Create a list to store data
    data = []
    # Loop over files in list
    for f in files:
        print("Loading {}...".format(f))
        nii = nibabel.load(f)
        tmp = nii.get_data().T.astype(dtype)
        # Optionally zscore each run independently
        if mask is not None:
            tmp = tmp[:, mask]
        if do_zscore:
            tmp = zscore(tmp, axis=0)
        data.append(tmp)
        del tmp
    # Concatenate full data
    data = np.vstack(data)
    return data

def load_data_lazy(*runs, exp='categories', **kwargs):
    if exp=='categories':
        files = [os.path.join(neurods.io.data_list['fmri'], exp, 's01_categories_%02d.nii.gz'%r) for r in runs]
    elif exp=='motor':
        files = [os.path.join(neurods.io.data_list['fmri'], exp, 's01_motorloc.nii.gz')]
    return load_data(*files, **kwargs)
# hrm = load_data_lazy(1,2,3, mask=cortical_voxels)
# hrm.shape

# MOVE ME


Consequently, it is common to display the results of statistical analyses of fMRI data on inflated and flattened representations of the cerebral cortex. Such cortical surface maps provide a way to examine all cortical fMRI data at once, with the anatomical location of the functional data made clear. 

The cortical surface must be computationally extracted from high spatial resolution anatomical MRI scans, and often manually edited (*NOTE: Manual editing to create a good corical surface can take days or weeks of effort! This data is not free!*)

<img src="figures/MPRAGE.png" align='left' style="height: 200px;">

<img src="figures/MPRAGE_wcortex.png" align='left' style="height: 200px;">

<img src="figures/cortex_3views.png" align='left' style="height: 200px;">

## Masking

As we have discussed, not all of the data in our 4D array is equally interesting to us. We are interested in the fMRI data collected IN the brain (vs outside it), and more specifically in the data collected from the cerebral cortex (the outermost layer of the brain). 

Here, we will show you how to extract (a) the data in the brain, and (b) the data in the cerebral cortex from the whole array. 

Remember our histogram of values for data, which show a ton of voxels with zero values (from outside the brain):

In [None]:
# Specify files
files = ['s01_categories_{:02d}.nii.gz'.format(r) for r in [1, 2, 3]]
files = [os.path.join(neurods.io.data_list['fmri'], 'categories', f) for f in files]

In [None]:
# Load data for only one file
data = load_data(files[0], do_zscore=False)

In [None]:
bins = np.linspace(0,2000,31)
_ = plt.hist(data.flatten(), bins)
plt.xlabel('Raw fMRI Activity')
plt.ylabel('TRs (count)')

So: how can we extract the data that is only from the region of the scan that contains the brain? We could try to write down an index for each data point in the data that contains a brain voxel (e.g. [25, 33, 33], [25, 33, 34]), but you can see how such a list would get quite long (tens of thousands) and would be difficult to construct. 

One simple way to find data that is in or near the brain is to threshold the data to find only the voxels where the signal is greater than zero. 

In [None]:
# Here, consider only the first volume
brain_voxels = data[0] > 0
print(brain_voxels)

In [None]:
### STUDENT ANSWER
brain_voxels = data[0] > 250

In [None]:
# What is this thing we have just created?
print('dtype of `brain_voxels`: ', brain_voxels.dtype)
print('Sum of of `brain_voxels`: ', brain_voxels.sum())
print('Mean of of `brain_voxels`: ', brain_voxels.mean())
print('Shape of `brain_voxels`: ', brain_voxels.shape)

### Breakout session
1. Discuss what each of the values above indicate about the `brain_voxels` array.
2. What happens if you change the cell above to be brain_voxels = data[0] > X, where X is greater than zero? (What should the threshold [X] for selecting brain voxels be?)
3. While playing with the threshold value, display the `brain_voxels` variable in some sensible way. What does the array LOOK like for different thresholds (values of X)?

In [None]:
### STUDENT ANSWER
_ = neurods.viz.slice_3d_array(brain_voxels, axis=0, vmin=0, vmax=1)

Now we have an array of True/False values (a boolean array). This array can be directly used to INDEX our data! 

In [None]:
# Logical indices are fun!
a = np.arange(10)
idx = np.array([True, False, True, False, True, False, True, False, True, False])
a[idx]

In [None]:
# This works in multiple dimensions, too!
a = np.arange(20).reshape(2,10)
print(a)

In [None]:
print(a[:,idx])

In [None]:
# or even for brain data!
brain_data = data[:, brain_voxels]
print(brain_data.shape)

### BREAKOUT SESSION
Make a histogram of `brain_data`. Z-score it, and plot it as an image.

In [None]:
### Student answer
plt.hist(brain_data.flatten(), bins)
plt.xlabel('Raw BOLD response')
plt.ylabel('TRs (count)')
plt.figure()
plt.imshow(zscore(brain_data, axis=0), aspect='auto')
plt.xlabel("Voxels")
plt.ylabel("Time (TRs)")

In [None]:
# Undo masking action?

# Masking with pycortex

In [None]:
# Just like specifying a volume, pycortex needs a subject and a transform to retrieve a mask
# for a particular data set.
cortical_voxels = cortex.db.get_mask('s01', 'catloc', type='cortical')

In [None]:
# Display the same information as the brain mask above
print(cortical_voxels.dtype)
print(cortical_voxels.shape)
print(cortical_voxels.sum())
print(cortical_voxels.mean())

In [None]:
# Plot 
fig1 = plt.figure(figsize=(6,5))
_ = neurods.viz.slice_3d_array(cortical_voxels, axis=0, fig=fig1)

In [None]:
### TEACHER INFO
fig2 = plt.figure(figsize=(10,3))
_ = neurods.viz.slice_3d_array(cortical_voxels, axis=1, fig=fig2)

In [None]:
cortical_data = data[:, cortical_voxels]
print(cortical_data.shape)

In [None]:
whos

# Event-related averages