# Overview

## Lab 4: Writing functions, Data normalization

In class, we talked about writing re-usable functions. Specifically, we wrote a plot function that was eventually incorporated into the visualization sub-module of `neurods` (`neurods.viz`). We also talked about data normalization, and introduced pycortex, our library for visualizing brain data on the cortical surface. For the first homework question, we will write a function to load data from NIfTI (.nii.gz) files; for the second homework question, we will use pycortex to show some data. First, a quick review of how to load data and plot data in pycortex:

In [None]:
# Imports
import os
import cortex
import neurods
import nibabel
import numpy as np

In [None]:
%matplotlib inline

In [None]:
if False: # Skip example cell to save memory
    # Load data (as we did in class)
    base_dir = '/data/shared/cogneuro88/fMRI/'
    experiment = 'categories'
    run = 's01_categories_01.nii.gz'
    fpath = os.path.join(base_dir, experiment, run)
    nii = nibabel.load(fpath)
    data = nii.get_data() # Note that data was not transposed in this example cell...
    print('Data shape:') # ... and that this was pointed out in the next two lines!
    print('(X, Y, Z, Time)') 
    print(data.shape)

In [None]:
if False: # Skip example cell
    # Show data pycortex
    sub = 's01' # Specifies subject (i.e., which surface in the pycortex database to use)
    xfm = 'catloc' # Specifies transform from functional data space to the cortical surface
    cmap = 'Reds' # Pick a good colormap here. `Reds` will do...
    data_volume = cortex.Volume(data.T[0], sub, xfm, cmap=cmap, vmin=0, vmax=1500)
    _ = cortex.quickflat.make_figure(data_volume)

# `1`. (6 points) Write a function to load data. 
There are a few things we will want to do to the data as we load it (for example, maybe normalize it, and maybe concatenate a few different files' worth of data together into one big array). The intent is that this will be the function we use to load data for the rest of the semester! (something like this will be incorporated into `neurods`)
* [1 pt] Name the function `load_data()`. It should take a file name as input and return an array as output. 
* [1 pt] It should have a docstring!
* [2 pts] It should *optionally* normalize the data by z-scoring it (i.e., whether or not the data returned by the function is z-normalized should be determined by an input argument). 
* [2 pts] If the function is given multiple files as input, it should concatenate all the files together in the time dimension and return a single array. It is up to you to determine how you should go about providing multiple files as input! (Note that the data directory (`/data/shared/cogneuro88/fMRI/categories/`) contains three runs of the same experiment: 
`s01_categories_01.nii.gz`, `s01_categories_02.nii.gz` and `s01_categories_03.nii.gz`. Make sure your code can (optionally) load and concatenate all three of these files!)
   
Hints: use `if` statements! and look up `np.vstack` and `np.concatenate`. You should think carefully about the order in which you perform normalization and concatenation (or concatenation and normalization?) of the data. Also remember that you should include some kind of demonstration that your code works as intended!

In [None]:
### STUDENT ANSWER

# An OK implementation of load_data:
from scipy.stats import zscore
def load_data_ok(files, do_zscore=False):
    """Load fMRI data from files and optionally z-normalize data"""
    # Create a list to store data
    data = None
    for f in files:
        nii = nibabel.load(f)
        if data is None:
            data = nii.get_data().T
            if do_zscore:
                data = zscore(data, axis=0)
        else:
            tmp = nii.get_data().T
            if do_zscore:
                tmp = zscore(tmp, axis=0)            
            data = np.vstack([data, tmp])
    return data

# A better implementation
def load_data_better(files, do_zscore=False):
    """Load fMRI data from files and optionally z-normalize data
    
    Parameters
    ----------
    files : list 
        List of file names (absolute paths)
    do_zscore : bool
        Flag that determines whether to zscore data in time or not
    
    Returns
    -------
    data : array
        fMRI data array, in (time, z, y, x) format
    """
    # Create a list to store data
    data = []
    # Loop over files in list
    for f in files:
        nii = nibabel.load(f)
        tmp = nii.get_data().T
        # Optionally zscore each run independently
        if do_zscore:
            data = zscore(data, axis=0)
        data.append(tmp)
    # Concatenate full data
    data = np.vstack(data)
    return data

# A nicer syntax
def load_data(*files, do_zscore=False, dtype=np.float32):
    """Load fMRI data from files and optionally z-normalize data
    
    Parameters
    ----------
    files : strings 
        Absolute path names for files to be loaded
    do_zscore : bool
        Flag that determines whether to zscore data in time or not
    mask : boolean array
        Selection mask that specifies which voxels to extract from 3D brain
    dtype : numpy data type
        Data type to which to convert the loaded data

    Returns
    -------
    data : array
        fMRI data array, in (time, z, y, x) format (if not masked) or in
        (time, voxels) format (if masked)
    """
    # Create a list to store data
    data = []
    # Loop over files in list
    for f in files:
        print("Loading {}...".format(f))
        nii = nibabel.load(f)
        tmp = nii.get_data().T.astype(dtype)
        # Optionally zscore each run independently
        if do_zscore:
            tmp = zscore(tmp, axis=0)
        data.append(tmp)
        del tmp
    # Concatenate full data
    data = np.vstack(data)
    return data

# The extra lazy way to load data (here as an example, not used below)
def load_data_lazy(*runs, exp='categories', **kwargs):
    """Efficient wrapper for load_data
    
    Loads data for a given experiment, after specifying only run number
    
    Parameters
    ----------
    runs : integers {1,2,3}
        Run number to load for a given experiment
    exp : string
        Experiment name
    kwargs : keyword arguments
        (passed to load_data)
    
    Returns
    -------
    data : array
        fMRI data array, in (time, z, y, x) format (if not masked) or in
        (time, voxels) format (if masked)
    
    """
    if exp=='categories':
        files = [os.path.join(neurods.io.data_list['fmri'], exp, 's01_categories_%02d.nii.gz'%r) for r in runs]
    elif exp=='motor':
        files = [os.path.join(neurods.io.data_list['fmri'], exp, 's01_motorloc.nii.gz')]
    return load_data(*files, **kwargs)

# See also neurods.io.load_fmri_data()

In [None]:
# Demonstration that the load function works well
# Set this to True to run this cell. We skip it here, because it will use up 
# a lot of memory, and thus possibly cause errors in subsequent cells. 
# Load one to three files
files = ['s01_categories_{:02d}.nii.gz'.format(r) for r in [1, 2, 3]]
files = [os.path.join(neurods.io.data_list['fmri'], 'categories', f) for f in files]    
for n in range(1, 4):
    data = load_data(*files[:n], do_zscore=True)
    print("Loaded {} files, data shape is:".format(n), data.shape)
    print("max={:0.3f}, min={:0.3f}".format(np.nanmax(data), np.nanmin(data)))
    print("")
    del data

In the second part of the homework, we will use pycortex to explore the three different scans in the data directory. 

# `2.` (4 points) Data visualization with pycortex
* [2 pt] Make a plot of the standard deviation of each voxel over time for data from two different experiments. Use the first run of the data we have been working with so far (`'/data/shared/cogneuro88/fMRI/categories/s01_categories_01.nii.gz'`), and the data located at `'/data/shared/cogneuro88/fMRI/motor/s01_motorloc.nii.gz'`. First compute the standard deviation over time. The result will be a 3D volume, with one value per voxel, for each data set. Then use pycortex to map this volume to the cortical surface for s01 for each data set. Don't worry about what was going on in each of these two experiments (we will discuss that next week); just know that the stimulus and the subject's behavior was very different for each of the two experiments. 
* [2 pt] Describe what you observe.  What does this tell you about the brain / the data? 

**Hint 1**: mind your color scales! in pycortex, the color scale is set at creation of a Volume object, using vmin and vmax, and the colormap is set using cmap:

`data_volume = cortex.Volume(data, sub, xfm, vmin=blah, vmax=deblah, cmap=whatever)`

`cmap` should be a string; the options for colormap names are shown in the drop-down menu of the webgl viewer. 

**Hint 2**: You can also use `cortex.webgl.show(data_volume)` to visualize data on a 3D view of the surface. This may help you answer part 2 of this question.

In [None]:
### STUDENT ANSWER
# Note that neurods.io.data_list is a dictionary containing useful paths 
# that we will use repeatedly in class.
cat_exp_data = os.path.join(neurods.io.data_list['fmri'], 'categories', 's01_categories_01.nii.gz')
motor_exp_data = os.path.join(neurods.io.data_list['fmri'], 'motor', 's01_motorloc.nii.gz')
data_cat = load_data(cat_exp_data, do_zscore=False)
# Demostrate what we've done
print(data_cat.shape)
# Compute STD
std_cat = data_cat.std(0) # STD along TIME axis (first, since data was transposed)
data_mot = load_data(motor_exp_data, do_zscore=False)
# Demostrate what we've done
print(data_mot.shape)
# Compute STD
std_mot = data_mot.std(0) # STD along TIME axis (first, since data was transposed)

In [None]:
import matplotlib.pyplot as plt

In [None]:
sub, xfm = 's01', 'catloc'
# NOTE that vmax should not be 1500, as it was above; that is appropriate 
# for the raw data, but not for the standard deviation, which only goes up 
# to ~200, and then only for a few voxels. 
std_cat_vol = cortex.Volume(std_cat, sub, xfm, vmin=0, vmax=30, cmap='viridis')
std_mot_vol = cortex.Volume(std_mot, sub, xfm, vmin=0, vmax=30, cmap='viridis') 
_ = cortex.quickflat.make_figure(std_cat_vol)
# For clarity, not necessary for full points:
_ = plt.title('Standard deviation of visual category experiment', fontsize=24)
_ = cortex.quickflat.make_figure(std_mot_vol)
# For clarity, not necessary for full points:
_ = plt.title('Standard deviation of motor experiment', fontsize=24) 

In [None]:
# To see what the above maps look like in a 3D brain
cortex.webgl.show(std_cat_vol)
#cortex.webgl.show(std_mot_vol)

# Discussion

**Answer**

(Much leniency was given here; if you said anything remotely sensible, you got the points). 

The maps of standard deviations over time for both experiments look somewhat similar. Both experiments have voxels with high standard deviations (high variability) in the frontal lobe and in the occipital lobe, near the very back of the brain (the occipital pole). High variability could be due to responses elicited in the experiment, or to noise in the signal that was measured. Since these experiments are very different, it is unlikely that the experiments would elicit large responses in the same locations in the brain. Thus, the regions that are highly variable in both experiments are likely to have high measurement noise. 