# Overview
This assignment is a review for the final exam. It is not required, but you can earn a maximum of 5 points of extra credit toward your Assignments grade by completing it and submitting it by NEXT Monday night (as you would a homework assignment). Note that you do not necessarily need to do everything in this notebook to receive 5 points of extra credit; you can cover your bases / increase your odds of getting full EC points by answering all questions. Again, this is NOT MANDATORY. This is intended to give an indication of what material may appear on the final exam (the question types will be similar to the midterm), as well as a chance to improve your grade. We strongly recommend that you work through the problems and convince yourself that you have gotten to the right answer. Some solutions / examples *may* be posted next Tuesday (after the deadline to submit answers for extra credit). As always, you are welcome to come to office hours to discuss the questions. 

In [None]:
# General imports
import os
import cortex
import numpy as np
import matplotlib.pyplot as plt
import neurods
from scipy.stats import zscore
from neurods import stats as nds
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'viridis'
plt.rcParams['image.aspect'] = 'auto'

# 1. Create artificial data
One way to demonstrate that you have understood an analysis procedure is to generate an artificial data set, and then analyze that data set as if it were real. This is a useful exercise, because the process of creating a fake data set is closely related to the process of making a statistical model of real data. Here, your task will be to fabricate an artificial data set and then analyze it. You will begin by creating a design matrix that you will make up with arbitrarily chosen onset times for two conditions. You will specify a weight for each condition, and then generate data for which a specific subset of the voxels in the brain show a difference between those conditions. 

Once you have generated the data, you will analyze the data to see if (1) you can correctly recover the weights you made up for each condition, and (2) whether a contrast between the two conditions generates differences between weights in the locations where you specified there should be differences. 

We will walk you through this process step by step. Each step will make use of concepts you have learned in the class.

In [None]:
# Use the data we have already worked with for the MNI brain as a template:
data_file = os.path.join(neurods.io.data_list['fmri'], 'word_picture', 's03.nii.gz')
data_real = neurods.io.load_fmri_data(data_file, mask=None)
print('Loaded data is of shape: ', data_real.shape)

The first step in this process is to create a data set that is of shape (time x voxels) that is simply Gaussian random noise. This represents our starting assumption, that if nothing is happening in the brain, the signals we measure will simply look like samples from a random Gaussian distribution. 

We will create this data as if it is the result of having measured the brain in our fake experiment - meaning, we will have to assume some size for the brain we have measured, and we will have to determine a number of time points that exist in our fake experiment. 

In [None]:
# Create a new data set using np.random.randn that has the same number of voxels as the experiment above,
# but has 100 time points. We will assume a 2 second TR.
data_fake = # ??

In [None]:
### STUDENT ANSWER
data_fake = np.random.randn(100, *data_real.shape[1:])
print(data_fake.shape)

Next, we will create an experimental design matrix ($X$) for our fake experiment, with two conditions (call them "finger tapping" and "rest"). We will dictate that the conditions were on for 10 seconds at a time. The two conditions should be on at different times, and there should be a gap of 4 seconds between conditions. Assume a 2-second TR.

Note that the time axis for $X$ should be the same as the time axis for `data_fake` ($Y$) above!

In [None]:
X = # ?? (This may take more than one line!)

In [None]:
### STUDENT ANSWER
# Make what we're doing clear with some explicit variables:
TR = 2
start = 4 # allow a little time at the onset
cond_length = 10
cond_gap = 4
n_TRs = 100
n_conds = 2
# Define onsets & randomize them
onsets = np.arange(start, n_TRs, cond_length/TR + cond_gap/TR)
randi = np.random.permutation(range(len(onsets)))
onsets = onsets[randi]
X = np.zeros((n_TRs, n_conds))
for ii, onset in enumerate(onsets):
    # Alternate every other
    if ii%2==0:
        X[onset:onset+cond_length/TR, 0] = 1
    else:
        X[onset:onset+cond_length/TR, 1] = 1
print(X.shape)

# Display X!
_ = plt.imshow(X)
_ = plt.xlabel("Condition", fontsize=14)
_ = plt.ylabel("Time (TR)", fontsize=14)



# Convolve X with the hemodynamic response function
t, hrf = neurods.fmri.hrf(tr=2)
X_hrf = np.vstack([np.convolve(X[:,i], hrf)[:n_TRs] for i in range(2)]).T
print(X_hrf.shape)

# Display X_hrf!
plt.figure()
_ = plt.imshow(X_hrf)
_ = plt.xlabel("Condition", fontsize=14)
_ = plt.ylabel("Time (TR)", fontsize=14)



Next, we will generate some data that is not purely noise - data that reflects responses to our fake experimental conditions. We don't want to simulate responses over the *whole* brain (that would be unrealistic!), so we first need to choose a particular part of the brain work with. Remember that this is what we use *masks* for!

Here, we provide a mask for a particular region in the brain.

In [None]:
# Download the file containing the mask & load it into memory
if not os.path.exists('hand_area_mask.npz'):
    mask_url = 'https://drive.google.com/file/d/0B_iniuUpMJoGZXdnUHktcGN5OWM/view?usp=sharing'
    neurods.io.download_file(mask_url, 'hand_area_mask.npz', root_destination='./')
roi_mask = np.load('hand_area_mask.npz')['mask']

In [None]:
# Show the mask! (there are at least two functions we have used in class that you can use to display this,
# one in pycortex, and one in neurods)

# (A) Show with neurods function (??)


# or (B) Show with pycortex function (??)
# Note that to show the mask in pycortex, you will need to specify a subject & transform (xfm)
# Since this subject has been transformed to the MNI brain template, which transform should you use? 
# sub, xfm = # ??



In [None]:
### STUDENT ANSWER
import cortex
# neurods
_ = neurods.viz.slice_3d_array(roi_mask, axis=0, cmap='viridis')
# pycortex
sub, xfm = 'MNI', 'atlas336'
V = cortex.Volume(roi_mask.astype(np.float), sub, xfm, vmin=0, vmax=1, cmap='viridis')
_ = cortex.quickflat.make_figure(V, with_curvature=True)

In [None]:
# How many voxels are in this mask? 
n_voxels = # ?

In [None]:
### STUDENT ANSWER
n_voxels = roi_mask.sum()

Next, we will use the design matrix (`X`) that we created above to create simulated responses ($Y$, which we will call `real_signal` below) for some voxels in the brain. We will define our data according to the regression equation:

$Y=X\beta + \epsilon$

$\beta$ specifies the weight for each condition for each voxel. (What size should it be?)

$\epsilon$ is simply random noise; you can use np.random.randn() again to create a value for $\epsilon$ (What size should it be? [hint: you should use the variable `n_voxels` defined above as one of the dimensions])

In [None]:
real_signal = # ?? (This maybe more than one line!)

In [None]:
### STUDENT ANSWER
# There are many ways to create this matrix. You can also use np.tile.
B1, B2 = 3.4, 1.2
B = np.zeros((2, n_voxels))
B[0, :] = B1
B[1, :] = B2
# This next step is not strictly necessary, but it will add a little variability to the "real" weights 
# in the voxels of interest
voxel_wt_variance_std = 0.2
B += (np.random.randn(2, n_voxels) * voxel_wt_variance_std)
# If you don't add a little noise, your results will come out exactly as you hypothesize.
# That would be pretty boring, so let's make it more interesting. Varying epsilon here 
# will make the data (and the results of your analysis) more or less noisy
epsilon = np.random.randn(n_TRs, n_voxels) * 2.0
real_signal = X_hrf.dot(B) + epsilon
_ = plt.imshow(real_signal)
_ = plt.colorbar()

For the last step of creation of the data, insert the `real_signal` into the `fake_data` variable in the locations specified by the mask!

In [None]:
# You should structure the cells above so that this line works!
data_fake[:, roi_mask] = real_signal
# Zscore `data_fake` over time to equate the scale of the data in all voxels
data_fake = zscore(data_fake, axis=0)

Now we analyze the data! Use X_hrf to estimate weights for each condition in `X`. 

Note that `data_fake` is not masked, so the neurods.stats.ols function will not work with the variables we've defined so far! You will have to mask the fake brain data (`data_fake`) from 4D to 2D. What is the appropriate mask to use? Consult Lecture 12...

In [None]:
### STUDENT ANSWER
# Note: if you uncomment the next line, it won't work! Need to mask the data down. 
mask_file = os.path.join(neurods.io.data_list['fmri'], 'word_picture', 's03_mask.nii')
mask = neurods.io.load_fmri_data(mask_file, do_zscore=False, dtype=np.bool)
B_est = neurods.stats.ols(X_hrf, data_fake[:, mask])

Plot the difference between condition 1 and condition 2 on the cortical surface! You should see a difference in the same part of the brain as the ROI mask that you displayed above.

In [None]:
### STUDENT ANSWER
dif = B_est[0]-B_est[1]
V_dif = cortex.Volume(dif, sub, xfm, vmin=-3, vmax=3, cmap='RdBu_r', mask=mask)
cortex.quickflat.make_figure(V_dif);

# 2. Create a different hypothesis than the one we used in lecture 12 to analyze the word/picture experiment. Analyze the experiment with your model. (2 points max)

You will have to define new groups (different from the ones we used in the breakout version of Lecture 12), and repeat each step of the analysis with those groups.

Compare the predictions of your model with the model we created in class. (It doesn't matter if your model makes less accurate predictions, as long as you have performed the analysis steps correctly). 

This analysis should follow all the steps from Lecture 12.

In [None]:
### STUDENT ANSWER


# 3. Write a function to analyze a whole experiment (2 points max)
There is currently a big push in many areas of science to increase the replicability of scientific work. One important way to do this is to share the code that is used to analyze scientific data.

We have analyzed several experiments in the lectures of this course (Category localizer, motor localizer [for the midterm], words & pictures, and EEG p300). Choose one of these experiments, and put together a single function that loads the relevant data, analyzes the data (for at least one of the analyses we did in class), and generates at least one figure showing the results of the analysis. 

Call your function `generate_figures()`. Give it one input argument - the path to the data for the experiment (hint: all the data for the class is in `/data/shared/cogneuro88/<experiment_folder>/`). Call the function to demonstrate that it works. Include a docstring that describes what the function does (in some detail!), and comment the code clearly so that someone seeking to replicate this experiment will understand it! 

In [None]:
### STUDENT ANSWER
