# Contrast computation and testing

In this homework, we will use the same dataset we used in the lecture. We will contrast additional conditions from that experiment.

In [None]:
# Imports
import neurods as nds
import numpy as np
import os
import matplotlib.pyplot as plt
# Configure defaults for plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.aspect'] = 'auto'
plt.rcParams['image.cmap'] = 'viridis'
%matplotlib inline

In [None]:
# The ordinary least squares solution for estimating model weights
from numpy.linalg import inv
def OLS(X,Y):
    return np.dot(inv(np.dot(X.T,X)),np.dot(X.T,Y))

### Modeling voxel responses

Remember, we are using regression because we want to model different voxel responses to a set of conditions. 

We do the following steps:
 - Create a design matrix that describes the conditions (or characteristics) of our stimulus time course. (In the lectures we usually provide you with the design matrix. However, when you do an experiment, this is something you have to create in order to draw conclusions. You basically need to specify what the participant observes at each point in time.)
 - Convolve this design matrix with the hemodynamic response function to account for the fMRI BOLD response. 
 - We use linear regression to find out which conditions (e.g. faces, objects, places) are driving a voxel's response. (*Side note: This assumes that each voxel's response was a linear combination of all the conditions (the convolved time courses of the stimuli) and our goal is to recover the parameters (or weights) of this linear combination*)

** Load the design matrix: stimulus time course**

In [None]:
basedir = os.path.join(nds.io.data_list['fmri'],'categories')
# Load stimulus design matrix
design = np.load(os.path.join(basedir,'experiment_design.npz'))
conditions = design['conditions'].tolist()
print('Conditions: ', conditions)
design_run1 = design['run1']

**Load two runs of fMRI response (run1 and run2) and two runs (run1 and run2) of the stimulus time course:**

When we load two (or more) separate runs of fMRI responses, we normalize (zscore) every run separately. This is an operation that we do in addition to the within voxel normalization we do using the *standardize=True* argument to *nds.fmri.load_data* function. 

Hence, we will use the *zscore* function after we load one run of fMRI data.

In [None]:
from scipy.stats import zscore
import cortex
sub, xfm = 'S2', 'S2_category_auto'
mask = cortex.db.get_mask(sub, xfm, type='cortical')
fname = os.path.join(basedir, 'S2_categories1_{n}.nii.gz') #S2_categories1_{n}.nii.gz

# Load fmri responses:
Y = np.vstack([zscore(nds.fmri.load_data(fname.format(n=n), mask=mask, standardize=True))
               for n in [1,2]])

# stimuli:
X = np.vstack([design[run] for run in ['run1','run2']])

plt.figure(figsize=(10,4))
plt.imshow(Y)
plt.title('Voxel responses')

plt.figure(figsize=(10,4))
for i, (cond, label) in enumerate(zip(X.T, conditions)):
    plt.plot(cond+i+0.2*i, label=label, lw=2)
plt.title('Condition labels')
_ = plt.legend(frameon=False, bbox_to_anchor=(1.4, 1))

**Convolve the design matrix with the hemodynamic response function to account for the fMRI BOLD response:**

We learned in the previous lectures that we can use the *np.convolve* function to do this.

In [None]:
from neurods.fmri import hrf as generate_hrf
t_hrf, hrf_1 = generate_hrf(tr=2)
n, d = X.shape

conv_X = np.zeros_like(X)
for i in range(d):
    conv_X[:,i] = np.convolve(X[:,i], hrf_1)[:n]
    
for i, (cond, label) in enumerate(zip(conv_X.T, conditions)):
    plt.plot(cond+i+0.2*i, label=label, lw=2)
    
plt.title('Condition labels')
_ = plt.legend(frameon=False, bbox_to_anchor=(1.4, 1))

**Perform linear regression and estimate model weights:**

We want to find the response of all the voxels in the brain to these 5 different conditions (body, faces, object, places, scrambled). 

Instead of a one dimensional output $Y$, we have a many dimensional output ${\bf Y}$ that describes time and number of voxels.

In [None]:
print("The size of the fMRI responses: {}".format(Y.shape))
print("The size of the convolved design matrix: {}".format(conv_X.shape))

In [None]:
weights = OLS(conv_X, Y)
print('The size of the estimated weights is {}'.format(weights.shape))


# Hypothesis Testing

In order to draw conclusions (or to make inferences) about what conditions are represented in the brain data we first need to estimate an appropriate statistic from the data. We will perform something called a t-test.



### Contrasting conditions
 
In class, we used the following vector to contrast faces - places: **contrast: [0, 1, 0, -1, 0]**.

Now, create a new vector that would test whether a voxel is more responsive to **bodies** than to **faces**.

Then, like we did in class:
- compute the mse
- use the t-stat function from class to compute:
$$ t_v = \frac{c^T \hat\beta_v}{\sqrt{\hat{\sigma}^2 c^T (X^T X)^{-1}c}}$$
for each voxel. 
- make a flatmap of the t-statistics.


In [None]:
### STUDENT ANSWER


Now we would like to find which voxels are significantly more responsive when the subject sees bodies than when they see faces.

- convert the t-statistics to p-values for each voxel, using the tdistribution.cdf function we used in class (imported below)
- using a significance level of $\alpha = 0.05$ and using the **Bonferroni** correction, find the threshold of significance to use
- make a flatmap of the voxels thresholded by significance
- is the result different from the one we saw in class?

In [None]:
from scipy.stats import t as tdistribution
### STUDENT ANSWER
