# Overview
In the last lab and in the homework, we saw how we can model the response of a voxel $v$ to one stimulus $i$ via the hemodynamic response function, and a weight $w$. To do this we assume that we know the shape of the hemodynamic response, and we assume a value for how well a voxel responds to a stimulus. 

In most experiments however, the experimenter is interested in multiple stimuli. They want to find the brain responses for these different stimuli and compare them. In this lecture, we will see how we can use regression and hypothesis testing to study hypotheses about how the brain responds to different stimuli. 

# Goals
- Neuroscience concepts
    - How to model and estimate voxel responses to different conditions
    - Computing contrasts between conditions
- Coding concepts
    - Matrix multiplication and inversion 
- Datascience concepts
    - Multiple regression
    - Relating regression to ERPs
    - Bootstrap tests for regression and p-values
    - Multiple comparison correction


In [None]:
# Imports
import neurods
import numpy as np
import cortex
import os
import matplotlib.pyplot as plt
# Configure defaults for plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.aspect'] = 'auto'
plt.rcParams['image.cmap'] = 'viridis'
%matplotlib inline

First let's talk about linear regression, and start to work with a simple problem. Let's say you walk into a grocery store, and you buy 3 oranges and 2 apples. You are told the price is 8\$. You go another time, you buy 2 oranges and 5 apples, you pay 9\$. How much do apples and oranges cost?

This is a system of two equations with two unknown that you can solve and obtain an exact solution.

However, imagine that the cashier doesn't tell you the exact price, but takes the correct price, and then depending on their mood, adds some "noise" to the price: you either pay a little less or a little more. This noise doesn't depend on how much your total is, but on other unrelated factors.

Can you still estimate accurately the prices if you go twice to the store?

What if you go 1000 times?

Let's simulate this with a random sample:

In [None]:
# these are the hidden parameters, we are not supposed to know them:
apple_price = 0.9
orange_price = 1.2
pear_price = 1.5
noise_variance = 5

# sample X and Y:
n = 1000
X = np.round(np.random.uniform(low = 0, high = 10, size=[n,3])).astype(int)
real_beta = np.array([apple_price, orange_price, pear_price]).reshape([3,1])
Y = np.dot(X, real_beta) + np.random.normal(size =[n,1] )*noise_variance

We can express the price in one visit j as:

\begin{align}
y_j =  X_j W +\epsilon_j
\end{align}

Were the row $X_j = [x^a_j,x^o_j,x^p_j]$ corresponds to the counts $x^a_j,x^o_j,x^p_j$ of apples, oranges and pears bought on visit $j$, and $W = [w_a,w_o,w_p]$ corresponds to the prices of apples, oranges and pears respectively. 

We can write the entire visits as:

\begin{align}
Y =  {\bf X} W +\epsilon
\end{align}

where
- $Y$ is n x 1
- ${\bf X}$ is n x d, here d = 3
- ${\beta}$ is d x 1
- $\epsilon$ is n x 1.

Due to the noise, we cannot exactly recover $W$. However, we would like to find a solution $W$ that minimizes the following error as much as possible:

\begin{align}
error = \sum_{j = 1}^N (y_j - X_j W)^2 = ||Y - {\bf X} W||_2^2
\end{align}

This is the sum of squared errors. To minimize this equation with respect to $W$, we first find the derivative with respect to $\beta$:

\begin{align}
\frac{\delta \ error}{\delta W} &=& \frac{\delta ||Y - {\bf X} W||_2^2}{\delta W}\\
 &=& -2{\bf X}^\top (Y - {\bf X} W)\\
\end{align}

The minimum is achieved when the derivative is zero:

\begin{align}
-2{\bf X}^\top (Y - {\bf X} \hat W) = 0\\
{\bf X}^\top Y = {\bf X}^\top{\bf X} \hat W \\
\hat W = ({\bf X}^\top{\bf X})^{-1}{\bf X}^\top Y\\
\end{align}

This is the Ordinary Least Squares Solution. Now write it as a function, then use this function to recover the prices of the fruits, using the following cell:

### BREAKOUT SESSION

- Use the space below to implement the OLS function that returns the OLS solution
- Use the function to compute the prices of the fruits, using the X and Y matrices already in your workspace
- What are the estimated prices of apples, oranges and pears? (apple is the first coordinate of X, oranges is the second and pears is the third).

In [None]:
from numpy.linalg import inv
def OLS(X,Y):
### STUDENT ANSWER
    return np.dot(inv(np.dot(X.T,X)),np.dot(X.T,Y))
prices = OLS(X,Y)
print("apple price: real {0} estimated {1}".format(apple_price,prices[0,0]))
print("orange price: real {0} estimated {1}".format(orange_price,prices[1,0]))
print("pear price: real {0} estimated {1}".format(pear_price,prices[2,0]))

Try to change the number of datapoints and the magnitude of the noise, what do you notice?

## Intercept Term

Many times, we are interested in modeling $y_j$ as:

\begin{align}
y_j =  w_0 + w_1 x^1_j + w_2 x^2_j ... + w_3 x^d_j +\epsilon_j
\end{align}

this means there is a constant intercept term which is always contributed to the output $y_j$. In our store analogy, this could be for example an additional flat fare that is added by the cashier for each costumer. How can we integrate the intercept term in our framework?

There is a simple way, notice how we can rewrite the above equation as:

\begin{align}
y_j =  w_0 x^0_j+ w_1 x^1_j + w_2 x^2_j ... + w_3 x^d_j +\epsilon_j
\end{align}

where $x^0_j$ is always equal to 1. This can be done by creating a matrix $X'$ by adding an additional column to our matrix $X$ which is all ones. Let's try to estimate the intercept term in our fruit example:


In [None]:
X2 = np.hstack([np.ones([n,1]),X])
prices = OLS(X2,Y)
print('intercept term is estimated to be {0}'.format(prices[0,0]))
print("apple price: real {0} estimated {1}".format(apple_price,prices[1,0]))
print("orange price: real {0} estimated {1}".format(orange_price,prices[2,0]))
print("pear price: real {0} estimated {1}".format(pear_price,prices[3,0]))

The intercept term is estimated to be 0, which makes sense because we didn't specify an intercept term! Let's sample data another time and estimate the intercept again:

In [None]:
intercept_term = 2
Y2 = np.dot(X, real_beta) + np.random.normal(size =[n,1] )*noise_variance + intercept_term

prices = OLS(X2,Y2)
print('intercept term is estimated to be {0}'.format(prices[0,0]))
print("apple price: real {0} estimated {1}".format(apple_price,prices[1,0]))
print("orange price: real {0} estimated {1}".format(orange_price,prices[2,0]))
print("pear price: real {0} estimated {1}".format(pear_price,prices[3,0]))

When we work with fMRI data, we usually remove the mean of each voxel in the begining of our analysis. This means we don't need to include the intercept term in our design matrix, because it's effectively equal to zero.

# Modeling voxel responses

Remember, we are using regression because we want to model different voxel responses to a set of stimuli. We learned how to take a stimulus time course and how to convolve it with the hemodynamic response. We then assumed that each voxel's activity was a linear combination of all the convolved time courses of the stimuli. We want to recover the parameters of the linear combination. Let's load some data:

In [None]:
basedir = os.path.join(neurods.io.data_list['fmri'],'categories')
design = np.load(os.path.join(basedir,'experiment_design.npz'))
print('Experiment design variables: ', design.keys())
conditions = design['conditions'].tolist()
print('Conditions: ', conditions)
design_run1 = design['run1']
for i, (cond, label) in enumerate(zip(design_run1.T, conditions)):
    plt.plot(cond+i+0.2*i, label=label, lw=2)
plt.title('Condition labels')
_ = plt.legend(frameon=False, bbox_to_anchor=(1.4, 1))

We are going to use the neurods.io.load_fmri_data to load the data with zscoring. We will load the first two runs of data.

In [None]:
fmri_files1 = ['s01_categories_{:02d}.nii.gz'.format(run) for run in [1,2]]
fmri_files1 = [os.path.join(neurods.io.data_list['fmri'], 'categories', f) for f in fmri_files1]

sub, xfm = 's01', 'catloc'
cortical_voxels = cortex.db.get_mask(sub, xfm, type='cortical')
# fmri responses:
Y = np.vstack( neurods.io.load_fmri_data(fmri_files1[i], mask=cortical_voxels, do_zscore=True, dtype=np.float32)
              for i in [0,1])
# stimuli:
X = np.vstack([design[run] for run in ['run1','run2']])

plt.figure(figsize=(10,4))
plt.imshow(Y)
plt.title('Voxel responses')

plt.figure(figsize=(10,4))
for i, (cond, label) in enumerate(zip(X.T, conditions)):
    plt.plot(cond+i+0.2*i, label=label, lw=2)
plt.title('Condition labels')
_ = plt.legend(frameon=False, bbox_to_anchor=(1.4, 1))

We need to first build a design matrix that accounts for the hemodynamic response:

In [None]:
from neurods.fmri import hrf as generate_hrf
t_hrf, hrf_1 = generate_hrf(tr=2)
n, d = X.shape

conv_X = np.zeros_like(X)
for i in range(d):
    conv_X[:,i] = np.convolve(X[:,i], hrf_1)[:n]
    
for i, (cond, label) in enumerate(zip(conv_X.T, conditions)):
    plt.plot(cond+i+0.2*i, label=label, lw=2)
    
plt.title('Condition labels')
_ = plt.legend(frameon=False, bbox_to_anchor=(1.4, 1))

We want to find the response of all the voxels in the brain to these 5 different conditions. Instead of a one dimensional output $Y$, we have a high dimensional output ${\bf Y}$.

For each voxel, we can write the linear model equation as:

\begin{align}
y^{v}_j =  X_j W^v +\epsilon_j^v
\end{align}

and:

\begin{align}
Y^{v} =  {\bf X} W^v +\epsilon^v
\end{align}

Since this model exist for every function, we can write it as a multiple regression function:
\begin{align}
{\bf Y} =  {\bf X} {\bf W} +\boldsymbol\epsilon
\end{align}

In the above:
- ${\bf Y}$ is n x nVoxels
- ${\bf X}$ is n x d
- ${\bf W}$ is d x nVoxels

Let's try to minimize the sum of squared errors like before with respect to ${\bf W}$, we first find the derivative:

\begin{align}
\frac{\delta \ error}{\delta {\bf W}} &=& \frac{\delta ||{\bf Y} - {\bf X} {\bf W}||_2^2}{\delta  {\bf W}}\\
 &=& -2{\bf X}^\top ({\bf Y} - {\bf X} {\bf W})\\
\end{align}

The minimum is achieved when the derivative is zero:

\begin{align}
-2{\bf X}^\top ( {\bf Y} - {\bf X}{\bf W}) = 0\\
{\bf X}^\top {\bf Y} = {\bf X}^\top{\bf X}  {\bf W}\\
{\bf W} = ({\bf X}^\top{\bf X})^{-1}{\bf X}^\top {\bf Y}\\
\end{align}

This solution is similar to the single dimensional output solution. The first term $({\bf X}^\top{\bf X})^{-1}{\bf X}^\top$ is independent of the data. If we are estimating the parameters for one voxel, or for a large number, this term will be the same. 

Notice also that each voxel's parameters are estimated independently from each other: each column of ${\bf W}$ corresponds to the parameters of one voxel $v$, and it is obtained by multipling the matrix $({\bf X}^\top{\bf X})^{-1}{\bf X}^\top$ with the ${\bf Y}$ column that corresponds to voxel $v$.

Your OLS code for a single voxel should work for multiple outputs as well. Use it to estimate the weights for the 5 conditions for each voxel. Then, using cortex.quickflat.make_figure, make a flatmap plot of the parameters of each of the conditions across the brain.

### Breakout session

- Now, use the OLS function you implemented earlier to find the weight of the response of each voxel to each condition. 
- What should you use as input to the OLS function?
- Print out the shape of the output of the OLS function. What does it correpond to?
- Then, in a for loop, produce a flatmap of the magnitude of the weights for each condition. (use the commands below to help you plot them). Use the name of the conditions as title for your plot (they are all in the variable `condition`).


In [None]:
# vol = cortex.Volume(some_vector, sub, xfm, mask = cortical_voxels, vmin = -1.5, vmax = 1.5)
# __  = cortex.quickflat.make_figure(vol)
# plt.suptitle('some str', fontsize = 30)

### STUDENT ANSWER
weights = OLS(conv_X, Y)
print('shape of weights is {}'.format(weights.shape))
# plt.imshow(weights)
for idx, condition in enumerate(conditions):
    vol = cortex.Volume(weights[idx], sub, xfm, mask = cortical_voxels,vmin = -1.5, vmax = 1.5)
    __  = cortex.quickflat.make_figure(vol)
    plt.suptitle(condition, fontsize = 30)

Why do we use regression instead of block averages?
- What happens when we have event related design? When the conditions overlap?

SURVEY

From the flatmaps above, it seems like different regions of the brain are responsive to different stimuli. For example, we can try to estimate which parts of the brain respond more to faces than bodies.

In [None]:
vol = cortex.Volume(weights[1] - weights[0], sub, xfm, mask = cortical_voxels,vmin = -1.5, vmax = 1.5)
__  = cortex.quickflat.make_figure(vol)
plt.suptitle('faces - bodies', fontsize = 30)
cortex.webshow(vol)

It seems there are effectively regions that are higher for faces than bodies. However, this is finite data from an experiment with a limited duration (effectively, this is only 8 minutes of stimuli). The amount of noise in fMRI data adds error to our estimation. How can we measure how certain we are of our estimates?

One way to contruct a confidence interval for our estimates is to use a procedure called the bootstrap. Bootstraping refers to taking a large set of samples from your N data points. Each time, N points are sampled with replacement from the original N samples. The statistic of interest is computed using those N points. This is repeated a large number of times (e.g. 10000). Then, one can use the resulting set of statistics to build a confidence interval. 

Let's go back to the fruit price example:

In [None]:
# these are the hidden parameters, we are not supposed to know them:
apple_price = 0.9
orange_price = 1.2
pear_price = 1.5
noise_variance = 5

# sample X and Y:
n = 1000
XFruit = np.round(np.random.uniform(low = 0, high = 10, size=[n,3])).astype(int)
real_beta = np.array([apple_price, orange_price, pear_price]).reshape([3,1])
YFruit = np.dot(XFruit, real_beta) + np.random.normal(size =[n,1] )*noise_variance

In [None]:
prices = OLS(XFruit,YFruit)
print("apple price: real {0} estimated {1}".format(apple_price,prices[0,0]))
print("orange price: real {0} estimated {1}".format(orange_price,prices[1,0]))
print("pear price: real {0} estimated {1}".format(pear_price,prices[2,0]))

Now, we sample with replacement 1000 rows from (XFruit,YFruit). We compute the prices for each fruit using those 1000 points:

In [None]:
def randomize_OLS(X,Y):
    n = X.shape[0]
    sample_index = np.random.choice(n,n)
    return OLS(X[sample_index], Y[sample_index])

#### Breakout session:

- now, run the function randomize_OLS 2000 times, recording the answers in the array `prices_bootstrap` below
- for each of the three prices, plot a separate histogram of the estimated weights for all the bootstrap samples

In [None]:
prices_bootstrap = np.zeros((2000,3))
### STUDENT ANSWER
for i in np.arange(2000):
    prices_bootstrap[i] = np.squeeze(randomize_OLS(XFruit,YFruit))

for i in np.arange(3):
    plt.figure()
    plt.hist(prices_bootstrap[:,i])

What if I am interested to know if an orange costs more than an apple? From the OLS weights I can see that:

In [None]:
prices[1] - prices[0]

How can we construct a confidence interval for this?

In the same way!

Run a bootstrap test. At each iteration:
- sample n points with replacement
- compute your statistic of interest, here it's price of orange - price of apple
At the end, contruct a confidence interval.


In [None]:
difference_bootstrap = np.zeros((2000,1))

for i in np.arange(2000):
    tmp = randomize_OLS(XFruit,YFruit)
    difference_bootstrap[i] = tmp[1,0] - tmp[0,0]

plt.figure()
plt.hist(difference_bootstrap[:,0])


We can see how all the sampled differences are above zero. 

We can use this data to construct a 99% confidence interval of the difference between the prices of oranges and apples. From the histogram above, it is clear that this confidence will not contain zero. This data therefore suggests that the price of oranges is more than the price of apples.

Let's construct that interval:

In [None]:
confidence_interval_lower_bound = np.percentile(difference_bootstrap, 0.5)
confidence_interval_upper_bound = np.percentile(difference_bootstrap, 99.5)
print('with 99% confidence, the difference in price is between {} and {}.'.format(confidence_interval_lower_bound,
                                                                                  confidence_interval_upper_bound))

Let's think of our faces vs. bodies contrast. Some of the voxels are more responsive to faces than to bodies and we would like to identify them. We already computed that contrast between the weights of faces and the weights of bodies. For every voxel in the brain, we want to run the same bootstrap test as above.

We will use the same approach. However, different fMRI data points are not IID (independent and identically distributed) because of the slow dynamics leading to the signal. By sampling individual points, the dependencies between the consecutive time points will be broken, and therefore the estimated confidence intervals will not be characteristic of the true variance.

Therefore the test we used above will not be adequate to use. We will use here a slightly modified test that samples blocks of data (5TRs) instead of individual TRs.

In [None]:
def randomize_OLS_for_fMRI(X,Y):
    n = X.shape[0]
    n_blocks = int(n/5)
    block_index = np.arange(n).reshape([n_blocks,-1])
    sample_index = np.random.choice(n_blocks,n_blocks)
    sample_index = block_index[sample_index].reshape([-1])
    return OLS(X[sample_index], Y[sample_index])

In [None]:
n_bootstrap = 100
difference_bootstrap = np.zeros((n_bootstrap,Y.shape[1]))

for i in np.arange(n_bootstrap):
    tmp = randomize_OLS_for_fMRI(X,Y)
    difference_bootstrap[i,:] = tmp[1,:] - tmp[0,:]



Let's plot on the brain the lower bound of the 99% confidence interval:

In [None]:
confidence_interval_lower_bound = np.percentile(difference_bootstrap, 0.5, axis = 0)


In [None]:
vol = cortex.Volume(confidence_interval_lower_bound, sub, xfm, mask = cortical_voxels, 
                    vmin = -np.abs(confidence_interval_lower_bound).max(),
                    vmax = np.abs(confidence_interval_lower_bound).max())
__  = cortex.quickflat.make_figure(vol)
plt.suptitle('faces - bodies', fontsize = 30)

In [None]:
plt.hist(confidence_interval_lower_bound)

In [None]:
vol = cortex.Volume((confidence_interval_lower_bound>0)*1.0, sub, xfm, mask = cortical_voxels, vmin = -1,
                   vmax = 1)
__  = cortex.quickflat.make_figure(vol)
plt.suptitle('faces - bodies', fontsize = 30)

What do you notice? Why are there so many voxels all around the brain? What did we do here?

In [1]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://imgs.xkcd.com/comics/significant.png")

The comic above illustrates the importance of multiple comparison correction. This problem is very important in fMRI. We will discuss it in the next lecture.

Meanwhile, you can check out the blog post: https://blogs.scientificamerican.com/scicurious-brain/ignobel-prize-in-neuroscience-the-dead-salmon-study/

This is about the following paper: Bennett et al. "Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction" Journal of Serendipitous and Unexpected Results, 2010.

Bennett et al. were able to show that a group of voxels in a dead salmon were implicated in processing social cues from images. This study was designed to show the perils of failing to correct for multiple comparisons.


#### Thiking about next week:

We will explore these ideas more next week. 

We will also think about how we can use different designs:
- what are problems about block designs?
- what other ways do we want to study the brain?