Please fill in the class survey:

https://goo.gl/forms/oOQD84gv3rJVk2yX2

# Overview
In the last lab, we went over how we can use linear regression to estimate how much a voxel responds to each stimulus in an experiment, and how to use hypothesis testing to determine if a specific condition activates a voxel more than another condition. We saw how to estimate a t-statistic, compute a p-value and use multiple comparison correction. 

# Goals
The regression models we build are predictive models. We can use them to predict the activity for each conditions. We will see in this lab how we can perform this prediction. We will also see how we can use complex stimulus that is not neatly categorized into conditions. Learning the brain responses to the different properties of the stimulus will allow us to build models that can predict the activity for new, unseen conditions.

In [None]:
# Imports
import neurods as nds
import numpy as np
import os
import matplotlib.pyplot as plt
import nibabel
import cortex
# Configure defaults for plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.aspect'] = 'auto'
plt.rcParams['image.cmap'] = 'viridis'
%matplotlib inline
from scipy.stats import zscore

In [None]:
os.symlink('/home/shared/cogneuro-connector/data/fmri/word_picture/','figures')

In [None]:
from numpy.linalg import inv
def OLS(X,Y):
    return np.dot(inv(np.dot(X.T,X)),np.dot(X.T,Y))

# Modeling voxel responses

We load the same data from the past two classes:

In [None]:
basedir = os.path.join(nds.io.data_list['fmri'],'categories')
design = np.load(os.path.join(basedir,'experiment_design.npz'))
print('Experiment design variables: ', design.keys())
conditions = design['conditions'].tolist()
print('Conditions: ', conditions)
design_run1 = design['run1']
for i, (cond, label) in enumerate(zip(design_run1.T, conditions)):
    plt.plot(cond+i+0.2*i, label=label, lw=2)
plt.title('Condition labels')
_ = plt.legend(frameon=False, bbox_to_anchor=(1.4, 1))

We use the zscore function while loading the data to normalize every block:

In [None]:
sub, xfm = 'S2', 'S2_category_auto'
mask = cortex.db.get_mask(sub, xfm, type='cortical')
fname = os.path.join(basedir, 'S2_categories1_{n}.nii.gz') #S2_categories1_{n}.nii.gz
# fmri responses:
Y = np.vstack([zscore(nds.fmri.load_data(fname.format(n=n), mask=mask, standardize=True)) for n in [1,2]])
# stimuli:
X = np.vstack([design[run] for run in ['run1','run2']])

plt.figure(figsize=(10,4))
plt.imshow(Y)
plt.title('Voxel responses')

plt.figure(figsize=(10,4))
for i, (cond, label) in enumerate(zip(X.T, conditions)):
    plt.plot(cond+i+0.2*i, label=label, lw=2)
plt.title('Condition labels')
_ = plt.legend(frameon=False, bbox_to_anchor=(1.4, 1))

We build a design matrix that accounts for the hemodynamic response:

In [None]:
from neurods.fmri import hrf as generate_hrf
t_hrf, hrf_1 = generate_hrf(tr=2)
n, d = X.shape

conv_X = np.zeros_like(X)
for i in range(d):
    conv_X[:,i] = np.convolve(X[:,i], hrf_1)[:n]
    
for i, (cond, label) in enumerate(zip(conv_X.T, conditions)):
    plt.plot(cond+i+0.2*i, label=label, lw=2)
    
plt.title('Condition labels')
_ = plt.legend(frameon=False, bbox_to_anchor=(1.4, 1))

We estimate the weights for all voxels:

In [None]:
weights = OLS(conv_X, Y)
print('shape of weights is {}'.format(weights.shape))

Last lecture we estimated the mean squared error of our predictions:
- First, we used the weights estimated above to predict the activity $\hat {\bf Y}$.
- Second, we estimated the error ${\bf Y-\hat Y}$.
- Then, we estimated $\boldsymbol \sigma^2$, the mean squared error $\sum_{i=0}^{N-1}(Y_i - \hat Y_i)^2$. This gave us a vector corresponding to the mean squared error at every voxel.

In [None]:
Y_hat = np.dot(conv_X, weights)
error = Y - Y_hat
mse = np.mean((Y - Y_hat)**2, axis=0)
vol = cortex.Volume(mse, sub, xfm, mask = mask)
__  = cortex.quickflat.make_figure(vol)
plt.title('mse', fontsize = 30)

We compute the coefficient of determination, which estimates how much of the data is being predicted: $R^2 = 1 - \frac{\bf \boldsymbol \sigma^2}{var({\bf Y})} $.

Since we have already normalized every voxel to have a variance of 1, you can simplity the computation to: $R^2 = 1 - \bf \boldsymbol \sigma^2 $

In [None]:
R2 = 1 - mse
vol = cortex.Volume(R2, sub, xfm, mask = mask)
__  = cortex.quickflat.make_figure(vol)
plt.title('R^2', fontsize = 30)

Another way of evaluating the quality of predictions is to see how correlated they are with what we intented to predict. We can look at the time course of data for one voxels, versus the predicted activity for that voxel:

In [None]:
vox_i = 10114 ### try other numbers like 100, 3000, 10114 ...
plt.plot(Y_hat[:,vox_i], label = 'predicted')
plt.plot(Y[:,vox_i], label = 'real')
_ = plt.legend(frameon=False, bbox_to_anchor=(1.4, 1));

The correlation between vectors $A$ and $B$ of length N can be estimated as: 

$$\frac{1}{N} \sum_{i=1}^N \frac{ (a_i - \mu_A) (b_i - \mu_B)  } {\sigma_A \sigma_B}  = \frac{1}{N} \sum_{i=1}^N \frac{ (a_i - \mu_A) }{\sigma_A} \frac{(b_i - \mu_B)  } { \sigma_B}  = \frac{1}{N} \sum_{i=1}^N  a_i' b_i' $$

where $A'$ and $B'$ are normalized versions of $A$ and $B$ that have a 0 mean and a variance of 1.

(in some textbooks you might find the above expression divided by N-1 instead of N, but we will not get into this subtelty here).

To correlate two time series, we therefore effectively remove the mean and variance of each to see if they vary in the same way. We can do that by using the zscore function:

In [None]:
#Y is already mean 0 and variance 1
plt.plot(zscore(Y_hat[:,vox_i]), label = 'predicted')
plt.plot(Y[:,vox_i], label = 'real')
_ = plt.legend(frameon=False, bbox_to_anchor=(1.4, 1));

#### Breakout session:

Write a function that computes the correlation between the predicted and the real brain activity, and returns a vector with the amount of correlation for each voxel: 

- create new versions of the two matrices which have a column mean of 0 and a column variance of 1 (you can use the zscore function)
- compute the **elementwise product** of the two normalized matrices
- compute the mean of the elementwise product across the row, this is effectively the correlation
- make a flatmap of the correlation between Y and Y_hat

In [None]:
# corr = compute_correlation(Y, Y_hat)
# vol = cortex.Volume(corr, sub, xfm, mask = mask)
# __  = cortex.quickflat.make_figure(vol)
# plt.title('correlation', fontsize = 30)

def compute_correlation(matrix_1, matrix_2):
### STUDENT ANSWER


## Predicting withheld data:

We explored t-test last week. Another method for testing if the weights of a learned model are meaningful and not due only to chance is to test them to predict new, unseen data. The idea is that if the weights we estimated are indicative of how the brain responds to the experimental conditions, then we can use them to predict the brain response for new data. Here, we introduce concepts that are very important for the statistics and machine learning fields:

- Training set: is the part of the data you use to estimate your model. You can use this data as you wish. We will discuss overfitting next week and see why you might want to be careful with how much of the variance of this data you want your model to predict.
- Test set: this test should remain untouched until the very end of your analysis, where you only use it to report your results. You should never go back to your analysis and change any parameters based on the performance of your model on the test set. 

We did not use yet the third run of our experiment. We will load it here and use it to test the performance of our model on it:

In [None]:
sub, xfm = 'S2', 'S2_category_auto'
mask = cortex.db.get_mask(sub, xfm, type='cortical')
fname = os.path.join(basedir, 'S2_categories1_{n}.nii.gz') #S2_categories1_{n}.nii.gz
# fmri responses:
Y_test = np.vstack([zscore(nds.fmri.load_data(fname.format(n=n), mask=mask, standardize=True)) for n in [3]])
# stimuli:
X_test = np.vstack([design[run] for run in ['run3']])

n_test = X_test.shape[0]

plt.figure(figsize=(10,4))
for i, (cond, label) in enumerate(zip(X_test.T, conditions)):
    plt.plot(cond+i+0.2*i, label=label, lw=2)
plt.title('Condition labels')
_ = plt.legend(frameon=False, bbox_to_anchor=(1.4, 1))

conv_X_test = np.zeros_like(X_test)
for i in range(d):
    conv_X_test[:,i] = np.convolve(X_test[:,i], hrf_1)[:n_test]
    

#### Breakout Session

Using the weights you have estimated before:
- First, use conv_X_test to predict the activity $ {\bf \hat Y_{test}}$.
- Second, estimate the error ${\bf Y_{test}-\hat Y_{test}}$.
- Then, estimate $\bf \boldsymbol \sigma_{test}$, the mean squared error $\sum_{i=0}^{N-1}({Y_{test}}_i -  {\hat Y_{test}}_i)^2$. This will give you a vector corresponding to the mean squared error at every voxel.
- Compute the coefficient of determination, which estimates how much of the data is being predicted:
    $R^2_{\bf test} = 1 - \frac{\bf \boldsymbol \sigma_{test}}{var({\bf Y_{test}})} $.
    Since we have already normalized every voxel to have a variance of 1, you can simplity the computation to:
    $R^2_{\bf test} = 1 - \bf \boldsymbol \sigma_{test} $
- Produce a flatmap of $R^2_{\bf test}$
- Then, use the previously computed $\boldsymbol \sigma$ using training data to produce a flatmap of $R^2_{\bf train} = 1 - \bf \boldsymbol \sigma_{} $

In [None]:
# start by using the variables conv_X_test and weights to estimate Y_hat_test
# you should estimate a variable mse_test
# then use to plot it the following:
# vol = cortex.Volume(1-mse_test, sub, xfm, mask = mask, vmin = 0, vmax = 1)
# __  = cortex.quickflat.make_figure(vol)
# plt.title('R2 test', fontsize = 30)
# then plot the training R2:
# vol = cortex.Volume(1-mse, sub, xfm, mask = mask, vmin = 0, vmax = 1)
# __  = cortex.quickflat.make_figure(vol)
# plt.title('R2 train', fontsize = 30)

### STUDENT ANSWER


- What is the difference between the two plots?

You can do the same comparisons with the correlation measure:

In [None]:
corr = compute_correlation(Y_test, Y_hat_test)
vol = cortex.Volume(corr, sub, xfm, mask = mask, vmin = 0, vmax = 1)
__  = cortex.quickflat.make_figure(vol)
plt.title('correlation - test', fontsize = 30);

corr = compute_correlation(Y, Y_hat)
vol = cortex.Volume(corr, sub, xfm, mask = mask, vmin = 0, vmax = 1)
__  = cortex.quickflat.make_figure(vol)
plt.title('correlation - training', fontsize = 30)

One of the big fallacies in the fMRI litterature is to use the same data to formulate a hypothesis about the involvement of a region . You should be careful to never use the same data to make and test your hypotheses! 

# Complex Stimuli

The approach we saw so far allows us to estimate the response of each voxel to one of a few conditions. What if we want to find the response of a voxel to new, unseen conditions? What if we are interested in the variety of meanings that language can have? It will take an extremely long time to test everything single property one by one. 

Another approach to neuroimaging is therefore to image the brain activity while the subject sees a large number of different stimuli that vary along multiple dimensions. The idea is to cover the space of variability so that the contribution of each feature of the stimulus can be recovered from the data.

We will use freely available data from the Mitchell 2008 science paper: https://www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html

The experiment actually consist in subjects looking at words/line drawings that are presented in isolation:

<img src="figures/science.png" style="height: 300px;">


In [None]:
# loading data:
basedir = os.path.join(nds.io.data_list['fmri'],'word_picture')
name = os.path.join(basedir,'subject_1.nii.gz')
volumes = nibabel.load(name)
data = volumes.get_data()
print(data.shape)
data = data.T
print(data.shape)

In [None]:
#loading mask
name = os.path.join(basedir,'subject_1_mask.nii')
volume = nibabel.load(name)
mask = volume.get_data()
print(mask.shape)
mask = mask.T
print(mask.shape)

In [None]:
# flatten data to a 2D matrix
data = data[:,mask>0]
print(data.shape)

# zscore the data
data = zscore(data, axis = 0)

In [None]:
# this package allows us to work with matlab data, which we need here to load the variables
import scipy.io as sio

# here we load the 60 words that comprise our stimuli
words = sio.loadmat(os.path.join(basedir,'words.mat'))

words = [s[0][0] for s in words['words']]

print("Here are the stimulus words:\n")
print (" - ".join(words))

In this dataset, a stimulus was presented every 10 seconds, and the activity between 4 and 8 seconds after onset was averaged, resulting in one brain image for every stimulus presentation. Each stimulus was repeated 6 times, and the repetitions of all the stimuli was averaged.

In [None]:
word_num = 38 # change the word number
sample_image = np.zeros_like(mask)-2
sample_image[mask>0] = data[word_num]
h = cortex.mosaic(sample_image) # can try with different color map: e.g. h = mosaic(image , cmap= cm.hot)
plt.title(words[word_num],size=30)

This dataset already accounts for the delay of the hemodynamic response, and therefore we should not be convolving our design matrix. We will see here how to contruct a design matrix appropriate for such an experiment.

How can we represent the activity for items that do not belong into clear conditions? 

We could try to make each word be a condition. Ending up with 60 conditions. We see each word only once. How would that help us? We would be able to compute a contrast map between "horse" and "table", but that would not tell us much about why these differences occur, as "horse" and "table" vary in many ways. Also, learning a response per word will not allow us to know what the activity will be like for new words, such as "goat", "pen" etc.

However, we know that new words have some features in common with our set of objects. What if we could learn the responses to specific properties of words (e.g. whether or not they are animate, whether or not they are edible etc...). Then we predict the activity of a novel word as a combination of the activities associated with its properties. For example, we can learn how the brain responds to objects that are manmade, inanimate, made of wood and that are used as tools, and we can estimate the brain response of "pen" as the combination of these responses.

We will do all this in the multivariate regression framework we have used in the last labs. 

First, we need an annotation of the properties of these words. From looking at the list of words, it's clear that there are many properties that different sets of them share.

We have access to a set of 218 questions for which every word has been labeled by multiple users on amazon mechanical Turk (Sudre et al., Neuroimage, 2012). These question were designed to represent the semantic properties of these objects. Additionally, 11 features describing the visual properties of the line drawings are also provided.

The scale of the features is 1-5 with 1 being a 100% no and 5 being 100% yes.

Try changing feature_i below. Try to see the different features, as well as the features 218-229 as well:

In [None]:
feature_data = sio.loadmat(os.path.join(basedir,'features.mat'))

feature_names = feature_data['feature_names']
features = feature_data['features']
print("We have {0} features that describe the stimulus.\n".format(len(feature_names)))
#print feature_names

print("The features matrix therefore has {0} rows and {1}.\n".format(len(words),len(feature_names)))


feature_i = 10
print("FEATURE NUMBER {0}".format(feature_i))
print(feature_names[feature_i][0][0])
for i in range(15):
    print(words[i], features[i,feature_i])
    

In [None]:
print ("Features 1 to 218\n")
for i in range(15):
    print (feature_names[i][0][0])
print ("...")

print ("\n\nFeatures 219 to 229\n")
for i in range(218,229):
    print (feature_names[i][0][0])

Let's only stick to the visual features for simplicity for now:

In [None]:
features = features[:,218:]
feature_names = feature_names[218:]

print ("new features size is {0}".format(features.shape))

Every word is therefore characterized by:

1 - a vector of visual properties. For example for the "apartment" stimulus, Word length has a score of 4, the count of white pixels has a score of 4 etc.

and

2 - a 3D brain image. This is basically a set of 21764 number. Each dimension corresponds to a voxel location. We can therefore think of the brain image as a vector. 

See below how "apartment" is represented:


In [None]:
word_i = 2

print ("\n Word = {0}\n\n Features = \n {1}".format(words[word_i],features[word_i],size=20))

vol = cortex.Volume(data[word_i],'MNI','atlas336', mask = mask)
fig = cortex.quickflat.make_figure(vol)

We have a matrix of 60 words x 12 features, and a matrix of 60 words x 21764 voxels. Let's focus on one voxel. We ultimately want to predict that voxel activity as a function of the features.

In [None]:
# take random voxel
random_index = 1549
vox = np.reshape(data[:,random_index],[60,1])
show_mat =[ features.T , vox.T]

print (show_mat[1].shape)
fig, axes = plt.subplots(nrows=1, ncols=2,figsize=(20,12))
for cnt, ax in enumerate(axes.flat):
    im = ax.matshow(show_mat[cnt].T)

# print(" Features for all stimuli= \n {0} \n voxel {1} activity for all stimuli: {2}".format(features, random_index,vox))

## BUILDING A PREDICTIVE MODEL

### IT IS VERY IMPORTANT NOT TO USE TEST DATA IN TRAINING!!

To judge if a model has learned to predict brain activity outside, we need test it on data it has not seen in training. 

Imagine you have a small dataset with voxel responses to features, and some of the voxels have some noise that is correlated to one of the features. The probability of such an event becomes smaller as the dataset size increases, but at low sample sizes there is a good chance of finding spurious correlations. Such a correlation actually allows you to build a model that predicts brain activity from the features, but only in that dataset, since the noise is independent of the data and will not repeat in the same way in other datasets. However, for the voxels that show a real and strong enough response to the features, you will be able to learn a model that predicts brain activity from the features, and that model should generalize to new data.

This is why we always test a model on held out data that was not used in training. This allows us to judge whether the model is really predicting neural activity and not just fitted to noise in the sample.

Here we separate for you the words into a test and a train set:

In [None]:
Test_index = [0,1,2,3,4,6,7,8,10,13,20,23]
Train_index = list(set(range(60)) - set(Test_index))

Train_X = zscore(features[Train_index,:])
Train_Y = zscore(data[Train_index,:])
print ("shape of training features: {0}, shape of training fMRI data: {1}".format(Train_X.shape, Train_Y.shape))

Test_X = zscore(features[Test_index,:])
Test_Y = zscore(data[Test_index,:])
print ("shape of testing features: {0}, shape of testing fMRI data: {1}".format(Test_X.shape, Test_Y.shape))


### Weight estimation and data prediction

We want to learn a function that predicts the activity for any word in terms of its features. 


#### Breakout session
- Use the OLS function to estimate the brain response to the various features for every voxel.
- Use the estimated weights to predict the activity for the held-out words, using Test_X.
- Use the compute_correlation function to compute the correlation of your predicted activity and the real activity Test_Y
- Plot a flatmap of the prediction performance. Which regions are well predicted, why?

In [None]:
# cor = compute_correlation(Test_Y, Pred_Y)
# vol = cortex.Volume(cor, 'MNI', 'atlas336', mask=mask, vmin=0, vmax=0.6, cmap='viridis')
# fig = cortex.quickflat.make_figure(vol, height=500)
# plt.title("prediction performance with visual features", fontsize=20)

### STUDENT ANSWER
