# Part 3 - A naturalistic reading  experiment

We now switch gears to another experiment. In this experiment, subjects read multiple stories in the scanner. The stories were presented one word at a time, with the words appearing at the rate of natural speech. Each word was presented at the center of the screen by itself, for a few hundred milliseconds. So in every TR, subjects read about 4 to 15 words. 

For these words, one can extract multiple features. For example, some of the features can relate to the semantic properties of those words. We will not deal with such features today. We will look at only one type of features for these words: The letters that compose the words. Our feature space is a 26 dimensional space in which each dimension corresponds to a letter in the alphabet. At every TR, we count how many times each letter occured. 

For example, if during one TR the subject reads:

"it 

was 

the 

first 

time 

I 

saw

something

so"

then the feature dimension corresponding to the letter "s" will have a count of 5, the feature corresponding to "t" will have 5, the feature corresponding to "a" will have 2, the feature corresponding to "e" will have 3, etc. E.g.:

| a        |    ...       | e  | ...      | s          | t  |
| ------------- |:-------------:| -----:|:-------------:| -----:|:-------------:|
| 2     |  ... | 3 | ...  |5 | 5 |


We would like to learn a model that predicts the activity in the brain as a function of the letters that are read by the subject. Letters are used across words of all meanings, so you can see how this letter model captures low level properties rather than high level meaning. Thus it may be a good model for brain mechanisms related to processing letters rather than semantics.

In [None]:
# Imports
import neurods as nds
import numpy as np
import h5py, os
import matplotlib.pyplot as plt
# Configure defaults for plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.aspect'] = 'auto'
plt.rcParams['image.cmap'] = 'viridis'
%matplotlib inline

## 3.1 Loading data

Here, we load the data. For simplicity, the data has already been normalized for us.

We will plot both the brain data and the letter features.

Like the example we saw in class, the data is divided into training data and test data.

In [None]:
# Loading the design
basedir = os.path.join(nds.io.data_list['fmri'],'reading')
design = h5py.File(os.path.join(basedir,'s03_X_v2.hdf'))
X_train = design['X_train'][:]
X_test = design["X_test"][:]

# load the data
import cortex
sub, xfm = 'S3', 's03_reading'
mask = cortex.db.get_mask(sub, xfm, 'thin')
data = h5py.File(os.path.join(basedir,'s03_data_v2.hdf'))
Y_train = data["Y_train"][:]
Y_test = data["Y_test"][:]

del data, design

### Visualizing X_train and X_test

- print the shape of the X_train and X_test function
- use plt.imshow to show both matrices.
- what does the column and row of each of X_train and X_test correspond to?

In [None]:
### STUDENT ANSWER


### Normalize the X matrices.

Run the following cells to normalize the feature matrices.

In [None]:
from scipy.stats import zscore
X_train = zscore(X_train, axis = 0)
X_test = zscore(X_test, axis = 0)

### Visualizing Y_train and Y_test

- print the shape of the Y_train and Y_test function
- DO **NOT** USE imshow here because you might run into memory issues.
- what does the column and row of each of Y_train and Y_test correspond to?

In [None]:
### STUDENT ANSWER


## 3.2 Convolution of the design matrix

The TR here is 2. 

- Use the function in the package *neurods* to estimate the hrf
- Plot the hrf
- Use the np.convolve function to obtain conv_X_train. (Remember to trim the matrix properly).
- Use plt.imshow to plot conv_X_train.

In [None]:
from neurods.fmri import hrf as generate_hrf
### STUDENT ANSWER


## 3.3 Estimating voxel responses to features

Here, we will again use linear regression to estimate the response of each feature to each voxel. 

- Implement OLS or copy the function you implemented earlier.
- Use it to estimate the weights for all voxels.
- Print the shape of the weights matrix. 
- What do the rows and columns correspond to?

In [None]:
### STUDENT ANSWER


## 3.4 Predicting training data

We will first predict the training data, and see how well the predicted data correlates with the real data.

- compute Y_train_hat using the weights matrix and conv_X_train.
- use compute_correlations below to compute the correlation of Y_train_hat and Y_train
- make a flatmap of the correlation value over the brain.

In [None]:
from scipy.stats import zscore
def compute_correlation(matrix_1, matrix_2):
    matrix_1_norm = zscore(matrix_1, axis = 0)
    matrix_2_norm = zscore(matrix_2, axis = 0)
    prod = matrix_1_norm * matrix_2_norm
    corr = np.mean(prod, axis = 0)
    return corr

### STUDENT ANSWER


## 3.5 Predicting test data

We will first predict the held out data, and see how well the predicted data correlates with the real held out data.

- ATTENTION: you cannot use the matrix X_test to compute Y_test_hat. Remember, the weights you estimated are a function of the convolved design matrix. You need to use the hrf function above and np.convolve to obtain conv_X_test. And you need to trim it appropriately
- compute Y_test_hat using the weights matrix and conv_X_test.
- use compute_correlations below to compute the correlation of Y_test_hat and Y_test
- make a flatmap of the correlation value over the brain.

In [None]:
### STUDENT ANSWER


### Interpretation

- Which regions appear to be predicted by letters the subject sees on the screen. Does that make sense?

In [None]:
### STUDENT ANSWER
