# Homework 9

In this homework we'll dig deeper into multiple regression, how it's done, and why it's important.


In [None]:
# Don't change this cell; just run it. 
# The result will give you directions about how to log in to the submission system, called OK.
# Once you're logged in, you can run this cell again, but it won't ask you who you are because
# it remembers you. However, you will need to log in once per assignment.
from client.api.notebook import Notebook
ok = Notebook('hw9.ok')
_ = ok.auth(inline=True)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import nibabel
from nistats.hemodynamic_models import glover_hrf as create_hrf
import cortex
from scipy.stats import zscore
from sklearn.linear_model import LinearRegression

Let's define our helper function to load fMRI data:

In [None]:
def load_nifti(filename, zscore=True, mask=None):
    img = nibabel.load(filename)
    data = img.get_data().T
    if mask is not None:
        data = data[:, mask]
    if zscore:
        data = (data - data.mean(0)) / (data.std(0) + 1e-7)
    return data

And let's create a response design matrix for the motor localizer:

In [None]:
hrf = create_hrf(tr=2, oversampling=1, time_length=32)

motor_categories = np.load("/home/jovyan/motorloc_experimental_conditions.npy")
unique_motor_categories = np.unique(motor_categories)
motor_response_vectors = []
for category in unique_motor_categories:
    stimulus_vector = motor_categories == category
    response_vector = np.convolve(stimulus_vector, hrf)[:len(motor_categories)]
    motor_response_vectors.append(response_vector)
motor_response_design = np.stack(motor_response_vectors, axis=1)

In [None]:
mask = cortex.db.get_mask('s01', 'catloc', 'cortical')
data = load_nifti('/data/cogneuro/fMRI/motor/s01_motorloc.nii.gz', mask=mask)

voxel_indices = [34854, 37594, 36630, 25004, 12135, 0]
voxels = data[:, voxel_indices]

**1. Simple linear regression** 

**(a)** [1pt] Use the `sklearn` object `LinearRegression` to create a model called `linreg` and use it to perform simple linear regression between the first column of `motor_response_design` (associated with `move eyes`) and the first column of `voxels` - the time series of the first voxel, also associated with `move eyes`. 

Create a figure named `fig_regression1` of size `(10, 10)`, then make a scatterplot of the response vector (store it in `move_eyes_response`) and the voxel time series (store it in `voxel0`). 

Use the `predict` method of `linreg` to compute the predicted voxel values for the motor response vector and display them in the same plot as a line plot.

**(b)** [1.5pts] Now you'll make scatter plots with the regression line (just as you did in part a) for all 36 pairs of the 6 response vectors and 6 selected voxels. You'll arrange the scatter plots such that each row of subplots will represent a voxel, and each column will represent a response vector.

Create a figure called `fig_all_scatter_plots` and make it of size `(24, 24)`.
Set a counter `counter` to 0 which you'll use to indicate which subplot to plot to. 

Write a `for` loop that iterates over `i` from 0 to the number of voxels. In the for loop:
1. extract the `i`th column of `voxels` and call it `current_voxel`. 
2. Write a second `for` loop that iterates over `j` counting from 0 to the number of response vectors. Inside this second `for` loop:
    1. Increment the counter. 
    2. Store the `j`th column from the design matrix (a response vector) in the name `current_response_vector` 
    3. Store the `j`th category in `current_category`. 
    4. Using `LinearRegression`, fit a regression model of `current_response_vector` as independent variable and `current_voxel` as dependent variable.  
    5. Select the subplot with index `counter`. 
    6. Make a scatter plot of `current_response_vector` and `current_voxel`. 
    7. Compute the y-values of the regression line and call them `current_y` using the `predict` method of the `LinearRegression` model you just fit. 
    8. Plot the regression line. 
    9. Add a title showing `'voxel {i} / {category}'`. 

You should be able to easily identify which category has a positive regression slope for which voxel by examining these plots.

**(c)** [1.5pts] Now you can also display the corresponding line plots. 

Make a figure of shape `(20, 20)` called `fig_line_plots` in which you arrange 5 subplots vertically (they will be wide to accommodate a time series). Each subplot should plot the following for one voxel:
1. The voxel's BOLD timeseries
2. All 6 response vectors multiplied by their estimated slopes and incremented by their respective intercepts. This will display each response vector's best fit to the time series. 
3. Label every plot appopriately and add a legend. 
4. Label the x and y axes appropriately (you can use either time or TRs, just make sure the numbers are correct for the label).
5. Give each subplot a title that indicates the voxel and motor task being plotted.

To do this, you will need two nested for loops just as above. The necessary slopes and intercepts are exactly the same as above, but for each voxel time series the 6 predictions get plotted into the same line plot.

**2. Multiple linear regression** You will now perform multiple linear regression on the same voxels and look at the predicted responses.

**(a)** [0.5pts] Create a `LinearRegression` object called `reg_mult`. Fit the full model (all 6 independent variables) to all 6 voxels in one step using `reg_mult.fit` with the `motor_response_design` as the independent variable and `voxels` as the dependent variable.

**(b)** [0.5pts] Use the `reg_mult` object to compute the predictions for the full model on all 6 voxels using `motor_response_design`, and call the resulting predictions `y_hat`. 

Make a figure of size `(20, 2)` and plot the BOLD time series of the first voxel along with the predicted time series for the same voxel. Label both plots appopriately and plot a legend. Also label the x and y axes appropriately (you can use either time or TRs, just make sure the numbers are correct for the label).


**(c)** [1.5pts] Now we're going to pull apart the contributions of each motor task to the predicted time series of the full model. We'll visualize this by plotting the contributions to `y_hat` from every motor task so we can compare them with those we calculated with simple linear regression and get a feeling for how simple and multiple linear regression calculate different solutions. 

1. Use indexing to select the coefficients (weights) for all 6 motor tasks for the first voxel from the model `reg_mult.coef_`. Store them in a name called `weights_full_voxel0`. Also store the intercept for the first voxel in a name called `intercept_full_voxel0`
2. Create an empty list called `multiplied_response_vectors`
3. In a `for` loop that iterates over `i` from 0 to the number of motor tasks, calculate the contribution of each motor task to the predicted time series by multiplying the `i`th column with the `i`th weight from `weights_full_voxel0`. Append this product to `multiplied_response_vectors`. 
4. Sum `multiplied_response_vectors` along `axis=0` and add `intercept_full_voxel0` to this, storing the result in `y_hat_2`. 
5. Use `np.allclose` to see that `y_hat_2` and `y_hat` are equal. If they are not, make sure you selected the correct weights in the first step. 
6. Create a figure of `figsize=(20, 2)`. 
7. In a for loop that iterates over `i` from 0 to the number of motor tasks, plot each of the `multiplied_response_vectors` and give each one an appropriate label.
8. Add a legend.
9. Give the x and y axes appropriate labels.



**(d)** [0.5pts] Now make a similar plot to that from part **c**, but use the slopes and intercepts that you calculated using simple linear regression in question **1**. If you stored the models in question **1**, you can use them directly. Otherwise you can reuse your code from above, copy it to here and make it fit your needs.



For each column in `motor_response_design`, fit a `LinearRegression` to the first voxel, then predict its activity. Plot all of these predictions into one plot like above

**(e)** [0.5pts]We observe that the predicted quantities differ noticeably, so let's make a direct comparison in the same plot to get a better feel for this difference. We'll plot the multiple linear regression predictions in red, and the simple linear regression predictions in blue:

Create a figure of size `(20, 2)`, in which you plot the above two plots together. Use `plt.plot(x, y, 'r')` to plot in red and `plt.plot(x, y, 'b)` to plot in blue. Add appropriate axis labels, and, *if you want a small challenge*, a legend (but make sure it only has two entries, one for red, one for blue and not 6 for either).



We observe that the red lines (multiple regression) are more contained around 0 when the voxel is not active and show stronger activation when the voxel is active.

**3. Motor contrast maps**

In class you saw a series of localizer contrast maps for specialized visual processing regions. Now you will create similar contrast maps for the motor localizer which will identify regions of motor cortex that control different parts of the body. The motor contrast maps you will create are 

- move hand - rest
- move foot - rest
- move eyes - rest
- speak - rest
- speak - move mouth

Let's start by setting up the first one

**(a)** [0.5pts] Print `unique_motor_categories` in order to know which column of `motor_response_design` corresponds to which task. Create a `LinearRegression` model called `full_reg` and use it to fit all 6 motor tasks (`motor_response_design`) to all the cortical voxels (`data`). This model will give you a weight for each of the 6 motor tasks across all the cortical voxels. Extract all the weights for each of the 6 motor tasks from  `full_reg.coef_` and store them in corresponding names `weight_move_eyes`, `weight_move_foot`, ..., `weight_speak`.



**(b)** [1pt] Create the first contrast which localizes the region of motor cortex that controls the hand by by subtracting `weight_rest` from `weight_move_hand` and call it `c_hand_vs_rest`. Create a cortex volume called `vol_c_hand_vs_rest` and display it as a flatmap. Give it the title "Hand vs Rest".

**(c)** [1pt] Now create the remaining 5 contrasts and make a flatmap for each, just as in part **(b)**, calling the contrasts: `c_foot_vs_rest`, `c_eyes_vs_rest`, `c_speak_vs_rest`, and `c_speak_vs_mouth`.

**(d)** [0.5pts] *Extra credit* For the visual category localizer in class we had to contrast two different conditions - contrasting against `nothing/rest` didn't yield the desired result. Why do you think it is OK to use `rest` as a contrast here?

Answer in this cell by double-clicking it!

Make sure to submit your homework to OKpy!

In [None]:
_ = ok.submit()