# Overview
Thus far, we've been calculating statistics of our brain signals. That is, we have transformed our signal (e.g., filtering or calculating event related responses), and come up with some number to summarize it (e.g., average activity across time).

However, neuroscience is about **linking the world to brain function**, and the best way to do this is to build a *model* that links the two. This is a more explicit way of defining how a change in the world results in a change in the brain.

Today, we'll cover the basics of **modeling**. We'll start with correlation, move to univariate regression, and we'll finish with multivariate regression.

## Goals for today
- Neuroscience concepts
    - Simulating a signal
    - Correlations between neural signals
- Coding concepts
    - Implementing mathematical functions (e.g. ordinary least squares)
- Datascience concepts
    - Correlation
    - Regression
    - Relationship b/w them
    - Conceptual introduction to multiple regression
---

# Correlation via simulation

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import neurods
import os
%matplotlib inline

In [None]:
# We'll begin by simulating a few signals, this will help with the intuition for what regression means
# First, we'll create a random variable
noise_amp = 5
n_pts = 50
a = 10 * np.random.random(n_pts)

# Now, we'll define a "weight" that causes a second variable to respond to it
weight = 2

# We will add a baseline:
baseline = 10

# Finally, we'll create some noise so that it's not a perfect mapping
noise = noise_amp * np.random.randn(n_pts)

# Then let's mix them together. In this case, b is explicitly created from the values in a
b = baseline + weight * a + noise

# Let's look at the signals
f, ax = plt.subplots()
ax.plot(a)
ax.plot(b)
ax.legend({'a','b'})


> **What's the explicit relationship between these two signals?**

Before we get into modeling, we'll begin with *correlation*. Look at the two plots above, they seem to be varying in similar ways. When one goes up, the other goes up, when one goes down, the other goes down. How can we quantify this?

First, we'll use a scatterplot to see how related the two signals are

In [None]:
f, ax = plt.subplots()
ax.scatter(a, b)
plt.xlabel('a', fontsize = 16)
plt.ylabel('b', fontsize = 16 )

Our intuition from above seem to hold here as well. When values of the x-axis are large, values on the y-axis also tend to be large. We can quantify this with the correlation coefficient.

*Note: The `corrcoef` function will actually return a "correlation matrix". In this case, every row of `a` is correlated with every row of `b`, and displayed as a matrix. Since our variables are vectors, the output will be a 2 by 2 matrix and the 1st element of the 2nd row will be the correlation coefficient.*

In [None]:
# Calculate the correlation matrix, then take the 1st val of the 2nd row.
corr = np.corrcoef(a, b)[1, 0]
print(corr)

> **What would happen if we increased the noise parameter when constructing the signal above?**

### Constructing your own function to compute the correlation coefficient:

The correlation between two variables is scale free, i.e. it is not affected by the magnitude of each variable. The correlation coefficient measures the extend to which two variables have the same behavior (they increase or decrease together). 

1- The first step in computing correlation is to bring both variables to the same scale. This can be done by zscoring:

In [None]:
from scipy.stats import zscore
a_zs = zscore(a)
b_zs = zscore(b)

f, ax = plt.subplots()
ax.plot(a_zs)
ax.plot(b_zs)
ax.legend({'a_zs','b_zs'})

2- The next step is to multiply elementwise the two time series:

In [None]:
c = a_zs * b_zs 

f, ax = plt.subplots()
ax.plot(c)
ax.legend({'c'});

3- The last step is to get the average of the elementwise product:

In [None]:
corr = np.mean(c)
print(corr)

### Mathematical expression

The correlation between vectors $A$ and $B$ of length N can there be estimated as:
$\frac{1}{N} \sum_{i=1}^N \frac{ (a_i - \mu_A) (b_i - \mu_B)  } {\sigma_A \sigma_B}  = \frac{1}{N} \sum_{i=1}^N \frac{ (a_i - \mu_A) }{\sigma_A} \frac{(b_i - \mu_B)  } { \sigma_B}  = \frac{1}{N} \sum_{i=1}^N  a_i' b_i' $

where $A'$ and $B'$ are normalized versions of $A$ and $B$ that have a 0 mean and a variance of 1.
(in some textbooks you might find the above expression divided by N-1 instead of N, but we will not get into this subtelty here).

#### Breakout session
Write a function that:
- takes as input two time series x and y and
- returns their correlation coefficient.
- You should repeat the 3 steps above (without the plots).
- Use your function to compute the correlation of a and b from above.

In [None]:
def my_corr(x,y):
### STUDENT ANSWER


### Simulated fMRI data example

Now let's go back to fMRI responses. Download the following dataset. The code below uses the stim_resp_plot function to plot the saved stimulus and associated data:

In [None]:
# Plotting function from Lecture08 that we will continue to use
def stim_resp_plot(t, stimulus, response, yl=(-0.2, 1.2), label_stim='Stimulus', label_resp='BOLD response (HRF)'):
    """Plot stimulus and response."""
    plt.figure(figsize=(10,4))
    plt.stem(t, stimulus, linefmt='k-', markerfmt='.', basefmt='k-', label=label_stim)
    plt.plot(t, response, 'r.-', label=label_resp)
    plt.ylim(yl)
    plt.xlim([-1,t.max()+1])
    plt.xlabel('Time (seconds)')
    plt.ylabel('Response (arbitrary units)')
    _ = plt.legend()

#Load example data
fileurl = 'https://www.dropbox.com/s/y9qjnno9b3vqoco/example_data_01.npz?dl=0'
filename = 'example_data_01.npz'
neurods.io.download_file(fileurl, filename,
                         root_destination=os.path.abspath(os.curdir),
                         replace = True)
ex_data = np.load(filename)
t = ex_data['t']
n_tps = len(t)
stimulus = ex_data['x']
data_sim = ex_data['data']
stim_resp_plot(t, stimulus, data_sim, yl=(-2, 5), label_stim='Stimulus', label_resp='Simulated data')

In [None]:

f, ax = plt.subplots()
ax.scatter(stimulus, data_sim)
plt.xlabel('stimulus', fontsize = 16)
plt.ylabel('data_sim', fontsize = 16 )
corr2 = my_corr(stimulus, data_sim)

print("The correlation between the stimulus and the data is {}".format(corr2))

What is going on? ==> survey

The presentation of the stimulus should create a hemodynamic response if this voxel is sensitive to that stimulus. We therefore need to convolve the stimulus first with the hemodynamic response function. 

But first, look at the time vector:

In [None]:
t[:10]

The data is sampled with a TR of 2 seconds! Then we need an hrf that is sampled with the same rate. This will be the same curve as before, but is sampled differently.

In [None]:
t2, hrf_2 = neurods.fmri.hrf(tr=2)
plt.plot(t2, hrf_2);

**Plot the stimulus and the convolved stimulus**

Convolve the stimulus with the HRF and plot

In [None]:
conv_stimulus = np.convolve(stimulus, hrf_2, mode='full')[:n_tps]
stim_resp_plot(t, stimulus, conv_stimulus, label_stim='Stimulus', label_resp='Convolved stimulus');

Zoom into the first 50 time points

In [None]:
stim_resp_plot(t[:100], stimulus[:100], conv_stimulus[:100],
               label_stim='Stimulus', label_resp='Convolved stimulus');

**Plot the convolved stimulus and the simulated voxel data**

In [None]:
stim_resp_plot(t, conv_stimulus, data_sim, yl=(-2, 5),
               label_stim='Convolved stimulus', label_resp='Simulated data')

In [None]:
stim_resp_plot(t[:100], conv_stimulus[:100], data_sim[:100], yl=(-2, 5),
               label_stim='Convolved stimulus', label_resp='Simulated data')

In [None]:
plt.scatter(conv_stimulus, data_sim);
print("the correlation between the stimulus and the data is {}".format(np.corrcoef(conv_stimulus, data_sim)[0,1]))

We can now see that we are able to recover a clear relationship between the stimulus and the data.

What is the variance of the noise that we can guess from this plot? Had this been real data, this would have been a very clean result. Usually in fMRI we are not so lucky to have effects that are this clear. We will study in future lectures how to expand this analysis.

### Important Observation 
So far we have measured the correlation between one stimulus and brain activity. We are also interested to find out how much this particular stimulus affects brain activity. 

Voxels in different parts of the brain can be differently responsive to a stimulus, or not responsive at all. In fMRI, we are interested in finding how different voxels are responding to an event or stimulus. We will therefore introduce a new parameter: $w^v$, that describes the strength with which a voxel $v$ responds to the stimulus:

$ \text{response}^v(t) = w^v \times  \text{convolved-stimulus} (t) $

We already know how to compute $\text{convolved-stimulus}(t)$. 

In the next steps, we will gradually learn how we can estimate $w^v$ from the data. We will try to find if and how responsive voxel $v$ is to a stimulus.

# Regression via simulation

The simulated signals a and b are highly correlated, this is because above we've defined one signal to be a *linear function* of the other signal. In other words, like this:

$signal_a = w_0 + w * signal_b + noise$

How is correlation related to modeling? Basically, correlation values must vary between -1 and 1. In modeling, however, we can use an arbitrary size for the weight that defines the relationship between two signals. For example, we could change the weight value above, and the underlying correlation would always be 1.

What if we wanted to recover the *actual* weight that we used, instead of a scaled correlation number? For this, we must use *regression*.

In regression, we explicitly model one signal as a linear function of the other signal. The output of regression is a *weight* (not a correlation) that tells us how we can predict values of one signal using the other.

Here's the equation for a linear model:

$$y(t) = w_0 + w x(t)+ \epsilon(t)$$

This says: each output $y(t)$ is predicted by weighting each feature $x(t)$ with by corresponding weight $w$, and summing them together, then adding random noise $\epsilon(t)$.

Regression is a technique for inferring what these weights are, given a dataset of inputs and outputs.

Let's go back to the scatter plot we had earlier of a and b:

In [None]:
f, ax = plt.subplots()
ax.scatter(a, b)
plt.xlabel('a', fontsize = 16)
plt.ylabel('b', fontsize = 16 )
# this line adds a line of best fit:
plt.plot(np.unique(a), np.poly1d(np.polyfit(a, b, 1))(np.unique(a)),'r')

Regression minimizes the least squares error:
    
$$\min_{w_0,w} \sum_{t = 0}^{T-1} (y(t) - w_0 - w x(t) )^2$$

There are many packages to do regression in python, but we will use for now the polyfit module with a polynomial order of 1:

In [None]:
# this function first a straight line through the points above
slope, intercept = np.polyfit(a, b, 1)
print(slope)
print(intercept)

Going back to our brain data:

In [None]:
plt.scatter(conv_stimulus, data_sim);
plt.plot(np.unique(conv_stimulus), np.poly1d(np.polyfit(conv_stimulus, data_sim, 1))(np.unique(conv_stimulus)),'r')

Can we estimate from this data the magnitude of the weight $w_v$?

In [None]:
# this function first a straight line through the points above
slope, intercept = np.polyfit(conv_stimulus, data_sim, 1)
print(slope)
print(intercept)

Quiz 2

We saw in class that we can derive the ordinary least squares (OLS) solution by taking the derivative of the objective function with respect to our parameters $W = [w_0, w]$. 

We define X as being a matrix with two columns: the first is a columns of 1s and the second is our input variable, (e.g. conv_stimulus). Y corresponds to the output variable (e.g. data_sim).

The OLS solution is:

$ W = (X^\top X)^{-1}X^\top Y$

#### Breakout session:
- using the inv and np.dot functions, implement the OLS solution. Your algorithm should return a vector W. 
- use the OLS solution in the cell below to print the slope and the intercept

In [None]:
from numpy.linalg import inv

def my_OLS(X,Y):
### STUDENT ANSWER


In [None]:
X = np.vstack([np.ones_like(conv_stimulus),conv_stimulus]).T
print(X.shape)
Y = data_sim
print(Y.shape)
### STUDENT ANSWER


Remember, what is units of this value? fMRI signal doesn't have a unit and can be rescaled and normalized. The weight therefore depends on how the data is normalized and is only meaningful with respect to the variance of the data. 

What's the relationship between correlation and regression? Well, if we were to convert both the inputs and the outputs into **standard units** (AKA, so they had a mean == 0, and a variance == 1), then regression would give us the exact same answer as correlation.

We have seen this before as zscoring the data.

In [None]:
# Scale our variables
X_zs = np.vstack([np.ones_like(conv_stimulus),zscore(conv_stimulus)]).T
Y_zs = zscore(data_sim)
W_zs = my_OLS(X_zs,Y_zs)
print('the slope is {} and the intercept is {}'.format(W_zs[1], W_zs[0]))
corr_standardized = my_corr(zscore(conv_stimulus),Y_zs)
print('the correlation between the standandized conv_stimulus and response is {}'.format(corr_standardized))

Now let's go back to our estimate of the weights before standardization:

In [None]:
X = np.vstack([np.ones_like(conv_stimulus),conv_stimulus]).T
print(X.shape)
Y = data_sim
print(Y.shape)
W = my_OLS(X,Y)

Assume that the stimulus appears again in a different run. Now it appears at times 10, 20, 30-40 and 70. The run lasts 120s.

#### Breakout session
Using the model predicted above in W, give your best prediction about how the signal would look like in that hypothetical voxel:
- convolve the stimulus variable below with hrf_2
- use the model W to predict activity. You will have to do a similar stacking as we did above.
- use the stim_resp_plot function to plot the stimulus with your predicted response

In [None]:
stimulus_times = [10,20,30,32,34,36,38,40,70]
t = np.arange(120)
stimulus = np.zeros((120))
stimulus[stimulus_times] = 1
### STUDENT ANSWER


We were able to use the regression model to *predict* a new output. This turns out to be really useful, and we'll cover it in future lectures.

What will happen when we have more than one stimulus?