# Overview
Thus far, we've been calculating statistics of our neural signals. That is, we have transformed our signal (e.g., filtering or calculating a TFR), and come up with some number to summarize it (e.g., average activity across trials).

However, neuroscience is about **linking the world to brain function**, and the best way to do this is to build a *model* that links the two. This is a more explicit way of defining how a change in the world results in a change in the brain.

Today, we'll cover the basics of **modeling**. We'll start with correlation, move to univariate regression, and we'll finish with multivariate regression.

![](https://imgs.xkcd.com/comics/linear_regression.png)


## Goals for today
* Understand correlation in context of simulated signals
* Understand correlation in the context of electrophysiology
* Relate correlation to univariate regression
* Use multivariate regression to ask more complicated questions about our data.

In [None]:
import mne
import numpy as np
import matplotlib.pyplot as plt
import datascience as ds
import neurods as nds
%matplotlib inline

---

# Correlation via simulation
> * In general, what does correlation reflect? If we say that two numbers are correlated, what do we mean?

To dive into this question, we'll create two variables from scratch and see what we can do with them...

In [None]:
# We'll begin by simulating a few signals, this will help with the intuition for what regression means
# First, we'll create a random variable
noise_amp = 1
n_pts = 50
a = 10 * np.random.random(n_pts)

# Now, we'll define a "weight" that causes a second variable to respond to it
weight = 2

# Finally, we'll create some noise so that it's not a perfect mapping
noise = noise_amp * np.random.randn(n_pts)

# Then let's mix them together. In this case, b is explicitly created from the values in a
b = weight * a + noise

In [None]:
# Let's look at the signals
f, ax = plt.subplots()
ax.plot(a, label='a')
ax.plot(b, label='b')
plt.legend()
plt.show()

Before we get into modeling, we'll begin with *correlation*. Look at the two plots above, they seem to be varying in similar ways. When one goes up, the other goes up, when one goes down, the other goes down. How can we quantify this?

> * What's the explicit relationship between these two signals?

Often it is better to investigate correlations between two signals by making a **scatterplot**. We'll use one here to see how related the two signals are related with one another:

In [None]:
f, ax = plt.subplots()
ax.scatter(a, b)
ax.set_xlabel('Signal A')
ax.set_ylabel('Signal B')

Our intuition from above seem to hold here as well. When values of the x-axis are large, values on the y-axis also tend to be large. We can quantify this with the correlation coefficient.

![](http://www.stat.yale.edu/Courses/1997-98/101/cor.gif)

There are many functions we can use to calculate the correlation coefficient. Below we'll use `np.corrcoef`.

> *Note: The `corrcoef` function will actually return a "correlation matrix". In this case, every row of `a` is correlated with every row of `b`, and displayed as a matrix. Since our variables are vectors, the output will be a 2 by 2 matrix and the 1st element of the 2nd row will be the correlation coefficient.*

In [None]:
# Calculate the correlation matrix
corr = np.corrcoef(a, b)
print(corr)
print('\n\n')

# then take the 1st val of the 2nd row.
print(corr[1, 0])

> * What would happen if we increased the noise parameter when constructing the signal above?

# Correlation vs. Regression

These two signals are nearly perfectly correlated, this is because above we've defined one signal to be a *linear function* of the other signal. In other words, like this:

$signal_a = w * signal_b + noise$

Having a high correlation basically tells us that the above formula is a "good" description of the relationship between these two variables. However it doesn't tell us anything about what `w` actually is.

However, what if we wanted to know the *actual* value of `w` used to general signal B from signal A?

To do this, we can use a related tool called **regression**. In regression, we can find any weight that defines the relationship between two signals. Instead of just asking "is this linear equation a good relationship between the two signals?", we can ask "what is the linear weight that describes the relationship between these two signals?".

For example, we could change the weight value above, and the underlying correlation would always be 1. However, regression will find a different value each time.


# Using regression on these simulated signals

In regression, we explicitly model one signal as a linear function of the other signal. The output of regression is a *weight* (not a correlation) that tells us how we can predict values of one signal using the other.

Here's the equation for a linear model:

$$y = \sum_{i=1}^{n}w_i x_i + \epsilon$$

This says: each output $y$ is predicted by weighting each feature $x$ with by corresponding weight $w$, and summing them together, then adding random noise $\epsilon$.

Regression is a technique for inferring what these weights are, given a dataset of inputs and outputs. Because we're using this explicit equation to model the relationship between these signals, linear regression is often called the simplest form of **modeling**.

> * What's the main difference between regression and correlation?

## Regression in Python: scikit-learn
There are many packages to do regression in python, but we'll focus on a Machine Learning package called `scikit-learn`.

Scikit-learn is **the** machine learning tool in python. It's also probably the most popular machine learning library across all languages right now. You could take an entire class JUST on machine learning with scikit-learn, but here is a very quick primer:

1. Scikit-learn contains "estimator" objects, that basically are a way of fitting different kinds of models to your data. You can create an estimator object by calling calling it (similar to a function): `myestimator = EstimatorObject()`.
2. Once you have an estimator object, you can fit it to data. Generally speaking, this means that we need *input* and *output* data. These always have a shape (n_samples, n_features).
3. We fit the estimator by using its `fit` method: `myestimator.fit(X, y)`.
4. After fitting the model to data, we can inspect the model coefficients that have been fit (these will exist as attributes created after calling `fit`, and always end in an underscore e.g.,: `myestimator.coef_`.
5. We can also use that model to predict new outputs given some inputs. We do this with the `predict` method, e.g.: `y_new = myestimator.predict(X_new)`.

> * Scikit-Learn also has excellent [tutorials](http://scikit-learn.org/stable/tutorial/) that describe how to do machine learning with the library.
> * Specifically, here's the section on [Linear Regression](http://scikit-learn.org/stable/tutorial/statistical_inference/supervised_learning.html#linear-regression)

We'll take a look at how to use this with some actual data below

# The least-squares solution in linear regression
The simplest way to find the weight that relates two signals to one another is to solve the "least squares" equation for the data.

What does "least-squares" mean in this context? It refers to the *error* of our model. Specifically, we can define the error of the model as the difference between our model predictions, and the "actual" values of the outputs. Since this can be either positive or negative, we square the difference so we can combine across datapoints.

$$error=\sum_{i=1}^n{(y - \hat{y})^2}$$

where

$$\hat{y} = \sum_{i=1}^{n}w_i x_i$$

When we "solve" for the least squares solution, we mean "determine the value `w` for each feature so that we minimize the above error.

Let's do this below.

In [None]:
# The simplest regression object solves the "Least Squares" problem.
from sklearn.linear_model import LinearRegression

In [None]:
# We'll simulate more random data so it's easier to change parameters
# First, we'll create a random variable
noise_amp = 1
n_pts = 50
a = 10 * np.random.random(n_pts)

# Now, we'll define a "weight" that causes a second variable to respond to it
weight = 2

# Finally, we'll create some noise so that it's not a perfect mapping
noise = noise_amp * np.random.randn(n_pts)

# Then let's mix them together. In this case, b is explicitly created from the values in a
b = weight * a + noise

In [None]:
# First, we'll add an extra dimension to both "a" and "b"
# This ensures that they are shape (n_samples, n_features)
a = a[:, np.newaxis]
b = b[:, np.newaxis]

# We'll create our regression model, and fit it to the data we created
# We won't fit an intercept, though it is easy to do so
reg = LinearRegression(fit_intercept=False)
reg.fit(a, b)

In this case, calling the `fit` method tells the model to find a set of coefficents (in this case just a single number) that, when combined with each value of "a", predicts a value of "b".

Now that the model is fit, we can access it's `coef_` attribute, or use it to predict new values:


In [None]:
# This is the relationship that the model has found between a and b
reg.coef_

Now that we have a **model** of signal B, using signal A, we can make predictions about new values. Below, we'll create a range of values to "test" our model, and see what the model outputs for each one.

In [None]:
# To test this out, we'll create range of values that span the values in "a"
# Let's see what the model predicts for each test value:
test_vals = np.linspace(a.min(), a.max(), 200)
b_preds = reg.coef_[0] * test_vals
f, ax = plt.subplots()
ax.scatter(a, b, color='k')
ax.scatter(test_vals, b_preds, color='r')
ax.set_xlabel('Signal A')
ax.set_ylabel('Signal B')

Scikit learn also makes this easy by giving you a `predict` method of an estimator object.

In [None]:
# We could have also just used the `predict` method of our estimator:
b_preds = reg.predict(test_vals[:, np.newaxis])

f, ax = plt.subplots()
ax.scatter(a, b, color='k')
ax.plot(test_vals, b_preds, color='r')

We can visualize the error in this model by drawing a line from each "actual" datapoint (the black dots), to each "predicted" datapoint (the red line):

In [None]:
fig, ax = plt.subplots()

# First plot the same data as above
ax.scatter(a, b, color='k')
ax.plot(test_vals, b_preds, color='r')

# Now plot error lines
for a_actual, b_actual in zip(a, b):
    # Fine the nearest predicted a to this "actual" a 
    ix_a_pred = np.argmin(np.abs(a_actual - test_vals))
    
    # Then look up its value on the line
    closest_predicted_b = b_preds[ix_a_pred]
    
    ax.vlines(a_actual, b_actual, closest_predicted_b)

What's the relationship between correlation and regression? Well, if we were to convert both the inputs and the outputs into **standard units** (AKA, so they had a mean == 0, and a variance == 1), then regression would give us the exact same answer as correlation.

In scikit-learn, this is called **scaling** the data.

In [None]:
# Scale our variables
a_scaled = (a - np.mean(a)) / np.std(a)
b_scaled = (b - np.mean(b)) / np.std(b)

# Alternatively we could do this with scikit-learn
from sklearn.preprocessing import scale
a_scaled = scale(a)
b_scaled = scale(b)

plt.hist(a, label='raw')
plt.hist(a_scaled, label='scaled')
plt.legend()
plt.show()

In [None]:
# Now re-run the regression
reg.fit(a_scaled, b_scaled)
coef_scaled = reg.coef_[0, 0]

# Note that the coefficient has changed:
print(coef_scaled)
print(corr)

> * Why isn't the coefficient exactly 1?
> * What would happen if we were to increase the noise levels when simulating the signals?

Notice how using the regression model allowed us to *predict* a new output. This turns out to be really useful, and we'll cover it in future lectures. For now, we'll take these methods into our neural data...

# Correlation and Regression in neural signals
One of the most challenging parts of neuroscience is figuring out how much information we can infer about the link between world and brain activity. Something that greatly affects this is how *correlated* all of our neural signals are with one another. Let's take a look at the correlations between our neural signals.

## The dataset
We'll use the same ECoG dataset that you used as homework last week.

The subject is listening to chords - some of them are consonant, some of them a dissonant. We are recording the brain activity from electrodes placed directly on the surface of the subject's brain. Moreover, these electrodes tend to be centered over auditory corte. We'd like to figure out if the brain processes consonant and dissonant chords differently.

In [None]:
# First we'll load the data. We'll load a downsampled version so that it's faster
path_ecog = '../../data/ecog/chords_task/'
data = mne.io.Raw(path_ecog + 'ecog_resamp-raw.fif', preload=True)

# We'll load the x/y positions of the sensors so we can plot on the brain
melec = ds.Table.read_table(path_ecog + 'meta_chans.csv')
melec = melec.where(~np.isnan(melec['x']))  # Drop electrodes without a position
lt = mne.channels.read_layout(path_ecog + 'brain.lout')
im = plt.imread(path_ecog + 'brain.png')

# We'll only take the first 3 minutes to save space
data.crop(0, 60 * 3)

## Linear relationships between channels
One of the first questions we ask with a new dataset is "how related are channels to one another?" In other words, are they correlated?

We can use the correlation coefficient to answer this question. We'll calculate the linear relationship between each channel and every other channel.

In [None]:
# As always, we'll begin by looking at the raw data.
# This time try to tell if channels are correlated with one another
_ = data.plot(scalings='auto')

Remember - correlations between two signals mean that when one signal is larger, the other also tends to be larger, and vice versa.

In [None]:
# We'll pick two neighboring channels at random, and construct a correlation matrix between them:
ch_a = data._data[10]
ch_b = data._data[11]
np.corrcoef(ch_a, ch_b)[1, 0]

In [None]:
# Let's compare that with a third channel that isn't right next to the other two:
ch_c = data._data[60]
np.corrcoef(ch_a, ch_c)[1, 0]

> * Why do you think these two correlation values are different?

In [None]:
# We can visualize where these electrodes are with respect to one another on the brain...
activity = np.zeros(data._data.shape[0])
activity[[10, 11, 16]] = 1
nds.viz.plot_activity_on_brain(melec['x'], melec['y'], activity, im)

To quickly look at the correlation between all of the channels, we can construct a "correlation matrix".

This is a matrix where each channel is correlated with all the other channels.

The output is a (channels x channels) matrix, that is symmetric about its diagonal:

In [None]:
cc_mat = np.corrcoef(data._data)
print(cc_mat.shape)

Below, we'll plot the correlation matrix for this grid. Each row is a channel, each column is a channel

In [None]:
f, ax = plt.subplots()
ax.imshow(cc_mat, vmin=0, vmax=.5, cmap=plt.cm.RdBu_r, interpolation='nearest')

> * What kind of structure do you see in this image?

> * What do you think that it means?

> * Do you think we get more information from correlated channels, or uncorrelated channels?

> * Why do you think two channels are correlated?

In [None]:
fig

In [None]:
# We can pull out one row of the correlation matrix.
# This is one channel's correlation with all other channels.
# Since we have the 2d locations of channels, we can visualize this
ix_seed = 25
cc_elec = cc_mat[ix_seed]

# This is a helper function to plot activity per electrode on the brain
ax = nds.viz.plot_activity_on_brain(melec['x'], melec['y'], cc_elec, im, vmin=0, vmax=.2)

# Using regression to ask more complex questions
If we want to do something more complex than looking at pairwise relationships between signals (like the correlation matrix above) then we are going to need regression.

Regression is powerful because of how flexible it is. The only thing we need to do is define input variables, and find a set of weights that mixes them together to predict an output variable. This means that we can use multiple input variables for a single output variable, something called **multiple regression**.

In [None]:
# We'll begin by loading event times for this dataset
time = ds.Table.read_table(path_ecog + 'meta_time.csv', index_col=0)

In [None]:
# Convert our time onsets from seconds to time indices
ix_onsets = time['start'] * data.info['sfreq']
ix_onsets = np.round(ix_onsets).astype(int)
ix_types = np.where(time['type'] == 'consonant', 1, 2)
events = np.vstack([ix_onsets, np.ones_like(ix_onsets), ix_types]).T

In [None]:
# As before, we could create an Epochs object, and calculate the average response for each one:
epochs = mne.Epochs(data, events, preload=True)
av = epochs.average()
_ = av.plot()

## Understanding what contributes to Global Field Power with regression
Recall that we calculated Global Field Power as a marker for how much activity there was overall among the electrodes. But how can we tell which electrodes are contributing the most to this global field power?

We can accomplish this by investigating the weights that we fit with a regression. It will tell us how much relationship each signal has to the overall GFP, which in turn will tell us which channels are contributing to it the most.

> * How do we calculate the Global Field Potential? What in general does it reflect?

In [None]:
# We'll calculate the global field power for these events
squared_ep = epochs._data ** 2
gfp_ep = squared_ep.mean(1)

# Now horizontally stack the epoched GFP / data so we can fit regressions
gfp = np.hstack(gfp_ep)
squared_raw = np.hstack(squared_ep)

In order to get a set of weights that we can interpret, we must **scale** the input features first, so that they all have ~the same amplitude

In [None]:
# Here we'll scale the inputs.
X = scale(squared_raw, axis=1)

Finally, we can use the input channel activity we've created to run a regression.

We'll fit a linear model, where the values of each channel are used to predict the global field power of the entire grid.

Don't forget, sklearn expects inputs of shape `(n_samples, n_features)`

In [None]:
# Finally, we can use regression to predict the GFP using channel data
mod = LinearRegression(fit_intercept=False)
mod.fit(X.T, gfp[:, np.newaxis])

The coefficients of this model represent how much each channel "contributes" to the GFP.

In [None]:
# The coefficients of this model represent how much each channel predicts GFP
coefs_raw = mod.coef_[0]
coefs_raw

In [None]:
# Visualizing the variables is easier
tab = ds.Table().with_columns([('name', data.ch_names),
                               ('coef', coefs_raw)])
tab.plot(select='coef')

We'll visualize this on the brain with a helper function.

In [None]:
nds.viz.plot_activity_on_brain(melec['x'], melec['y'], mod.coef_[0], im)

> * What can you conclude from the plot above? Which channels are driving the GFP?
> * Why would a channel be a large contributor to Global Field Power?

In [None]:
# Now let's see if that matches with the ERPs for this subject
fig = av.plot_topo(lt, fig_background=im)

* Do you see a perfect match between the two?

# Lab time!