---
title: Regression
project:
  type: website
format:
  html:
    code-fold: true
    code-tools: true
jupyter: python 3
number-sections: false
filters:
    - pyodide
---

In this notebook, we'll implement an example application of linear regression applied to behavioral and electrophysiologicial data.

```{pyodide-python}
# Load modules
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.stats import pearsonr
import statsmodels.api as sm
```

# Load the data

The example behavioral and electrophysiologicial data is [regression_example_data.csv](https://raw.githubusercontent.com/Mark-Kramer/BU-MA665-MA666/master/Data/regression_example_data.csv). Get these data, and load them:

```{pyodide-python}
df = pd.read_csv("https://raw.githubusercontent.com/Mark-Kramer/BU-MA665-MA666/master/Data/regression_example_data.csv")

# Extract the variables from the loaded data
task_performance = np.array(df.iloc[:,0])  #Get the values associated with the first column of the dataframe
firing_rate = np.array(df.iloc[:,1])  #Get the values associated with the second column of the dataframe
```

# Visualize the data

```{pyodide-python}
# Plot it ...
plt.figure()
plt.plot(firing_rate, task_performance, '.')
plt.xlabel('Firing rate [Hz]')
plt.ylabel('Task Performance [a.u.]')
plt.show()
```

# Correlation

Compute the [correlation](https://numpy.org/doc/stable/reference/generated/numpy.correlate.html) between $x$ and $y$. 

```{pyodide-python}
N       = np.size(firing_rate)
x       = firing_rate - np.mean(firing_rate)
y       = task_performance - np.mean(task_performance)
sigma_x = 'SOMETHING'    #Standard deviation of x
sigma_y = 'SOMETHING'    #Standard deviation of y

correlation = 'SOMETHING'
print(correlation)
```

# Regression (compute it)

Model the data using regression.

```{pyodide-python}
from statsmodels.formula.api import ols

data = {"x": firing_rate, "y": task_performance}

res1 = ols("y ~1 + x", data=data).fit()
res1.summary()
```

# Regression (plot it)

Plot the estimated regression model with the data.

```{pyodide-python}
# Get model prediction.
fitted_values = res1.fittedvalues

# Sort x values for better plotting of the regression line
x_sorted = np.sort(firing_rate)
fitted_sorted = np.sort(fitted_values)

# Plot the regression line (fitted model)
plt.figure()
plt.scatter(firing_rate,task_performance)
plt.plot(x_sorted, fitted_sorted, label="Fitted Model", color="red")

plt.xlabel('Firing rate [Hz]')
plt.ylabel('Task Performance [a.u.]')
plt.show()
```

---
# Regression example (Part 2)

We learn that an additional predictor - age - impacts task performance.

```{pyodide-python}
df = pd.read_csv("https://raw.githubusercontent.com/Mark-Kramer/BU-MA665-MA666/master/Data/regression_example_data.csv")

# Extract the variables from the loaded data
task_performance = np.array(df.iloc[:,0])
firing_rate = np.array(df.iloc[:,1])
age = np.array(df.iloc[:,2])
```

# Visualize the new data

```{pyodide-python}
# Plot it ...
plt.figure()
plt.plot(age, task_performance, '.')
plt.xlabel('Age [months]')
plt.ylabel('Task Performance [a.u.]')
plt.show()
```

# Correlation (between task performance and age)

```{pyodide-python}
# Compute the correlation between task performance and age
```

# Visualize all data

```{pyodide-python}
fig = plt.figure(figsize=(12, 12))
ax  = fig.add_subplot(projection='3d')
ax.scatter(age, firing_rate, task_performance)
ax.set_xlabel('Age [months]')
ax.set_ylabel('Firing Rate [Hz]')
ax.set_zlabel('Task Performance');
plt.show()
```

# Regression (compute it with all data)

Model all data using regression.

```{pyodide-python}
from statsmodels.formula.api import ols
data = {"firing_rate": firing_rate, "age": age, "y": task_performance}
# Write the model and print out the summary
```

# Regression (plot it with all data)

Plot the estimated regression model with two predictors and all data.

```{pyodide-python}
# Plot the mean model fit.

# First, plot the data.
fig = plt.figure(figsize=(10, 10))
ax  = fig.add_subplot(projection='3d')
ax.set_xlabel('Age [months]')
ax.set_ylabel('Firing Rate [Hz]')
ax.set_zlabel('Task Performance')
ax.scatter(age, firing_rate, task_performance)

# Then, define model parameter estimates.
# REPLACE THESE VALUES WITH YOUR PARAMETER ESTIMATES
alpha  = 1
beta_1 = 1
beta_2 = 1

# Finally, plot the model fit.
x      = np.arange(8, 12, 0.1)          # Firing rate
y      = np.arange(10,20, 0.1)          # Age
xx, yy = np.meshgrid(x, y)              # Two dim coordinates
zz     = alpha + beta_1*xx + beta_2*yy  # Model predictions
ax.plot_surface(yy,xx,zz);
plt.show()
```