# Simple Linear Regression

We will show how to use linear regression to find a best fit line through an artificial 2D scatter plot generated from a line with Gaussian noise added.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="ticks", palette="muted", color_codes=True)

from stattools.glm import LinearRegression
from stattools.visualization import abline

In [None]:
# Set NumPy random number generator seed for replicability
np.random.seed(100)

## Create Some Artificial Data

In [None]:
n = 100
slope = 3
intercept = 2

x = np.random.uniform(0, 10, n)
y = slope * x + intercept + np.random.normal(0, 10, n)

## The Ordinary Least Squares Regression Model

### Fitting the Model

In [None]:
%time
model = LinearRegression().fit(x, y)

### Plotting the Fit

In [None]:
# Get intercept (a) and slope (b) of model
a = model.intercept
b = model.coef[0]

# Plot the regression line
plt.figure()
plt.scatter(x, y, c="b", alpha=0.7, edgecolor="k")
abline(intercept, slope, lw=3, c="k", label=f"Actual: y = {intercept} + {slope}x")
abline(a, b, lw=3, c="r", label=f"Prediction: y = {a:.3f} + {b:.3f}x")
plt.title("OLS Regression")
plt.legend(loc="best", frameon=True, shadow=True)
plt.show()
plt.close()

### Mean Squared Error

To evaluate this model's performance, we compute its *mean squared error* on the data set:
$$
\mathrm{MSE} = \frac{1}{n} \sum_{i=1}^n \left(y_i - \widehat{y}_i\right)^2.
$$
Here $n$ is the number of observations, $y_i$ is the $i$th observed value, and $\widehat{y}_i$ is the least squares estimate corresponding to the $i$th data point.
A good regressor achieves an MLE close to zero.