# Linear Regresion Example

In this example, we create a training set and find the best linear model that fits it. First, we generate a training set of $N = 1000$ points $\{(x_n, y_n)\}_{n=1}^N$ where $x_n$ are independent and uniformly distributed in $[0, 1]$ and $y_n = 2\|x_n\|^2 + \varepsilon_n$ where $\varepsilon_n$ are uniformly distributed in $[-1, 1]$.

In [1]:
from matplotlib import pyplot as plt
import numpy as np

N = 1000
sigma = 1

x_train = np.random.rand(N, 2)
eps = 2*(np.random.rand(N) - .5)
y_train = 2*x_train[:,0]**2 + 2*x_train[:,1]**2 + eps

Next, we use the *sklearn* package to find the best linear model for the training data. The output is the vector $\hat{\beta}$ to be used in the linear model $f(x) = \langle x,\hat{\beta} \rangle$. Re-generate the training set and run this code multiple times and observe that $\hat{\beta}$ changes (it depends on the training set).

In [2]:
from sklearn.linear_model import LinearRegression

model = LinearRegression(fit_intercept=False)
model.fit(x_train, y_train)
beta_hat = model.coef_

print("beta_hat = {}".format(beta_hat))

beta_hat = [1.39117959 1.42571725]


We had an explicit formula for $\hat{\beta}$ in class, namely $\hat{\beta} = (D^T D)^{-1}D^T y$. We compute $\hat{\beta}$ using this formula and verify that it matches the sklearn package.

In [3]:
beta_hat_formula = np.matmul(np.matmul(np.linalg.inv(np.matmul(x_train.transpose(), x_train)),x_train.transpose()),y_train)

print("beta_hat computed using formula = {}".format(beta_hat_formula))

beta_hat computed using formula = [1.39117959 1.42571725]
