# Linear Regression

> The importance of the sweep operation in statistical computing is not so much that it is an inversion technique, but rather that is a conceptual tool for understanding the least squares process. Without this conceptual tool, it is **extremely difficult** to explain concepts such as absorption and what the R notation is testing in terms of the parameters of the model.  
> --James Goodnight (1978)

In [1]:
import sweepystats as sw
import numpy as np

Lets generate some random data. Here we simulated 10 samples each with 3 covariates. 

In [2]:
X = np.random.normal(10, 3, size=(10, 3))
beta = np.array([1., 2., 3.])
y = X @ beta + np.random.normal(5)

We can form an instance of the `LinearRegression` class and fit it as follows:

In [3]:
ols = sw.LinearRegression(X, y)
ols.fit()

100%|██████████████████████████████████████████████████| 3/3 [00:00<00:00, 8060.80it/s]


The resulting beta coefficient can be extracted as

In [4]:
beta = ols.coef()
beta

array([1.23248944, 2.18899434, 3.28287515])

The sum-of-square residuals is

In [5]:
resid = ols.resid()
resid

np.float64(9.464187503488802)

In addition, we can extract Var($\hat{\beta}$) and the standard deviation of $\hat{\beta}$:

In [6]:
cov = ols.cov()
std = ols.coef_std()
std

array([0.13228845, 0.17715883, 0.10127559])

We can also check R2 (coefficient of determination):

In [7]:
ols.R2()

np.float64(0.987899852888527)

## Comparison with `numpy`

For comparison, lets check whether the answer agrees with the least squares solution implemented in `numpy` package. 

In [8]:
# least squares solution by QR
beta, resid, _, _ = np.linalg.lstsq(X, y)
beta

array([1.23248944, 2.18899434, 3.28287515])

In [9]:
resid # true residuals

array([9.4641875])

`numpy` doesn't have built-in methods to extract Var($\hat{\beta}$) or std of beta, but we can manually extract them as:

In [10]:
# true Var(beta)
n, p = 10, 3
sigma2 = resid[0] / (n - p)
beta_cov = sigma2 * np.linalg.inv(X.T @ X)
beta_cov

array([[ 0.01750023, -0.01620656, -0.00247337],
       [-0.01620656,  0.03138525, -0.00973216],
       [-0.00247337, -0.00973216,  0.01025674]])

In [11]:
# true std of beta
beta_std = np.sqrt(np.diag(beta_cov))
beta_std

array([0.13228845, 0.17715883, 0.10127559])