# Linear Regression
Linear regression models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to the observed data.

$$
\mathbf{y} = \mathbf{X\beta} 
$$

Where:
- $n$: amount of observations
- $p$: amount of features
- $\mathbf{y} \in \mathbb{R}^{n \times 1}$: observed outputs/target vector
- $\mathbf{X} \in \mathbb{R}^{n \times p}$: feature matrix (rows are observations, column are feature, includes column of 1s for intercept)
- $\mathbf{\beta} \in \mathbb{R}^{p \times 1}$: coefficient vector/parameters

In [21]:
from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
import pandas as pd
import numpy as np

diabetes = load_diabetes()
df = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)
Y = np.array(diabetes.target)
X = np.concatenate((df.to_numpy(), np.ones((df.shape[0], 1))), axis=1)

### Least Squares Method
The Least Squares Method analytically minimizes the cost function, or the sum of squared errors (residuals):

$$
J(\beta) = ||\mathbf{y - X\beta}||^2 = (\mathbf{y - X\beta})^{\top}(\mathbf{y - X\beta})
$$

Derivation:
$$
J(\beta) = \mathbf{y^{\top}y} - 2 \mathbf{\beta^{\top}X^{\top}y} + \mathbf{\beta^{\top}X^{\top}X\beta} 
$$

$$
\nabla_{\beta}\mathbf{J} = -2\mathbf{X^{\top}y} + 2\mathbf{X^{\top}X\beta}
$$

$$
\mathbf{2X^{\top}y+2X^{\top}X\beta} = 0
$$

$$
\mathbf{X^{\top}X\beta=X^{\top}y}
$$

$$
\beta = \mathbf{(X^{\top}X)^{-1}X^{\top}y}
$$

In [22]:
beta = np.linalg.inv(X.T @ X) @ X.T @ Y
print("Least Squares Coefficents:")
for i in range(1, 11):
    print(f"beta {i}: {round(beta[i-1], 3)}")
print("Least Squares Intercept:", round(beta[-1], 3), "\n")

Least Squares Coefficents:
beta 1: -10.01
beta 2: -239.816
beta 3: 519.846
beta 4: 324.385
beta 5: -792.176
beta 6: 476.739
beta 7: 101.043
beta 8: 177.063
beta 9: 751.274
beta 10: 67.627
Least Squares Intercept: 152.133 



### Gradient Descent Method
The Gradient Descent Method minimizes the cost function by iteratively updating the parameters in the direction of the negative gradient:

$$
J(\beta) = \frac{1}{2n}||\mathbf{y - X\beta}||^2 
$$

$$
J(\beta) = \frac{1}{2n}(\mathbf{y - X\beta})^{\top}(\mathbf{y - X\beta})
$$

$$
\nabla_{\beta}J = \frac{1}{2n}\nabla_{\beta}[\mathbf{y^{\top}y}-2\mathbf{\beta^{\top}X^{\top}y+\beta^{\top}X^{\top}X\beta}]
$$

$$
\nabla_{\beta}J = \frac{1}{2n}[-2\mathbf{X^{\top}y}+2\mathbf{X^{\top}X\beta}]=\frac{1}{n}\mathbf{X^{\top}}\mathbf{(X\beta-y)}
$$

Update Rule:
$$
\beta^{(t+1)}=\beta^{t}-\alpha\nabla_{\beta}J
$$
Where:
- $\alpha$ is the learning rate
- $t$ is the current iteration
- $\nabla_{\beta}J=\frac{1}{n}\mathbf{X^{\top}}\mathbf{(X\beta-y)}$
- $||\beta^{(t+1)}-\beta^{t}||_2<\epsilon$ is the stopping condition


In [23]:
beta = np.zeros((X.shape[1]))
prev = np.ones_like(beta)
alpha = 1.97 # manually tuned
epsilon = 1e-5
iterations = 0
while (np.linalg.norm(beta - prev) > epsilon):
    prev = beta.copy()
    beta -= alpha * 1/X.shape[0] * X.T @ (X @ beta - Y) 
for i in range(1, 11):
    print(f"beta {i}: {round(beta[i-1], 3)}")
print("Least Squares Intercept:", round(beta[-1], 3), "\n")

beta 1: -10.009
beta 2: -239.815
beta 3: 519.848
beta 4: 324.384
beta 5: -791.99
beta 6: 476.591
beta 7: 100.96
beta 8: 177.039
beta 9: 751.204
beta 10: 67.627
Least Squares Intercept: 152.133 



### Solution from Sklearn

In [24]:
model = LinearRegression()
model.fit(df, Y)
coef = model.coef_
intercept = model.intercept_
for i in range(1, 11):
    print(f"beta {i}: {round(coef[i-1], 3)}")
print("Least Squares Intercept:", round(intercept, 3), "\n")

beta 1: -10.01
beta 2: -239.816
beta 3: 519.846
beta 4: 324.385
beta 5: -792.176
beta 6: 476.739
beta 7: 101.043
beta 8: 177.063
beta 9: 751.274
beta 10: 67.627
Least Squares Intercept: 152.133 

