# Linear regression
>Fit a linear equation of the form $h_w(x) = wx+w_0$ to data points $(x_i, y_i$) where $i = 1, 2, ..., n$.
>Linear regression interpolates the data points by a line. Outputs are extrapolated based on the interpolation.
- Notation per [AIMA](https://aima.cs.berkeley.edu/).

Assumptions for linear regression models per [interpretable-ml-book](https://christophm.github.io/interpretable-ml-book/limo.html):
- Requires linearity: The label must be a linear combination of the features.
- Normality: Target outcome given the features must be normally distributed.
- Homoscedasticity (constant variance): The variance of the error must be constant for all values of $x$.
- Independence: The error terms must be independent of each other.
- Fixed features: Input features are treated as fixed values, rather than RVs.
- Absence of multicollinearity: The features must be linearly independent of each other. (Hard to attribute changes in the output to changes in a particular feature if the features are correlated. Therefore hard to adjust the weights.)

In [1]:
import np as np

# Data
Taken from [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) to compare my implementation afterwards.
- Features: $X$
- Labels: $y$

In [2]:
X: np.ndarray = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
print(X)
y: np.ndarray = np.dot(X, np.array([1, 2])) + 3
print(y)

[[1 1]
 [1 2]
 [2 2]
 [2 3]]
[ 6  8  9 11]


# Model
- Hypothesis: $h_w(x) = wx+w_0$
- Outputs: $y_i + \epsilon_i$ ($\epsilon_i$ is the error in positive / negative direction)
##### Weights
- Intercept: $w_0$ (bias, not multiplied with any feature)
- Weights: $w_i$ (multiplied with feature $x_i$)

In [3]:
w_0: float = 1
w_1: float = 1
w_2: float = 1
W = np.array([w_1, w_2])

# Learning rate
alpha: float = 0.01
epochs: int = 5000

Weight update rule for intercept: $w_0 \leftarrow w_0 - \alpha(y-h_w(x))$.
Weight update rule for weights: $w_i \leftarrow w_i - \alpha(y-h_w(x))x_i$.

In [4]:
for i in range(epochs):
    y_hat: np.ndarray = np.dot(X, W) + w_0
    loss: np.ndarray = y_hat - y
    W = W - alpha * loss.dot(X)
    w_0 = w_0 - alpha * loss.sum()
    if i % 1000 == 0:
        print(f"Epoch {i}: Loss {loss.sum()}")

Epoch 0: Loss -16
Epoch 1000: Loss -0.022099870569299185
Epoch 2000: Loss -0.001178929150551511
Epoch 3000: Loss -6.455711109509821e-05
Epoch 4000: Loss -3.5555962654143514e-06


In [5]:
pred = np.dot(X, W) + w_0
print(pred)
print(y)

[ 5.99999977  7.99999969  9.00000021 11.00000014]
[ 6  8  9 11]


# Test

In [6]:
X_test: np.ndarray = np.array([[3, 5], [4, 5]])
y_test: np.ndarray = np.dot(X_test, np.array([1, 2])) + 3

pred_test = np.dot(X_test, W) + w_0
print(pred_test)
print(y_test)

[16.00000051 17.00000102]
[16 17]


##### Compare with self-implemented class
Just to verify that my class is implemented similarly to what I did above.

In [7]:
from oli.ml.models.LinearRegression import LinearRegression as LinearRegressionOli
oli_lr = LinearRegressionOli(alpha=0.01, epochs=5000)
oli_lr.fit(X, y)
pred = oli_lr.predict(X_test)

print(pred)
print(y_test)

[16.00000059 17.00000119]
[16 17]


## Compare with sklearn

In [8]:
from sklearn.linear_model import LinearRegression

lr = LinearRegression().fit(X, y)
lr = lr.predict(X_test)
print(pred)
print(y_test)

[16.00000059 17.00000119]
[16 17]


--> Looks good!
:)