# Linear Regression

In this notebook, we will code the Linear Regression model from scratch to understand the theory behind this commonly used machine learning model.

YT: https://youtu.be/RIg3iuen7MY

## Implementation

Input data ("y" table) has "m" data points, "n" columns (features or independent variables), and "n" + 1 total of betas

In [1]:
# load library
import random

For the main function, we perform the steps as follows:
- First, we initialize the parameters with `initialize_params` based on the dimension of the input data ("n")
- We then compute the gradient of the betas using `compute_graident`
- Use the computed gradients to update the value of each beta using `update_params`
- We repeat the process multiple times

In [104]:
# main function
def linear_regression(X, y, iterations = 100, learning_rate = 0.01):
    X = X.to_numpy()
    y = y.to_numpy()

    n, m = len(X[0]), len(X)
    beta_0, beta_other = initialize_params(n)
    
    for _ in range(iterations):
        gradient_beta_0, gradient_beta_other = compute_gradient(X, y, beta_0, beta_other, n, m)
        beta_0, beta_other = update_params(beta_0, beta_other, gradient_beta_0, gradient_beta_other, learning_rate)
    return beta_0, beta_other

For the `initialize_params` function, we initialize "beta_0" as 0 and "beta_other" is a vector with the size of "n" that holds all the other randomly initialized betas.

In [3]:
# helper function: initialize params
def initialize_params(n):
    beta_0 = 0
    beta_other = [random.random() for _ in range(n)]
    return beta_0, beta_other

`compute_gradient` is the core of the algorithm where we compute gradients for all betas.
- Initialized all gradient betas as 0
- We loop through all data points and add gradient contributed by each data point to those variables:
    - First, we obtain the prediction "y_i_hat" for each data point "i"
    - Get the difference between the prediction ("y_i_hat") and the observation ("y[i]")
    - Use the difference to obtain the derivative of the error over "y" by multiplying the difference with 2
    - Get gradient of betas by diving each data point's gradient by "n" so the gradient computed at the end will be the average over all data points

In [92]:
# helper function: compute gradient
def compute_gradient(X, y, beta_0, beta_other, n, m):
    gradient_beta_0 = 0
    gradient_beta_other = [0] * n
    
    for i in range(m):
        y_i_hat = sum(X[i][j] * beta_other[j] for j in range(n)) + beta_0
        derror_dy = 2 * (y[i] - y_i_hat)
        
        for j in range(n):
            gradient_beta_other[j] += (derror_dy * X[i][j]) / m
        
        gradient_beta_0 += (derror_dy / m)
    
    return gradient_beta_0, gradient_beta_other

We use `update_params` to update all the betas using the gradient we obtained. We don't add gradients to betas, but we scale the gradient by multiplying it with the learning rate (a rate of speed where the gradient moves during a gradient descent; a learning rate too high will make gradient descent unstable, too low will make it slow to converge)

Note: We update betas using "+=" because of how gradients are computed in `compute_gradient`. We get the gradient of error with respect to "y" as "y[i]" - "y_i_hat". If "y_i_hat" is overestimated, "derror_dy" will be a negative value. That's why we add the gradient to betas.

In [5]:
# helper function: update params
def update_params(beta_0, beta_other, gradient_beta_0, gradient_beta_other, learning_rate):
    beta_0 += (gradient_beta_0 * learning_rate)
    
    for i in range(len(beta_other)):
        beta_other[i] += (gradient_beta_other[i] * learning_rate)
    return beta_0, beta_other

## Test

### Library & Data Preparation

In [18]:
# load library
import pandas as pd
from sklearn.linear_model import LinearRegression

In [142]:
# load data
df = pd.read_csv('house_price_data.txt', names = ["housesize", "rooms", "price"])
df.head()

Unnamed: 0,housesize,rooms,price
0,2104,3,399900
1,1600,3,329900
2,2400,3,369000
3,1416,2,232000
4,3000,4,539900


In [141]:
# normalize data
df_normalized = (df - df.min()) / (df.max() - df.min()) # min-max normalization
df_normalized.head()

Unnamed: 0,housesize,rooms,price
0,0.345284,0.5,0.433962
1,0.206288,0.5,0.301887
2,0.426917,0.5,0.37566
3,0.155543,0.25,0.11717
4,0.592388,0.75,0.698113


In [99]:
X = df_normalized[["housesize", "rooms"]] # independent variables
y = df_normalized["price"] # dependent variable

### Models Comparison

In [100]:
# sklearn linear regression model
model_sklearn  = LinearRegression().fit(X, y)

print("Intercept | Constant of Linear Regression Equation: ", model_sklearn.intercept_)
print("Coefficient of Linear Regression Equation: ", model_sklearn.coef_)

Intercept | Constant of Linear Regression Equation:  0.05578751828959755
Coefficient of Linear Regression Equation:  [ 0.95241114 -0.06594731]


In [140]:
# scratch model
model_scratch = linear_regression(X, y, iterations = 1000, learning_rate = 0.1)

print("Intercept | Constant of Linear Regression Equation: ", model_scratch[0])
print("Coefficient of Linear Regression Equation: ", model_scratch[1])

Intercept | Constant of Linear Regression Equation:  0.05019895636877814
Coefficient of Linear Regression Equation:  [0.9405309080534231, -0.04888038153049782]
