# Linear Regression

## 1. Definition
Linear regression is the most basic form of supervised learning used for regression tasks. The main goal of linear regression is to find the best-fitting straight line (or called hyperplane in higher dimensions) that describe the linear relationship between input features and the target. The model is linaer because it assumes that output is a linear combination of the inputs.


## 2. Core Idea
Linear regression answers the question: “How does Y change when X changes?”

It assumes:
	•	each feature contributes additively
	•	the relationship is roughly linear


## 3. Mechanism
The key process in Linear Regression is determining the "best-fit" line by minimizing the error between the predicted values and the actual observed values. This is achieved using the Ordinary Least Squares (OSL) methods.

The linear equation can be expressed as: $\hat{Y} = w_1X_2 + w_2X_2 +...+ w_nX_n + b$

In vector form, this is written more concisely as: $\hat{Y}=w^TX+b$

The model predicts the target by taking a weighted sum of the input features. The weights ($\mathbf{w}$) and the bias ($b$) are the parameters the model learns.


* The "best-fit" line is the one that minimizes the sum of the squared vertical distances between the data points and the line itself. These vertical distances are called the residuals (or errors, $\epsilon$).
* Minimization: By squaring the residuals, the method heavily penalizes large errors and ensures that positive and negative errors don't cancel each other out.



## 4. Mathematical Details and Training
Training involves optimizing the weights ($\mathbf{w}$) and bias ($b$) to minimize the error defined by the loss function.
* Loss function: 

$$J(w,b) = \frac{1}{2m}\sum_{i=1}^m{(\hat{Y}_i - Y_i)}^2$$

    *  m: number of instance
    *  2m: (instead of m) to simplify the derivative calculation.


* Optimization:
    * 1. Method 1: Closed-form solution (Normal Equation): For smaller datasets, the optimal weights can be calculated directly without iteration using linear algbra: (Slow in high dimensions)
    $$\hat{w} = (X^TX)^{-1}X^TY$$
    * 2. Method 2: Iterative Solution (Gradient Descent): : For larger datasets, weights are iteratively adjusted by taking steps proportional to the negative of the gradient of the MSE function. The Gradient tells us the direction of the steepest ascent, so we move in the opposite direction to minimize the loss.

    $$\frac{\partial L}{\partial w} = -\frac{1}{n}X^T(y-\hat{y})$$

## 5. Pros and Cons
* Pros:
    * Highly interpretable: Weights show the influence of each feature
    * Extremetly fast to train and deploy, event with large datasets
    * Acts as a baseline for regression tasks.
* Cons:
    * Assumes a linear relationship, fails on complex, non-linear data
    * Very sensitive to outliers
    * Sensitive to multicollinearity


## 6. Production Consideration
* Must ensure consistent feature preprocessing (scaling, encoding)
* MOnitor for data drift
* Outliers can severely degrade performance



## 7. Other variants
* Polynomial RegressionsL Add polynomial terms ($X^2$ etc) to the linear equation to model non-linear relationships
* Regularized Regression: Adds a penalty term to the MSE loss function to prevent overfitting e.g. L1, L2

In [1]:
import os, sys
root_path = os.path.abspath("..")
sys.path.append(root_path)

from src.linear_regression import LinearRegression

In [2]:
import numpy as np
def create_dataset(n_samples=100):
    np.random.seed(42)

    X = np.linspace(0,10,n_samples).reshape(-1,1) # (n_sample, 1)
    y = 3 * X[:,0]+5
    return X,y 

In [3]:
X, y = create_dataset()
print(X.shape, y.shape)

(100, 1) (100,)


In [4]:
model = LinearRegression()
model.fit(X, y)

In [5]:
pred = model.predict(X)
print(pred)

[ 2.29554883  2.63965406  2.98375929  3.32786452  3.67196975  4.01607498
  4.36018021  4.70428544  5.04839067  5.3924959   5.73660113  6.08070636
  6.42481159  6.76891682  7.11302205  7.45712728  7.80123251  8.14533774
  8.48944297  8.8335482   9.17765342  9.52175865  9.86586388 10.20996911
 10.55407434 10.89817957 11.2422848  11.58639003 11.93049526 12.27460049
 12.61870572 12.96281095 13.30691618 13.65102141 13.99512664 14.33923187
 14.6833371  15.02744233 15.37154756 15.71565278 16.05975801 16.40386324
 16.74796847 17.0920737  17.43617893 17.78028416 18.12438939 18.46849462
 18.81259985 19.15670508 19.50081031 19.84491554 20.18902077 20.533126
 20.87723123 21.22133646 21.56544169 21.90954692 22.25365215 22.59775737
 22.9418626  23.28596783 23.63007306 23.97417829 24.31828352 24.66238875
 25.00649398 25.35059921 25.69470444 26.03880967 26.3829149  26.72702013
 27.07112536 27.41523059 27.75933582 28.10344105 28.44754628 28.79165151
 29.13575673 29.47986196 29.82396719 30.16807242 30.5