# Ordinary Least Squares (OLS) Theory and Derivation

This notebook covers the mathematical foundation of linear regression using Ordinary Least Squares.

## 1. Linear Regression Model

The linear regression model assumes a linear relationship between features and target:

$$y = X\beta + \epsilon$$

Where:
- $y$ is the target vector (n × 1)
- $X$ is the feature matrix (n × p)
- $\beta$ is the coefficient vector (p × 1)
- $\epsilon$ is the error term

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import sys
sys.path.append('../src')

from linear_regression import LinearRegression
from visualization import plot_regression_line

## 2. Cost Function

OLS minimizes the sum of squared residuals:

$$J(\beta) = \frac{1}{2m}\sum_{i=1}^{m}(h_\beta(x^{(i)}) - y^{(i)})^2$$

In matrix form:
$$J(\beta) = \frac{1}{2m}(X\beta - y)^T(X\beta - y)$$

In [None]:
# Generate sample data
np.random.seed(42)
X = np.random.randn(100, 1)
y = 2 * X.flatten() + 1 + 0.1 * np.random.randn(100)

# Fit model
model = LinearRegression()
model.fit(X, y)

print(f"Coefficient: {model.coef_[0]:.3f}")
print(f"Intercept: {model.intercept_:.3f}")

## 3. Normal Equation Derivation

To minimize the cost function, we take the derivative with respect to β and set it to zero:

$$\frac{\partial J}{\partial \beta} = \frac{1}{m}X^T(X\beta - y) = 0$$

Solving for β:
$$X^TX\beta = X^Ty$$
$$\beta = (X^TX)^{-1}X^Ty$$