# ðŸ“˜ Understanding Ordinary Least Squares (OLS) Regression

## 1. Overview

**Ordinary Least Squares (OLS)** is a method for estimating the coefficients of a linear regression model.  
It finds the best-fitting line (or hyperplane) that minimizes the sum of squared differences between observed and predicted values.

For a simple regression model:

$$
y = \beta_0 + \beta_1 x + \epsilon
$$

- $y$: dependent (output) variable  
- $x$: independent (input) variable  
- $\beta_0$: intercept  
- $\beta_1$: slope (coefficient)  
- $\epsilon$: error term (difference between prediction and observation)

---

## 2. Objective Function

OLS minimizes the **sum of squared residuals**:

$$
S = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$

where $\hat{y}_i = \beta_0 + \beta_1 x_i$ is the predicted value.

The goal is to find $\beta_0$ and $\beta_1$ that minimize $S$.

---

## 3. Closed-Form Solution

### For Simple Linear Regression

$$
\beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}
$$

$$
\beta_0 = \bar{y} - \beta_1 \bar{x}
$$

### For Multiple Linear Regression

If we have multiple predictors (features), the model can be written in matrix form as:

$$
\mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}
$$

The solution is:

$$
\boldsymbol{\beta} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}
$$

where  
- $\mathbf{X}$: matrix of input variables (with a column of 1s for the intercept)  
- $\mathbf{y}$: vector of outputs  
- $\boldsymbol{\beta}$: vector of coefficients

---

## 4. Why It's Called "Least Squares"

The method **minimizes the sum of squared residuals**, i.e., it chooses the line where the total squared error between predictions and actual data is the smallest possible.

---

## 5. Limitations of OLS

OLS works well under the following assumptions:
1. Linearity between inputs and outputs.  
2. Errors are independent and normally distributed.  
3. Constant variance of errors (homoscedasticity).  
4. No multicollinearity among predictors.

However, OLS struggles when:
- The relationship is **nonlinear**.  
- There are **outliers** (since squaring residuals amplifies large errors).  
- Predictors are **highly correlated** (multicollinearity).

---

## 6. Multicollinearity Explained

### Definition

**Multicollinearity** occurs when two or more independent variables are **highly correlated**, meaning they contain overlapping information.

For example:

$$
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon
$$

If $x_2 \approx 2x_1$, then $x_1$ and $x_2$ are nearly linearly dependent.

---

### Mathematical Problem

OLS computes coefficients via:

$$
\boldsymbol{\beta} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}
$$

If predictors are correlated, $\mathbf{X}^T \mathbf{X}$ becomes **nearly singular** (non-invertible).  
This causes numerical instability, making the coefficient e
