## Simple Linear Regression

In simple linear regression, we consider a single independent variable and a single dependent variable. The relationship between the variables can be represented by the equation:

$y = mx + c$

Where:

- `y` is the dependent variable
- `x` is the independent variable
- `c` is the y-intercept (the value of `y` when `x` is 0)
- `m` is the slope (the change in `y` for a unit change in `x`)

In higher dimension this equation becomes:

$y = wx + b$

The goal is to estimate the values of $w$ and $b$ that best fit the data.

### Geometric and Mathematical Intuition

<img src = "https://miro.medium.com/v2/resize:fit:450/1*bACFHSM81VJVr900rLoUog.png">
<img src = "https://miro.medium.com/v2/resize:fit:450/1*7miWhsUBOAYI8VmXuAp32g.png">

Geometrically, linear regression aims to find the line that minimizes the sum of squared distances between the observed data points and the predicted values on the line. The line is chosen such that the vertical distances between the points and the line are minimized.

### Linear Regression

$error for x_1 = y_1 - ŷ_1$

$error for x_2 = y_2 - ŷ_2$

$error for x_3 = y_3 - ŷ_3 = 0 \quad$ *$∵ [y_3 = ŷ_3]$*

Mathematically, the goal is to minimize the sum of squared residuals (also known as the residual sum of squares or RSS) given by:

$RSS = Σ(y - ŷ)^2$

Where:

- `y` is the observed value of the dependent variable
- `ŷ` is the predicted value of the dependent variable based on the regression line

The best-fit line is obtained by minimizing RSS, which can be achieved through various optimization techniques.

### Ordinary Least Squares (OLS) Estimation for Simple Linear Regression

<img src = "../../assets/ordinary_least_square.png">

The most common method to estimate the coefficients (`b` and `w`) in linear regression is the Ordinary Least Squares (OLS) estimation. It aims to minimize the sum of squared residuals by finding the values of `b` and `w` that minimize the following equation:

<div align="center">

$\frac{∂SSE}{∂b_0} = -2Σ(y - b_0 - b_1 * x) = 0$

$\frac{∂SSE}{∂b_1} = -2Σx(y - b_0 - b_1 * x) = 0$
</div>

Solving these equations simultaneously will yield the estimated coefficients $b_0$ and $b_1$:
<div align="center">

$w = \frac{Σ(x - x̄)(y - ȳ)}{Σ(x - x̄)^2}$

$b = ȳ - w * x̄$
</div>

Where:

- `x̄` is the mean of the independent variable `x`
- `ȳ` is the mean of the dependent variable `y`

These formulas can be computed efficiently, providing the best-fit line.

### Ordinary Least Squares (OLS) Estimation for Multiple Linear Regression

Multiple linear regression generalizes from simple linear regression by allowing more than one input variable $x_1$,$x_2$,..$x_d$. The ultimate goal is to find a relationship between input variables and the output variables.

Mathematically,

$\hat{y} = β_0 + β_1x_1 + β_2x_2 + ... +  β_dx_d$

Where,
-  $β_1$ through $β_d$ are the regression coefficients for inpendent variables $x_1$ through $x_d$.

Similar to linear regression, the regression model to get actual output variable is:

$y = β_0 + β_1x_1 + β_2x_2 + ... +  β_dx_d + ϵ$

Where,
- ϵ is random error or residual that represents the difference between actual output and predicted outputs.

In matrix form,

<div align="center">

$
\begin{bmatrix}
y_1 \\ 
y_2 \\
\vdots \\
y_n
\end{bmatrix} =   \begin{bmatrix}
  1 & x_{11} & \cdots & x_{1d} \\
  1 & x_{21} & \cdots & x_{2d} \\
  \vdots  & \vdots  & \ddots & \vdots  \\
  1 & x_{n1} & \cdots & x_{nd}
 \end{bmatrix}\begin{bmatrix}
\beta_0 \\ 
\beta_1 \\
\vdots \\
\beta_d
\end{bmatrix}+ \begin{bmatrix}
\epsilon_1 \\ 
\epsilon_2 \\
\vdots \\
\epsilon_n
\end{bmatrix}
$

</div>

We can write as,</br>
<div align="center">

$y = Xβ + ϵ$
</div>

Also,</br>
<div align="center">

$\text{SSE} =\sum_{i=1}^{n}\epsilon_i^2 =  \epsilon^T\epsilon = (y−Xβ)^T(y−Xβ)$
</div>

Solving for β through differentiating SSE both sides w.r.t. β and setting up by 0. We get estimated parameter from the normal equation as,
<div align="center">

$\hat{β}=(X^TX)^{−1} X^Ty$
</div>



## Optimization Methods

To estimate the coefficients efficiently, optimization methods are used. Two common optimization algorithms are Gradient Descent and Normal Equations.

### Gradient Descent

Gradient Descent is an iterative optimization algorithm that updates the coefficients gradually by descending along the negative gradient of the cost function. The cost function is typically the RSS, and the algorithm seeks to minimize it. The steps of Gradient Descent include:

1. Initialize the coefficients randomly or with some initial guess.
2. Calculate the predicted values using the current coefficients.
3. Calculate the gradient of the cost function with respect to each coefficient.
4. Update the coefficients by taking a step proportional to the negative gradient.
5. Repeat steps 2-4 until convergence is reached.

## Model Evaluation
After fitting the linear regression model, it's important to evaluate its performance and assess its quality. Here are some commonly used evaluation metrics:

### Mean Squared Error (MSE)
MSE measures the average squared difference between the predicted and actual values. It is computed as:

$MSE = \frac{Σ(y - ŷ)^2}{n}$

Where `n` is the number of data points.

### R-squared ($R^2$)
R-squared represents the proportion of the variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, where 1 indicates a perfect fit. It is calculated as:

$R^2 = 1 - \frac{RSS}{TSS}$

Where RSS is the residual sum of squares and TSS is the total sum of squares.

The RSS (Residual Sum of Squares) represents the sum of squared differences between the observed dependent variable values (y) and the predicted values (ŷ) obtained from the linear regression model. Mathematically, it is calculated as follows:

$RSS = Σ(y - ŷ)^2$

On the other hand, the TSS (Total Sum of Squares) represents the total variation in the dependent variable (y) from its mean (ȳ). It measures the sum of squared differences between each observed dependent variable value (y) and the mean of the dependent variable (ȳ). Mathematically, it is calculated as follows:

$TSS = Σ(y - ȳ)^2$

## References

- [What Is Linear Regression? Types, Equation, Examples, and Best Practices](https://www.spiceworks.com/tech/artificial-intelligence/articles/what-is-linear-regression/)
- [What is linear regression?](https://www.ibm.com/topics/linear-regression)