# Linear Regression



Each data point $x$ is a vector $\mathbf{x} \in \mathbb{R}^m$ consisting of $m$ features, with an associated output $y$. Assuming that there is a linear line of best fit, we can use linear approximation to predict the y values, with given x vector. 


$$ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_p x_p + \epsilon $$

For the second data point, we denote:

$$ y_2 = \beta_0 + \beta_1 x_{21} + \beta_2 x_{22} + \dots + \beta_p x_{2p} + \epsilon_2 $$

For multiple data points:

$$
\begin{aligned}
&\text{First data point:} \quad y_1 = \beta_0 + \beta_1 x_{11} + \beta_2 x_{12} + \dots + \beta_p x_{1p} + \epsilon_1 \\
&\text{Second data point:} \quad y_2 = \beta_0 + \beta_1 x_{21} + \beta_2 x_{22} + \dots + \beta_p x_{2p} + \epsilon_2 \\
&\text{Third data point:} \quad y_3 = \beta_0 + \beta_1 x_{31} + \beta_2 x_{32} + \dots + \beta_p x_{3p} + \epsilon_3
\end{aligned}
$$

And so on...

## Matrix Representation

Organizing this into a matrix notation:

$$ Y = X \beta + \epsilon $$

where:
- $ Y $ is the vector of observed values,
- $ X $ is the matrix of input features,
- $ \beta $ is the vector of coefficients,
- $ \epsilon $ is the error term.

## Error in Prediction

For each data point:

$$ e_i = y_i - \hat{y}_i $$

where $ \hat{y}_i $ is the predicted value.

## Residual Sum of Squares (RSS)

The sum of squared errors (RSS) is given by:

$$ RSS = \sum (y_i - \hat{y}_i)^2 $$

Inspecting RSS, we see that it is a function of $ \beta $:

$$ RSS = (Y - X\beta)^T (Y - X\beta) $$

## Minimizing RSS

To minimize RSS, we take the derivative:

$$ \frac{\partial RSS}{\partial \beta} = -2X^T(Y - X\beta) = 0 $$

Solving for $ \beta $:

$$ \beta = (X^T X)^{-1} X^T Y $$

### Conditions for Existence

For $ (X^T X)^{-1} $ to exist, $ X^T X $ must be invertible (full rank, linearly independent columns). If it is not invertible, we use:

- **Regularization**: Adding a term to make it invertible
- **Moore-Penrose Pseudo Inverse**:

  $$ X^+ = (X^T X)^{-1} X^T $$

This ensures a solution even if $ X^T X $ is not full rank.

---

