# Formulas for Simple Linear Regression (SLR)

## True nature of X and Y
we suppose that there is a true linear dependence between X and Y, which is noted like this ($\approx$ is to be read like *approximately modeled as*):

$$ Y\ \approx\ \beta_0 + \beta_1\cdot X 
\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, 
\mbox{(3.1)}_{\,ISLR\,2} ,$$ 

or also:

$$ Y\ =\ \beta_0 + \beta_1\cdot X + \varepsilon
\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,
\mbox{(3.5)}_{\,ISLR\,2} ,$$ 

where $ \frac{1}{N}\sum_{i=1}^N \varepsilon_i  \mathop{\longrightarrow}\limits_{N\to\infty}0 $ or $ E\varepsilon = 0 $ ( $E\varepsilon$ here stands for *mathematical expectation of random value $\varepsilon$*, sampled as $\{\varepsilon_i\}$ for $i\in\{1,2\ldots N\}$ )

**NB!: actually the existance of such truly linear dependence is not mandatory for the SLR formulas below to be valid !**
So the formulas shall be correct anyway, but the prediction given by SLR model shall be less useful (less exact) if there are no such linearity behind the obtained samples.

## Regressing Y onto X
by using "`^`" on top of value to mean "`estimated (by the prediction model)`" we can say for a **sample**: 

$$ \begin{array}{ll}
\mbox{for any sample}\,(x_i,y_i),\, &\mbox{where}\, i\in\{1,2,\ldots n\} \\
y_i\, =\, \hat\beta_0 + \hat\beta_1\cdot x_i + \hat\epsilon_i, &\mbox{where}\,\{\hat\epsilon_i\}\,\mbox{are called "residuals"} \\
\hat y_i\, =\, \hat\beta_0 + \hat\beta_1\cdot x_i, &\mbox{being the predicted part of $y_i$ for each $i$} \\
\end{array} $$

**$\{\hat y_i\}$ indicates a prediction of Y on the basis of X sampled by $\{x_i\}$:**

$$ \hat y_i\, \stackrel{def}{=}\, \hat\beta_0 + \hat\beta_1\cdot x_i 
\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, 
\mbox{(3.2)}_{\,ISLR\,2} $$

## Residual Sum of Squares `RSS`

$$
\mbox{RSS}\,=\,
\sum_{i=1}^n \hat\epsilon_i^2\,=\,
\sum_{i=1}^n (y_i-\hat y_i)^2\,=\,
\sum_{i=1}^n (y_i - \hat\beta_0 + \hat\beta_1\cdot x_i)^2
\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, 
\mbox{(3.3)}_{\,ISLR\,2} 
$$

### Optimization criteria for SLR 
**is in finding $\hat\beta_0$ and $\hat\beta_1$ delivering minimum to RSS on a given sample**

## Analytical formulas for `SLR coefficients`
by using "`-`" on top of value to mean "`mean value over the sample`", that is:

$$
\bar x\, \stackrel{def}{=}\, \sum_{i=1}^n x_i\,\,\,\,\,\mbox{and}\,\,\,\,\,\bar y\, \stackrel{def}{=}\, \sum_{i=1}^n y_i
$$

we can derive from optimum conditions  $ \Bigl( \frac{\partial}{\partial\beta_k}\, \mbox{RSS}\,=\,0$ , for $k\in \{0,1\} \Bigr) $  that:

$$ \left\{
\begin{eqnarray}
\,\hat\beta_0 &=& \bar y - \hat\beta_1\cdot\bar x \\
\,\hat\beta_1 &=& \frac
{\sum_{i=1}^n (x_i-\bar x_i)(y_i-\bar y_i)}
{\sum_{i=1}^n (x_i-\bar x_i)^2} & \\
\end{eqnarray} 
\right. 
\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, 
\mbox{(3.4)}_{\,ISLR\,2}$$

### Some universal invariants to be derived from optimisation criteria for SLR 

#### mean of residuals is exact zero
let's sum all residuals:

$$
\sum_{i=1}^n \hat\epsilon_i = \sum_{i=1}^n (y_i - \hat\beta_0 - \hat\beta_1\cdot x_i) = n\bar y - n\hat\beta_o - \hat\beta_1\cdot n\bar x = n\cdot\underbrace{(\bar y - \hat\beta_0 - \hat\beta_1\cdot\bar x)}_{0}
\, ,$$

and the expression in the round brackets above is exactly zero - because of first equation of system (3.4)

#### multiplication of $\beta_1$ for regressions of Y onto X and X onto Y `for same sample` is exactly equal to squared correlator of X and Y
let's use the notation for the reversed regression as follows:

$$
x_i = \hat\beta_0^{(-1)} + \hat\beta_1^{(-1)}\cdot y_i + \hat\epsilon_i^{(-1)} ,
$$

**NB!: here the superscript `(-1)` does not mean power of `-1`, but means coefficients for reversed regression of X onto Y**.  
Although for $\hat\beta_1$ we intuitively expect to see something like $\hat\beta_1^{(-1)} = 1/\hat\beta$.

Then multiplying $\beta_1$ for regressions of Y onto X and X onto Y **for same sample** (which means same $x_i$ and $y_i$) we easily get:

$$
\,\hat\beta_1\cdot\hat\beta_1^{(-1)} \,=\, \frac
{\Bigl(\sum_{i=1}^n (x_i-\bar x_i)(y_i-\bar y_i)\Bigr)^2}
{\Bigl(\sum_{i=1}^n (x_i-\bar x_i)^2\Bigr)\cdot\Bigl(\sum_{i=1}^n (y_i-\bar y_i)^2\Bigr)}
\,=\, \Bigl(\mbox{corr (X,Y)}\Bigr)^2
$$