# Ordinary Least Squares

### Estimating Single-Independent-Variable Models with OLS

The objective of regression analysis is to start from a theoretical equation:

$$\begin{align}
Y_i = \beta_0 + \beta_1X_i + \epsilon_i
\end{align}$$

and, through the use of data, arrive at the estimated equation:

$$\begin{align}
\hat{Y_i} = \hat{\beta_0}+ \hat{\beta_1}X_i \end{align}
$$

where $\hat{\beta_0}$ and $\hat{\beta_1}$ is a sample estimate of the population value. In the case of $Y$, the "true population value" is $\mathbb E[Y|X]$

The most widely used method to obtain these estimates is **Ordinary Least Squares (OLS)**. OLS is a regression estimation technique that calculates  $\hat{\beta_0}$ and $\hat{\beta_1}$ to minimize the sum of the squared residuals. Mathematically, given $N$ observations,

$$\begin{align}\text{Objective of OLS is }\min \sum^{N}_{i=1}{e_i^2}\end{align}$$

for $i = 1, 2, \cdots, N$

The sum of squared residuals is also expressed as $\sum (Y_i-\hat{Y_i})^2$.


### Why use OLS?
There are at least three reasons for using OLS to estimate regression models. First, it is relatively easy to use. Other models use iterative models while OLS estimates can be calculated easily by hand, using formulae. 

Second, the goal of minimizing $\sum{e_i^2}$ is appropriate from a theoretical point of view as it can take into account positive or negative deviations of ($Y_i - \hat{Y_i}$). 

Finally, OLS estimates have a number of useful characteristics. The sum of the residuals is *exactly* zero and OLS can be shown to be the 'best' estimator under a set of specific assumptions.

### Estimators

An **estimator** is a mathematical technique that is applied to a sample of data to produce real-world **estimates** of the true population regression coefficients (or other parameters). Thus, OLS is an **estimator** and $\beta$s produced by OLS are called **estimates**.

### How does OLS work?

OLS selects estimates of $\beta_0$ and $\beta_1$ that minimizes the squared residuals, summed over all the sample data points. 

For an equation with one variable, the coefficients are

$$\begin{align}
\hat{\beta_1} = 
\frac{\sum^N_{i=1}\begin{bmatrix}
\begin{pmatrix}
X_i-\bar{X}
\end{pmatrix}
\begin{pmatrix}
Y_i-\bar{Y}
\end{pmatrix}
\end{bmatrix}}
{\sum^N_{i=1}\begin{pmatrix}X_i - \bar{X}\end{pmatrix}^2}\end{align}
$$

and, given the estimate of $\beta_1$,

$$\begin{align}\hat{\beta_0}=\bar{Y}-\hat{\beta_1}\bar{X}\end{align}$$

where $\bar{X}$ is the mean of $X$, or $\frac {\sum X_i}{N}$ and $\bar{Y}$ is the mean of $Y$, or $\frac {\sum Y_i}{N}$. For each data set, we obtain different estimates of $\beta_0$ and $\beta_1$. 

## Proof of OLS Estimators

OLS minimizes the sum of squared residuals or

$$\begin{align}\min \sum^{N}_{i=1}{e_i^2} = \min \sum^{N}_{i=1}(Y_i-\hat{Y_i})^2\end{align}$$

and since

$$\begin{align}\hat{Y_i} = \hat{\beta_0}+ \hat{\beta_1}X_i \end{align}$$

substituting $(7)$ into RHS of $(6)$ yields the objective function:

$$\begin{align}\min \sum_{i=1}^N e_i^2 = \min \sum_{i=1}^N(Y_i-\hat{\beta_0}-\hat{\beta_1}X_i)^2 \end{align}$$

Differentiating $(8)$ w.r.t. $\hat{\beta_0}$,
$$\begin{align}
\frac{\partial} {\partial\hat{\beta_0}}\sum_{i=1}^N(Y_i-\hat{\beta_0}-\hat{\beta_1}X_i)^2 = -2\sum_{i=1}^N (Y_i-\hat{\beta_0}-\hat{\beta_1}X_i)\end{align}$$

Setting RHS of $(9)$ to $0$ and solving for $\hat{\beta_0}$,

$$\begin{align*}
0 &= -2\sum_{i=1}^N (Y_i-\hat{\beta_0}-\hat{\beta_1}X_i) \\
0 &= \sum_{i=1}^N (Y_i-\hat{\beta_0}-\hat{\beta_1}X_i) \\
0 &= \sum_{i=1}^N Y_i -N\hat{\beta_0}  - \hat{\beta_1}\sum_{i=1}^N X_i \\
 N\hat{\beta_0} &= \sum_{i=1}^N Y_i - \hat{\beta_1}\sum_{i=1}^N X_i \\
\end{align*}$$
$$
\begin{align}
  \hat{\beta_0} &= \frac{\sum_{i=1}^N Y_i - \hat{\beta_1}\sum_{i=1}^N X_i}{N} \\
\end{align}
$$

Since $N\bar{Y} = \sum_{i=1}^N Y_i$ and $N\bar{X} = \sum_{i=1}^N X_i$, substitute this into $(10)$ finally yields:

$$
\begin{align*}
  \hat{\beta_0} &= \frac{\sum_{i=1}^N Y_i - \hat{\beta_1}\sum_{i=1}^N X_i}{N} \\
  \hat{\beta_0}  &= \frac{N\bar{Y} - \hat{\beta_1}N\bar{X}}{N}
\end{align*}
$$

$$
\begin{align}
\hat{\beta_0}  &= \bar{Y} - \hat{\beta_1}\bar{X} 
\end{align}
$$

$
\diamond
$

Next, differentiating $(8)$ w.r.t. $\hat{\beta_1}$,

$$
\begin{align}
\frac{\partial} {\hat{\beta_1}}\sum_{i=1}^N(Y_i-\hat{\beta_0}-\hat{\beta_1}X_i)^2 = -2\sum_{i=1}^N X_i(Y_i-\hat{\beta_0}-\hat{\beta_1}X_i)
\end{align}
$$

Setting RHS of $(12)$ to $0$ and solving for $\hat{\beta_1}$,

$$\begin{align*}
0 &= -2\sum_{i=1}^N X_i(Y_i-\hat{\beta_0}-\hat{\beta_1}X_i)\\
0 &= \sum_{i=1}^N X_i(Y_i-\hat{\beta_0}-\hat{\beta_1}X_i)\\
0 &= \sum_{i=1}^N X_iY_i- \hat{\beta_0}\sum_{i=1}^NX_i- \hat{\beta_1}\sum_{i=1}^NX^2_i
\end{align*}$$

$$
\begin{align}
\hat{\beta_1}\sum_{i=1}^NX^2_i &= \sum_{i=1}^N X_iY_i- \hat{\beta_0}\sum_{i=1}^NX_i \\
\end{align}
$$

Substitute $(11)$ or $\hat{\beta_0}  = \bar{Y} - \hat{\beta_1}\bar{X}$ into the $(13)$ yields:
$$
\begin{align*}
\hat{\beta_1}\sum_{i=1}^NX^2_i &= \sum_{i=1}^N X_iY_i- (\bar{Y} - \hat{\beta_1}\bar{X})\sum_{i=1}^NX_i \\
\hat{\beta_1}\sum_{i=1}^NX^2_i &= \sum_{i=1}^N X_iY_i- \bar{Y}\sum_{i=1}^NX_i + \hat{\beta_1}\bar{X}\sum_{i=1}^NX_i \\
\hat{\beta_1}\sum_{i=1}^NX^2_i &= \sum_{i=1}^N X_iY_i- N\bar{Y}\bar{X} + N\hat{\beta_1}\bar{X}\bar{X} \\
\hat{\beta_1}\sum_{i=1}^NX^2_i - N\hat{\beta_1}(\bar{X})^2 &= \sum_{i=1}^N X_iY_i- N\bar{X}\bar{Y} \\
\hat{\beta_1}\begin{bmatrix}\sum_{i=1}^NX^2_i - N(\bar{X})^2\end{bmatrix}&= \sum_{i=1}^N X_iY_i- N\bar{X}\bar{Y} \\
\end{align*}
$$

$$
\begin{align}
\hat{\beta_1} = \frac{\sum_{i=1}^N X_iY_i- N\bar{X}\bar{Y} }{\sum_{i=1}^NX^2_i - N(\bar{X})^2}
\end{align}
$$

For the numerator of $(14)$,
$$
\begin{align*}
\sum_{i=1}^N[(X_i -\bar{X})(Y_i -\bar{Y})]&=\sum_{i=1}^N[X_i Y_i-\bar{X}Y_i-X_i\bar{Y}+\bar{X}\bar{Y}]
\\&=\sum_{i=1}^N X_i Y_i- \sum_{i=1}^N\bar{X}Y_i- \sum_{i=1}^NX_ i\bar{Y}+\sum_{i=1}^N\bar{X}\bar{Y}
\\&=\sum_{i=1}^N X_i Y_i- \bar{X}\sum_{i=1}^NY_i- \bar{Y}\sum_{i=1}^NX_ i+\bar{X}\bar{Y}\sum_{i=1}^N 1
\\&=\sum_{i=1}^N X_i Y_i- \bar{X} (N \bar{Y})- \bar{Y}(N \bar{X})+N\bar{X}\bar{Y}
\\&=\sum_{i=1}^N X_iY_i- N\bar{X}\bar{Y}
\end{align*}
$$

Similarly, for the denominator of $(14)$,
$$
\begin{align*}
\sum_{i=1}^N[(X_i -\bar{X})^2]&=\sum_{i=1}^N[X^2_i -\bar{X}X_i-X_i\bar{X}+\bar{X}\bar{X}]
\\&=\sum_{i=1}^N X^2_i - \sum_{i=1}^N\bar{X}X_i- \sum_{i=1}^NX_ i\bar{X}+\sum_{i=1}^N\bar{X}\bar{X}
\\&=\sum_{i=1}^N X^2_i - \bar{X} (N \bar{X})- \bar{X}(N \bar{X})+N\bar{X}\bar{X}
\\&=\sum_{i=1}^N X^2_i - N(\bar{X})^2
\end{align*}
$$

Substitute the LHS of the numerator and denominator into $(14)$ finally yields the OLS estimator of $\hat{\beta_1}$:
$$\begin{align}
\hat{\beta_1} = 
\frac{\sum^N_{i=1}\begin{bmatrix}
\begin{pmatrix}
X_i-\bar{X}
\end{pmatrix}
\begin{pmatrix}
Y_i-\bar{Y}
\end{pmatrix}
\end{bmatrix}}
{\sum^N_{i=1}\begin{pmatrix}X_i - \bar{X}\end{pmatrix}^2}
\end{align}
$$

and hence these are the **normal equations** that minimise the sum of squared residuals. They are also the OLS estimators.

$\diamond$