# The Simple Regression Model

## Definition of the Simple Regression Model

$Def\newcommand{\using}[1]{\stackrel{\mathrm{#1}}{=}}
\newcommand{\ffrac}{\displaystyle \frac}
\newcommand{\space}{\text{ }}
\newcommand{\bspace}{\;\;\;\;}
\newcommand{\QQQ}{\boxed{?\:}}
\newcommand{\CB}[1]{\left\{ #1 \right\}}
\newcommand{\SB}[1]{\left[ #1 \right]}
\newcommand{\P}[1]{\left( #1 \right)}
\newcommand{\dd}{\mathrm{d}}
\newcommand{\Tran}[1]{{#1}^{\mathrm{T}}}
\newcommand{\d}[1]{\displaystyle{#1}}
\newcommand{\EE}[2][\,\!]{\mathbb{E}_{#1}\left[#2\right]}
\newcommand{\Var}[2][\,\!]{\mathrm{Var}_{#1}\left[#2\right]}
\newcommand{\Cov}[2][\,\!]{\mathrm{Cov}_{#1}\left(#2\right)}
\newcommand{\Corr}[2][\,\!]{\mathrm{Corr}_{#1}\left(#2\right)}
\newcommand{\I}[1]{\mathrm{I}\left( #1 \right)}
\newcommand{\N}[1]{\mathrm{N} \left( #1 \right)}
\newcommand{\ow}{\text{otherwise}}$

***Simple Linear Regression Model***: Explains variable $y$ in terms of variable $x$ like the following 

$$y = \beta_0 + \beta_1 x + u$$

Here, $y$ is called the ***dependent variable***, or ***explained variable***, or ***response variable***, or ***regressand***; $x$ is the ***independent variable***, or ***explanatory variable***, or ***regressor***. Also, $\beta_0$ is the ***Intercept parameter***, $\beta_1$ is the ***slope parameter***, $u$ is the ***error term***, or ***disturbance***, or ***unobservables***.

Since we already have a intercept, so we can also assume that $\boxed{\EE{u} = 0}$.

**Interpretation**

$\ffrac{\partial y} {\partial x} = \beta_1$ as long as $\ffrac{\partial u} {\partial x} = 0$

- The first term is to say by how much does the dependent variable change if the independent variable is increased by one unit
- remember let all other things remain equal
- rarely applicable though.

***

Then, how $u$ and $x$ are related? The natural way is to consider their *correlation coefficient*, while that's not enough cause it only measures the linear dependence between $u$ and $x$.

$Remark$

>It's possible for $u$ to be uncorrelatd with $x$ while being correlated with a function of $x$, say $x^2$.

A better choice is consider the expected value of $u$ given $x$, which is that *the average value of $u$ does not depend on the value of* $x$, or $u$* is **mean independent** of $x$*. Mathematically:

$$\boxed{\EE{u\mid x} = \EE{u}}$$

Thus we obtain the ***zero conditional mean assumption***: $\EE{u \mid x} = 0$ and then the ***population regression function (PRF)***: $\boxed{ \EE{y \mid x} = \beta_0 + \beta_1 x }$ which is a linear function of $X$.

![](./figs/simpleRegressionModel.png)

We call $\beta_0 + \beta_1 x$ the ***systematic part*** of $y$ and $u$, ***unsystematic***.

## Deriving the Ordinary Least Squares Estimates

1. Random sample of $n$ observations: $\CB{\P{x_i,y_i}: i=1,2,\dots, n}$
    - obviously $y_i = \beta_0 + \beta_1 x_i + u_i$ for all $i$
    - $u_i$ is the ***error (disturbance) term*** for observation $i$
2. Fit as good as possible a regression line through the data points
    
Before that, one significant waypoint: $\EE{u \mid x} = 0 \Longrightarrow \boxed{\Cov{x,u} = 0}\Longrightarrow \boxed{\EE{xu} = 0}$

$Proof$
>$\begin{align}
\EE{u\mid x} &= \int_{-\infty}^{\infty} u\cdot f_{u \mid x} \P{u\mid x} \;\dd{u} \\
&= \int_{-\infty}^{\infty} u\ffrac{f\P{u,x}} {f_{X}\P{x}} \;\dd{u} \\
&= \ffrac{\d{\int_{-\infty}^{\infty} u \cdot f\P{u,x}\;\dd{u}}} {f_X\P{x}} = 0
\end{align}$
>
>$\begin{align}
\Cov{x,u} &= \EE{XU} \\
&= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} xu\cdot f\P{x,u}\;\dd{x}\dd{u}\\
&= \int_{-\infty}^{\infty}x \cdot{ \int_{-\infty}^{\infty} u\cdot f\P{x,u}\;\dd{u} }\dd{x}
\end{align}$

Alternatively we can write them as $\boxed{\EE{y - \beta_0 - \beta_1 x} = 0}$ and $\boxed{\EE{x\P{y - \beta_0 - \beta_1 x}} = 0}$. And these two *MIGHT* be the restriction on two unknown parameters thus to obtain $\hat\beta_0$ and $\hat\beta_1$, the estimators of $\beta_0$ and $\beta_1$ respectively! The *possible* method is (we can regard the following as moment estimation 矩估计):

$$\begin{cases}
n^{-1} \sum \P{y_i - \hat\beta_0 - \hat\beta_1 x_i} = 0 \\
n^{-1} \sum x_i\P{y_i - \hat\beta_0 - \hat\beta_1 x_i} = 0
\end{cases}$$

the solution is quite easy to obtain 😉😉😉:

$$\hat{\beta}_1 = \ffrac{\d{\sum_{i=1}^{n} \P{x_i - \bar{x}}\P{y_i - \bar{y}}}} {\d{\sum_{i=1}^{n}\P{x_i - \bar{x}}^2}}, \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}$$

$Remark$

- $\bar y = n^{-1}\sum_{n} y_i$, the sample average of the $y_i$.
- $\hat\beta_1$ can be interpreted as the sample covariance between $x$ and $y$ divided by the sample variance of $x$.
- Implication: if $x$ and $y$ are positively correlated in the sample, then $\hat\beta_1$ is positive

***
This method is called the ***Ordinary Least Square***. Kinda different from what we've learned before...

$$\begin{cases}
\ffrac{\partial \P{\sum\limits_{i=1}^{n} \P{y_i - \beta_0 - \beta_1 x_i}^2}} {\partial \beta_0} = 0 \Longrightarrow \sum\limits_{i=1}^{n} -2\P{y_i - \hat\beta_0 - \hat\beta_1 x_i} = 0  \\
\ffrac{\partial \P{\sum\limits_{i=1}^{n} \P{y_i - \beta_0 - \beta_1 x_i}^2}} {\partial \beta_1} = 0 \Longrightarrow \sum\limits_{i=1}^{n} -2x_i\P{y_i - \hat\beta_0 - \hat\beta_1 x_i} = 0
\end{cases}$$

- ***fitted value***: $\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i$
- ***residual***: $\hat{u}_i = y_i - \hat{y}_i = y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i$

![](./figs/Residual_fittedValue.png)

$Remark$

>The result can be obtained by differenting on $\beta_1$ and $\beta_0$. And the regression line will go through point $\P{\bar{x},\bar{y}}$ and different sample will result in different estimates and thus a different regression line. Also, the **residual** $\hat u_i$ is quite different from the **error term** $u_i$.

- ***sum of squared residuals***: $\text{SSR} = \sum\limits_{i=1}^n \hat u_i^2 = \sum\limits_{i=1}^n\P{y_i - \hat\beta_0 - \hat\beta_1 x_i}^2$. And actually the method we've learned before is exactly the way to find the value of $\P{\hat\beta_0, \hat\beta_1}$ to minimize the $\text{SSR}$.
- We also called the first equation set without $n^{-1}$ the ***first order conditions*** for the OLS estimate.
- ***OLS Regression Line***: $\hat y = \hat\beta_0 + \hat\beta_1 x$, also known as ***sample regression function (SRF)*** cause it's the estimated version of *PRF*, $\EE{y \mid x} = \beta_0 + \beta_1$

## Properties of OLS on Any Sample of Data

### algebraic properties of olS Statistics

For any sample data, we have the following

- $\sum\limits_{i=1}^{n} \hat{u}_i = 0$, followed directly from the **first order conditions**
- $\sum\limits_{i=1}^{n} x_i\hat{u}_i = 0$
- $\bar{y} = \hat{\beta}_0 + \hat{\beta}_1 \bar{x}$

from the first one, we can take the average of both sides of equation $y_i = \hat y_i + \hat u_i$ and obtain: $\bar{y} = \bar{\hat y}$; and follow the first and the second one, the sample covariance between $\hat y_i$ and $\hat u_i$ is $0$.

Also we define several measures:

- ***Total sum of squares***: $\text{SST} \equiv \sum\limits_{i=1}^{n} \P{y_i - \bar{y}}^2$
- ***Explained sum of squares***: $\text{SSE} \equiv \sum\limits_{i=1}^{n} \P{\hat{y}_i - \bar{y}}^2$
- ***Residual sum of squares***: $\text{SSR} \equiv \sum\limits_{i=1}^{n} \hat{u}_i^2 = \text{SST} - \text{SSE}$

$Proof$

$\bspace \begin{align}
\text{SST} &= \sum_{i=1}^{n} \P{y_i - \bar y}^2 \\
&= \sum_{i=1}^{n} \SB{\P{y_i - \hat y_i}+ \P{\hat y_i - \bar y}}^2 \\
&= \sum_{i=1}^{n} \SB{\hat u_i + \hat y_i - \bar y}^2 \\
&= \sum_{i=1}^{n} \hat u_i^2 + 2\sum_{i=1}^{n}\hat u_i \P{\hat y_i -\bar y}+\sum_{i=1}^{n}\P{\hat y_i -\bar y}^2\\
&= \text{SSR} + 2\sum_{i=1}^{n}\hat u_i \P{\hat y_i -\bar y} + \text{SSE}
\end{align}$

There $2\sum_{i=1}^{n}\hat u_i \P{\hat y_i -\bar y} = 0$ cause $\ffrac{\sum_{i=1}^{n}\hat u_i \P{\hat y_i -\bar y}} {n-1} = \ffrac{\sum \hat u_i \hat y_i - \sum \hat u_i \bar y} {n-1} = \Cov{\hat u_i, \hat y_i} = \hat\beta_1\Cov{\hat u_i,x_i} = 0$. Thus, proved. Or use other methods if you like.

### Goodness-of-Fit

We first assume that $\text{SST} \neq 0$ which will happen only when all the $y_i$ equal the same value. Then:

$Def$

***Goodness of fit measure*** (***R-squared***), $R^2 = \ffrac{\text{SSE}} {\text{SST}} = 1 - \ffrac{\text{SSR}}{\text{SST}}$.

$Remark$

> R-squared measures the *fraction* of the total variation that is explained by the regression.
>
> And we can also prove that $R^2$ is equal to the square of the sample correlation coefficient between $y_i$ and $\hat y_i$, $\rho^2\P{y_i, \hat y_i}$.
>
> A seemingly low $R^2$ does not necessarily mean that an OLS regression equation is useless. And a high $R^2$ does NOT necessarily mean that the regression has a causal interpretation! (There may be other factors that affect the election outcome)

## Units of Measurement and Functional Form
### The Effects of Changing Units of Measurement on OLS Statistics

The slope parameter and the intercept parameter expand or shrink with the same pace as the measure change of dependent variable, rather, generally only the slope parameter change with the measure change of independent variable.

As for $R^2$, never gonna change with any of $x$ or $y$.

### Incorporating Nonlinearities in Simple Regression

***Semi-logarithmic form***

$\log\P{y} = \beta_0 + \beta_1 x + u$

and here $\beta_1$ is the change rate of "the percentage change of $y$ over $x$"

***Log-logarithmic form***

$\log\P{y} = \beta_0 + \beta_1 \log\P{x} + u$

and here $\beta_1$ is the change rate of "*the percentage change of $y$ over the percentage change of $x$*". The log-log form postulates a constant elasticity model,    whereas the semi-log form assumes a semi-elasticity model

### The Meaning of "linear" Regression

The model (equation: $y = \beta_0 + \beta_1 x + u$) is linear in the *parameters* $\beta_0$ and $\beta_1$.

## Expected Values and Variances of the OLS Estimators

Mainly the statistical properties of OLS estimation. We now view $\hat\beta_0$ and $\hat\beta_1$ as estimators for the parameters $\beta_0$ and $\beta_1$. In other words, the distributions of $\hat\beta_0$ and $\hat\beta_1$ over different random samples from the population.

### Unbiasedness of OLS

$Assumption$ $\text{SLR}.1$ to $\text{SLR}.4$

-  Linearity of the parameters: we define the population model, $y = \beta_0 + \beta_1 x + u$
-  Random sampling: $\CB{\P{x_i,y_i}: i = 1,2,\dots,n}$; and define: $u_i = y_i - \beta_0 - \beta_1 x_i$
-  Sample variation in explanatory variable: $\sum\limits_{i=1}^{n} \P{x_i - \bar{x}}^2 > 0$, or equivalently, sample variance exists, $\CB{x_i, i = 1,2,\dots,n}$ are not all the same value
    - otherwise it would be impossible to study how different values of the explanatory variable lead to different values of the dependent variable
- Zero conditional mean: $\EE{u_i \mid x_i} = 0$

$Theroem.1$ Unbisedness of OLS

Using assumption $\text{SLR}.1$ through $\text{SLR}.4$, we have $\EE{\hat{\beta}_0} = \beta_0 $ and $\EE{\hat{\beta}_1} = \beta_1$.

To prove this, we first mention a missed property for the estimators, linearity.

$$\hat{\beta}_1 = \ffrac{\d{\sum_{i=1}^{n} \P{x_i - \bar{x}}\P{y_i - \bar{y}}}} {\d{\sum_{i=1}^{n}\P{x_i - \bar{x}}^2}} = \ffrac{\d{\sum_{i=1}^{n} \P{x_i - \bar{x}}y_i}} {\d{\sum_{i=1}^{n}\P{x_i - \bar{x}}^2}} \hat= \sum k_i y_i$$

$$k_i = \ffrac{x_i - \bar{x}} {\d{\sum_{i=1}^{n}\P{x_i - \bar{x}}^2}} \Rightarrow\sum k_i = 0,\bspace\sum k_i x_i = \ffrac{\d{\sum_{i=1}^{n}x_i^2 - x_i\bar{x}}} {\d{\sum_{i=1}^{n}\P{x_i - \bar{x}}^2}} = 1$$

$$\hat\beta_1 = \sum k_i y_i = \beta_0 \sum k_i + \beta_1 \sum k_i x_i + \sum k_i u_i = \beta_1 + \sum k_i u_i$$

$$\EE{\hat\beta_1} = \EE{\beta_1 + \sum k_i u_i} = \beta_1 + \sum k_i \EE{u_i} = \beta_1$$

As for $\beta_0$, it's rather easy now. 

$$\hat\beta_0 = \bar y - \hat\beta_1 \bar x = \beta_0 + \beta_1 \bar x + \bar{u} -\hat\beta_1 \bar x \Longrightarrow \hat\beta_0 = \beta_0 + \P{\beta_1 - \hat\beta_1} \bar x + \bar{u}$$

And here $\bar u = n^{-1}\sum u_i$.

Also we'd like to mention: $\hat u_i = y_i - \hat\beta_0 -\hat\beta_1 x_i = \P{\beta_0 + \beta_1 x_i + u_i} - \hat\beta_0 -\hat\beta_1 x_i$ so that

$$\hat u_i = u_i - \P{\hat\beta_0 - \beta_0} - \P{\hat\beta_1 - \beta_1}x_i$$

### Variances of the OLS Estimators

$Assumption.5$

Homoskedasticity: $\Var{u_i \mid x_i} = \sigma^2$, called the ***error variance***. This is to say that the value of the explanatory variable must contain no information about the variability of the unobserved factors; and with this we can measure the sample variability, $\Var{\hat\beta_0}$ and $\Var{\hat\beta_1}$

$Remark$

$\text{SLR}.5$ would make the calculation of variance so much easier.

Under that, we first can know that $\EE{u^2 \mid x} = \sigma^2$ and then $\sigma^2 = \EE{u^2 \mid x} \using{\text{assumption } 4}{} \EE{u^2} = \Var{u}$. Then

$Theorem.2$ Sampling Variances of the OLS Estimator

Using $\text{SLR}.1$ to $\text{SLR}.5$,

$\begin{align}
\Var{\hat\beta_1} &= \Var{\beta_1 + \ffrac{\d{\sum_{i=1}^{n} \P{x_i - \bar{x}}u_i}} {\d{\sum_{i=1}^{n}\P{x_i - \bar{x}}^2}}} = \Var{\ffrac{\d{\sum_{i=1}^{n} \P{x_i - \bar{x}}u_i}} {\d{\sum_{i=1}^{n}\P{x_i - \bar{x}}^2}}} \\
&= \P{\ffrac{1} {\sum_{i=1}^{n}\P{x_i - \bar{x}}^2}}^2 \P{\sum_{i=1}^{n} \P{x_i - \bar{x}}^2 \Var{u_i}}\\
&= \ffrac{1} {\SB{{\sum_{i=1}^{n}\P{x_i - \bar{x}}^2}}^2} \sigma^2 \cdot \text{SST}_x = \ffrac{\sigma^2} {\text{SST}_x}
\end{align}$

or even quicker:

$\Var{\hat\beta_1} = \sigma^2 \sum k_i^2 = \ffrac{ \sigma^2} {\d{\sum_{i=1}^{n}\P{x_i - \bar{x}}^2}}$

And since we've already had $\hat\beta_0 = \beta_0 + \P{\beta_1 - \hat\beta_1} \bar x + \bar{u}$, then we have

$\Var{\hat\beta_0} = 0 + \ffrac{\sum\Var{u}} {n^2} + \P{\bar{x}}^2 \cdot \ffrac{\sigma^2} {\text{SST}_x} = \ffrac{ \text{SST}_x} {n\cdot \text{SST}_x}\sigma^2 + \ffrac{ n\P{\bar{x}}^2} {n\cdot \text{SST}_x}\sigma^2 = \ffrac{\sum x_i^2} {n\cdot \text{SST}_x}\sigma^2$

$Remark$

Assuption $4$ and $5$ can be also writen as $\EE{y \mid x} = \beta_0 + \beta_1 x$ and $\Var{y \mid x} = \sigma^2 = \Var{u \mid x} = \Var{u}$ respectively.

![](./figs/SLRHomoskedasticity.png)

In other words, the conditional expectation of $y$ given $x$ is linear in $x$, but the variance of $y$ given $x$ is constant.

$Remark$

When $\Var{u \mid x}$ depends on $x$, the error term is said to exhibit ***heteroskedasticity*** (or just because of nonconstant variance). And since we always have $\Var{u \mid x} = \Var{y \mid x}$, heteroskedasticity is present whenever $\Var{y \mid x}$ is a function of $x$.

$Remark$

1. Larger **error variance** $\sigma^2$, larger $\Var{\hat\beta_1}$
2. More variability in the independent variable $x$ is preferred
***

Also, in case that confidence intervals needed, their ***standard deviation*** are $\text{sd}\P{\hat\beta_i} = \sqrt{\Var{\hat\beta_i}}$

### Estimating the Error Variance

$\bspace \Var{u_i \mid x_i} \using{independency} \sigma^2 = \Var{u_i}$

But Error Variance $\sigma^2$ is not always given thus we have to estimate that sometimes.

An intuitive estimation would be like $\tilde\sigma^2 = \ffrac{1} {n} \sum\limits_{i=1}^{n} \P{\hat{u}_i - \bar{\hat{u}}_i}^2 = \ffrac{1} {n} \sum\limits_{i=1}^{n} \hat{u}_i^2$ however, this is an biased estimation. Here's the unbiased one:

$$\hat\sigma^2 = \ffrac{1} {n-2} \sum_{i=1}^{n} \hat u_i^2$$

$Remark$

>Here $2$ is the number of estimated regression coefficients, or you can call it a degree of freedom...

$Theorem.3$

$\bspace \EE{\hat{\sigma}^2} = \sigma^2$

$Proof$

>We need to use the assumption $\text{SSR}.1$ to $\text{SSR}.5$, and the preceding results: $\sum \hat u_i = 0$ and $\hat u_i = u_i - \P{\hat\beta_0 - \beta_0} - \P{\hat\beta_1 - \beta_1} x_i$. We go across all $i$ for the second equation and will get that: $0 = \bar{u} - \P{\hat\beta_0 - \beta_0} - \P{\hat\beta_1 - \beta_1} \bar x$; substracting it from that again leads to:  $\hat u_i = \P{u_i - \bar u} - \P{\hat\beta_1 - \beta_1} \P{x_i - \bar x}$. Therefore,
>
>$$\bspace \begin{align}
\sum_{i=1}^{n} \hat u_i^2 &= \sum_{i=1}^{n} \P{\P{u_i - \bar u}^2 + \P{\hat\beta_1 - \beta_1}^2\P{x_i - \bar x}^2 - 2\P{u_i - \bar{u}}\P{\hat\beta_1 - \beta_1}\P{x_i - \bar x}} \\
&= \P{\sum_{i=1}^{n} \P{u_i - \bar u}^2} + \P{\hat\beta_1 - \beta_1}^2 \sum_{i=1}^{n} \P{x_i - \bar x}^2 - 2\P{\hat\beta_1 - \beta_1} \sum_{i=1}^{n} u_i\P{x_i - \bar{x}}
\end{align}$$
>
>Taking its expectation, the first term got a expectation $\P{n-1}\sigma^2$; the second term got a expectation of $\EE{\P{\hat\beta_1 - \beta_1}^2} = \Var{\hat\beta_1} = \sigma^2/SST_x$. And for the third term, we first rewrite it as $2\P{\hat\beta_1 - \beta_1}\QQQ$; taking expectation gives $2\sigma^2$. Basically that's it.

Then it's the ***standard error of regression (SER)***: $\hat\sigma = \sqrt{\hat\sigma^2}$. We can also plug the value back to get the estimate value of $\Var{\hat\beta_i}$, standard deviation and standard error of $\hat\beta_i$

$\DeclareMathOperator*{\sd}{sd} \sd\P{{\hat\beta_1}} = \sqrt{{\Var{\hat\beta_1}}} = \sigma / \sqrt{\text{SST}_x}$

$\sd\P{\hat\beta_0} = \sqrt{{\Var{\hat\beta_0}}} = \sqrt{\ffrac{\sum x_i^2} {n\cdot \text{SST}_x}} \sigma$

$\DeclareMathOperator*{\se}{se} \se\P{{\hat\beta_1}} = \sqrt{\widehat{\Var{\hat\beta_1}}} = \hat\sigma / \sqrt{\text{SST}_x}$

$\se\P{\hat\beta_0} = \sqrt{\widehat{\Var{\hat\beta_0}}} = \sqrt{\ffrac{\sum x_i^2} {n\cdot \text{SST}_x}}\hat \sigma$

***