## 4.4 The Least Squares Assumptions

- OLS performs well under a quite broad variety of different circumstances. 
- However, there are some assumptions which need to be satisfied in order to ensure that the estimates are normally distributed in large samples

### The Least Squares Assumptions

Given that: 
\begin{equation}
Y_i = \beta_0 + \beta_1 X_i + u_i \text{, } i = 1,\dots,n
\end{equation}

### 1. The error term $u_i$ has conditional mean zero given $X_i$: $E(u_i|X_i) = 0$

- This means that no matter which value we choose for $X$, the error term $u$ must not show any systematic pattern and must have a mean of 0.
- Consider the case that, unconditionally, $E(u) = 0$, but for low and high values of $X$, the error term tends to be positive and for midrange values of $X$ the error tends to be negative.
<img src="https://www.econometrics-with-r.org/ITER_files/figure-html/unnamed-chunk-161-1.png" width="80%"/>
- Using the quadratic model (represented by the black curve) we see that there are no systematic deviations of the observation from the predicted relation, so it is credible that the assumption is not violated when such a model is employed.
- However, using a simple linear regression model we see that the assumption is probably violated as $E(u_i|X_i)$ varies with the $X_i$.

### 2. $(X_i,Y_i), i = 1,\dots,n$  are independent and identically distributed (i.i.d.) draws from their joint distribution

- Most sampling schemes used when collecting data from populations produce i.i.d.-samples. 
- For example, we could use R’s random number generator to randomly select student IDs from a university’s enrollment list and record age $X$ and earnings $Y$ of the corresponding students. 
- This is a typical example of simple random sampling and ensures that all the $(X_i, Y_i)$ are drawn randomly from the same population.
- A prominent example where the i.i.d. assumption is not fulfilled is time series data where we have observations on the same unit over time. 
- For example, take $X$ as the number of workers in a production company over time. 
- Due to business transformations, the company cuts jobs periodically by a specific share but there are also some non-deterministic influences that relate to economics, politics etc.
- It is evident that the observations on the number of employees cannot be independent in this example: the level of today’s employment is correlated with tomorrows employment level. 
- Thus, the i.i.d. assumption is violated.

### 3. Large outliers are unlikely: $X_i$ and $Y_i$ have nonzero finite fourth moments (finite kurtosis)

- Common cases where we want to exclude or (if possible) correct such outliers is when they are apparently typos, conversion errors or measurement errors. 
- Even if it seems like extreme observations have been recorded correctly, it is advisable to exclude them before estimating a model since OLS suffers from sensitivity to outliers. 
- Extreme observations receive heavy weighting in the estimation of the unknown regression coefficients when using OLS.
- Therefore, outliers can lead to strongly distorted estimates of regression coefficients. 
<img src="https://www.econometrics-with-r.org/ITER_files/figure-html/unnamed-chunk-163-1.png" width="80%"/>
