# More on Specification and Data Issues
## Functional Form Misspecification
Usually, the $F$ test for joint exclusion restrictions would be enough to test whether higher order terms should be included or not. However, it can be difficult to pinpoint the precise reason that a functional form is misspecified. Fortunately, using *logarithms* of certain variables and adding *quadratics* are sufficient for detecting many important nonlinear relationships in economics.

### RESET as a General Test for Functional Form Misspecification
Suppose the original model is

$$y = \beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k + u
% bbox
% \bbox[#EEF, 5px, border: 2px solid #880015]{E=mc^2}
% \bbox[9px, border:2px solid #880015]{abc}
% text size
% tiny scriptsize small normalsize large Large LARGE huge Huge
% color
% Aquamarine, black, blue, brown, cyan, darkgray, gray, green, lightgray, lime, magenta, olive, orange, pink, purple, red, teal, violet, white, yellow
\DeclareMathOperator*{\argmin}{argmin}
\DeclareMathOperator*{\argmax}{argmax}
\DeclareMathOperator*{\plim}{plim}
\newcommand{\space}{\;\;}
\newcommand{\bspace}{\;\;\;\;}
\newcommand{\Bspace}{\;\;\;\;\;\;}
\newcommand{\bbspace}{\;\;\;\;\;\;\;\;}
\newcommand{\BBspace}{\;\;\;\;\;\;\;\;\;\;}
\newcommand{\QQQ}{\boxed{?\:}}
\newcommand{\void}{\left.\right.}
\newcommand{\myEmphy}[2][#880015]{\color{#1}{#2}}
\newcommand{\myEmphyQ}{\color{#880015}}
\newcommand{\myBox}[2][9px, border:2px solid #880015]{\bbox[#1]{#2}}
\newcommand{\myBoxQ}{\bbox[9px, border:2px solid #880015]}
\newcommand{\ffrac}[2]{\displaystyle{\frac{#1}{#2}}}
\newcommand{\d}[1]{\displaystyle{#1}}
\newcommand{\Tran}[1]{{#1}^{\mathrm{T}}}
\newcommand{\CB}[1]{\left\{ #1 \right\}}
\newcommand{\SB}[1]{\left[ #1 \right]}
\newcommand{\P}[1]{\left( #1 \right)}
\newcommand{\abs}[1]{\left| #1 \right|}
\newcommand{\norm}[1]{\left\| #1 \right\|}
\newcommand{\given}[1]{\left. #1 \right|}
\newcommand{\using}[1]{\stackrel{\mathrm{#1}}{=}}
\newcommand{\asim}{\overset{\text{a}}{\sim}}
\newcommand{\RR}{\mathbb{R}}
\newcommand{\EE}{\mathbb{E}}
\newcommand{\II}{\mathbb{I}}
\newcommand{\NN}{\mathbb{N}}
\newcommand{\ZZ}{\mathbb{Z}}
\newcommand{\QQ}{\mathbb{Q}}
\newcommand{\PP}{\mathbb{P}}
\newcommand{\AcA}{\mathcal{A}}
\newcommand{\FcF}{\mathcal{F}}
\newcommand{\AsA}{\mathscr{A}}
\newcommand{\FsF}{\mathscr{F}}
\newcommand{\dd}{\mathrm{d}}
\newcommand{\I}[1]{\mathrm{I}\left( #1 \right)}
\newcommand{\N}[1]{\mathcal{N}\left( #1 \right)}
\newcommand{\Exp}[1]{\mathrm{E}\left[ #1 \right]}
\newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]}
\newcommand{\Avar}[1]{\mathrm{Avar}\left[ #1 \right]}
\newcommand{\Cov}[1]{\mathrm{Cov}\left( #1 \right)}
\newcommand{\Corr}[1]{\mathrm{Corr}\left( #1 \right)}
\newcommand{\ExpH}{\mathrm{E}}
\newcommand{\VarH}{\mathrm{Var}}
\newcommand{\AVarH}{\mathrm{Avar}}
\newcommand{\CovH}{\mathrm{Cov}}
\newcommand{\CorrH}{\mathrm{Corr}}
\newcommand{\ow}{\text{otherwise}}
\newcommand{\wp}{\text{with probability }}
\newcommand{\FSD}{\text{FSD}}
\newcommand{\SSD}{\text{SSD}}$$

and it satisfies $\text{MLR}.4$. In most applications, to implement RESET we add squared and cubed terms and it will be like

$$y = \beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k + \delta_1 \hat y^2 + \delta 1 \hat y^3+\text{error}$$

Similar to **White test** before, however here the ***RESET*** is the $F$ statistic for testing $H_0: \delta_1= \delta_2 = 0$. A significant $F$ statistic suggests some sort of functional form problem. In large sample, $F$ statistic is assymptotic $F_{2,n-k-3}$ distributed under the null hypothesis, and the Gauss-Markov assumptions..

While all these done, a rejection from **RESET** provides no real direction on how to develop the model. Also, **RESET** gives misguided information when testing the unobserved **omitted variables** or **heteroskedasticity**. It can be shown that if the omitted variables have expectations that are *linear* in the included independent variables in the model, **RESET** will fail to detect them. Further, if the functional form is *properly specified*, **RESET** has no power for detecting **heteroskedasticity**. 

### Tests against Nonnested Alternatives
It is possible to test the model $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + u$ against $y = \beta_0 + \beta_1 \log x_1 + \beta_2 \log x_2 + u$, and vice versa. However, these are nonnested models (meaning neither equation is a special case of the other), so a standard $F$ test is NOT applicable. Here are two possible approaches.

The first is to construct a comprehensive model that contains each model as a special case and then to test the restrictions that led to each of the models.

$$y = \gamma_0 + \gamma_1 x_1 + \gamma_2 x_2 + \gamma_3 \log x_1 + \gamma_4 \log x_2 + u$$

First on $H_0: \gamma_3 = \gamma_4 = 0$, as the test for the first model. A significant $F$ statistic against a two-sided alternative means the first model is NOT significantly precise to use. Then on $H_0: \gamma_1 = \gamma_2 = 0$ and blah blah blah.

The second approach is based on the idea that the fitted values from the not-true model should be insignificant in the already proved true model. For instance, to test the first model, we first estimate the second one by OLS to obtain the fitted values, named $\hat{\hat y}$. Then the ***Davidson-MacKinnon test*** is based on the $t$ statistic on $\hat{\hat y}$ in the equation $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \theta_1 \hat{\hat y} +\text{error}$. A significant $t$ statistic, against a two-sided alternative, is a rejection of the first one. Then do the same thing on the second model to decide whether to reject that.

## Using Proxy Variables for Unobserved Explanatory Variables
A more difficult problem arises when a model excludes a key variable, usually because of data unavailability. So now to avoid omitted variables bias we can find a ***proxy variable*** for the omitted variable, which, loosely speaking, is something that is related (*correlated*) to the unobserved variable. The following model with three independent variables can illustrate the idea. 

$$y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3^* + u$$

Assume that the data are available on $y$, $x_1$ and $x_2$ while $x_3^*$ is unobserved. So we find a proxy variable for it, named $x_3$. Their relation is captured as

$$x_3^* = \delta_0 + \delta_3 x_3 + v_3$$

Typically we think of $x_3^*$ and $x_3$ as being positively related, so that $\delta_3>0$. If $\delta_3 = 0$, then $x_3$ is not a suitable proxy for $x_3^*$. $v_3$ is the error from non-perfect relation and $\delta_0$ is from the  different scales in measuring the two. 

After all these, we do regression of $y$ on $x_1$, $x_2$ and $x_3$. We call this the ***plug-in solution to the omitted variables problem*** because $x_3$ is just plugged in for $x_3^*$ before we run OLS. And to make sure this way provides consistent estimators of $\beta_1 $ and $\beta_2$, we have some assumptions.

1. No correlation between error $u$ and $x_1$, $x_2$, $x_3$, $x_3^*$. The proxy $x_3$ is here, meaning that it is irrelevant in the true model. We write $\Corr{u,x_1} = \Corr{u,x_2} = \Corr{u,x_3} = \Corr{u,x_3^*} = 0$, or shortly, $\Exp{u\mid x_1,x_2,x_3.x_3^*} = 0$. Otherwise, we should add $x_3$ in the real model
2. No correlation between error $v_3 $ and $x_1$, $x_2$, $x_3$, so that $x_3$ is a good proxy for $x_3^*$. We wirte $\Exp{x_3^* \mid x_1,x_2,x_3} = \Exp{x_3^* \mid ,x_3} = \delta_0 + \delta_3 x_3$. Here it means once $x_3$ is controlled for, the expected value of $x_3^*$ does not depend on $x_1$ and $x_2$. Otherwise, $x_1$ and $x_2$ would appear in the regression for the omitted model.

Then the model changes to $y = \P{\beta_0 + \beta_3 \delta_0} + \beta_1 x_1 + \beta_2 x_2 + \beta_3 \delta_3 x_3 + u + \beta_3 v_3$. Let $e = u + \beta_3 v_3$, $\alpha_0 = \beta_0 + \beta_3 \delta_0$, $\alpha_3 = \beta_3 \delta_3$ we have

$$y = \alpha_0 + \beta_1 x_1 + \beta_2 x_2 + \alpha_3 x_3 + e$$

and with this we can obtain the unbiased (at least consistent) estimators of $\alpha_0$, $\beta_1$, $\beta_2$ and $\alpha_3$. 

**e.g.**

Estimate $\log\P{\text{wage}} = \beta_0 + \beta_1 \cdot\text{educ} + w$, $\log\P{\text{wage}} = \beta_0 + \beta_1 \cdot \text{educ} + \beta_2\cdot \text{IQ} + u$. Complete table in textbook.

$$
\begin{array}{cccc} \hline
\text{independent } r.v. & \P 1 & \P 2 & \P 3\\\hline
\text{educ} & 0.065 & 0.054 & 0.018\\
& \P{0.006} & \P{0.007} & \P{0.041} \\
\text{others} & \vdots & \vdots & \vdots\\
\text{IQ} & \text{N/A} & 0.0036 & -0.0009 \\
& & \P{0.0010} & \P{0.0052}\\
\text{educ}\cdot\text{IQ} & \text{N/A} & \text{N/A} & 0.00034\\
& & & \P{0.00038}\\
\beta_0 &&&\\
n&&&\\
R^2 & 0.253 & 0.263 & 0.263\\\hline
\end{array}$$

$\P 1$ Explain the Omitted Variable ($\text{IQ}$) Bias here

> This leads to an increase in $\text{educ}$. The formula is derived: assume $\text{IQ} = \delta_0 + \delta_1 \text{educ} + v$. Then the true model changes to $y = \P{\beta_0 + \beta_2 \delta_0} + \P{\beta_1 + \beta_2 \delta_1}\text{educ} + \P{\beta_2 v + u}$. Write the estimated wrong model as $\tilde y = \tilde \beta_0 + \tilde \beta_1 x_1$, then we have
>
>$$\tilde \beta_1 = \hat \beta_1 + \hat \beta_2 \tilde \delta_1$$
>
>Thus, $\text{Bias}\P{\tilde \beta_1} = \Exp{\tilde \beta_1} - \beta_1 = \beta_2 \tilde \delta_1$. And of course $\beta_2$ and $\tilde \delta_1$ are positive, we have a positive omitted variable bias.
>
>More generally, $\Exp{\tilde \beta_j} = \Exp{\hat\beta_j + \hat\beta_k\tilde\delta_j} = \beta_j + \beta_k\ffrac{\sum_i\P{x_{ij} - \bar x_j}x_{ik}}{\sum_i\P{x_{ij} - \bar x_j}^2}$. Here $k$th $r.v.$ is omitted $\tilde\delta_j = \Cov{x_{j},x_k}$

$\P 2$ How's $\text{IQ}$?

>Fail to increase $R^2$ by a large step. And it makes few change in the estimated results. Also, the $\text{blake}$ $r.v.$ remains.

$\P 4$ Why adding $\text{educ}\cdot\text{IQ}$ results in a small $\text{educ}$

>After adding $\text{educ}\cdot\text{IQ}$, the coefficient on $\text{educ}$ measures its effect on $\log\P{\text{wage}}$ when $\text{IQ} = 0$. And the partial effect of education is $\beta_1 + \beta_{\text{IQ}\cdot\text{IQ}}$. And since there's almost no one with zero $\text{IQ}$, taking the average $\text{IQ}$ as $100$, we have the estimated return to education from column $\P 3$ is $0.018 + 0.00034\times 100 = 0.052$, which is about the value in column $\P 2$.
***

### Using Lagged Dependent Variables as Proxy Variables
Sometimes we have no idea how to obtain a proxy for certain omitted variables. In such cases, we can include, as a control, the value of the dependent variable from an *earlier time period*. This is especially useful for policy analysis. 

Using a **lagged dependent variable** provides a simple way to account for historical factors that cause current differences in the dependent variable that are difficult to account for in other ways. Some inertial effects are captured by putting in lags of $y$.

## Models with Random Slopes
## Properties of OLS under Measurement Error
## Missing Data, Nonrandom Samples, and Outlying Observations
## Least Absolute Deviations Estimation