# VIOLATION OF EXOGENEITY
<br>


## Introduction

<br>
Let's start by saying that what we primarily want to avoid is the issue of Endogeneity. In these notebook we will define what endogeneity is, what are the possible causes, and finally the effects it has on linear regression.

<br>
An endogeneity problem occurs when an <b>independent</b> variable is correlated with the error term $\boldsymbol{\varepsilon}$ (we will see that the same independent variable is uncorrelated with residuals by construction).

<br>
Assuming that the relationship between $\mathbf{X}$ and $\mathbf{Y}$ can be described by a linear model, we can express $\mathbf{y}$ as : 

<br>
<blockquote>
$
   \mathbf{Y} \ = \ \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}
$
</blockquote>


<br>
We can think of this equation in a number of ways:

<br>
<ul style="list-style-type:square">
    <li>as a convenient way of predicting $\mathbf{Y}$ based on $\mathbf{X}$ values;</li>
    <br>
    <li>as a convenient way of modeling $\mathbf{E}[\mathbf{Y} \mid \mathbf{X}]$;</li>
    <br>
    <li>
        as embodying causation; for example, we can think of $\boldsymbol{\beta_i}$ 
        as the answer to the question "what would happen to $\mathbf{Y}$ if I reached in to this system 
        and experimentally increased $\boldsymbol{\mathbf{X}_i}$ by one unit?"
    </li> 
</ul>

<br>
In the first two cases there is no such thing as endogeneity, and we don't need to worry about it. Endogeneity is only a problem if we want to recover causal effects (unlike mere correlations), as in the last case. What is the difference then, in the context of regression ?

<br>
The regression coefficient $\boldsymbol{\beta_i}$ is also called "path coefficient" and quantifies the (direct) causal
effect of $\boldsymbol{\mathbf{X}_i}$ on $\mathbf{Y}$ ; given the numerical values of $\boldsymbol{\beta_j}$ and $\boldsymbol{\varepsilon}$, the equation claims that a unit increase for $\boldsymbol{\mathbf{X}_i}$ would result in $\boldsymbol{\beta_i}$ units increase of $\mathbf{Y}$, regardless of the values taken by other variables in the model, and regardless of whether the increase in $\boldsymbol{\mathbf{X}_i}$ originates from external or internal influences.

<br>
Whenever we are inferring on the basis of the regression coefficients, we are thinking of the model in terms of causality. We should not worry about endogeneity if we only look at the predictions and the value of the coefficients does not really matter to us (provided the predictions are good).

<br>
It is very common, when talking about linear regression, to use the causal interpretation of the coefficients while pretending not to be introducing causation, thus the understandable confusion; classical textbooks often do not make this distinction, either.


## Causes

<br>
Endogeneity can arise as a result of : 

<br>
<ul style="list-style-type:square">
    <li>
        measurement error <br>
        [<b>C1</b>]
    </li>
    <br>
    <li>
        omitted variables (an uncontrolled confounder causing both independent and dependent variables of a model) <br>
        [<b>C2</b>]
    </li>    
    <br>
    <li>
        simultaneous causality (between the independent and dependent variables) <br>
        [<b>C3</b>]
    </li>
    <br>
    <li>
        autoregression with autocorrelated errors <br>
        [<b>C4</b>]
    </li>
</ul>

<br>
Exogeneity is articulated in such a way that even if a variable (or variables) is exogenous for a certain parameter $\boldsymbol{\beta_j}$, it might be endogenous for parameter $\boldsymbol{\beta_p}$.


## [C1] Measurement Error

<br>
Variables of interest are sometimes hard to measure accurately. The difficulty may arise from practical constraints in obtaining the data, from conceptual imprecision etc. 

<br>
What is the effect of measurement error on a simple linear regression analysis? The effects are quite different depending on whether the measurement error is in the dependent or independent variable.


### [C1.1] Measurement Error in the dependent variable

<br>
Suppose that, instead of the true values for the dependent variable, we can only observe 
$
\boldsymbol{\mathbf{Y}^*} = \mathbf{Y} + \boldsymbol{\nu}
$; 
<br>
due to the measurement error, $\mathbf{Y}$ is no longer observable.

<br>
Now also suppose $\mathrm{E}[\boldsymbol{\nu}] = 0$. In other words, the measurement error $\boldsymbol{ \nu }$ does not systematically increase or decrease the measured value of the dependent variable ($\boldsymbol{\mathbf{Y}^*}$ will not systematically overestimate or underestimate the actual value $\mathbf{Y}$).

<br>
The original equation specifying the (assumed) linear relationship between the dependent and the explanatory variables was 
$ 
    \mathbf{Y}  
    = \mathbf{X}\boldsymbol{\beta} + {\boldsymbol{\varepsilon}} 
$;
<br>
now, since $\mathbf{Y}$ is unknown, we have to use $\boldsymbol{\mathbf{Y}^*}$ and the equation changes into 
$ 
    \mathbf{y}  
    = \boldsymbol{\mathbf{Y}^*} - \boldsymbol{\nu}
    = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon} 
$.

<br>
Solving for $\boldsymbol{\mathbf{Y}^*}$ (and letting $\boldsymbol{\upsilon} = \boldsymbol{\varepsilon} + \boldsymbol{\nu}$) we obtain 
$ 
{\displaystyle 
    \boldsymbol{\mathbf{Y}^*}
    = \mathbf{X}\boldsymbol{\beta} + (\boldsymbol{ \varepsilon } + \boldsymbol{ \nu })
    = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\upsilon}
} 
$. <br>

<br>
Let's determine if the explanatory variables and the new disturbance term are correlated or not. Looking at the last formula, we see that increasing the noise in the dependent variable $\boldsymbol{\nu}$ will also cause the disturbance term $\boldsymbol{\upsilon}$ to increase, but $\mathbf{X}$ will remain unchanged. Hence, the explanatory variable and the new disturbance term are independent; no bias should result. 

<br>
Experiments confirm that, when measurement error is present in the dependent variable, the average of the estimated coefficient corresponds to the value we would have without error. 

<br>
Measurement error in the dependent variable does not lead to bias, but does it affect our estimation in other ways ?

<br>
As the variance of the measurement error increases, the variance of the estimated coefficient values also increases : we are introducing more "uncertainty" into the process and consequently the ordinary least squares estimates become less reliable.


### [C1.2] Measurement Error in the independent variable

<br>
Let's now repeat the process and the considerations in the case measurement error occurs in the independent variables. Suppose that, instead of the true values for the dependent variable, we can only observe 
$
\boldsymbol{\mathbf{X}^*} = \mathbf{X} + \boldsymbol{\nu}
$; 
due to the measurement error, $\mathbf{X}$ is no longer observable.

<br>
We know that the linear assumption was originally described by the equation 
$\mathbf{Y} = {\boldsymbol{\beta}}\mathbf{X} + \boldsymbol{\varepsilon}$;
<br>
which has now changed into <br><br>
$
    \quad
    \begin{align}
    \mathbf{Y}  
    &= (\mathbf{X}^{*} -  \boldsymbol{\nu}) \boldsymbol{ \beta } + \boldsymbol{\varepsilon} 
    \newline
    &=  \mathbf{X}^{*} \boldsymbol{ \beta } - \boldsymbol{\nu}\boldsymbol{ \beta } + \boldsymbol{\varepsilon} 
    \newline
    &=  \mathbf{X}^{*}\boldsymbol{ \beta } + (\boldsymbol{\varepsilon} - \boldsymbol{\nu}\boldsymbol{\beta}) 
    \newline
    &=  \mathbf{X}^{*}\boldsymbol{\beta} + \boldsymbol{\upsilon}
\end{align}
$.

<br>
Again, let's determine if the (observed) explanatory variables and the new disturbance term are correlated or not. Looking at the last formula, we see that the answer depends on the actual coefficient $\boldsymbol{ \beta }$ :

<br>
<ul style="list-style-type:square">
    <li>
        $\boldsymbol{\beta} > 0$ : when the actual coefficient is positive, the correlation between $\mathbf{X}^{*}$ and
        $\boldsymbol{\upsilon}$ is negative;<br>
        consequently, the OLS estimation of the coefficient <b>is biased downward</b> (toward 0).<br>
        It means that increasing the measurement error $\boldsymbol{ \nu }$ will cause $\mathbf{X}^{*}$ to also increase, 
        but $\boldsymbol{\upsilon}$ will decrease.
    </li>
    <br>
    <li>
        $\boldsymbol{\beta} < 0$ : when the actual coefficient is negative, the correlation between $\mathbf{X}^{*}$ and
        $\boldsymbol{\upsilon}$ is positive;<br>
        consequently, the OLS estimation of the coefficient <b>is biased upward</b> (toward 0).<br>
        It means that increasing the measurement error $\boldsymbol{ \nu }$ will cause $\mathbf{X}^{*}$ to decrease, 
        but $\boldsymbol{\upsilon}$ will increase.
    </li>
    <br>
    <li>
        $\boldsymbol{\beta} = 0$ : when the actual coefficient equals 0, no correlation exists;<br>
        consequently, the OLS estimation is not biased.<br>
        It means that increasing the measurement error $\boldsymbol{\nu}$ will cause $\mathbf{X}^*$ to also increase,
        but $\boldsymbol{\upsilon}$ is not affected;<br>
        the impact of the explanatory variables in this case is none anyway.
    </li>
</ul>

<br>
Experiments have shown that, while measurement error in the explanatory variables leads to bias, the latter never appears to be strong enough to change the sign of the mean of the coefficient estimates. The estimated coefficients will be biased toward zero while retaining their sign : this type of bias is called <b>attenuation bias</b>.



### [C1 | Further Analysis]

<br>
Why does measurement error in the explanatory variables cause attenuation bias? Even more basic, why does it cause bias at all? After all, the chances that the measured value of the explanatory variable will be too high equal the chances it will be too low. 

<br>
To understand why, let's see how a change in the observed explanatory variable $\mathbf{X}^{*}$ affects the original dependent variable $\mathbf{y}$.<br> 
In the following analysis we'll assume $\boldsymbol{ \beta } > 0$, but the conclusions hold true for 
$\boldsymbol{ \beta } < 0$ by analogy.

<br>
When $\mathbf{X}^{*}$ ( $= \boldsymbol{ \mathbf{X} } + \boldsymbol{ \nu }$) rises, it may do so for two reasons:

<br>
<ul style="list-style-type:square">
    <li>
        the original regressor $\boldsymbol{ \mathbf{X} }$ has risen;<br>
        in this case the actual value of the dependent variable 
        $\mathbf{y}$ ( $= \mathbf{X}^* \boldsymbol{ \beta } + \boldsymbol{\upsilon}$) 
        rises because the actual value of the explanatory variable has risen;<br>
        OLS will produce "correct" estimates of the coefficient
    </li>
    <br>
    <li>
        the measurement error $\boldsymbol{ \nu }$ has risen;<br>
        in this case the actual value of the dependent variable
        $\mathbf{y}$ ( $= \mathbf{X}^* \boldsymbol{ \beta } + \boldsymbol{\upsilon}$) 
        remains unchanged because the actual value of the explanatory variable has not changed;<br>
        OLS will estimate the value of the coefficient to be 0. 
    </li>
</ul>

<br>
Overall, the estimation procedure will understate the effect that the actual value of the explanatory variable has on the dependent variable, and consequently underestimate the value of the regression coefficient.

<br>
We have shown that when measurement error occurs in the explanatory variables and the actual coefficients are non-zero, the OLS
estimation procedure for the coefficient value is biased. But perhaps it is at least consistent.

<br>
Experiments have shown that increasing the sample size does not lessen the bias : the OLS estimation of the regression coefficients is biased and not consistent.


## MMM...
$
    \quad
    \boldsymbol{ \mathbf{\hat{y}}_{i}^{*} }
    = (\boldsymbol{ {\mathbf{\hat{y}}_{i}}^{*} } + \boldsymbol{ \hat{\nu}_{i} })
    = \boldsymbol{ \hat{\alpha} } + \boldsymbol{ \hat{\beta} } \boldsymbol{ \mathbf{X}_{i} } + {\boldsymbol{\mathbf{e}_{i}}}
$

<br>
and originate the following coefficients : <br>

$
    \quad
    \begin{align}
        \boldsymbol{ \hat{\beta} } 
            &= \frac
                {\mathrm{Cov}(\boldsymbol{ \mathbf{\hat{y}}_{i}^{*} }, \boldsymbol{ \mathbf{X}_{i} })}
                {\mathrm{Var}(\boldsymbol{ \mathbf{X}_{i} })} 
            = \frac
                {\mathrm{Cov}(
                    \boldsymbol{ \mathbf{\hat{y}}_{i} } + \boldsymbol{ \hat{\nu}_{i} }
                    , \boldsymbol{ \mathbf{X}_{i} }
                    )
                }
                {\mathrm{Var}(\boldsymbol{ \mathbf{X}_{i} })
            } 
            = \frac
                {\mathrm{Cov}(
                    \boldsymbol{ \hat{\alpha} } 
                    + \boldsymbol{ \hat{\beta} } \boldsymbol{ \mathbf{X}_{i} }
                    + \boldsymbol{\epsilon_i} + \boldsymbol{\nu_i},\boldsymbol{ \mathbf{X}_{i} })}
                {\mathrm{Var}(\boldsymbol{ \mathbf{X}_{i} })} 
            \newline
            &= 
                \frac
                    { \mathrm{Cov}(\boldsymbol{\alpha},\boldsymbol{\mathbf{X}_i}) }
                    { \mathrm{Var}(\boldsymbol{\mathbf{X}_i}) } 
                + \boldsymbol{\beta} \frac
                    { \mathrm{Cov}(\boldsymbol{\mathbf{X}_i},\boldsymbol{\mathbf{X}_i}) }
                    { \mathrm{Var}(\boldsymbol{\mathbf{X}_i}) } 
                + \frac
                    { \mathrm{Cov}(\boldsymbol{\epsilon_i},\boldsymbol{\mathbf{X}_i}) }
                    { \mathrm{Var}(\boldsymbol{\mathbf{X}_i}) }
                + \frac
                    { \mathrm{Cov}(\boldsymbol{\nu_i},\boldsymbol{\mathbf{X}_i}) }
                    { \mathrm{Var}(\boldsymbol{\mathbf{X}_i}) } 
                \newline
            &= \boldsymbol{\beta} 
                \frac
                    { \mathrm{Var}(\boldsymbol{\mathbf{X}_i}) }
                    { \mathrm{Var}(\boldsymbol{\mathbf{X}_i}) } 
            = \boldsymbol{\beta}
    \end{align}
$


## [C2] Omitted Variable

<br>
In statistics, omitted-variable bias (OVB) occurs when a model created incorrectly leaves out one or more confounding variables. Two conditions must hold true for omitted-variable bias to exist in linear regression :
<ul style="list-style-type:square">
    <li>
        the omitted variable must be a determinant of the dependent variable (its true regression coefficient is not zero);
    <li>
        the omitted variable must be correlated with an independent variable specified in the regression
    </li>
</ul>

<br>
Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations.

<br>
Suppose that the relationship to be estimated is 
$
    \quad
    \mathbf{Y} = 
        \boldsymbol{\alpha} 
        + (\boldsymbol{\beta} \mathbf{X} + \boldsymbol{\gamma} \mathbf{Z}) 
        + \boldsymbol{\varepsilon}
$ 
<br>
and suppose the relation between $\mathbf{X}$ and $\mathbf{Z}$ is given by 
$ 
    \quad
    \mathbf{Z} = \boldsymbol{\delta} + \boldsymbol{\omega} \mathbf{X} + \boldsymbol{\nu}
$ .

<br>
Substituting the second equation into the first one we obtain : <br>

$
    \quad
    \begin{align}
        \mathbf{Y} 
            &= 
            \newline
            &=
                \boldsymbol{\alpha} 
                + \mathbf{X} \boldsymbol{\beta}
                + \boldsymbol{\gamma} (\boldsymbol{\delta} + \mathbf{X} \boldsymbol{\omega} + \boldsymbol{\nu})
                + \boldsymbol{\varepsilon}
            \newline
            &= 
                (\boldsymbol{\alpha} + \boldsymbol{\gamma} \boldsymbol{\delta})
                + \mathbf{X} (\boldsymbol{\beta} + \boldsymbol{\gamma} \boldsymbol{\omega}) 
                + (\boldsymbol{\varepsilon} + \boldsymbol{\gamma}\boldsymbol{\nu})
    \end{align}
$ 

<br>
If the regression of $\mathbf{Y}$ is conducted upon $\mathbf{X}$ only, this last equation is what will be estimated. The estimated regression coefficient of $\mathbf{X}$ ($\boldsymbol{\beta} + \boldsymbol{\gamma} \boldsymbol{\omega}$) does not simply represent (an estimate of) the desired direct effect of $\mathbf{X}$ upon $\mathbf{y}$ (which is $\boldsymbol{\beta}$), but rather of its sum with the indirect effect. 

<br>
By omitting the confounding variable $\mathbf{Z}$ from the regression, we have estimated the total derivative of $\mathbf{Y}$ with respect to $\mathbf{X}$ rather than its partial derivative with respect to $\mathbf{X}$. The two derivatives differ if both $\boldsymbol{\gamma}$ and $\boldsymbol{\omega}$ are non-zero.

<br>
The presence of OVB violates the assumption of exogeneity (the disturbance term is now correlated with the regressors), and this violation causes the OLS estimation to be biased and inconsistent. The <b>direction of the bias</b> depends on the estimators as well as the covariance between the regressors and the omitted variables. A positive covariance of the omitted variable with both a regressor and the dependent variable will lead the OLS estimate of the included regressor coefficient to be greater than the true value of that coefficient. This effect can be seen by taking the expectation of the parameter.


## [C3] Simultaneity

<br>
Suppose that two variables are jointly determined (or "codetermined"), with each affecting the other. <br>
Suppose we have the two following structural equations (where $\boldsymbol{\mathbf{Z_1}}$ and $\boldsymbol{\mathbf{Z_2}}$ are two exogenous explanatory variables) : 

<br>
$
    \quad
    \begin{align}    
        \mathbf{Y} &= 
              \boldsymbol{\alpha_1} 
            + \boldsymbol{\beta_1}\mathbf{X} 
            + \boldsymbol{\omega_1}\mathbf{Z_1} 
            + \boldsymbol{\varepsilon}
        \newline
        \mathbf{X} &= 
              \boldsymbol{\alpha_2} 
            + \boldsymbol{\beta_2}\mathbf{y} 
            + \boldsymbol{\omega_2}\mathbf{Z_2} 
            + \boldsymbol{\upsilon}
    \end{align}
$ 

<br>
Solving the first equation for $\mathbf{y}$ (and the second equation for $\mathbf{X}$), re-arranging (and assuming the denominator $\neq 0$), we obtain : <br>

$
    \quad
    \begin{align}
        \mathbf{Y} 
        &= 
        \newline
        &= \frac
            {
                \boldsymbol{\alpha_1} 
                + \boldsymbol{\beta_1} 
                (
                    \boldsymbol{\alpha_2} 
                    + \boldsymbol{\omega_2}\mathbf{Z_2} 
                    + \boldsymbol{\upsilon}
                )  
                + \boldsymbol{\omega_1}\mathbf{Z_1} 
                + \boldsymbol{\varepsilon}
            }
            {(1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2})}
            \newline
        &= \frac
            {
                (\boldsymbol{\alpha_1} + \boldsymbol{\beta_1} \boldsymbol{\alpha_2})
                + (\boldsymbol{\beta_1} \boldsymbol{\omega_2}\mathbf{Z_2} + \boldsymbol{\omega_1}\mathbf{Z_1})
                + (\boldsymbol{\beta_1} \boldsymbol{\upsilon} + \boldsymbol{\varepsilon})
            }
            {(1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2})}
            \newline
        &= 
            \boldsymbol{\delta_1} 
            + (\boldsymbol{\pi_{11}} \mathbf{Z_1} + \boldsymbol{\pi_{12}} \mathbf{Z_2})
            + \boldsymbol{\mu_1}
    \end{align}
$

<br>

$
    \quad
    \begin{align}
        \mathbf{X} 
        &= 
        \newline
        &= \frac
            {
                \boldsymbol{\alpha_2} 
                + \boldsymbol{\beta_2} 
                (
                    \boldsymbol{\alpha_1} 
                    + \boldsymbol{\omega_1}\mathbf{Z_1} 
                    + \boldsymbol{\varepsilon}
                )  
                + \boldsymbol{\omega_2}\mathbf{Z_2} 
                + \boldsymbol{\upsilon}
            }
            {(1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2})}
            \newline
        &= \frac
            {
                (\boldsymbol{\alpha_2} + \boldsymbol{\beta_2} \boldsymbol{\alpha_1})
                + (\boldsymbol{\beta_2} \boldsymbol{\omega_1}\mathbf{Z_1} + \boldsymbol{\omega_2}\mathbf{Z_2})
                + (\boldsymbol{\beta_2} \boldsymbol{\varepsilon} + \boldsymbol{\upsilon})
            }
            {(1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2})}
            \newline
        &= 
            \boldsymbol{\delta_2} 
            + (\boldsymbol{\pi_{21}} \mathbf{Z_1} + \boldsymbol{\pi_{22}} \mathbf{Z_2})
            + \boldsymbol{\mu_2}
    \end{align}
$

<br>
Notice how we assumed $(1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2}) \neq 0 $, whether this assumption is reasonable (or restrictive) depends by the application, the context. 

<br>
These last equations, which express $\mathbf{Y}$ and $\mathbf{X}$ in terms of the exogenous variables and the error terms, are the "reduced form" equations for (respectively) $\mathbf{Y}$ and $\mathbf{X}$. The parameters $\boldsymbol{\pi}$ are called "reduced form parameters"; they are nonlinear functions of the structural parameters which appear in the original structural equations. The reduced form errors $\boldsymbol{\mu}$ are linear functions of the structural error terms $\boldsymbol{\varepsilon}$ and $\boldsymbol{\upsilon}$.

<br>
In the first structural equation $\mathbf{Z_1}$ and $\boldsymbol{\varepsilon}$ are uncorrelated by assumption, so the issue is whether $\mathbf{X}$ and $\boldsymbol{\varepsilon}$ are uncorrelated. From the reduced form of $\mathbf{X}$ we see that the former and the latter are correlated if and only if $\boldsymbol{\mu}$ and $\boldsymbol{\varepsilon}$ are correlated (the other terms were assumed exogenous); but $\boldsymbol{\mu}$ is a linear function of $\boldsymbol{\varepsilon}$ and $\boldsymbol{\upsilon}$, thus $\mathbf{X}$ is generally correlated with $\boldsymbol{\varepsilon}$.

<br>
By now, we already know that this correlation between the independent variables $\mathbf{X}$ and the disturbance term $\boldsymbol{\varepsilon}$ will lead to a biased OLS estimate of the regression coefficients, but <b>will our OLS estimator at least be consistent</b> ? 


<br>
Let's take a look at the partial derivatives of our minimization objective :

<br>
$
    \quad
    \begin{align}
        \boldsymbol{S}
        (
            \boldsymbol{\hat{\alpha}_1}, 
            \boldsymbol{\hat{\beta}_1}, 
            \boldsymbol{\hat{\omega}_1}
        )
        \newline
        &= 
            \min\limits_
            {
                \boldsymbol{\hat{\alpha}_1}, 
                \boldsymbol{\hat{\beta}_1}, 
                \boldsymbol{\hat{\omega}_1}
            } 
            \sum_{i=1}^{m} \boldsymbol{\mathbf{e}_{i}^{\ 2}}
        \newline
        &= 
            \min\limits_
            {
                \boldsymbol{\hat{\alpha}_1}, 
                \boldsymbol{\hat{\beta}_1}, 
                \boldsymbol{\hat{\omega}_1}
            } 
            \sum_{i=1}^{m} (\boldsymbol{\mathbf{Y}_i} - \boldsymbol{\hat{\mathbf{Y}}_i})^\boldsymbol{2}
        \newline
        &= 
            \min\limits_
            {
                \boldsymbol{\hat{\alpha}_1}, 
                \boldsymbol{\hat{\beta}_1}, 
                \boldsymbol{\hat{\omega}_1}
            } 
            \sum_{i=1}^{m} 
            \big[
                \mathbf{Y}_i 
              - \boldsymbol{\hat{\alpha}_1}
              - \boldsymbol{\hat{\beta}_1}  \boldsymbol{\mathbf{X}_i}
              - \boldsymbol{\hat{\omega}_1} \boldsymbol{\mathbf{Z_1}_i}
            \big]^\boldsymbol{2}
      \end{align}
$

<br>
$
    \quad
    \begin{align}
        \dfrac
            {\partial \ S}
            {\partial \ \boldsymbol{\hat{\alpha}_1} }
        &=  \quad
            {
                \sum_{i=1}^{m} (-2) \ 
                (
                    \mathbf{Y}_i 
                    - \boldsymbol{\hat{\alpha}_1} 
                    - \boldsymbol{\hat{\beta}_1}  \boldsymbol{\mathbf{X}_i} 
                    - \boldsymbol{\hat{\omega}_1} \boldsymbol{\mathbf{Z_1}_i}
                )
            } = 0  
        \newline         
        & \Rightarrow \quad
            \sum_{i=1}^{m} (\boldsymbol{\hat{\alpha}_1} )
            =   \sum_{i=1}^{m} (\mathbf{Y}_i) 
              + \sum_{i=1}^{m} (- \boldsymbol{\hat{\beta}_1}  \boldsymbol{\mathbf{X}_i} ) 
              + \sum_{i=1}^{m} (- \boldsymbol{\hat{\omega}_1} \boldsymbol{\mathbf{Z_1}_i} )  
        \newline
        & \Rightarrow \quad
            m \ \boldsymbol{\hat{\alpha}_1} 
            =   m \ \overline{\mathbf{Y}} 
              - m \ \boldsymbol{\hat{\beta}_1}  \overline{\mathbf{X}} 
              - m \ \boldsymbol{\hat{\omega}_1} \overline{\mathbf{Z_1}}
        \newline \newline       
        & \Rightarrow \quad
            \boldsymbol{\hat{\alpha}_1} 
            =   \overline{\mathbf{Y}} 
              - \boldsymbol{\hat{\beta}_1}  \overline{\mathbf{X}} 
              - \boldsymbol{\hat{\omega}_1} \overline{\mathbf{Z_1}}
    \end{align}
$


<br> <br>
$
    \quad
    \begin{align}
        \dfrac
            {\partial \ S}
            {\partial \ \boldsymbol{\hat{\beta}_1} }
        &=  \quad   
            {
                \sum_{i=1}^{m} (-2) 
                \big[
                      \mathbf{Y}_i 
                    - \boldsymbol{\hat{\alpha}_1} 
                    - \boldsymbol{\hat{\beta}_1}  \boldsymbol{\mathbf{X}_i}
                    - \boldsymbol{\hat{\omega}_1} \boldsymbol{\mathbf{Z_1}_i}
                \big]
            } \mathbf{X}_i = 0  
        \newline         
        &   \Rightarrow \quad
            \sum_{i=1}^{m} 
            \big[
                \boldsymbol{\mathbf{X}_i} \boldsymbol{\mathbf{Y}_i}
                \ - \ \boldsymbol{\hat{\alpha}_1} \boldsymbol{\mathbf{X}_i}
                \ - \ \boldsymbol{\hat{\beta}_1}  \boldsymbol{\mathbf{X}_i^{\ 2}}
                \ - \ \boldsymbol{\hat{\omega}_1} \boldsymbol{\mathbf{Z_1}_i} \boldsymbol{\mathbf{X}_i}
            \big] =
        \newline
        &   \qquad =
            \sum_{i=1}^{m} 
            \big[
                  \boldsymbol{\mathbf{X}_i} \boldsymbol{\mathbf{Y}_i}
                  \ - \
                  (
                      \overline{\mathbf{Y}} 
                      - \boldsymbol{\hat{\beta}_1} \overline{\mathbf{X}} 
                      - \boldsymbol{\hat{\omega}_1} \overline{\mathbf{Z_1}}
                  ) \mathbf{X}_i 
                  \ - \ \boldsymbol{\hat{\beta}_1}  \boldsymbol{\mathbf{X}_i^{\ 2}} 
                  \ - \ \boldsymbol{\hat{\omega}_1} \boldsymbol{\mathbf{Z_1}_i} \boldsymbol{\mathbf{X}_i}       
            \big]
        \newline
        &   \qquad =
            \sum_{i=1}^{m} (\boldsymbol{\mathbf{X}_i} \boldsymbol{\mathbf{Y}_i})
            \ - \ \overline{\mathbf{Y}} \sum_{i=1}^{m} (\mathbf{X}_i)
            \ + \ \boldsymbol{\hat{\beta}_1} \overline{\mathbf{X}} \sum_{i=1}^{m} (\boldsymbol{\mathbf{X}_i})
            \ + \ \boldsymbol{\hat{\omega}_1} \overline{\mathbf{Z_1}} \sum_{i=1}^{m} (\boldsymbol{\mathbf{X}_i})
            \ - \ \boldsymbol{\hat{\beta}_1} \sum_{i=1}^{m} \boldsymbol{\mathbf{X}_i^{\ 2}}
            \ - \ \boldsymbol{\hat{\omega}_1} \sum_{i=1}^{m} {\mathbf{Z_1}_i \mathbf{X}_i}  
        \newline         
        &   \Rightarrow \quad
            \boldsymbol{\hat{\beta}_1}
            \big[ \ 
                \overline{\mathbf{X}} m \ \overline{\mathbf{X}}
                - \sum_{i=1}^{m} { \boldsymbol{\mathbf{X}_i^{\ 2}} } 
            \ \big]
            = - \sum_{i=1}^{m} \big[ \boldsymbol{\mathbf{X}_i} \boldsymbol{\mathbf{Y}_i} \big]
              \ + \ \overline{\mathbf{Y}} m \ \overline{\mathbf{X}}
              \ - \ \boldsymbol{\hat{\omega}_1} \overline{\mathbf{Z_1}} m \ \overline{\mathbf{X}}
              \ - \ \boldsymbol{\hat{\omega}_1} \sum_{i=1}^{N} { \boldsymbol{\mathbf{Z_1}_i} \boldsymbol{\mathbf{X}_i} } 
        \newline \newline          
        &   \Rightarrow \quad
            \begin{aligned}[T]
                \boldsymbol{\hat{\beta}_1}
                &= \frac
                    {
                          \big[
                              \sum_{i=1}^{m} 
                                  (\boldsymbol{\mathbf{X}_i} - \overline{\mathbf{X}})
                                  (\boldsymbol{\mathbf{Y}_i} - \overline{\mathbf{Y}})
                          \big] 
                        + \big[
                            \hat{\omega}_1 \sum_{i=1}^{m} 
                            (\boldsymbol{\mathbf{X}_i} - \overline{\mathbf{X}})
                            (\boldsymbol{\mathbf{Z_1}_i} - \overline{\mathbf{Z_1}})
                          \big] 
                    }
                    {\sum_{i=1}^{m} (\boldsymbol{\mathbf{X}_i} - \overline{\mathbf{X}})^\boldsymbol{\ 2}}
                \newline
                &= \frac
                    {\mathrm{Cov}(\mathbf{X}, \mathbf{y}) + \boldsymbol{\hat{\omega}_1} \mathrm{Cov}(\mathbf{X}, \mathbf{Z_1})}
                    {\mathrm{Var}(\mathbf{X})}
                \newline    
                &= \frac
                    {\mathrm{Cov}(\mathbf{X}, \mathbf{Y})}
                    {\mathrm{Var}(\mathbf{X})}
            \end{aligned}       
    \end{align}
$

, assuming that $\mathbf{X}$ and $\mathbf{Z_1}$ are uncorrelated (we haven't omitted any confounding variable from the model).


<br>
Now let's express our estimate $\boldsymbol{\hat{\beta}_1}$ in terms of the correlation between $\mathbf{X}$ and $\boldsymbol{\varepsilon}$ :

<br>
$
\quad
\begin{align}
    \boldsymbol{\hat{\beta}_1}
    &= \frac
        {\sum_{i=1}^{m} (\mathbf{X}_i - \overline{\mathbf{X}})(\mathbf{y}_i - \overline{\mathbf{y}})}
        {\sum_{i=1}^{m} (\mathbf{X}_i - \overline{\mathbf{X}})^\boldsymbol{\ 2} }
    \newline
    &= \frac
        {\sum_{i=1}^{m} 
            (\mathbf{X}_i - \overline{\mathbf{X}})
            [
                (
                    \boldsymbol{\alpha_1} 
                    + \boldsymbol{\beta_1}\mathbf{X}_i
                    + \boldsymbol{\omega_1}\mathbf{Z_1}_i
                    + \boldsymbol{\varepsilon}_i 
                )
                - (\overline{
                    \boldsymbol{\alpha_1}
                    + \boldsymbol{\beta_1}\mathbf{X}_i
                    + \boldsymbol{\omega_1}\mathbf{Z_1}_i
                    + \boldsymbol{\varepsilon}_i}
                )
            ]
        }
        {\sum_{i=1}^{m} (\mathbf{X}_i - \overline{\mathbf{X}})^\boldsymbol{\ 2} }
    \newline
    &= \frac
        {\sum_{i=1}^{m} 
            (\mathbf{X}_i - \overline{\mathbf{X}})
            [
                (
                    \boldsymbol{\alpha_1}                    
                    + \boldsymbol{\beta_1}\mathbf{X}_i
                    + \boldsymbol{\omega_1}\mathbf{Z_1}_i
                    + \boldsymbol{\varepsilon}_i 
                )
                - (
                    \boldsymbol{\alpha_1}
                    + \boldsymbol{\beta_1} \overline{\mathbf{X}}
                    + \boldsymbol{\omega_1} \overline{\mathbf{Z_1}}
                    + \overline{\boldsymbol{\varepsilon}}
                )
            ]
        }
        {\sum_{i=1}^{m} (\mathbf{X}_i - \overline{\mathbf{X}})^\boldsymbol{\ 2} }
    \newline
    &= \frac
        {\sum_{i=1}^{m} 
            (\mathbf{X}_i - \overline{\mathbf{X}})
            [
                  \boldsymbol{\beta_1} (\mathbf{X}_i - \overline{\mathbf{X}})
                + \boldsymbol{\omega_1} (\mathbf{Z_1}_i - \overline{\mathbf{Z_1}})
                + (\boldsymbol{\varepsilon}_i - \overline{\boldsymbol{\varepsilon}})
            ]
        }
        {\sum_{i=1}^{m} (\mathbf{X}_i - \overline{\mathbf{X}})^\boldsymbol{\ 2} }
    \newline
    &= \frac
        {\sum_{i=1}^{m} 
              [\boldsymbol{\beta_1} (\mathbf{X}_i - \overline{\mathbf{X}})^\boldsymbol{\ 2} ]
            + [\boldsymbol{\omega_1}(\mathbf{X}_i - \overline{\mathbf{X}})(\mathbf{Z_1}_i - \overline{\mathbf{Z_1}})]
            + [(\mathbf{X}_i - \overline{\mathbf{X}})(\boldsymbol{\varepsilon}_i - \overline{\boldsymbol{\varepsilon}})] 
        }
        {\sum_{i=1}^{m} (\mathbf{X}_i - \overline{\mathbf{X}})^\boldsymbol{\ 2} }
    \newline \newline
    &= \boldsymbol{\beta_1}
       + \boldsymbol{\hat{\omega}_1}
         \frac
            {\mathrm{Cov}(\mathbf{X}, \mathbf{Z_1})}
            {\mathrm{Var}(\mathbf{X})} 
       + \frac
           {\mathrm{Cov}(\mathbf{X}, \boldsymbol{\varepsilon})}
           {\mathrm{Var}(\mathbf{X})} 
    \newline
    &= \boldsymbol{\beta_1}
       + \frac
           {\mathrm{Cov}(\mathbf{X}, \boldsymbol{\varepsilon})}
           {\mathrm{Var}(\mathbf{X})} 
\end{align}        
$



<br>
If the second term of the right-hand side of the equation (the covariance between $\mathbf{X}$ and $\boldsymbol{\varepsilon}$) is zero, then the OLS estimator is consistent, otherwise it's inconsistent. Let's then examine the enumerator of this second term :

<br>
$
\quad
\begin{align}
    \mathrm{Cov}(\mathbf{X}, \boldsymbol{\varepsilon})
    & = \mathrm{Cov}
        \left [
            \frac
            {
                (\boldsymbol{\alpha_2} + \boldsymbol{\beta_2} \boldsymbol{\alpha_1})
                + (\boldsymbol{\beta_2} \boldsymbol{\omega_1} \mathbf{Z_1} + \boldsymbol{\omega_2}\mathbf{Z_2})
                + (\boldsymbol{\beta_2} \boldsymbol{\varepsilon} + \boldsymbol{\upsilon})
            }
            {(1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2})}
            , \boldsymbol{\varepsilon}
        \right ]
    \newline
    & = 
          0
        + \frac
            {\boldsymbol{\beta_2} \boldsymbol{\omega_1}}
            {(1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2})}
            \mathrm{Cov}(\mathbf{Z_1}, \boldsymbol{\varepsilon})
        + \frac
            {\boldsymbol{\omega_2}}
            {(1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2})}
            \mathrm{Cov}(\mathbf{Z_2}, \boldsymbol{\varepsilon})
        + \frac
            {\boldsymbol{\beta_2}}
            {(1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2})}
            \mathrm{Cov}(\boldsymbol{\varepsilon} , \boldsymbol{\varepsilon})
        + \frac
            {1}
            {(1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2})}
            \mathrm{Cov}(\boldsymbol{\upsilon} , \boldsymbol{\varepsilon})
    \newline
    & = 
        \frac
            {\boldsymbol{\beta_2}}
            {(1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2})}
            \mathrm{Var}(\boldsymbol{\varepsilon})
\end{align}
$


<br>
And the denominator :

$
\quad
\begin{align}
    \mathrm{Var}(\mathbf{X})
    &= 
    \newline
    &= \mathrm{Var}
        \left [
            \frac
                {
                    (\boldsymbol{\alpha_2} + \boldsymbol{\beta_2} \boldsymbol{\alpha_1})
                    + (\boldsymbol{\beta_2} \boldsymbol{\omega_1}\mathbf{Z_1} + \boldsymbol{\omega_2}\mathbf{Z_2})
                    + (\boldsymbol{\beta_2} \boldsymbol{\varepsilon} + \boldsymbol{\upsilon})
                }
                {(1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2})}
        \right ]
    \newline
    &=  \left ( \frac{1}{1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2}} \right )^\boldsymbol{2}
        \mathrm{Var}
        \bigg[
            (\boldsymbol{\alpha_2} + \boldsymbol{\beta_2} \boldsymbol{\alpha_1})
            + (\boldsymbol{\beta_2} \boldsymbol{\omega_1}\mathbf{Z_1} + \boldsymbol{\omega_2}\mathbf{Z_2})
            + (\boldsymbol{\beta_2} \boldsymbol{\varepsilon} + \boldsymbol{\upsilon})
        \bigg]
    \newline \newline
    &= \left ( \frac{1}{1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2}} \right )^\boldsymbol{2}
        \mathrm{Var} [ \ A + B + C \ ]
    \newline
    &=  \left ( \frac{1}{1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2}} \right )^\boldsymbol{2}
        \bigg[
            \mathrm{Var}(A) 
            + \mathrm{Var}(B) 
            + \mathrm{Var}(C) 
            + 2 \ \mathrm{Cov}(A,B) 
            + 2 \ \mathrm{Cov}(B,C) 
            + 2 \ \mathrm{Cov}(A,C)
        \bigg]
    \newline
    &= \left ( \frac{1}{1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2}} \right )^\boldsymbol{2}
        \bigg[
            0 
            + \mathrm{Var}(B)
            + \mathrm{Var}(C) 
            + 0
            + 2 \ \mathrm{Cov}(B,C) 
            + 0
        \bigg]
    \newline \newline
    &=  \left ( \frac{1}{1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2}} \right )^\boldsymbol{2}
        \bigg\{ \ 
            \Big[ \
                  \mathrm{Var}(\boldsymbol{\beta_2} \boldsymbol{\omega_1}\mathbf{Z_1}) 
                + \mathrm{Var}(\boldsymbol{\omega_2}\mathbf{Z_2})
                + 2 \ \mathrm{Cov}(\boldsymbol{\beta_2} \boldsymbol{\omega_1}\mathbf{Z_1}, \boldsymbol{\omega_2}\mathbf{Z_2})
            \ \Big]
            + \Big[ \ 
                  \mathrm{Var}(\boldsymbol{\beta_2} \boldsymbol{\varepsilon})
                + \mathrm{Var}(\boldsymbol{\upsilon})
                + 2 \ \mathrm{Cov}(\boldsymbol{\beta_2} \boldsymbol{\varepsilon}, \boldsymbol{\upsilon})
            \ \Big]
            + 2 \ \Big[ \
                  \boldsymbol{\beta_2} \boldsymbol{\omega_1} \boldsymbol{\beta_2}
                      \mathrm{Cov}(\mathbf{Z_1}, \boldsymbol{\varepsilon})
                + \boldsymbol{\beta_2} \boldsymbol{\omega_1} \mathrm{Cov}(\mathbf{Z_1}, \boldsymbol{\upsilon})
                + \boldsymbol{\omega_2} \boldsymbol{\beta_2} \mathrm{Cov}(\mathbf{Z_2}, \boldsymbol{\varepsilon})
                + \boldsymbol{\omega_2} \mathrm{Cov}(\mathbf{Z_2}, \boldsymbol{\upsilon})
            \ \Big] \   
        \bigg\}
    \newline
    &=  \left ( \frac{1}{1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2}} \right )^\boldsymbol{2}
        \bigg\{
            \Big[ \ 
                  (\boldsymbol{\beta_2} \boldsymbol{\omega_1})^\boldsymbol{2} {\sigma_\mathbf{Z_1}}^\boldsymbol{2}
                + (\boldsymbol{\omega_2})^\boldsymbol{2} {\sigma_\mathbf{Z_2}}^2
            \ \Big]
            + \Big[ \
                  (\boldsymbol{\beta_2})^\boldsymbol{2} {\sigma_{\boldsymbol{\varepsilon}}}^\boldsymbol{2}
                + {\sigma_{\boldsymbol{\upsilon}}}^\boldsymbol{2}
            \ \Big]
        \bigg\}
\end{align}
$


<br>
For the sake of semplicity, from now on we'll ignore the contribution of any exogenous variable involved. Finally, we can put all together and re-write $\boldsymbol{\hat{\beta}_1}$ as :

<br>
$
\quad
\begin{align}
    \boldsymbol{\hat{\beta}_1}
    &= \boldsymbol{\beta_1}
       + d\frac
           {\mathrm{Cov}(\mathbf{X}, \boldsymbol{\varepsilon})}
           {\mathrm{Var}(\mathbf{X})} 
    \newline
    &=  \boldsymbol{\beta_1}
        + d\frac
            {\boldsymbol{\beta_2}}
            {(1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2})}
            {\sigma_{\boldsymbol{\varepsilon}}}^\boldsymbol{\ 2}
        d\frac
            {(1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2})^\boldsymbol{\ 2} }
            {
                (\boldsymbol{\beta_2})^2 {\sigma_{\boldsymbol{\varepsilon}}}^\boldsymbol{\ 2}
                + {\sigma_{\boldsymbol{\upsilon}}}^\boldsymbol{\ 2}
            }
    \newline
    &=    \boldsymbol{\beta_1}
        + \dfrac
            {\boldsymbol{\beta_2} (1 - \boldsymbol{\beta_1} \boldsymbol{\beta_2}) {\sigma_{\boldsymbol{\varepsilon}}}^2 }
            {
                (\boldsymbol{\beta_2})^\boldsymbol{\ 2} {\sigma_{\boldsymbol{\varepsilon}}}^\boldsymbol{\ 2}
                + {\sigma_{\boldsymbol{\upsilon}}}^\boldsymbol{\ 2}
            }
\end{align}    
$

<br>
We can see that the second term in this last equation is not zero (inconsistency), unless $\boldsymbol{\beta_2} = 0$ which means that $\mathbf{X}$ and $\mathbf{Y}$ were not jointly correlated in the first place.


## Consequences

<br>
If the independent variable is correlated with the error term in a regression model then the estimate of the regression coefficient in an ordinary least squares (OLS) regression is biased; however if the correlation is not contemporaneous, then the coefficient estimate may still be consistent. 

<br>
When dealing with simultaneity, the OLS estimator is both biased and inconsistent.


## Detection TODO

<br>

## Correction TODO

<br>
There are many methods of correcting the bias, including instrumental variable regression and Heckman selection correction.


## References - TODO

<br>
<ul style="list-style-type:square">
    <li>
         ...
    </li>
    <br>
    <li>
        ...
    </li>
</ul>
