# Notes on Co-integration, Error Correction, and the Econometric Analysis of Non-Stationary Data
by Anindya Banerjee, Juan Dolado, J. W. Galbraith, David Hendry (z-lib.org)

## Chapter 1

An equilibrium state is defined as one in which there is no inherent tendency to change.  Equilibria are states to which the sistem is atracted, other things equal.

"Long-run equilibrium" is also used to denote the equilibrium relationship to which a system converges over time.

The present is the long-run outcome of the distant past and, a long-run relationship will often hold 'on average' over time.

We say that an equilibrium relationship $f(x_1, x_2)=0$ hold between two variable, if the amount $\epsilon_t = f(x_{1t}, x_{2t})$ by which actual observations deviate from this equilibrium is a median-zero **stationary process**.
The short run $\epsilon_t$ in an equilibrium relationship must have no tendency to grow systematically over time.

This concept of statistical equilibrium is useful in examining equilibrium relationships between vairables tending to grow over time.  If the actual relationship is $x_1 = \beta x_2$ the discropancy $x_{1t} - b x_{2t}$ will be nonstationary for $b \neq \beta$.  Only the true relationship can yield a stationary discrepancy.

If there exists a stable equilibrium $x_1 = \beta x_2$, the discrepancy ${x_{1t} - \beta x_{2t}}$ contains useful information since on average the system will move towards that equilibrium.

In particular ${x_{1t-1} - \beta x_{2t-1}}$ represent the previous disequilibrium. It is called *error-correction mechanism* and is included in dynamic regressions. If ${x_{1t-1} - \beta x_{2t-1}}$ is positive, $x_{1t-1}$ is to high and on avergae we might expect a fall in $x_1$ future periods.

The practice of exploiting information contained in the current deviation from an equilibrium relationship, in explaining the path of a variable, has benefited from the formalization of the concept of co-integration by Granger (1981) and Engle and Granger (1987).

A series is said to be integrated of order 1 $(I(1))$ if, although it is itself non-stationary, the changes in this series form a stationary series. A stationary series is denoted as $(I(0))$ 

The definition of co-integration does require stationarity of the deviation ${x_{1t} - \beta x_{2t}}$

A linear relation yields a stationary deviation.

An *integrated process* is one that can be made stationary by differencing. A discrete process integrated of orde $d$ must be differenced $d$ times to reach stationarity.

If $x_t$ is stationary then so is $\Delta x_t$

## Chapter 2

**Error-correction model (ECM)**. Error-correction terms were used as a way of capturing adjustments in a dependent variable which depended not on the level of some explanatory variable, but on the extent to which an explanatory variable deviated from an equilibrium relationship with the dependent variable.

When the equilibrium relationship is of the form $y^* = \theta x^*$, then an error-correction term is one such as ($y_t - \theta x_t$).

The error-correction mechanism will be of particular value where the extent of an adjustment to a deviation from equilibrium is especially interesting.

## Chapter 3

One example of the problems that can arise when performing regression with clearly non-stationary series is the problem of *nonsense regression*, or *spurious regression*

The standar proof of the consistency of ordinary least squeares regression uses the assumption that with increasing sample information, the sample moments of the data settle down to treir population values. In order to have fixed population moments to which these sample moments converge, the data must be stationey.

If we have uncorrelate random walks $x_t$ and $y_t$. that is, neither $x_t$ affects nor is affected by $y_t$, one would expect that the coefficient $\beta_1$ in {$y_t = \beta_0 + \beta_1 X_t + \epsilon_t$} would converge to zero and the coefficient of determination ($R^2$) would also tend to zero.  However, this is not the case. If two time series are each growing, they may be correlated even though they are increasin for entirely different reasons and by increments that are uncorrelated.

Tests based on badly specified models can often be misleading.

Although $t-$ and $F-$ statistics fro the null hypotesisof interest are grossly misleading, some information which would suggest that the regression is misspecified is provided by a test for residual autocorrelation.

If $x_t$ and $y_t$ where made stationary, the OLS-estimated regression coefficient $\hat{\beta_1}$ would converge to 0.

**Spurios regression Problem**: Regression of an integrated series on another unrelated integrated series produces t-ratios on th slope parameter which indicate a relationship much more often than they should at the nominal test level.

The Durbin-Watson statistic calculated from the residual of {$y_t = \beta_0 + \beta_1 X_t + \epsilon_t$} converges to zero as the sample size tends to infinity.  When the two series are genuinely related, the DW statistic converges to a non-zero value. DW statistic provides one way of sidciminating between spurious and genuine regression for large samples.

Laws of large numbers guarantee the convergence in probibility of the sample mean to the true mean of the process fro a class of processes that includes stationary time series, but one of the primary facts about integrated processes is that convergence theorems of this type, chere convergence is to constants, generally fail to hold.

Analitical results concerning limiting distributions must therefore be based on an extended asymptotic theory.

## Chapter 4

Since an $I(1)$ series becomes stationary upon being differenced once, it must contain one unit root.

A random walk first difference is stationary. By contrast, the underlying data-generating process
$$y_t  = \rho_1 y_{t-1} + u_{1t} \;\; \text{where} \;\; \vert\rho_1\vert > 1 \;\; \text{then we have,}$$
$$y_t - y_{t-1} = \Delta y_t = (\rho_1 - 1)y_{t-1} + u_{1t}$$
$\Delta y_t$, is no longer stationary: it depends not only upon the stationary process $u_{1t}$, but also upon the non-stationary process $y_{t-i}$ (since $\rho_1 - 1 > 0$). Hence an AR(1) process with a coefficient of 1 is $I(1)$. *But the same process with a coefficient of 1.01 is not,
since differencing will not reduce this process to stationarity.*

**Testing for unit root**

Consider the simplest data-generation process
$$y_t  = \rho y_{t-1} + u_{t} \; \text{;} \;\; u_t \sim \text{IID}(0,\sigma_u^2) \; \text{;} \;\; y_0 = 0$$

If one were testing the true hypothesis $H_0:\rho = \rho_0$ for $\rho_0 < 1$, the test would be easily performed. Running the regression, the t-statistic $(\hat{\rho} - \rho_0)/\text{SE}(\hat{\rho})$ has, asymptotically, a standard normal distribution and can be compared with tables of significance points for $N(0,1)$. In small samples the statistic is approximately t-distributed, although the coefficient estimate $\hat{\rho}$ is biased downward slightly. 

For $\rho_o = 1$, however, this result no longer holds.