# __Unit Roots, Cointegration, and Error-Correction Models (Julia)__

<br>

Finance 5330: Financial Econometrics <br>
Tyler J. Brough <br>
Last Update: April 15, 2021 <br>
<br>

In [None]:
using DataFrames
using GLM
using HypothesisTests
using StatsPlots

## __Unit Roots and Stationarity__

## The Random Walk with Drift

<br>

A simple starting model for efficient log-prices of assets is the ___Random Walk with Drift___ model:

<br>

$$
y_{t} = \mu + y_{t-1} + \epsilon_{t}, \quad \quad \epsilon \sim N(0, \sigma_{\epsilon}^{2})
$$

<br>

with something like $y_{t} = \ln{(p_{t})}$ with $p_{t}$ a transaction price observed in some market. The expected value of this process is: 

<br>

$$
E(y_{t}) = \mu + y_{t-1}
$$

<br>

To get the variance it is helpful to solve recursively as follows, assuming $y_{0} = 0$ for simplicity:

<br>

$$
y_{t} = t \mu + \sum\limits_{i=0}^{t} \epsilon_{t-i}
$$

<br>

We can now state the variance of the process as:

<br>

$$
Var(y_{t}) = \sum\limits_{i=0}^{t} Var(\epsilon_{t-i}) = Var(\epsilon_{t}) + Var(\epsilon_{t-1}) + \cdots + Var(\epsilon_{0}) = t \sigma_{\epsilon}^{2}
$$

<br>

From this it is easy to see that this is an explosive process as the variance is proportional to time. This translates to the value of the process at any time $t$ being unpredictable based on the information known up to that time ($I_{t-1}$). This is a good starting place for a model of informationally efficient prices as the definition of such is one that incorporates all available information up to that point in time. 

<br>

__NB:__ Samuelson's paper: [Proof That Properly Anticipated Prices Fluctuate Randomly](https://www.gyc.com.sg/files/p_samuelson-proof.pdf)

<br>

We can simulate this process as follows (setting $\mu = 0$ for convenience):

<br>

In [None]:
y = cumsum(randn(52*5))
StatsPlots.plot(y, color="orange", lw = 2.5, grid = true)

### __Weak Stationarity__

<br>

In the time series literature a process is known as [weakly stationary](https://en.wikipedia.org/wiki/Stationary_process#Weak_or_wide-sense_stationarity) if the mean and autocovariance are not time varying. 

<br>

A few notes:

* Clearly the random walk process is __NOT__ weakly stationary.
* A weakly stationary time series process will exhibit mean reversion
* Mean reversion in a time series can be such that there is some predictibility in the process

<br>

It makes sense that informationally efficient prices should behave as a random walk model (with possible other extensions).


<br>
<br>

The random walk model is a special case of the AR(1) model:

<br>

$$
y_{t} = \mu + \phi y_{t-1} + \epsilon_{t}
$$

<br>

It won't be shown here, but a technical requirement for the AR(1) model to be weakly stationary is $|\phi| < 1$. For the random walk model $\phi = 1$, thus the alternative name ___unit root___.

<br>
<br>

### __Some Notation__

<br>

A random walk model is also known as a unit-root non-stationary process. In the literature this is often denoted as $y_{t} \sim I(1)$. We state this as: _"the process_ $y_{t}$ _is_ ___integrated of order one___."

<br>
<br>

We can tranform and $I(1)$ process to a stationary process by ___first differencing___ the process like so:

<br>

$$
\begin{aligned}
y_{t} - y_{t-1} &= y_{t-1} - y_{t-1} + \epsilon_{t} \\
y_{t} - y_{t-1} &= \epsilon_{t} \\
\Delta y_{t}    &= \epsilon_{t}
\end{aligned}
$$

<br>
<br>

In this case we can denote that $\Delta y_{t}$ is now weakly stationary with the notation $\Delta y_{t} \sim I(0)$, i.e. "$\Delta y_{t}$ _is_ ___integrated of order zero.___"

<br>

## __Spurious Regression__

<br>

It is important to understand the properties of unit root processes, because they can be problematic to work with in applying econometrics to finance. 

<br>

For example, there is a well-known problem of ___spurious regression___ when one unit root process is regressed on an independent unit root process:

<br>

$$
y_{t} = \alpha + \beta x_{t} + u_{t}, \quad u_{t} \sim N(0, \sigma_{u}^{2})
$$

<br>

This regression is not valid because the homoscedasticity assumption of the error term is violated (recall that $Var(y_{t}) = t\sigma_{\epsilon}^{2}$)

<br>
<br>

This can be easily demonstrated by a simple Monte Carlo study as follows:

<br>

In [None]:
M = 10_000
N = 52 * 5
β̂ = zeros(M)
R² = zeros(M);
tstats = zeros(M);

In [None]:
## Monte Carlo simulation loop
for i in 1:M
    y = cumsum(randn(N))
    x = cumsum(randn(N))
    data = DataFrame(Y=y, X=x)
    reg = GLM.lm(@formula(Y ~ X), data);
    β̂[i] = coef(reg)[2]
    R²[i] = r2(reg)
    tstats[i] = coef(reg)[2] / stderror(reg)[2]
end

### __The Histogram $\hat{\beta}$__

<br>


Let's first take a look at the histogram just for the $\hat{\beta}$'s. 

<br>
<br>

In [None]:
StatsPlots.histogram(β̂, bins=100)

<br>

The distribution is centered at zero, but it has pretty thick tails! Way thicker than they should be. We should basically see something that looks like a point distribution at zero!

<br>

### __The Histogram for the $T$-Statistic__

<br>

For each repitition of the simulation we test the following hypothesis test:

<br>

$$
\begin{aligned}
H_{0}: & \beta = 0   \\
       &             \\
H_{a}: & \beta \ne 0
\end{aligned}
$$

<br>

where the test statistic is the $t$-statistic:

<br>

$$
t = \frac{\hat{\beta}}{se(\hat{\beta})}
$$

<br>

Recall that the critical value is about $2$-ish for a $99\%$ confidence level. So one quick check will be to see how many of the simulations incorrectly reject the null hypothesis. 

<br>

__NB:__ recall that we need to look at the absolute value of the $t$-statistics to account for the two-tailed nature of the test.

<br>

In [None]:
proportion = sum((abs.(tstats) .> 2.0)) / M

Whoa! A ___whopping___ $85\%$ of the repetitions falsley reject the null hypothesis. Clearly this is a problem.

<br>

Let's see the histogram.

<br>

In [None]:
StatsPlots.histogram(tstats, bins=100)

<br>

__Those are some awefully thick tails!!!__

<br>

### __The Histogram for $R^{2}$__

In [None]:
StatsPlots.histogram(R², bins=100)

<br>

We can also see from histogram of the $R^{2}$ that there are some extremely high values even though we know that the processes are independent!

<br>

<br>

## __Testing for Unit Roots__

<br>

### __The Dickey-Fuller Test for Unit Roots__

<br>

We can test for unit roots in a time series process with the so-called ___Dickey-Fuller Test___, named for the statisticians who invented it. 

<br>

It would seem natural to test the following hypothesis: 

<br>

$$
\begin{aligned}
H_{0}: \phi = 1 \\
H_{a}: \phi \ne 1
\end{aligned}
$$

for the regression: $y_{t} = \phi y_{t-1} + \epsilon_{t}$, but because the model under the null hypothesis leads to spurious regression we cannot conduct this direct test. 

<br>

D-F had the bright idea to transform the model to render it amenable to such testing. We start by subtracting $y_{t-1}$ from both sides:

<br>

$$
\begin{aligned}
y_{t} - y_{t-1} &= \phi y_{t-1} - y_{t-1} + \epsilon_{t} \\
\Delta y_{t}    &= (\phi - 1) y_{t-1} + \epsilon_{t} \\
\Delta y_{t}    &= \theta y_{t-1} + \epsilon_{t}
\end{aligned}
$$

<br>

where $\theta = \phi - 1$. With this transformation we can now conduct the test:

<br>

$$
\begin{aligned}
H_{0}: \theta = 0 \\
H_{a}: \theta \ne 0
\end{aligned}
$$

<br>

Because $\Delta y_{t} \sim I(0)$ and when $\phi = 1$ it means that $\theta = 0$ this model is now valid under the null hypothesis. We can form the standard $t$-ratio as our test statistic, but D-F showed that the asymptotic sampling distribution of $t$ is no longer the Standard Normal distribution. Instead they provide critical values via a Monte Carlo method. 

<br>

### __The Augmented Dickey-Fuller Test__

<br>

D-F added one extension to the test to account for possible serial correlation in $\Delta y_{t}$. The model under the null hypothesis now becomes:

<br>

$$
\Delta y_{t} = \theta y_{t-1} + \sum\limits_{i=1}^{p} \gamma_{i} \Delta y_{t-i} + \epsilon_{t}
$$

<br>

This tends to make the test more robust to short-term serial correlations in the process.

<br>

We can use the `HypothesisTests` package to conduct the ADF test as follows. See [ADFTest](https://juliastats.org/HypothesisTests.jl/stable/time_series/#Augmented-Dickey-Fuller-test-1) for more details.

<br>

In [None]:
?ADFTest

#### A Simple `ADFTest` Example

Let's do a simple test. We will simulate a pure random walk process. The ADF test should fail to reject the null hypothesis.

<br>

__NB:__ Recall that the null hypothesis is $H_{0}: \theta = 0$, where $\theta = (\phi - 1.0)$ which will be zero when $\phi = 1.0$ (which is the definition of a unit root).

In [None]:
y = cumsum(randn(52 * 5))
test = ADFTest(y, :none, 1)

<br>

Not surprisingly, we __fail to reject__!

<br>

In [None]:
## You can get the $p$-value with the `pvalue` method
pvalue(test)

<br>

#### __Recapitulation__

With the ADF test we now have a reliable way to detect the presence of unit-root non-stationarity. 

<br>

## __Cointegration and Error-Correction Model Forms__

### __Cointegration__

<br>

Fortunately, there is an upside to unit-root non-stationarity for financial modeling. It turns out that there is a relationship that is even stronger when pairs of asset prices are $I(1)$, but move together in a way. This concept is ___cointegration___. 

<br>

If there is a linear combination of two processes that are separately $I(1)$ that is itself $I(0)$, we say that the two processes are ___cointegrated___. Cointegration is a stronger concept than mere correlation. It has a causal explanation. I often like to say that (at least in financial applications) ___cointegration is the statistical footprint of an arbitrage relationship.___

<br>

We can test for cointegration using the ADF test developed above. To begin, set up and run the following regression:

<br>

$$
y_{t} = \alpha + \beta x_{t} + u_{t}
$$

<br>

If the variables are cointegrated this is not a spurious regression. In fact, it has the property of superconsistency (a kind of uber statistical efficiency). 

<br>

Once the model is estimated, we can form the fitted residuals:

<br>

$$
\begin{aligned}
\hat{u_{t}} &= y_{t} - \hat{\alpha} - \hat{\beta} x_{t} \\
\hat{u_t{}} &= y_{t} - \hat{y_{t}}
\end{aligned}
$$

<br>

We can now submit these fitted residuals to the ADF test as above. 

<br>

__NB:__ the null hypothesis of the ADF test is that there ___is___ a unit-root. Cointegration exists between $y_{t}$ and $x_{t}$ if there ___is not___ a unit-root in $\hat{u_{t}}$. So we conclude that there is cointegration if we reject the null hypothesis of the ADF test. 

<br>

We can simulate this as follows.


<br>
<br>

In [None]:
N = 52 * 5
x = cumsum(randn(N))
u = randn(N)
y = 0.22 .+ 0.75 .* x .+ u;

In [None]:
StatsPlots.plot(y, color="orange", lw=2.0, grid=true, title="Simulated Cointegration")
StatsPlots.plot!(x, color="purple", lw=2.0)

In [None]:
data = DataFrame(Y=y, X=x)
reg = GLM.lm(@formula(Y ~ X), data)

In [None]:
ϵ̂ = residuals(reg)
StatsPlots.plot(ϵ̂, color="blue", lw=2.0, grid=true)

In [None]:
## Conduct the ADFTest
test = ADFTest(ϵ̂, :none, 1)

<br>

__NB: We strongly reject the null hypothes! (meaning that we accept that these variables are cointegrated)__

<br>

In [None]:
pvalue(test)

## __Error-Correction Models__

<br>

Whenever two (or more) asset prices are cointegrated, we can also write down an error-correction model. That is, cointegration implies an error-correction form. 

<br>

We state the simplest form of the error-correction model as follows:

<br>

$$
\begin{aligned}
\Delta y_{t} &= \lambda (y_{t-1} - \alpha - \beta x_{t-1}) + \nu_{t} \\
             &  \\
\Delta y_{t} &= \lambda (z_{t-1}) + \nu_{t}
\end{aligned}
$$

with $z_{t} = \hat{u}_{t} = (y_{t-1} - \alpha - \beta x_{t-1})$.

<br>

This model form relates the changes in $y_{t}$, that is $\Delta y_{t}$ to the ___spread___ between $y_{t-1}$ and $x_{t-1}$ in ___levels___. This is a valid time series regression, given that $\Delta y_{t} \sim I(0)$ via first differencing, and $z_{t-1} \sim I(0)$ via cointegration. 

<br>

Let's see if we can develop some intuition for this model. Let's start by interpreting the coefficient $\boldsymbol{\lambda}$, which we call the ___error-correction coefficient___. Its value will be such that when there is a large past deviation between $y_{t-1}$ and $x_{t-1}$ (i.e. a large error) it will cause an ___error correction___ in the change in $y_{t}$, or $\Delta y_{t}$. In other words, $\Delta y_{t}$ will adjust based on a lagged error in the spread. There is now a stationary relationship (i.e. mean-reverting) that exists, and can even be predicted. It's easy to see now why we call this an error-correction model, and also its relationship with cointegration.

<br>

* __Q:__ What causes the error-correction in $\Delta y_{t}$? 
* __A:__ in financial markets between related asset prices that are cointegrated, the answer is ___arbitrage___!

<br>

We can now think about the error-correction model in terms of some kind of equilibrium concept. When dynamic market forces are such that related asset prices are temporarily driven apart, an arbitrage relationship between the asset prices acts to restore the spread between the two to a long-run equilibrium level. 

<br>

* __Q:__ what kind of equilibrium concept fits this description? 
* __Q:__ is it a static neo-classical equilibrium? 
* __Q:__ is it more like the neo-Austrian type of equilibrium that has been mentioned in this class?

<br>
<br>

### __A More General Error-Correction Model__

<br>

We can also account for possible short-run variation in $\Delta y_{t}$ by adding lagged terms on the right-hand side of the model as follows (as well as a drift term):

$$
\Delta y_{t} = \mu + \sum\limits_{j=1}^{p} \delta_{j} \Delta y_{t-j} + \lambda z_{t-1} + \nu_{t}
$$

<br>
<br>


### __A Vector Error-Correction Model__

Now we can think in terms of systems of equations, and think about a multivariate relationship between $y_{t}$ and $x_{t}$ called a ___vector error-correction model___ (vecm).

<br>

Here is a VECM(1) model in $y_{t}$ and $x_{t}$:

<br>

$$
\begin{aligned}
\Delta y_{t} &= \mu_{1} + \delta_{1} \Delta y_{t-1} + \gamma_{1} \Delta x_{t-1} + \lambda_{1} z_{t-1} + \nu_{1,t} \\
& \\
\Delta x_{t} &= \mu_{1} + \delta_{2} \Delta y_{t-1} + \gamma_{2} \Delta x_{t-1} + \lambda_{2} z_{t-1} + \nu_{2,t}
\end{aligned}
$$

<br>
<br>
<br>