# Intro

* price levels non-stationary
* can't use correlation unless on returns or diffs
* equilibrium (spread) can also be non-linear --> collinearity
* coint vector $\beta_{coint}$



# Multivariate Regression 

Also known as 'generalised linear model', it is a regression on multiple predictor variables (regressors). It  is best expressed in matrix form:

$$
Y = X \beta + \epsilon
$$

where $\beta$ is the coefficients vector and $\epsilon$ the residuals vector. 

The OLS method (which minimizes the sum of squared residuals) can be used to estimate the parameters:

$$
\begin{align}
\hat{\beta} = (X^\prime X)^{-1} X^\prime Y \\
\hat{\epsilon} = Y - X \hat{\beta}
\end{align}
$$


We want to model a $T \times K$  time series $Y$, where $T$ denotes the number of observations and $K$ the number of variables.


* Construct special matrices - the dependent ($Y$), explanatory ($Z$),  residuals ($\epsilon$) and coefficient ($B$) matrices ($K \times K$):

$$
Y = BZ + \epsilon
$$

* Generalising the OLS method to the multivariate case (MLE of the multivariate Normal Log Likelihood function) results in:

$$
\begin{align}
\hat{B} = Y Z^\prime (Z Z^\prime)^{-1} \\
\hat{\epsilon} = Y - \hat{B} Z
\end{align}
$$

* The residual covariance matrix is also generalised to:
$$
\hat{\Sigma} = \frac{1}{T} \sum^T_{t=1} \hat{\epsilon} \hat{\epsilon}^\prime
$$

The main application of the multivariate regression is for Vector Autoregression  (VAR(p)) to model returns which are stationary and Error Correction Model (ECM) to model prices which aren't stationary. Below the two are summarised.

## Vector Autoregression (VAR(p))

Also referred to as VAR(p) where p is the lag order, the is simply a multivariate regression on a time series and its lagged (past) values:

$$
\begin{align}
Y_t = C + \beta_1 Y_{t-1} + \dots + \beta_{p}Y_{t-p} + \epsilon_t \\
\epsilon_t \sim Normal(0, \Sigma_\epsilon)
\end{align}
$$

$Y_t$ is an $n \times 1$ column vector and $\beta_p$ is an $n \times n$ matrix of coefficients. This VAR system can be estimated in one go by using OLS with special matrix formulation.

Each of the $Y_t$ vector elements can be estimated using the ordinary least squares (OLS) method. Computationally, this can be done in one go for all elements using matrix manipulation. Example code on how to contruct the above can be found in script XXXX.py in the Appendix.

Note that VAR(p) models assume the lagged series are stationary. Non-stationary or trending data can often be transformed to stationary by first-differencing or some other method. For direct analysis of non-stationary series this isn't appropriate and instead ECM described below should be used.


### Optimal Lag $p$

This is determined by the lowest values of an information criterion such as:
* Akaike Information Criterion (AIC):
$$
AIC = \log | \hat{\Sigma} | + \frac{2 k ^\prime}{T}
$$
where $k^\prime = n \times (n\times p+1)$ is the total number of variables in VAR(p)
* Bayesian Information Criterion (also Schwarz Criterion):
$$
BIC = \log | \hat{\Sigma} | + \frac{k ^\prime}{T} \log(T)
$$

See more at [statstools](http://statsmodels.sourceforge.net/stable/vector_ar.html#lag-order-selection))

### Stability Condition

It requires for the eigenvalues of each relationship matrix $\beta_p$ to be
inside the unit circle ($ < 1$):

$$
| \lambda I - \beta | = 0
$$

[comment]: <> (The VAR system satisfies stability condition if $| \lambda I - \beta | = 0$. If $p>1$, coefficient for each lag $A_p$ si to be checked separately.)

## Error Correction Model (ECM)

The familiar linear regression:

$$
y_t = a + b x_t
$$

is only suitable to model _stationary_ variables. In general stock prices aren't stationary but their differences (returns) are. In this case we can go from the above model to one for returns:

$$
\Delta y_t = \beta_1 \Delta x_t - (1-\alpha) e_{t-1}
$$

This is known as the Error Correction Model (ECM)

In addition, $\hat{e}_t$ can be used to estimate the speed of mean-reversion and significance of stationarity $\phi$ via the regression:
$$
\Delta y_t = \phi \Delta x_t - (1-\alpha) \hat{e}_{t-1}
$$

## Other TS stuff

* statstools: descriptive stats for ts - autocorrelation and partial autocorrelation function, periodogram

# Cointegration

## Engle-Granger Procedure

When two series are cointegrated, a linear combination exists with weights $\beta_c^\prime$ that cancels the common stochastic process and produces a starionary spread $e_t = \beta_c^\prime Y_t$. The parameters of this linear combination may be estimated using linear (multivariate) regression:
$$
\hat{e}_t = y_t - \hat{b} x_t - \hat{a}
$$

We can then test $\hat{e}_t$ for unit root using ADF or similar to confirm thes spread is stationary.

## Augmented Dickey-Fuller Test

The Augmented Dickey-Fuller (ADF) test can be used to see if a time series is stationary. The following model is used:
$$
\Delta Y_t = \phi Y_{t-1} + \epsilon_t
$$

If $\phi$ is found not significant, the time series has a unit root:
$$
\phi = \beta - 1 = 0 \implies \beta = 1 \implies \Delta Y_t = \epsilon_t
$$

To increase the  significance, we can 'augment' the test by including  lags into the model $\phi_k \Delta Y_{t-k}$  or time-dependence $\phi_t t$. 


A test statistic is constructed and compared to the Dickey-Fuller distribution critical values. The null hypothesis can be rejected or not based on this. 

----

In the Dickey-Fuller test the null hypothesis is that the time series has unit root. The test regression is:

$$
\Delta Y_t = C + \phi Y_{t-1} + \phi_1 \Delta Y_{t-1} 
$$

and if $\phi$ is not significant the series has unit root. 

This is an improved version of the Dickey-Fuller test by using lagged differences $\Delta y_{t-k}$:

$$
\Delta Y_t = \phi Y_{t-1} + \sum^p_{k=1} \phi_i \Delta Y_{t-k} + \epsilon_t
$$

Insignificant $\phi$ means unit root for series $y_t$, i.e. $\phi = \beta -1 \approx 0 \to \beta \approx 1$.

The critical value is taken from the empirically tabulated Dickey-Fuller distribution.

## Assessing quality of mean-reversion in cointegrated coefficient

## Granger causality (determination of lead variable)

* See statstools

## Johansen procedure

MLE for multivariate cointegration on asset price data (levels, not returns)

# Trading strategies

* trading around the spread $e_t$ where estimated weights $\beta_i$ represent the position to take on each stock
* optimised bounds give entre/exit signals, e.g. $\mu_e \pm 1\sigma_e$. $\sigma_e$ can be obtained from fitting to OU process or using optimisation 
* speed of mean-reversion in the spread $\theta$ gives idea of profitability over time and can be converted to half-life (expected position holding time) as $\tilde{\tau} \propto ln 2 / \theta$
* the dollar MtM P&L $\Delta e_t = e_t - e_{t-1} $ is independent of its mean $\mu_e$


## Fitting to the OU process

The OU process is represented by the SDE:
$$
dY_t = -\theta(Y_t-\mu)dt + \sigma dX_t
$$
where $\theta$ is the speed of mean-reversion, $\mu$ is the equilibrium level and $\sigma$ the diffusion. This has an analytical solution which consists of mean-reverting and autoregressive terms. In terms of the cointegrated spread this can be written as:

$$
e_t = C + B e_{t-1} + \epsilon_t
$$

where $\theta = -\ln B/ \tau$, $\tau$ being the data frequency e.g. $1/252$ for daily data, and $\mu_e = C / (1-B)$.

Commonly trading bounds are defined as $\mu_e \pm \sigma_{eq}$ where $\sigma_{eq}$ is a scaled version of $\sigma_{OU}$:
$$
\sigma_{eq} \approx \sigma_{OU}/\sqrt{2 \theta}
$$

with 

$$
\sigma_{OU} = \sqrt{\frac{2 \theta}{1-e^{-2 \theta \tau}} Var[e_t]}
$$