# Multivariate Regression

## Vector Autoregression - VAR(p)

Also referred to as VAR(p) where p is the lag order, the is simply a multivariate regression with past values (lags) as inputs:

$$
Y_t = C + A_1 Y_{t-1} + \dots + A_{t-p}Y_{t-p} + \epsilon_t
$$

$Y_t$ is a column vector (the time series data) and $A_p$ is an $n \times n$ matrix of coefficients. Each of the $Y_t$ vector elements is estimated using the usual ordinary least squares (OLS) method. Computationally, this can be done in one go for all elements using matrix manipulation [ref].

* Construct special matrices - the dependent ($Y$), explanatory ($Z$),  residuals ($\epsilon$) and coefficient ($B$) matrices for $T = N_{obs}$:

$$
Y = BZ + \epsilon
$$

* Generalising the OLS method to the multivariate case (MLE of the multivariate Normal Log Likelihood function) results in:

$$
\begin{align}
\hat{B} = Y Z^\prime (Z Z^\prime)^{-1} \\
\hat{\epsilon} = Y - \hat{B} Z
\end{align}
$$

* The residual covariance matrix is also generalised to:
$$
\hat{\Sigma} = \frac{1}{T} \sum^T_{t=1} \hat{\epsilon} \hat{\epsilon}^\prime
$$


Example code on how to contruct the above can be found in script XXXX.py in the Appendix.

### Optimal Lag Selection

Optimal Lag $p$ is determined by the lowest values of AIC or BIC statistics constructed using penalised likelihood (see [statstools](http://statsmodels.sourceforge.net/stable/vector_ar.html#lag-order-selection)):
* Akaike Information Criterion (AIC):
$$
AIC = \log | \hat{\Sigma} | + \frac{2 k ^\prime}{T}
$$
where $k^\prime = n \times (n\times p+1)$ is the total number of variables in VAR(p)
* Bayesian Information Criterion (also Schwarz Criterion):
$$
BIC = \log | \hat{\Sigma} | + \frac{k ^\prime}{T} \log(T)
$$

### Stability Condition

It requires for the eigenvalues of each relationship matrix $A_p$ to be
inside the unit circle ($ < 1$). The VAR system satisfies stability condition if $| \lambda I - A | = 0$. If $p>1$, coefficient for each lag $A_p$ si to be checked separately.

# Cointegration

## Engle-Granger Procedure

When two series are cointegrated, a linear combination exists so to cancel the common stochastic process and produce a starionary spread $e_t$. The parameters of this linear combination may be estimated using linear (multivariate) regression:
$$
\hat{e}_t = y_t - \hat{b} x_t - \hat{a}
$$

We can then test $\hat{e}_t$ for unit root using ADF or similar to confirm thes spread is stationary.

## Augmented Dickey-Fuller (ADF) Test

This is an improved version of the Dickey-Fuller test by using lagged differences $\Delta y_{t-k}$:

$$
\Delta y_t = \phi y_{t-1} + \sum^p_{k=1} \phi_i \Delta y_{t-k} + \epsilon_t
$$

Insignificant $\phi$ means unit root for series $y_t$, i.e. $\phi = \beta -1 \approx 0 \to \beta \approx 1$.

The critical value is taken from the empirically tabulated Dickey-Fuller distribution.

### ECM

In addition, $\hat{e}_t$ can be used to estimate the speed of mean-reversion and significance of stationarity $\phi$ via the regression:
$$
\Delta y_t = \phi \Delta x_t - (1-\alpha) \hat{e}_{t-1}
$$

## Assessing quality of mean-reversion in cointegrated coefficient

## Determination of lead variable (Granger causality)

## Johansen procedure

MLE for multivariate cointegration on asset price data (levels, not returns)

# Trading strategies

* trading around the spread $e_t$ where estimated weights $\beta_i$ represent the position to take on each stock
* optimised bounds give entre/exit signals, e.g. $\mu_e \pm 1\sigma_e$. $\sigma_e$ can be obtained from fitting to OU process or using optimisation 
* speed of mean-reversion in the spread $\theta$ gives idea of profitability over time and can be converted to half-life (expected position holding time) as $\tilde{\tau} \propto ln 2 / \theta$
* the dollar MtM P&L $\Delta e_t = e_t - e_{t-1} $ is independent of its mean $\mu_e$


## Fitting to the OU process

The OU process is represented by the SDE:
$$
dY_t = -\theta(Y_t-\mu)dt + \sigma dX_t
$$
where $\theta$ is the speed of mean-reversion, $\mu$ is the equilibrium level and $\sigma$ the diffusion. This has an analytical solution which consists of mean-reverting and autoregressive terms. In terms of the cointegrated spread this can be written as:

$$
e_t = C + B e_{t-1} + \epsilon_t
$$

where $\theta = -\ln B/ \tau$, $\tau$ being the data frequency e.g. $1/252$ for daily data, and $\mu_e = C / (1-B)$.

Commonly trading bounds are defined as $\mu_e \pm \sigma_{eq}$ where $\sigma_{eq}$ is a scaled version of $\sigma_{OU}$:
$$
\sigma_{eq} \approx \sigma_{OU}/\sqrt{2 \theta}
$$

with 

$$
\sigma_{OU} = \sqrt{\frac{2 \theta}{1-e^{-2 \theta \tau}} Var[e_t]}
$$