# Intro

# Multivariate Regression 

Also known as 'generalised linear model', it generalises linear regression to multiple predictor variables (regressors) and $N_{obs}$ observations. It  is best expressed in matrix form as:

$$
Y = X \beta + \epsilon
$$

where $Y$ is a vector representing the endogenous (dependent) variable, $X$ is matrix representing the exogenous (independent) variables, $\beta$ is the coefficients vector and $\epsilon$ the residuals vector. 

The OLS method, which minimizes the sum of squared residuals via the Maximum Likelihood Estimation method (MLE), is used to estimate the parameters:

$$
\begin{align}
\hat{\beta} = (X^\prime X)^{-1} X^\prime Y \\
\hat{\epsilon} = Y - X \hat{\beta}
\end{align}
$$

The are several assumptions about the nature of the underlying variables for this model to hold. These will not be outlined here except that they should be stationary and the residuals homoscedastic (have a finite variance) and normally distributed.

The main applications of the multivariate regression are:
* Vector Autoregression - also known as VAR(p) to test for stationarity (as in the ADF test) and model stationary series like returns
* Error Correction Model - also known as ECM to model prices which aren't stationary

These are described in more detail below

## Vector Autoregression - VAR(p)

Also referred to as VAR(p) where $p$ is the lag order, the is simply a linear regression on a time series and its lagged (past) values:

$$
\begin{align}
Y_t = C + \sum^{p}_{i=1} \phi_i Y_{t-i} + \epsilon_t \\
\end{align}
$$

where $\phi_i$ are the parameters of the model, $C$ is a constant and $\epsilon_t$ is white noise.

Computationally, the model can be fitted in one go by using the OLS method described above with a special matrix formulation. Example code for this can be found in script _analysis.py_ in the project repository.

As noted, being a linear regression, this model assumes the lagged series are stationary. For direct analysis of non-stationary series this isn't appropriate and instead the ECM method described below should be used.


### Optimal Lag $p$

This is determined by the lowest values of an information criterion such as:
* Akaike Information Criterion (AIC):
$$
AIC = \log | \hat{\Sigma} | + \frac{2 k ^\prime}{T}
$$
where $k^\prime = n \times (n\times p+1)$ is the total number of variables in VAR(p)
* Bayesian Information Criterion (also Schwarz Criterion):
$$
BIC = \log | \hat{\Sigma} | + \frac{k ^\prime}{T} \log(T)
$$

See more at [statstools](http://statsmodels.sourceforge.net/stable/vector_ar.html#lag-order-selection))

## Likelihood function

$$
\frac{n}{2}\log\left(\left(Y-\hat{Y}\right)^{\prime}\left(Y-\hat{Y}\right)\right)-\frac{n}{2}\left(1+\log\left(\frac{2\pi}{n}\right)\right)-\frac{1}{2}\log\left(\left|\Sigma\right|\right)
$$

### Stability Condition

It requires for the eigenvalues of each relationship matrix $\beta_p$ to be
inside the unit circle ($ < 1$):

$$
| \lambda I - \beta | = 0
$$

[comment]: <> (The VAR system satisfies stability condition if $| \lambda I - \beta | = 0$. If $p>1$, coefficient for each lag $A_p$ si to be checked separately.)

## Error Correction Model (ECM)

The familiar linear regression:

$$
y_t = a + b x_t
$$

is only suitable to model _stationary_ variables. In general stock prices aren't stationary but their differences (returns) are. In this case we can go from the above model to one for returns:

$$
\Delta y_t = \beta_1 \Delta x_t - (1-\alpha) e_{t-1}
$$

This is known as the Error Correction Model (ECM)

In addition, $\hat{e}_t$ can be used to estimate the speed of mean-reversion and significance of stationarity $\phi$ via the regression:
$$
\Delta y_t = \phi \Delta x_t - (1-\alpha) \hat{e}_{t-1}
$$

## Other TS stuff

* statstools: descriptive stats for ts - autocorrelation and partial autocorrelation function, periodogram

# Cointegration

## Engle-Granger Procedure

When two series are cointegrated, a linear combination exists with weights $\beta_c^\prime$ that cancels the common stochastic process and produces a starionary spread $e_t = \beta_c^\prime Y_t$. The parameters of this linear combination may be estimated using linear (multivariate) regression:
$$
\hat{e}_t = y_t - \hat{b} x_t - \hat{a}
$$

We can then test $\hat{e}_t$ for unit root using ADF or similar to confirm thes spread is stationary.

## Assessing quality of mean-reversion in cointegrated coefficient

## Granger causality (determination of lead variable)

* See statstools

## Johansen procedure

MLE for multivariate cointegration on asset price data (levels, not returns)

# Trading strategies

* trading around the spread $e_t$ where estimated weights $\beta_i$ represent the position to take on each stock
* optimised bounds give entre/exit signals, e.g. $\mu_e \pm 1\sigma_e$. $\sigma_e$ can be obtained from fitting to OU process or using optimisation 
* speed of mean-reversion in the spread $\theta$ gives idea of profitability over time and can be converted to half-life (expected position holding time) as $\tilde{\tau} \propto ln 2 / \theta$
* the dollar MtM P&L $\Delta e_t = e_t - e_{t-1} $ is independent of its mean $\mu_e$


## Fitting to the OU process

The OU process is represented by the SDE:
$$
dY_t = -\theta(Y_t-\mu)dt + \sigma dW_t
$$
where $\theta$ is the speed of mean-reversion, $\mu$ is the equilibrium level and $\sigma$ the diffusion. This has an analytical solution which consists of mean-reverting and autoregressive terms. In fact in a discrete setting this is a VAR(1) process. In terms of the cointegrated spread this can be written as:

$$
e_t = C + B e_{t-1} + \epsilon_t
$$

where $\theta = -\ln B/ \tau$, $\tau$ being the data frequency e.g. $1/252$ for daily data, and $\mu_e = C / (1-B)$.

Commonly trading bounds are defined as $\mu_e \pm \sigma_{eq}$ where $\sigma_{eq}$ is a scaled version of $\sigma_{OU}$:
$$
\sigma_{eq} \approx \sigma_{OU}/\sqrt{2 \theta}
$$

with 

$$
\sigma_{OU} = \sqrt{\frac{2 \theta}{1-e^{-2 \theta \tau}} Var[e_t]}
$$

# Other

* price levels non-stationary
* can't use correlation unless on returns or diffs
* equilibrium (spread) can also be non-linear --> collinearity
* coint vector $\beta_{coint}$

