<a href="https://colab.research.google.com/github/deltorobarba/machinelearning/blob/master/causality.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Causality in Time Series

*Author: Alexander Del Toro Barba*

# Import Libraries

In [0]:
# Import packages
import numpy as np
import sklearn
from decimal import *
from numpy import linalg as LA
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

import collections
import datetime
import matplotlib
from scipy import stats
import scipy.stats as stats
import matplotlib.pylab as py
import matplotlib.pyplot as plt
from numpy import sqrt
from copy import copy
from pandas.plotting import lag_plot
from pandas.plotting import autocorrelation_plot
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf
import statsmodels.api as sm
from statsmodels.tsa.arima_process import ArmaProcess
from statsmodels.graphics.tsaplots import plot_pacf
import matplotlib.dates as mdates
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf
import statsmodels
from statsmodels.tsa.stattools import adfuller
from numpy import log

# Import Data

In [0]:
# Import from https://webdav.tuebingen.mpg.de/cause-effect/

# Visualize Data

# Stationarity & Unit Root Tests

**Stationarity is a pre-requisite for Granger causality tests (but not for Johansen cointegration test - that is done on original, untransformed time series.**

## Johansen Cointegration

Johansen Test for Cointegration on Original Time uses the augmented Engle-Granger two-step cointegration test

Test for no-cointegration of a univariate equation. The null hypothesis is no cointegration. If two time series, X and Y, are cointegrated, there must exist Granger causality either from X to Y, or from Y to X, both in both directions. 

The presence of Granger causality in either or both directions between X and Y does not necessarily imply that the series will be cointegrated.

The first value is cointegration score, second is p-value, rest is 1%, 5% and 10%.

In [0]:
statsmodels.tsa.stattools.coint(series, series2, trend='ct', method='aeg', autolag='aic')

## KPSS Stationary Test

* Kwiatkowski-Phillips-Schmidt-Shin to test if time series is stationarity. Computes the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test for the null hypothesis that x is level or trend stationary. Stationarity Test: KPSS
* H0: Series has no unit root (is stationary or a deterministic trend). Ha: Series is not stationary
* Reject H0 if  p-value of less than 5% and test stats higher than provided critical values

**Cautions**

* A major disadvantage for the KPSS test is that it has a high rate of Type I errors (it tends to reject the null hypothesis too often). If attempts are made to control these errors (by having larger p-values), then that negatively impacts the test’s power.

* One way to deal with the potential for high Type I errors is to combine the KPSS with an ADF test. If the result from both tests suggests that the time series in stationary, then it probably is.

* https://www.statisticshowto.datasciencecentral.com/kpss-test/

**KPSS - Level Stationarity Test**

‘c’ : The data is stationary around a constant (default).

H0 = Stationary (Stochastic Trend). H1 = Non-Stationary (Unit Root)

Reject H0 if p-value of less than 5% and test stats higher than provided critical values

In [0]:
statsmodels.tsa.stattools.kpss(series, regression='c', store=False)

**KPSS Trend Stationary Test**

‘ct’ : The data is stationary around a trend.

H0 = Deterministic Trend. H1 = Non-Stationary (Unit Root)

Reject H0 if p-value of less than 5% and test stats higher than provided critical values

In [0]:
statsmodels.tsa.stattools.kpss(series, regression='ct', store=False)

## Augmented Dickey Fuller

The Augmented Dickey-Fuller test can be used to test for a unit root in a univariate process in the presence of serial correlation.

H0: There is a unit root for the series (non stationary)

Ha: There is no unit root for the series. The series is stationary.

Reject H0 if p-value of less than 5% and test stats higher than provided critical values

**Choose regression {‘c’,’ct’,’ctt’,’nc’} - Constant and trend order to include in regression.**

* ‘c’ : constant only (default).
* ‘ct’ : constant and trend.
* ‘ctt’ : constant, and linear and quadratic trend.
* ‘nc’ : no constant, no trend.

In [0]:
# no constant, no trend
statsmodels.tsa.stattools.adfuller(series, maxlag=None, regression='nc', autolag='AIC', store=False, regresults=False)

# Data Transformation

# Granger Causality

Granger causality means that past values of x2 have a statistically significant effect on the current value of x1, taking past values of x1 into account as regressors. 

* **Null hypothesis: x2 does NOT Granger cause x1**
* **Reject null hypothesis if the p-values are below a desired size of the test**

The null hypothesis for all four test is that the coefficients corresponding to past values of the second time series are zero.

* If p > .10 → “not significant” 
* If p ≤ .10 → “marginally significant” 
* If p ≤ .05 → “significant” 
* If p ≤ .01 → “highly significant.”

Explanation: With a large number of variables and lags (large number of lags may be assumed for this time series), the F-test can lose power. An alternative is the chi-square test, constructed with likelihood ratio or Wald tests. [nb: Although both versions give practically the same result, the F-test is much easier to run.] All values are computed automatically.

In [0]:
# Granger Causality
statsmodels.tsa.stattools.grangercausalitytests(master, maxlag = 10, addconst=True, verbose=True)