Granger Causality

Wang Cheng-Jun edited this page Dec 19, 2016 · 1 revision

计算传播学是计算社会科学的重要分支。它主要关注人类传播行为的可计算性基础,以传播网络分析、传播文本挖掘、数据科学等为主要分析工具,(以非介入地方式)大规模地收集并分析人类传播行为数据,挖掘人类传播行为背后的模式和法则,分析模式背后的生成机制与基本原理,可以被广泛地应用于数据新闻和计算广告等场景,注重编程训练、数学建模、可计算思维。

Clone this wiki locally

Source from https://www.r-bloggers.com/chicken-or-the-egg-granger-causality-for-the-masses/

 When I first learned about Granger-causality this past February, I was bemused and quite skeptical of the whole procedure.  I felt it belonged on the scrapheap of impractical academic endeavors, preferring to possibly use an ARIMA transfer function model for the same task.  However, several contemporaries threw the red challenge flag and upon further review, my initial impressions have been overturned.   Not only am I fascinated by the technique, in my attempt to discover its value I have became a raving R fan.  As such, my first blog entry is to provide some simple code to allow anyone to utilize this obscure econometric technique, but first some background.

Table of Contents

Introduction

Given two sets of time series data, x and y, granger-causality is a method which attempts to determine whether one series is likely to influence change in the other. This is accomplished by taking different lags of one series and using that to model the change in the second series. We create two models which predict y, one with only past values of y (Ω), and the other with past values of y and x (π). The models are given below where k is the number of lags in the time series:

Let Ω = yt = β0 + β1yt-1 +…+ βkyt-k + e
And π = yt = β0 + β1yt-1 +…+ βkyt-k + α1xt-1 +…+ αkxt-k + e

The residual sums of squared errors are then compared and a test is used to determine whether the nested model (Ω) is adequate to explain the future values of y or if the full model (π) is better. The F-test, t-test or Wald test (used in R) are calculated to test the following null and alternate hypotheses:

H0:  αi = 0 for each i of the element [1,k]
H1: αi ≠ 0 for at least 1 i of the element [1,k]

Essentially, we are trying to determine whether we can say that statistically x provides more information about future values of y than past values of y alone. Under this definition it is clear that we are not trying to prove actual causation, only that the two values are related by some phenomenon. Along those lines, we must also run this model in reverse to verify that that y does not provide information about future values of x. If we find that this is the case, it is likely that there is some exogenous variable, z, which needs to be controlled or could be a better candidate for granger causation.

For a detailed explanation, one can read the original paper on the subject: Granger, Clive W., “Investigating Causal Relations by Econometric Models and Cross-Spectral Methods”, Econometrica, 37(1969): 424-38[1]

lmtest

The R package “lmtest” incorporates the granger causality procedure, including a data set to answer the age old question of what came first, “the chicken or the egg”. The data was presented by Walter Thurman and Mark Fisher in the American Journal of Agricultural Economics, May 1988, titled “Chickens, Eggs, and Causality, or Which Came First?” [2]. It consists of two time series from 1930 to 1983, one of U.S. egg production and the other the estimated U.S. chicken population [3].

Let’s get some code going.

library(lmtest)
data(ChickEgg)
head(ChickEgg)
  • Year chicken egg
  • 1 1930 468491 3581
  • 2 1931 449743 3532
  • 3 1932 436815 3327
  • 4 1933 444523 3255
  • 5 1934 433937 3156
  • 6 1935 389958 3081
  1. plot the time series
par(mfrow=c(2,1)) plot.ts(chicken) plot.ts(egg)

600px

The plots provide little information other than the data is likely not stationary. I’ve just started using the forecast package, so let’s load it and test for what will achieve stationarity.

library(forecast)
  1. test for unit root and number of differences required, you can also test for seasonality with nsdiffs
ndiffs(chicken, alpha=0.05, test=c("kpss"))
  • [1] 1
ndiffs(egg, alpha=0.05, test=c("kpss"))
  • [1] 1
  1. differenced time series
dchick = diff(chicken) degg = diff(egg) plot.ts(dchick) plot.ts(degg)

600px

Much better!

Optimal lag

That’s pretty standard stuff, but this is where the magic happens! There are several ways to find the optimal lag, which I will skip in the interest of time, but let’s say four is the magic number.

Granger causality test

# do eggs granger cause chickens?
grangertest(dchick ~ degg, order=4)

  • Granger causality test
  • Model 1: dchick ~ Lags(dchick, 1:4) + Lags(degg, 1:4)
  • Model 2: dchick ~ Lags(dchick, 1:4)
  • Res.Df Df F Pr(>F)
  • 1 40
  • 2 44 -4 4.1762 0.006414**
  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
  1. Highly significant p-value, but what about the other direction?
  2. do chickens granger cause eggs, at lag 4?
grangertest(degg ~ dchick, order=4)
  • Granger causality test
  • Model 1: degg ~ Lags(degg, 1:4) + Lags(dchick, 1:4)
  • Model 2: degg ~ Lags(degg, 1:4)
  • Res.Df Df F Pr(>F)
  • 1 40
  • 2 44 -4 0.2817 0.8881

It is not significant, so we can say the eggs Granger-Cause chickens!

This is just the tip of the iceberg, but should be enough to strike up your curiosity and to make you dangerous. I’m working on commodity prices, bond prices and the U.S. stock market, but that is better left for another day.

Reference

  1. Granger, Clive W., “Investigating Causal Relations by Econometric Models and Cross-Spectral Methods”, Econometrica, 37(1969): 424-38
  2. ↑ ChickEgg data from lmtest http://math.furman.edu/~dcs/courses/math47/R/library/lmtest/html/ChickEgg.html
  3. ↑ Thurman W.N. & Fisher M.E. (1988), Chickens, Eggs, and Causality, or Which Came First?, American Journal of Agricultural Economics, 237-238.