# BDACA I+II
## Week 8: Time series analysis

To do this exercise, you need:

- The information from this weeks guest lecture
- The example code at https://github.com/damian0604/bdaca/blob/master/ipynb/timeseries.ipynb
- The data from https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/JU8B9V
- (optionally, for better understanding) The article to which the data belong at https://doi.org/10.1017/S0007123418000145

We will try to find out whether media attention to UKIP increase vote intention for UKIP in polls, and/or whether vote intention in polls increases media attention. The authors of the paper argue that media attention drives party support, but not the other way around.

As you will see when starting to analyze the data, you might actually *not* arrive at the same conclusion, *depending on your model specification*. We can discuss why this is the case and what the implications of this are in class, but first and foremost, do not focus too much on discripancies with the author's original analysis. The authors provide the R Markdown code that allows you to replicate exactly what they did in R, if you want.

## How this notebook works

Insert your code in the empty cells. Add new code cells as you need. Do not forget to also insert Markdown cells to make your own interpretation and comments. I sometimes provide some example code, but you need to adapt it (e.g., use correct filenames or variable names)

## Load and prepare the data

Read the data into a pandas dataframe. For now, we only need the `ukip_media.tab` file. Our main variables of interest are called `UKIP.Vote` and `UKIP.Articles`. 
If you want to exactly replicate the author's analysis, you might also need some of the other files (and merge them, as we have learned in Week 7) for additional control variables. But let's start simple.

In [None]:
import pandas as pd
import statsmodels.api as sm
import statsmodels.tsa as smtsa
import numpy as np
%matplotlib inline

In [None]:
media = pd.read_csv('data/ukip_media.tab', delimiter = '\t')

In [None]:
# insert some code for inspecting the data

In [None]:
# given that it is over time data, maybe do a plot, e.g. like this:
# media['variablename'].plot()

In [None]:
# a comment that allows you to use the Date as index (=row label) instead of a number 0, 1, 2 ... n
# media.index=pd.to_datetime(media['Date'])
# check out what that does to your plots!

You might have recognized that the poll data have some missing values. We could drop them, but then we would loose the nice property that our time series is evenly spaced. The authors therefore used a linear interpolation to replace the missing values. Let's to that as well

In [None]:
# media['UKIP.Vote'] = media['UKIP.Vote'].interpolate()

## Test for stationarity (and potentially differentiate)
Look again at the plots you created above? Do they look stationaty?

Let's test that more formally with a Dickey-Fuller test. We need to be able to reject the null hypothesis. If we are not (i.e., if p>.05) We need to differentiate.

In [None]:
# your code for DF tests here

In [None]:
# if applicable: your code for differentiating here
# if you differentiate, remove the first row of your new resulting dataset because it now contains NaA (can you explain why?)

In [None]:
# if applicable: do DF test again after differentiating to make sure that you are now able to reject the null

## Run first model

Let's run a model and then test for heteroskedaticity. You can select the number of lags by first displaying the statistics and then selecting it manyally, or you do it automatically. The authors used the aic statistics for that.

In [None]:
# first analysis, no tuning yet
# m1 = sm.tsa.VAR(endog= mydataframe)
# result = m1.fit(maxlags=10, ic='aic')
# result.summary()

In [None]:
# or, to select lag by hand
# m1 = sm.tsa.VAR(endog= mydataframe)
# model.select_order(10).summary() 

In [None]:
# results = model.fit(3) # if 3 is what you want
# results.summary()

In [None]:
# your code for heteroskedaticity
# sth along the lines of (choose better variable names)
# sres_adj_close=[line**2 for line in result.resid['UKIP.Vote']]
# sm.stats.diagnostic.acorr_ljungbox(sres_adj_close,lags=20)

## Transform the input data (and do tests etc again)

The authors argue that in order to remove heteroskedaticity, they log-transformed the variables. This basically means that you are not asking "what do I have to add to/substract from y when I increase x by 1,2,3,...?", but "With what do I have to multiply y when I multiply x with 1,2,3,...?".

Because log(0) is not defined, we first add 1 to the variable. You can use `.apply()` or `.map()` for your transformation, e.g `.apply(lambda x: np.log(x+1)`


### Also don't forget to do a white noise test!

In [None]:
# your code

In [None]:
# your code

In [None]:
# your code

In [None]:
# ...

In [None]:
#

## Interpret the model

You can now start interpreting the model by 

1. Running Granger-causality tests
2. Plotting IRF's

In [None]:
# your code

## Specify alternative models (and test them)

There are a couple of ways how we could change our models (and, in fact, the authors did use such models). I'll give you some suggestions here:

### Include a trend variable

You can specify whether you want to include a trend variable (basically,  include the row number as variable)

`r = model.fit(maxlags=15, ic='aic', trend='c')  # include a constant`


`r = model.fit(maxlags=15, ic='aic', trend='ct')  # include a constant and a trend`


`r = model.fit(maxlags=15, ic='aic', trend='ctt')  # include a constant and a squared trend`

In [None]:
# your code

### Include control variables

You can include control variables (also called exogeneous variables, in contrast to endogeneous variables).

`model = sm.tsa.VAR(endog=mydf[mycolumnswithvariablesofinterest], exog=mydf[mycolumnswithcontrols])`


If you would create a new variable with an increasing numner like this:

`mydf['test_trend']=range(len(mydf))`

and then include it as an edogeneous variable, then this would be equivalent to using the trend shortcut above.

In [None]:
# your code