# Interrupted time series analysis using the Causal Impact package
The idea behind *interrupted time series analysis* is to study the effect of a single intervention at a specific point in time on a time series. Examples of an intervention could be a treatment in medicine, or the introduction of a policy or new law, or the effect of an advertising campaign on sales. Examples of responses could be

* no effect at all
* an immediate effect only, seen as a change in the level
* a long term effect only, seen as a change in the slope
* both an immediate and long term effect, seen as a change in both the level and the slope

Some assumptions are 

* the relationship between covariates and treated time series remains stable throughout the post-period
* the covariates were not themselves affected by the intervention
* no other event took place simultaneously

What we are essentially doing is calculating the counterfactual: extending the time series as if  nothing had ever happened, *i.e.* forecast the data from the moment of intervention using the data prior to the intervention. We then compare the counterfactual time series with the real time series data, and examine the difference between the two.

## Example
For this example we shall use the [Carbon Dioxide Levels in Atmosphere](https://www.kaggle.com/ucsandiego/carbon-dioxide) dataset, which we looked at previously in the notebook ["*Time series decomposition: Naive example*"](https://www.kaggle.com/carlmcbrideellis/time-series-decomposition-naive-example). We then artificially introduce, starting in 2005, the hypothetical policy change that all combustion engine cars are  globally replaced by zero emission electric cars, leading to an immediate reduction in CO$_2$ by 10 ppm. 

Note that we first [de-trend the time series in order to make it stationary](https://www.statsmodels.org/dev/examples/notebooks/generated/stationarity_detrending_adf_kpss.html).

In [None]:
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 18})
plt.style.use('fivethirtyeight')
import pandas as pd

Install the [Causal Impact](https://github.com/dafiti/causalimpact) package

In [None]:
!pip install -q pycausalimpact
from causalimpact import CausalImpact

Take a look at our data

In [None]:
# read in the data
data = pd.read_csv("../input/carbon-dioxide/archive.csv", parse_dates= {"date" : ["Year","Month"]})
# remove any NaN data
data = data.dropna(how='any', subset=['Carbon Dioxide (ppm)'])
# select the data between 1970 and 2017
data = data[data['date'].between('1970-01-01', '2016-12-01')]
# for plotting
data["raw data"]   = data["Carbon Dioxide (ppm)"]
data.loc[data.date > '2004-12-01', "Carbon Dioxide (ppm)"] = data["Carbon Dioxide (ppm)"] - 15
# de-trend the time series to make it stationary
data['CO2_detrended'] = data['raw data'] - data['raw data'].shift(1)
# introduce the effect of the policy intervention
data.loc[data.date > '2004-12-01', "CO2_detrended"] = data["CO2_detrended"] - 15
data = data.set_index('date')

# plot the data
data.plot(y='Carbon Dioxide (ppm)', kind='line',figsize=(12,5), lw=2, title="Carbon Dioxide Levels in Atmosphere");
# data.plot(y='CO2_detrended', kind='line',figsize=(12,5), lw=2, title="Carbon Dioxide Levels de-trended");

In [None]:
# define the 'before' and 'after' periods
pre_period  = [ pd.Timestamp('1970-01-01') , pd.Timestamp('2004-12-01') ]
post_period = [ pd.Timestamp('2005-01-01') , pd.Timestamp('2016-12-01') ]

ci = CausalImpact(data.loc[:,"CO2_detrended"], 
                  pre_period, post_period, 
                  nseasons=[{'period': 12}],
                  prior_level_sd=0.05)
# print out a summary
print(ci.summary())
# display the plots
ci.plot(panels=['pointwise','cumulative'], figsize=(12, 8));

We can now clearly see the effect of the policy intervention on our data.
# Related reading
* [Kay H. Brodersen, Fabian Gallusser, Jim Koehler, Nicolas Remy, Steven L. Scott "*Inferring causal impact using Bayesian structural time-series models*", The Annals of Applied Statistics Volume **9** Pages 247-274 (2015)](https://projecteuclid.org/journalArticle/Download?urlid=10.1214%2F14-AOAS788)
* [David McDowall, Richard McCleary, Bradley J. Bartos  "*Interrupted Time Series Analysis*", SAGE (1980)](https://doi.org/10.1093/oso/9780190943943.001.0001)
* [Evangelos Kontopantelis, Tim Doran, David A Springate, Iain Buchan, David Reeves "*Regression based quasi-experimental approach when randomisation is not an option: Interrupted time series analysis*", The BMJ 350:h2750 (2015)](https://www.bmj.com/content/bmj/350/bmj.h2750.full.pdf)
* [James Lopez Bernal, Steven Cummins, and Antonio Gasparrini "*Interrupted time series regression for the evaluation of public health interventions: A tutorial*", International Journal of Epidemiology, Volume **46** Pages 348–355 (2017)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5407170/pdf/dyw098.pdf)
* [Andrea L. Schaffer, Timothy A. Dobbins, and Sallie-Anne Pearson "*Interrupted time series analysis using autoregressive integrated moving average (ARIMA) models: A guide for evaluating large-scale health interventions*", BMC Medical Research Methodology Volume **21** Article number 58 (2021)](https://bmcmedresmethodol.biomedcentral.com/track/pdf/10.1186/s12874-021-01235-8.pdf)

### Packages
* [CausalImpact](https://google.github.io/CausalImpact/CausalImpact.html) the original R package by Google
* [Causal Impact](https://github.com/dafiti/causalimpact) a python port of **CausalImpact** by Dafiti OpenSource