The goal of CausalArima is to estimate the causal effect of an intervention on a univariate time series using ARIMA models.
You can install the development version of CausalArima from GitHub with:
# install.packages("devtools")
devtools::install_github("FMenchetti/CausalArima")
This is a basic example which shows you how to use the package:
library(CausalArima)
#> Loading required package: forecast
#> Registered S3 method overwritten by 'quantmod':
#> method from
#> as.zoo.data.frame zoo
#> Loading required package: ggplot2
#> Loading required package: gridExtra
# simulate data
n<-100
set.seed(1)
x1 <- 100 + arima.sim(model = list(ar = 0.999), n = n)
y <- 1.2 * x1 + rnorm(n)
y[ floor(n*.71):n] <- y[ floor(n*.71):n] + 10
data <- cbind(y, x1)
dates <- seq.Date(from = as.Date("2014-01-05"), by = "days", length.out = n)
start<-as.numeric(strftime(as.Date(dates[1], "%Y-%m-%d"), "%u"))
# Adding a fictional intervention
int.date <- as.Date("2014-03-16")
# fit the model - Causal effect estimation
ce <- CausalArima(y = ts(y, start = start, frequency = 1), dates = dates, int.date = int.date,
xreg =x1, nboot = 1000)
How to obtain the plot of the forecast:
forecasted<-plot(ce, type="forecast")
forecasted
How to obtain the plot of the estimated effects and cumulative effects:
impact_p<-plot(ce, type="impact")
grid.arrange(impact_p$plot, impact_p$cumulative_plot)
How to obtain a quick summary of the estimated effect:
summary(ce)
#>
#> Point causal effect 12.257
#> Standard error 1.211
#> Left-sided p-value 1
#> Bidirectional p-value 0
#> Right-sided p-value 0
#>
#> Cumulative causal effect 310.709
#> Standard error 6.634
#> Left-sided p-value 1
#> Bidirectional p-value 0
#> Right-sided p-value 0
#>
#> Temporal average causal effect 10.357
#> Standard error 0.221
#> Left-sided p-value 1
#> Bidirectional p-value 0
#> Right-sided p-value 0
How to obtain a detailed summary of the results, with an option to produce tables in html format (notice that to proper display the results as html on a Rmarkdown chunk you have to set result as ‘asis’). Other possible format include “numeric”, useful to retrieve the statistics and use them in calculations, and “latex”. Estimated model:
summary_model<-impact(ce, format="html")
summary_model$arima
$arima_order
p | d | q | |
---|---|---|---|
arima_order | 0 | 0 | 0 |
coef | se | t value | |
---|---|---|---|
xreg | 1.199333 | 0.0016581 | 723.3279 |
ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
---|---|---|---|---|---|---|---|
Training set | 0.0043228 | 1.202503 | 0.9464393 | -0.0051705 | 0.9072633 | 0.5734012 | 0.1407503 |
loglik | aic | bic | aicc | |
---|---|---|---|---|
metrics | -112.234 | 228.4681 | 232.9651 | 228.6472 |
Causal impact:
summary_model$impact_norm
$average
estimate | sd | p_value_left | p_value_bidirectional | p_value_right |
---|---|---|---|---|
10.35698 | 0.2211311 | 1 | 0 | 0 |
estimate | sd | p_value_left | p_value_bidirectional | p_value_right |
---|---|---|---|---|
310.7095 | 6.633933 | 1 | 0 | 0 |
estimate | sd | p_value_left | p_value_bidirectional | p_value_right |
---|---|---|---|---|
12.25715 | 1.211185 | 1 | 0 | 0 |
Causal impact based on boostrap:
summary_model$impact_boot
$average
estimates | inf | sup | sd | |
---|---|---|---|---|
observed | 117.0485168 | NA | NA | NA |
forecasted | 106.6915345 | 106.2768920 | 107.1550453 | 0.2184921 |
absolute_effect | 10.3569824 | 9.8934716 | 10.7716249 | 0.2184921 |
relative_effect | 0.0970741 | 0.0927297 | 0.1009604 | 0.0020479 |
estimates | inf | sup | sd | |
---|---|---|---|---|
observed | 3511.4555050 | NA | NA | NA |
forecasted | 3200.7460337 | 3188.3067592 | 3214.6513578 | 6.5547623 |
absolute_effect | 310.7094713 | 296.8041472 | 323.1487458 | 6.5547623 |
relative_effect | 0.0970741 | 0.0927297 | 0.1009604 | 0.0020479 |
x | |
---|---|
alpha | 0.05 |
p | 0.00 |
How to inspect the residuals, with the plots of autocorrelation (ACF) and partial autocorrelation (PACF) functions and QQ-plots:
residuals<-plot(ce, type="residuals")
grid.arrange(residuals$ACF, residuals$PACF, residuals$QQ_plot)
# simulate data
n<-100
set.seed(1)
x1 <- 100 + arima.sim(model = list(ar = 0.999), n = n)
y <- 1.2 * x1 + rnorm(n)
y[ floor(n*.71):n] <- y[ floor(n*.71):n] + 10
data <- cbind(y, x1)
dates <- seq.Date(from = as.Date("2014-01-05"), by = "days", length.out = n)
start<-as.numeric(strftime(as.Date(dates[1], "%Y-%m-%d"), "%u"))
# Adding a fictional intervention
int.date <- as.Date("2014-03-16")
horizon<-as.Date(c("2014-03-25", "2014-04-05")) # add horizons
# fit the model - Causal effect estimation
ce <- CausalArima(y = ts(y, start = start, frequency = 1), ic = "aicc", dates = dates, int.date = int.date,
xreg =x1, nboot = 1000)
How to obtain the plot of the estimated effects and cumulative effects:
impact_p<-plot(ce, type="impact", horizon = horizon)
grid.arrange(impact_p$plot, impact_p$cumulative_plot)
How to obtain a quick summary of the estimated effect:
summary(ce, horizon = horizon)
#> 2014-03-25 2014-04-05
#> Point causal effect 9.962 9.673
#> Standard error 1.211 1.211
#> Left-sided p-value 1 1
#> Bidirectional p-value 0 0
#> Right-sided p-value 0 0
#>
#> Cumulative causal effect 104.356 216.327
#> Standard error 3.83 5.55
#> Left-sided p-value 1 1
#> Bidirectional p-value 0 0
#> Right-sided p-value 0 0
#>
#> Temporal average causal effect 10.436 10.301
#> Standard error 0.383 0.264
#> Left-sided p-value 1 1
#> Bidirectional p-value 0 0
#> Right-sided p-value 0 0
How to obtain a detailed summary of the results, with an option to produce tables in html format (notice that to proper display the results as html on a Rmarkdown chunk you have to set result as ‘asis’). Other possible format include “numeric”, useful to retrieve the statistics and use them in calculations, and “latex”. Estimated model:
summary_model<-impact(ce, format="html", horizon = horizon)
summary_model$arima
$arima_order
p | d | q | |
---|---|---|---|
arima_order | 0 | 0 | 0 |
coef | se | t value | |
---|---|---|---|
xreg | 1.199333 | 0.0016581 | 723.3279 |
ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | |
---|---|---|---|---|---|---|---|
Training set | 0.0043228 | 1.202503 | 0.9464393 | -0.0051705 | 0.9072633 | 0.5734012 | 0.1407503 |
loglik | aic | bic | aicc | |
---|---|---|---|---|
metrics | -112.234 | 228.4681 | 232.9651 | 228.6472 |
Causal impact:
summary_model$impact_norm
$average
horizon | estimate | sd | p_value_left | p_value_bidirectional | p_value_right |
---|---|---|---|---|---|
2014-03-25 | 10.43560 | 0.3830103 | 1 | 0 | 0 |
2014-04-05 | 10.30127 | 0.2643022 | 1 | 0 | 0 |
horizon | estimate | sd | p_value_left | p_value_bidirectional | p_value_right |
---|---|---|---|---|---|
2014-03-25 | 104.3560 | 3.830103 | 1 | 0 | 0 |
2014-04-05 | 216.3267 | 5.550346 | 1 | 0 | 0 |
horizon | estimate | sd | p_value_left | p_value_bidirectional | p_value_right |
---|---|---|---|---|---|
2014-03-25 | 9.961521 | 1.211185 | 1 | 0 | 0 |
2014-04-05 | 9.673077 | 1.211185 | 1 | 0 | 0 |
Causal impact based on boostrap:
summary_model$impact_boot
$2014-03-25
|
|
|
|
|
|
The plotting functions have some graphical parameters that make easier to personalize the plots:
forecasted_2<-plot(ce, type="forecast", fill_colour="orange",
colours=c("red", "blue"))
forecasted_2
All plotting functions return a ggplot object or a list of ggplot objects, which makes easy to modify any ggplot parameters of the theme. The ggthemes package can be useful to employ directly some pre-customized themes, for example we can use the Wall Street Journal theme simply typing:
library(ggthemes)
forecasted+theme_wsj()
You can read more on Estimating the causal effect of an intervention in a time series setting: the C-ARIMA approach (Fiammetta Menchetti, Fabrizio Cipollini, Fabrizia Mealli, 2021).
It is also available on youtube a video of a webinar on the topic: Fiammetta Menchetti: Estimating the causal effect of an intervention in a time series setting.