Skip to content

Latest commit

 

History

History
1564 lines (1502 loc) · 25.2 KB

README.md

File metadata and controls

1564 lines (1502 loc) · 25.2 KB

CausalArima

The goal of CausalArima is to estimate the causal effect of an intervention on a univariate time series using ARIMA models.

Installation

You can install the development version of CausalArima from GitHub with:

# install.packages("devtools")
devtools::install_github("FMenchetti/CausalArima")

Example

This is a basic example which shows you how to use the package:

library(CausalArima)
#> Loading required package: forecast
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
#> Loading required package: ggplot2
#> Loading required package: gridExtra

# simulate data
n<-100
set.seed(1)
x1 <- 100 + arima.sim(model = list(ar = 0.999), n = n)
y <- 1.2 * x1 + rnorm(n)
y[ floor(n*.71):n] <- y[ floor(n*.71):n] + 10
data <- cbind(y, x1)
dates <- seq.Date(from = as.Date("2014-01-05"), by = "days", length.out = n)
start<-as.numeric(strftime(as.Date(dates[1], "%Y-%m-%d"), "%u"))

# Adding a fictional intervention
int.date <- as.Date("2014-03-16")

# fit the model - Causal effect estimation
ce <- CausalArima(y = ts(y, start = start, frequency = 1), dates = dates, int.date = int.date,
                  xreg =x1, nboot = 1000)

How to obtain the plot of the forecast:

forecasted<-plot(ce, type="forecast")
forecasted

How to obtain the plot of the estimated effects and cumulative effects:

impact_p<-plot(ce, type="impact")
grid.arrange(impact_p$plot, impact_p$cumulative_plot)

How to obtain a quick summary of the estimated effect:

summary(ce)
#>                                       
#> Point causal effect            12.257 
#> Standard error                 1.211  
#> Left-sided p-value             1      
#> Bidirectional p-value          0      
#> Right-sided p-value            0      
#>                                       
#> Cumulative causal effect       310.709
#> Standard error                 6.634  
#> Left-sided p-value             1      
#> Bidirectional p-value          0      
#> Right-sided p-value            0      
#>                                       
#> Temporal average causal effect 10.357 
#> Standard error                 0.221  
#> Left-sided p-value             1      
#> Bidirectional p-value          0      
#> Right-sided p-value            0

How to obtain a detailed summary of the results, with an option to produce tables in html format (notice that to proper display the results as html on a Rmarkdown chunk you have to set result as ‘asis’). Other possible format include “numeric”, useful to retrieve the statistics and use them in calculations, and “latex”. Estimated model:

summary_model<-impact(ce, format="html")
summary_model$arima

$arima_order

p d q
arima_order 0 0 0
$param
coef se t value
xreg 1.199333 0.0016581 723.3279
$accuracy
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.0043228 1.202503 0.9464393 -0.0051705 0.9072633 0.5734012 0.1407503
$log_stats
loglik aic bic aicc
metrics -112.234 228.4681 232.9651 228.6472

Causal impact:

summary_model$impact_norm

$average

estimate sd p_value_left p_value_bidirectional p_value_right
10.35698 0.2211311 1 0 0
$sum
estimate sd p_value_left p_value_bidirectional p_value_right
310.7095 6.633933 1 0 0
$point_effect
estimate sd p_value_left p_value_bidirectional p_value_right
12.25715 1.211185 1 0 0

Causal impact based on boostrap:

summary_model$impact_boot

$average

estimates inf sup sd
observed 117.0485168 NA NA NA
forecasted 106.6915345 106.2768920 107.1550453 0.2184921
absolute_effect 10.3569824 9.8934716 10.7716249 0.2184921
relative_effect 0.0970741 0.0927297 0.1009604 0.0020479
$effect_cum
estimates inf sup sd
observed 3511.4555050 NA NA NA
forecasted 3200.7460337 3188.3067592 3214.6513578 6.5547623
absolute_effect 310.7094713 296.8041472 323.1487458 6.5547623
relative_effect 0.0970741 0.0927297 0.1009604 0.0020479
$p_values
x
alpha 0.05
p 0.00

How to inspect the residuals, with the plots of autocorrelation (ACF) and partial autocorrelation (PACF) functions and QQ-plots:

residuals<-plot(ce, type="residuals")
grid.arrange(residuals$ACF, residuals$PACF, residuals$QQ_plot)

Example with more horizons

# simulate data
n<-100
set.seed(1)
x1 <- 100 + arima.sim(model = list(ar = 0.999), n = n)
y <- 1.2 * x1 + rnorm(n)
y[ floor(n*.71):n] <- y[ floor(n*.71):n] + 10
data <- cbind(y, x1)
dates <- seq.Date(from = as.Date("2014-01-05"), by = "days", length.out = n)
start<-as.numeric(strftime(as.Date(dates[1], "%Y-%m-%d"), "%u"))

# Adding a fictional intervention
int.date <- as.Date("2014-03-16")
horizon<-as.Date(c("2014-03-25", "2014-04-05")) # add horizons

# fit the model - Causal effect estimation
ce <- CausalArima(y = ts(y, start = start, frequency = 1), ic = "aicc", dates = dates, int.date = int.date,
                  xreg =x1, nboot = 1000)

How to obtain the plot of the estimated effects and cumulative effects:

impact_p<-plot(ce, type="impact", horizon = horizon)
grid.arrange(impact_p$plot, impact_p$cumulative_plot)

How to obtain a quick summary of the estimated effect:

summary(ce, horizon = horizon)
#>                                2014-03-25 2014-04-05
#> Point causal effect            9.962      9.673     
#> Standard error                 1.211      1.211     
#> Left-sided p-value             1          1         
#> Bidirectional p-value          0          0         
#> Right-sided p-value            0          0         
#>                                                     
#> Cumulative causal effect       104.356    216.327   
#> Standard error                 3.83       5.55      
#> Left-sided p-value             1          1         
#> Bidirectional p-value          0          0         
#> Right-sided p-value            0          0         
#>                                                     
#> Temporal average causal effect 10.436     10.301    
#> Standard error                 0.383      0.264     
#> Left-sided p-value             1          1         
#> Bidirectional p-value          0          0         
#> Right-sided p-value            0          0

How to obtain a detailed summary of the results, with an option to produce tables in html format (notice that to proper display the results as html on a Rmarkdown chunk you have to set result as ‘asis’). Other possible format include “numeric”, useful to retrieve the statistics and use them in calculations, and “latex”. Estimated model:

summary_model<-impact(ce, format="html", horizon = horizon)
summary_model$arima

$arima_order

p d q
arima_order 0 0 0
$param
coef se t value
xreg 1.199333 0.0016581 723.3279
$accuracy
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.0043228 1.202503 0.9464393 -0.0051705 0.9072633 0.5734012 0.1407503
$log_stats
loglik aic bic aicc
metrics -112.234 228.4681 232.9651 228.6472

Causal impact:

summary_model$impact_norm

$average

horizon estimate sd p_value_left p_value_bidirectional p_value_right
2014-03-25 10.43560 0.3830103 1 0 0
2014-04-05 10.30127 0.2643022 1 0 0
$sum
horizon estimate sd p_value_left p_value_bidirectional p_value_right
2014-03-25 104.3560 3.830103 1 0 0
2014-04-05 216.3267 5.550346 1 0 0
$point_effect
horizon estimate sd p_value_left p_value_bidirectional p_value_right
2014-03-25 9.961521 1.211185 1 0 0
2014-04-05 9.673077 1.211185 1 0 0

Causal impact based on boostrap:

summary_model$impact_boot

$2014-03-25

estimates inf sup sd
observed 117.57705 NA NA NA
forecasted 107.08878 106.336204 107.8783426 0.3887218
absolute_effect 10.48827 9.698704 11.2408425 0.3887218
relative_effect 0.09794 0.090567 0.1049675 0.0036299
estimates inf sup sd
observed 1058.19342 NA NA NA
forecasted 963.79898 957.025838 970.9050836 3.4984965
absolute_effect 94.39444 87.288337 101.1675823 3.4984965
relative_effect 0.09794 0.090567 0.1049675 0.0036299
x
alpha 0.05
p 0.00
$`2014-04-05`
estimates inf sup sd
observed 117.8232981 NA NA NA
forecasted 107.4906190 106.9643496 108.0108449 0.2687184
absolute_effect 10.3326792 9.8124532 10.8589485 0.2687184
relative_effect 0.0961263 0.0912866 0.1010223 0.0024999
estimates inf sup sd
observed 2356.4659629 NA NA NA
forecasted 2149.8123792 2139.2869920 2160.2168984 5.3743690
absolute_effect 206.6535837 196.2490645 217.1789708 5.3743690
relative_effect 0.0961263 0.0912866 0.1010223 0.0024999
x
alpha 0.05
p 0.00

Modify the plots

The plotting functions have some graphical parameters that make easier to personalize the plots:

forecasted_2<-plot(ce, type="forecast", fill_colour="orange",
               colours=c("red", "blue"))
forecasted_2

All plotting functions return a ggplot object or a list of ggplot objects, which makes easy to modify any ggplot parameters of the theme. The ggthemes package can be useful to employ directly some pre-customized themes, for example we can use the Wall Street Journal theme simply typing:

library(ggthemes)
forecasted+theme_wsj()

Learn more

You can read more on Estimating the causal effect of an intervention in a time series setting: the C-ARIMA approach (Fiammetta Menchetti, Fabrizio Cipollini, Fabrizia Mealli, 2021).

It is also available on youtube a video of a webinar on the topic: Fiammetta Menchetti: Estimating the causal effect of an intervention in a time series setting.