# Synthetic control application

This notebook illustrates how the tfp-causalimpact library is used to apply the synthetic control method to understand the impact that the Otis hurricane (which happened in October 2023) had on terminal transactions in Acapulco. Terminal transactions can be used as a proxy of economic activity, so this is a way to measure the impact that the hurricane had in Acapulco's economic activity.

In the synthetic control method, a treatment unit (e.g., a geographic location) is exposed to an intervention and the goal is to estimate what would have happened to that treatment unit if the intervention had not occurred. This is achieved by constructing a synthetic control unit, which is a weighted combination of control units (e.g., other geographic locations) that were not exposed to the intervention. The weights are chosen such that the synthetic control unit closely resembles the treatment unit in the pre-intervention period. By comparing the outcomes of the treatment unit and the synthetic control unit in the post-intervention period, we can estimate the causal effect of the intervention.

In [1]:
import causalimpact
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
import pandas as pd
import os

  if not hasattr(np, "object"):


In [2]:
# Load data
# The treatment unit (Acauplco) has to be the first column and the date column has to be the index. The other columns are the control units.
os.chdir("..")
df = pd.read_csv("data/processed/transactions_clean.csv")
df["Date"] = pd.to_datetime(df["Date"])
df = df.set_index("Date")
df.head()

Unnamed: 0_level_0,Acapulco de Juárez,Tijuana,Mexicali,Ensenada,La Paz,Los Cabos,Campeche,Ciudad del Carmen,Torreón,Saltillo,...,Apizaco,Tlaxcala,Veracruz,Xalapa,Coatzacoalcos,Boca del Río,Mérida,Zacatecas,Fresnillo,Guadalupe (Zacatecas)
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2011-04-01,439470,1015274,841030,391604,290722,538139,81624,154226,825532,633146,...,55523,43701,328584,361765,179680,298787,804268,103490,38786,52784
2011-05-01,411317,1239107,974104,438863,376113,575380,91734,175460,933421,731530,...,61530,50289,341287,389623,208101,358177,886519,107968,41939,55484
2011-06-01,375101,1200103,949425,423926,364554,539562,84131,175137,884642,707161,...,57993,51956,329791,378310,194557,343286,848640,103299,39697,57457
2011-07-01,416318,1142975,951503,462863,314585,548698,94517,177652,882503,719837,...,60453,55803,338792,402642,207734,315307,899933,113629,42901,60577
2011-08-01,428905,1256928,1027613,496120,332952,537693,100099,182621,898999,723469,...,65322,56790,352790,410032,218224,324414,931138,118381,43338,62325


In [3]:
# Define pre and post intervention periods. The pre intervention period is from April 2011 to September 2023.
# The post intervention period is from October 2023 to June 2025.
pre_period = (pd.to_datetime('2011-04-11'), pd.to_datetime('2023-09-01'))
post_period = (pd.to_datetime('2023-10-01'), pd.to_datetime('2025-06-01'))
# Set global seeds
seed = 0
np.random.seed(seed)
tf.random.set_seed(seed)
# Fit CausalImpact model
impact = causalimpact.fit_causalimpact(
    data=df,
    pre_period=pre_period,
    post_period=post_period
    )

I0000 00:00:1768156558.915673  516942 service.cc:145] XLA service 0x326df5610 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1768156558.915688  516942 service.cc:153]   StreamExecutor device (0): Host, Default Version
I0000 00:00:1768156559.136723  516942 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


In the following plot we can see the observed transactions in Acapulco (orange line) vs the counterfactual (blue line, which represents what would have happened to Acapulco if the hurricane had not occurred). Prior to the hurricane, both series closely track each other, indicating a good pre-treatment fit. Immediately after the hurricane, observed transactions experience a sharp decline relative to the counterfactual. Moreover, observed transactions never converge back to the counterfactual, suggesting that, at least through June 2025, Acapulco continues to experience persistent economic effects from the hurricane.

In [4]:
# Plot the results
causalimpact.plot(impact)

In [5]:
# summary of the results
print(causalimpact.summary(impact, output_format='summary'))


Posterior Inference {CausalImpact}
                          Average            Cumulative
Actual                    737845.8           15494761.0
Prediction (s.d.)         1025805.3 (105601.73)21541912.0 (2217636.5)
95% CI                    [888662.3, 1145919.6][18661908.1, 24064312.2]

Absolute effect (s.d.)    -287959.6 (105601.73)-6047151.0 (2217636.34)
95% CI                    [-408073.8, -150816.5][-8569550.3, -3167146.8]

Relative effect (s.d.)    -25.7% (36.1%)     -25.7% (36.0%)
95% CI                    [-35.6%, -17.0%]   [-35.6%, -17.0%]

Posterior tail-area probability p: 0.009
Posterior prob. of a causal effect: 99.11%

For more details run the command: summary(impact, output_format="report")


  p_value = ci_model.summary["p_value"][0]


In [6]:
# Interpretation of the results
print(causalimpact.summary(impact, output_format='report'))


Analysis report {CausalImpact}


During the post-intervention period, the response variable had
an average value of approx. 737845.8. By contrast, in the absence of an
intervention, we would have expected an average response of 1025805.3.
The 95% interval of this counterfactual prediction is [888662.3, 1145919.6].
Subtracting this prediction from the observed response yields
an estimate of the causal effect the intervention had on the
response variable. This effect is -287959.6 with a 95% interval of
[-408073.8, -150816.5]. For a discussion of the significance of this effect,
see below.


Summing up the individual data points during the post-intervention
period (which can only sometimes be meaningfully interpreted), the
response variable had an overall value of 15494761.0.
By contrast, had the intervention not taken place, we would have expected
a sum of 21541912.0. The 95% interval of this prediction is [18661908.1, 24064312.2].


The above results are given in terms of absolute numb

  p_value = ci_model.summary["p_value"][0]
