# Excess deaths due to COVID-19

In [None]:
import arviz as az
import pandas as pd

import causalpy as cp

In [None]:
%load_ext autoreload
%autoreload 2
# %config InlineBackend.figure_format = 'retina'
seed = 42

## Load data

In [None]:
df = (
    cp.load_data("covid")
    .assign(date=lambda x: pd.to_datetime(x["date"]))
    .set_index("date")
)
df["variable"] = ["deaths"] * len(df)
treatment_time = pd.to_datetime("2020-01-01")
df.head()

The columns are:

* `date` + `year`: self explanatory
* `month`: month, numerically encoded. Needs to be treated as a categorical variable
* `temp`: average UK temperature (Celcius)
* `t`: time
* `pre`: boolean flag indicating pre or post intervention

## Run the analysis

In this example we are going to standardize the data. So we have to be careful in how we interpret the inferred regression coefficients, and the posterior predictions will be in this standardized space.

<div class="alert alert-info">

Note:

The `random_seed` keyword argument for the PyMC sampler is not neccessary. We use it here so that the results are reproducible.

</div>

In [None]:
result = cp.pymc_experiments.SyntheticControl(
    df,
    "variable",
    treatment_time,
    formula="standardize(deaths) ~ 0 + standardize(t) + C(month) + standardize(temp)",
    model=cp.pymc_models.LinearRegression(sample_kwargs={"random_seed": seed}),
)

In [None]:
fig, ax = result.plot(plot_predictors=False, variable="deaths", hdi_prob=0.94)