Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
## Checklist

- [ ] I've formatted the new code by running `hatch run dev:format` before committing.
- [ ] I've added tests for new code.
- [ ] I've added docstrings for the new code.

## Description

Please describe your changes here. If this fixes a bug, please link to the issue, if possible.

Issue Number: N/A
12 changes: 12 additions & 0 deletions .github/workflows/ruff.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
name: Check linting
on:
pull_request:
push:
branches:
- main
jobs:
ruff:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3.5.2
- uses: chartboost/ruff-action@v1
34 changes: 34 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: Run Tests
on:
pull_request:
push:
branches:
- main

jobs:
unit-tests:
name: Run Tests
runs-on: ubuntu-latest
strategy:
matrix:
# Select the Python versions to test against
os: ["ubuntu-latest", "macos-latest"]
python-version: ["3.10", "3.11"]
fail-fast: true
steps:
- name: Check out the code
uses: actions/checkout@v3.5.2
with:
fetch-depth: 1
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

# Install Hatch
- name: Install Hatch
uses: pypa/hatch@install

# Run the unit tests and build the coverage report
- name: Run Tests
run: hatch run dev:test
65 changes: 56 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,64 @@
## My Project
# SyntheticCausalDataGen

TODO: Fill this README out!
This package provides functionality to define your own causal data generation process and then simulate data from the process. Within the package, there is functionality to include complex components to your process, such as periodic and temporal trends, and all of these operations are fully composable with one another.

Be sure to:
A short example is given below
```python
from causal_validation import Config, simulate
from causal_validation.effects import StaticEffect
from causal_validation.plotters import plot
from causal_validation.transforms import Trend, Periodic
from causal_validation.transforms.parameter import UnitVaryingParameter
from scipy.stats import norm

* Change the title in this README
* Edit your repository description on GitHub
cfg = Config(
n_control_units=10,
n_pre_intervention_timepoints=60,
n_post_intervention_timepoints=30,
)

## Security
# Simulate the base observation
base_data = simulate(cfg)

See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
# Apply a linear trend with unit-varying intercept
intercept = UnitVaryingParameter(sampling_dist = norm(0, 1))
trend_component = Trend(degree=1, coefficient=0.1, intercept=intercept)
trended_data = trend_component(base_data)

## License
# Simulate a 5% lift in the treated unit's post-intervention data
effect = StaticEffect(0.05)
inflated_data = effect(trended_data)

This project is licensed under the Apache-2.0 License.
# Plot your data
plot(inflated_data)
```


## Examples

To supplement the above example, we have two more detailed notebooks which exhaustively present and explain the functionalty in this package, along with how the generated data may be integrated with [AZCausal](https://github.com/amazon-science/azcausal).
1. [Basic notebook](): We here show the full range of available functions for data generation
2. [AZCausal notebook](): We here show how the generated data may be used within an AZCausal model.

## Installation

In this section we guide the user through the installation of this package. We distinguish here between _users_ of the package who seek to define their own data generating processes, and _developers_ who wish to extend the existing functionality of the package.

### Prerequisites

- Python 3.10 or higher
- [Poetry](https://python-poetry.org/) (optional, but recommended)

### For Users

1. It's strongly recommended to use a virtual environment. Create and activate one using your preferred method before proceeding with the installation.
2. Clone the package `git clone git@github.com:amazon-science/causal-validation.git`
3. Enter the package's root directory `cd SyntheticCausalDataGen`
4. Install the package `pip install -e .`

### For Developers

1. Follow steps 1-3 from `For Users`
2. Create a hatch environment `hatch env create`
3. Open a hatch shell `hatch shell`
4. Validate your installation by running `hatch run tests:test`
113 changes: 113 additions & 0 deletions examples/azcausal.pct.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# %%
from azcausal.estimators.panel.sdid import SDID
import scipy.stats as st

from causal_validation import (
Config,
simulate,
)
from causal_validation.effects import StaticEffect
from causal_validation.plotters import plot
from causal_validation.transforms import (
Periodic,
Trend,
)
from causal_validation.transforms.parameter import UnitVaryingParameter

# %% [markdown]
# ## AZCausal Integration
#
# Amazon's [AZCausal](https://github.com/amazon-science/azcausal) library provides the
# functionality to fit synthetic control and difference-in-difference models to your
# data. Integrating the synthetic data generating process of `causal_validation` with
# AZCausal is trivial, as we show in this notebook. To start, we'll simulate a toy
# dataset.

# %%
cfg = Config(
n_control_units=10,
n_pre_intervention_timepoints=60,
n_post_intervention_timepoints=30,
seed=123,
)

linear_trend = Trend(degree=1, coefficient=0.05)
data = linear_trend(simulate(cfg))
plot(data)

# %% We'll now simulate a 5% lift in the treatment group's observations. This [markdown]
# will inflate the treated group's observations in the post-intervention window.

# %%
TRUE_EFFECT = 0.05
effect = StaticEffect(effect=TRUE_EFFECT)
inflated_data = effect(data)
plot(inflated_data)

# %% [markdown]
# ### Fitting a model
#
# We now have some very toy data on which we may apply a model. For this demonstration
# we shall use the Synthetic Difference-in-Differences model implemented in AZCausal;
# however, the approach shown here will work for any model implemented in AZCausal. To
# achieve this, we must first coerce the data into a format that is digestible for
# AZCausal. Through the `.to_azcausal()` method implemented here, this is
# straightforward to achieve. Once we have a AZCausal compatible dataset, the modelling
# is very simple by virtue of the clean design of AZCausal.

# %%
panel = inflated_data.to_azcausal()
model = SDID()
result = model.fit(panel)
print(f"Delta: {TRUE_EFFECT - result.effect.percentage().value / 100}")
print(result.summary(title="Synthetic Data Experiment"))

# %% We see that SDID has done an excellent job of estimating the treatment [markdown]
# effect. However, given the simplicity of the data, this is not surprising. With the
# functionality within this package though we can easily construct more complex datasets
# in effort to fully stress-test any new model and identify its limitations.
#
# To achieve this, we'll simulate 10 control units, 60 pre-intervention time points, and
# 30 post-intervention time points according to the following process: $$ \begin{align}
# \mu_{n, t} & \sim\mathcal{N}(20, 0.5^2)\\
# \alpha_{n} & \sim \mathcal{N}(0, 1^2)\\
# \beta_{n} & \sim \mathcal{N}(0.05, 0.01^2)\\
# \nu_n & \sim \mathcal{N}(1, 1^2)\\
# \gamma_n & \sim \operatorname{Student-t}_{10}(1, 1^2)\\
# \mathbf{Y}_{n, t} & = \mu_{n, t} + \alpha_{n} + \beta_{n}t + \nu_n\sin\left(3\times
# 2\pi t + \gamma\right) + \delta_{t, n} \end{align} $$ where the true treatment effect
# $\delta_{t, n}$ is 5% when $n=1$ and $t\geq 60$ and 0 otherwise. Meanwhile,
# $\mathbf{Y}$ is the matrix of observations, long in the number of time points and wide
# in the number of units.

# %%
cfg = Config(
n_control_units=10,
n_pre_intervention_timepoints=60,
n_post_intervention_timepoints=30,
global_mean=20,
global_scale=1,
seed=123,
)

intercept = UnitVaryingParameter(sampling_dist=st.norm(loc=0.0, scale=1))
coefficient = UnitVaryingParameter(sampling_dist=st.norm(loc=0.05, scale=0.01))
linear_trend = Trend(degree=1, coefficient=coefficient, intercept=intercept)

amplitude = UnitVaryingParameter(sampling_dist=st.norm(loc=1.0, scale=2))
shift = UnitVaryingParameter(sampling_dist=st.t(df=10))
periodic = Periodic(amplitude=amplitude, shift=shift, frequency=3)

data = effect(periodic(linear_trend(simulate(cfg))))
plot(data)

# %% As before, we may now go about estimating the treatment. However, this [markdown]
# time we see that the delta between the estaimted and true effect is much larger than
# before.

# %%
panel = data.to_azcausal()
model = SDID()
result = model.fit(panel)
print(f"Delta: {100*(TRUE_EFFECT - result.effect.percentage().value / 100): .2f}%")
print(result.summary(title="Synthetic Data Experiment"))
Loading