From 48f7783a5499a4df81cffb07c1ca98225fffc9ac Mon Sep 17 00:00:00 2001
From: Thomas Pinder <pinthoma@amazon.com>
Date: Thu, 5 Sep 2024 17:41:01 +0000
Subject: [PATCH 1/3] Add rich formating and multi-model support

---
 examples/placebo_test.pct.py                  | 53 ++++++++----------
 pyproject.toml                                |  3 +-
 src/causal_validation/__about__.py            |  2 +-
 src/causal_validation/models.py               |  3 ++
 src/causal_validation/validation/placebo.py   | 54 +++++++++++++++----
 .../test_validation/test_placebo.py           | 33 ++++++++++--
 6 files changed, 100 insertions(+), 48 deletions(-)

diff --git a/examples/placebo_test.pct.py b/examples/placebo_test.pct.py
index 7016483..cd55e93 100644
--- a/examples/placebo_test.pct.py
+++ b/examples/placebo_test.pct.py
@@ -7,7 +7,7 @@
 #       extension: .py
 #       format_name: percent
 #       format_version: '1.3'
-#       jupytext_version: 1.16.4
+#       jupytext_version: 1.11.2
 #   kernelspec:
 #     display_name: causal-validation
 #     language: python
@@ -15,22 +15,11 @@
 # ---
 
 # %% [markdown]
-# # Placebo Testing
+# # Placebo Testing 
 #
-# A placebo test is an approach to assess the validity of a causal model by checking if
-# the effect can truly be attributed to the treatment, or to other spurious factors. A
-# placebo test is conducted by iterating through the set of control units and at each
-# iteration, replacing the treated unit by one of the control units and measuring the
-# effect. If the model detects a significant effect, then it suggests potential bias or
-# omitted variables in the analysis, indicating that the causal inference is flawed.
+# A placebo test is an approach to assess the validity of a causal model by checking if the effect can truly be attributed to the treatment, or to other spurious factors. A placebo test is conducted by iterating through the set of control units and at each iteration, replacing the treated unit by one of the control units and measuring the effect. If the model detects a significant effect, then it suggests potential bias or omitted variables in the analysis, indicating that the causal inference is flawed.
 #
-# A successful placebo test will show no statistically significant results and we may
-# then conclude that the estimated effect can be attributed to the treatment and not
-# driven by confounding factors. Conversely, a failed placebo test, which shows
-# significant results, suggests that the identified treatment effect may not be
-# reliable. Placebo testing is thus a critical step to ensure the robustness of findings
-# in RCTs. In this notebook, we demonstrate how a placebo test can be conducted in
-# `causal-validation`.
+# A successful placebo test will show no statistically significant results and we may then conclude that the estimated effect can be attributed to the treatment and not driven by confounding factors. Conversely, a failed placebo test, which shows significant results, suggests that the identified treatment effect may not be reliable. Placebo testing is thus a critical step to ensure the robustness of findings in RCTs. In this notebook, we demonstrate how a placebo test can be conducted in `causal-validation`.
 
 # %%
 from azcausal.core.error import JackKnife
@@ -53,9 +42,7 @@
 # %% [markdown]
 # ## Data simulation
 #
-# To demonstrate a placebo test, we must first simulate some data. For the purposes of
-# illustration, we'll simulate a very simple dataset containing 10 control units where
-# each unit has 60 pre-intervention observations, and 30 post-intervention observations.
+# To demonstrate a placebo test, we must first simulate some data. For the purposes of illustration, we'll simulate a very simple dataset containing 10 control units where each unit has 60 pre-intervention observations, and 30 post-intervention observations.
 
 # %%
 cfg = Config(
@@ -73,10 +60,7 @@
 # %% [markdown]
 # ## Model
 #
-# We'll now define our model. To do this, we'll use the synthetic
-# difference-in-differences implementation of AZCausal. This implementation, along with
-# any other model from AZCausal, can be neatly wrapped up in our `AZCausalWrapper` to
-# make fitting and effect estimation simpler.
+# We'll now define our model. To do this, we'll use the synthetic difference-in-differences implementation of AZCausal. This implementation, along with any other model from AZCausal, can be neatly wrapped up in our `AZCausalWrapper` to make fitting and effect estimation simpler.
 
 # %%
 model = AZCausalWrapper(model=SDID(), error_estimator=JackKnife())
@@ -84,18 +68,23 @@
 # %% [markdown]
 # ## Placebo Test Results
 #
-# Now that we have a dataset and model defined, we may conduct our placebo test. With 10
-# control units, the test will estimate 10 individual effects; 1 per control unit when
-# it is mocked as the treated group. With those 10 effects, the routine will then
-# produce the mean estimated effect, along with the standard deviation across the
-# estimated effect, the effect's standard error, and the p-value that corresponds to the
-# null-hypothesis test that the effect is 0.
+# Now that we have a dataset and model defined, we may conduct our placebo test. With 10 control units, the test will estimate 10 individual effects; 1 per control unit when it is mocked as the treated group. With those 10 effects, the routine will then produce the mean estimated effect, along with the standard deviation across the estimated effect, the effect's standard error, and the p-value that corresponds to the null-hypothesis test that the effect is 0.
 #
-# In the below, we see that expected estimated effect is small at just 0.08.
-# Accordingly, the p-value attains a value of 0.5, indicating that we have insufficient
-# evidence to reject the null hypothesis and we, therefore, have no evidence to suggest
-# that there is bias within this particular setup.
+# In the below, we see that expected estimated effect is small at just 0.08. Accordingly, the p-value attains a value of 0.5, indicating that we have insufficient evidence to reject the null hypothesis and we, therefore, have no evidence to suggest that there is bias within this particular setup.
 
 # %%
 result = PlaceboTest(model, data).execute()
 result.summary()
+
+# %% [markdown]
+# ## Model Comparison
+#
+# We can also use the results of a placebo test to compare two or more models. Using `causal-validation`, this is as simple as supplying a series of models to the placebo test and comparing their outputs. To demonstrate this, we will compare the previously used synthetic difference-in-differences model with regular difference-in-differences.
+
+# %%
+did_model = AZCausalWrapper(model=DID())
+PlaceboTest([model, did_model], data).execute().summary()
+
+# %%
+
+# %%
diff --git a/pyproject.toml b/pyproject.toml
index 0112fd3..4d7d220 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -31,7 +31,8 @@ dependencies = [
   "matplotlib",
   "numpy",
   "pandas",
-  "pandera"
+  "pandera",
+  "rich"
 ]
 
 [tool.hatch.build]
diff --git a/src/causal_validation/__about__.py b/src/causal_validation/__about__.py
index f83f980..c24a156 100644
--- a/src/causal_validation/__about__.py
+++ b/src/causal_validation/__about__.py
@@ -1,3 +1,3 @@
-__version__ = "0.0.4"
+__version__ = "0.0.5"
 
 __all__ = ["__version__"]
diff --git a/src/causal_validation/models.py b/src/causal_validation/models.py
index 5b81fb4..d8b5196 100644
--- a/src/causal_validation/models.py
+++ b/src/causal_validation/models.py
@@ -13,6 +13,9 @@ class AZCausalWrapper:
     model: Estimator
     error_estimator: tp.Optional[Error] = None
 
+    def __post_init__(self):
+        self._model_name = self.model.__class__.__name__
+
     def __call__(self, data: Dataset, **kwargs) -> Result:
         panel = data.to_azcausal()
         result = self.model.fit(panel, **kwargs)
diff --git a/src/causal_validation/validation/placebo.py b/src/causal_validation/validation/placebo.py
index f12e2eb..2328a25 100644
--- a/src/causal_validation/validation/placebo.py
+++ b/src/causal_validation/validation/placebo.py
@@ -11,6 +11,8 @@
 )
 from scipy.stats import ttest_1samp
 from tqdm import trange
+from rich import box
+from rich.table import Table
 
 from causal_validation.data import Dataset
 from causal_validation.models import AZCausalWrapper
@@ -29,37 +31,67 @@
 
 @dataclass
 class PlaceboTestResult:
-    effects: tp.List[Effect]
+    effects: tp.Dict[str, tp.List[Effect]]
 
-    def summary(self) -> pd.DataFrame:
-        _effects = [effect.value for effect in self.effects]
+    def _model_to_df(self, model_name: str, effects: tp.List[Effect]) -> pd.DataFrame:
+        _effects = [effect.value for effect in effects]
         _n_effects = len(_effects)
         expected_effect = np.mean(_effects)
         stddev_effect = np.std(_effects)
         std_error = stddev_effect / np.sqrt(_n_effects)
         p_value = ttest_1samp(_effects, 0, alternative="two-sided").pvalue
         result = {
+            "Model": model_name,
             "Effect": expected_effect,
             "Standard Deviation": stddev_effect,
             "Standard Error": std_error,
             "p-value": p_value,
         }
         result_df = pd.DataFrame([result])
-        PlaceboSchema.validate(result_df)
         return result_df
 
+    def to_df(self) -> pd.DataFrame:
+        df = pd.concat(
+            [
+                self._model_to_df(model, effects)
+                for model, effects in self.effects.items()
+            ]
+        )
+        PlaceboSchema.validate(df)
+        return df
+
+    def summary(self) -> Table:
+        table = Table(show_header=True, box=box.MARKDOWN)
+        df = self.to_df()
+
+        for column in df.columns:
+            table.add_column(str(column), style="magenta")
+
+        for _, value_list in enumerate(df.values.tolist()):
+            row = [str(x) for x in value_list]
+            table.add_row(*row)
+
+        return table
+
 
 @dataclass
 class PlaceboTest:
-    model: AZCausalWrapper
+    models: tp.Union[AZCausalWrapper, tp.List[AZCausalWrapper]]
     dataset: Dataset
 
+    def __post_init__(self):
+        if isinstance(self.models, AZCausalWrapper):
+            self.models: tp.List[AZCausalWrapper] = [self.models]
+
     def execute(self) -> PlaceboTestResult:
         n_control_units = self.dataset.n_units
-        results = []
-        for i in trange(n_control_units):
-            placebo_data = self.dataset.to_placebo_data(i)
-            result = self.model(placebo_data)
-            result = result.effect.percentage()
-            results.append(result)
+        results = {}
+        for model in self.models:
+            model_result = []
+            for i in trange(n_control_units):
+                placebo_data = self.dataset.to_placebo_data(i)
+                result = model(placebo_data)
+                result = result.effect.percentage()
+                model_result.append(result)
+            results[model._model_name] = model_result
         return PlaceboTestResult(effects=results)
diff --git a/tests/test_causal_validation/test_validation/test_placebo.py b/tests/test_causal_validation/test_validation/test_placebo.py
index 12557b0..92299f6 100644
--- a/tests/test_causal_validation/test_validation/test_placebo.py
+++ b/tests/test_causal_validation/test_validation/test_placebo.py
@@ -10,6 +10,7 @@
 import numpy as np
 import pandas as pd
 import pytest
+from rich.table import Table
 
 from causal_validation.models import AZCausalWrapper
 from causal_validation.testing import (
@@ -54,11 +55,37 @@ def test_placebo_test(
 
     # Check that the structure of result
     assert isinstance(result, PlaceboTestResult)
-    assert len(result.effects) == n_control
+    for _, v in result.effects.items():
+        assert len(v) == n_control
 
     # Check the results are close to the true effect
-    summary = result.summary()
+    summary = result.to_df()
     PlaceboSchema.validate(summary)
     assert isinstance(summary, pd.DataFrame)
-    assert summary.shape == (1, 4)
+    assert summary.shape == (1, 5)
     assert summary["Effect"].iloc[0] == pytest.approx(0.0, abs=0.1)
+
+    rich_summary = result.summary()
+    assert isinstance(rich_summary, Table)
+    n_rows = result.summary().row_count
+    assert n_rows == summary.shape[0]
+
+
+@pytest.mark.parametrize("n_control", [9, 10])
+def test_multiple_models(n_control: int):
+    constants = TestConstants(N_CONTROL=n_control, GLOBAL_SCALE=0.001)
+    data = simulate_data(global_mean=20.0, seed=123, constants=constants)
+    trend_term = Trend(degree=1, coefficient=0.1)
+    data = trend_term(data)
+
+    model1 = AZCausalWrapper(DID())
+    model2 = AZCausalWrapper(SDID())
+    result = PlaceboTest([model1, model2], data).execute()
+
+    result_df = result.to_df()
+    result_rich = result.summary()
+    assert result_df.shape == (2, 5)
+    assert result_df.shape[0] == result_rich.row_count
+    assert result_df["Model"].tolist() == ["DID", "SDID"]
+    for _, v in result.effects.items():
+        assert len(v) == n_control

From 661e5d1c2fa0e02c279471e9b5de74db2d279c54 Mon Sep 17 00:00:00 2001
From: Thomas Pinder <pinthoma@amazon.com>
Date: Thu, 5 Sep 2024 17:44:17 +0000
Subject: [PATCH 2/3] Format

---
 examples/placebo_test.pct.py                | 4 ++--
 src/causal_validation/validation/placebo.py | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/examples/placebo_test.pct.py b/examples/placebo_test.pct.py
index cd55e93..34aa07c 100644
--- a/examples/placebo_test.pct.py
+++ b/examples/placebo_test.pct.py
@@ -7,7 +7,7 @@
 #       extension: .py
 #       format_name: percent
 #       format_version: '1.3'
-#       jupytext_version: 1.11.2
+#       jupytext_version: 1.16.4
 #   kernelspec:
 #     display_name: causal-validation
 #     language: python
@@ -15,7 +15,7 @@
 # ---
 
 # %% [markdown]
-# # Placebo Testing 
+# # Placebo Testing
 #
 # A placebo test is an approach to assess the validity of a causal model by checking if the effect can truly be attributed to the treatment, or to other spurious factors. A placebo test is conducted by iterating through the set of control units and at each iteration, replacing the treated unit by one of the control units and measuring the effect. If the model detects a significant effect, then it suggests potential bias or omitted variables in the analysis, indicating that the causal inference is flawed.
 #
diff --git a/src/causal_validation/validation/placebo.py b/src/causal_validation/validation/placebo.py
index 2328a25..a4e8962 100644
--- a/src/causal_validation/validation/placebo.py
+++ b/src/causal_validation/validation/placebo.py
@@ -9,10 +9,10 @@
     Column,
     DataFrameSchema,
 )
-from scipy.stats import ttest_1samp
-from tqdm import trange
 from rich import box
 from rich.table import Table
+from scipy.stats import ttest_1samp
+from tqdm import trange
 
 from causal_validation.data import Dataset
 from causal_validation.models import AZCausalWrapper

From f2dbf598df6a2bb76a902735826b18a5b58793c0 Mon Sep 17 00:00:00 2001
From: Thomas Pinder <pinthoma@amazon.com>
Date: Thu, 5 Sep 2024 17:45:55 +0000
Subject: [PATCH 3/3] Lint

---
 examples/basic.pct.py        |  2 +-
 examples/placebo_test.pct.py | 45 +++++++++++++++++++++++++++---------
 2 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/examples/basic.pct.py b/examples/basic.pct.py
index a7f566f..900ccf8 100644
--- a/examples/basic.pct.py
+++ b/examples/basic.pct.py
@@ -51,7 +51,7 @@
 scales = [0.1, 0.5]
 
 fig, axes = plt.subplots(ncols=2, nrows=2, figsize=(10, 6), tight_layout=True)
-for (m, s), ax in zip(product(means, scales), axes.ravel()):
+for (m, s), ax in zip(product(means, scales), axes.ravel(), strict=False):
     cfg = Config(
         n_control_units=10,
         n_pre_intervention_timepoints=60,
diff --git a/examples/placebo_test.pct.py b/examples/placebo_test.pct.py
index 34aa07c..30a8a46 100644
--- a/examples/placebo_test.pct.py
+++ b/examples/placebo_test.pct.py
@@ -17,9 +17,20 @@
 # %% [markdown]
 # # Placebo Testing
 #
-# A placebo test is an approach to assess the validity of a causal model by checking if the effect can truly be attributed to the treatment, or to other spurious factors. A placebo test is conducted by iterating through the set of control units and at each iteration, replacing the treated unit by one of the control units and measuring the effect. If the model detects a significant effect, then it suggests potential bias or omitted variables in the analysis, indicating that the causal inference is flawed.
+# A placebo test is an approach to assess the validity of a causal model by checking if
+# the effect can truly be attributed to the treatment, or to other spurious factors. A
+# placebo test is conducted by iterating through the set of control units and at each
+# iteration, replacing the treated unit by one of the control units and measuring the
+# effect. If the model detects a significant effect, then it suggests potential bias or
+# omitted variables in the analysis, indicating that the causal inference is flawed.
 #
-# A successful placebo test will show no statistically significant results and we may then conclude that the estimated effect can be attributed to the treatment and not driven by confounding factors. Conversely, a failed placebo test, which shows significant results, suggests that the identified treatment effect may not be reliable. Placebo testing is thus a critical step to ensure the robustness of findings in RCTs. In this notebook, we demonstrate how a placebo test can be conducted in `causal-validation`.
+# A successful placebo test will show no statistically significant results and we may
+# then conclude that the estimated effect can be attributed to the treatment and not
+# driven by confounding factors. Conversely, a failed placebo test, which shows
+# significant results, suggests that the identified treatment effect may not be
+# reliable. Placebo testing is thus a critical step to ensure the robustness of findings
+# in RCTs. In this notebook, we demonstrate how a placebo test can be conducted in
+# `causal-validation`.
 
 # %%
 from azcausal.core.error import JackKnife
@@ -33,16 +44,14 @@
 from causal_validation.effects import StaticEffect
 from causal_validation.models import AZCausalWrapper
 from causal_validation.plotters import plot
-from causal_validation.transforms import (
-    Periodic,
-    Trend,
-)
 from causal_validation.validation.placebo import PlaceboTest
 
 # %% [markdown]
 # ## Data simulation
 #
-# To demonstrate a placebo test, we must first simulate some data. For the purposes of illustration, we'll simulate a very simple dataset containing 10 control units where each unit has 60 pre-intervention observations, and 30 post-intervention observations.
+# To demonstrate a placebo test, we must first simulate some data. For the purposes of
+# illustration, we'll simulate a very simple dataset containing 10 control units where
+# each unit has 60 pre-intervention observations, and 30 post-intervention observations.
 
 # %%
 cfg = Config(
@@ -60,7 +69,10 @@
 # %% [markdown]
 # ## Model
 #
-# We'll now define our model. To do this, we'll use the synthetic difference-in-differences implementation of AZCausal. This implementation, along with any other model from AZCausal, can be neatly wrapped up in our `AZCausalWrapper` to make fitting and effect estimation simpler.
+# We'll now define our model. To do this, we'll use the synthetic
+# difference-in-differences implementation of AZCausal. This implementation, along with
+# any other model from AZCausal, can be neatly wrapped up in our `AZCausalWrapper` to
+# make fitting and effect estimation simpler.
 
 # %%
 model = AZCausalWrapper(model=SDID(), error_estimator=JackKnife())
@@ -68,9 +80,17 @@
 # %% [markdown]
 # ## Placebo Test Results
 #
-# Now that we have a dataset and model defined, we may conduct our placebo test. With 10 control units, the test will estimate 10 individual effects; 1 per control unit when it is mocked as the treated group. With those 10 effects, the routine will then produce the mean estimated effect, along with the standard deviation across the estimated effect, the effect's standard error, and the p-value that corresponds to the null-hypothesis test that the effect is 0.
+# Now that we have a dataset and model defined, we may conduct our placebo test. With 10
+# control units, the test will estimate 10 individual effects; 1 per control unit when
+# it is mocked as the treated group. With those 10 effects, the routine will then
+# produce the mean estimated effect, along with the standard deviation across the
+# estimated effect, the effect's standard error, and the p-value that corresponds to the
+# null-hypothesis test that the effect is 0.
 #
-# In the below, we see that expected estimated effect is small at just 0.08. Accordingly, the p-value attains a value of 0.5, indicating that we have insufficient evidence to reject the null hypothesis and we, therefore, have no evidence to suggest that there is bias within this particular setup.
+# In the below, we see that expected estimated effect is small at just 0.08.
+# Accordingly, the p-value attains a value of 0.5, indicating that we have insufficient
+# evidence to reject the null hypothesis and we, therefore, have no evidence to suggest
+# that there is bias within this particular setup.
 
 # %%
 result = PlaceboTest(model, data).execute()
@@ -79,7 +99,10 @@
 # %% [markdown]
 # ## Model Comparison
 #
-# We can also use the results of a placebo test to compare two or more models. Using `causal-validation`, this is as simple as supplying a series of models to the placebo test and comparing their outputs. To demonstrate this, we will compare the previously used synthetic difference-in-differences model with regular difference-in-differences.
+# We can also use the results of a placebo test to compare two or more models. Using
+# `causal-validation`, this is as simple as supplying a series of models to the placebo
+# test and comparing their outputs. To demonstrate this, we will compare the previously
+# used synthetic difference-in-differences model with regular difference-in-differences.
 
 # %%
 did_model = AZCausalWrapper(model=DID())