Skip to content

Commit 81efd2b

Browse files
igerberclaude
andauthored
docs(cs): record decision to keep cluster= (not deprecate); close TODO row (igerber#614)
Resolves the backlog item "decide whether to formally deprecate CallawaySantAnna.cluster=X in favor of survey_design=SurveyDesign(psu=X)" with a decision to KEEP cluster= as the canonical ergonomic single-level clustering kwarg, and removes the TODO row. Rationale (recorded in REGISTRY.md's CallawaySantAnna cluster-wiring section): cluster= matches the field's universal convention (R fixest cluster=~unit, Stata vce(cluster id), statsmodels cov_type="cluster") and is retained across all IF-based estimators (CS / EfficientDiD / ImputationDiD / TwoStageDiD). The cluster= -> SurveyDesign(psu=cluster) synthesis is an internal implementation detail, not user-facing redundancy; survey_design= is the advanced entry point (strata / FPC / replicate weights) while bare cluster= is the shorthand for the common single-level case. This mirrors the HAD survey-API consolidation, which deprecated only the redundant survey= / weights= entry points and deliberately kept cluster=. Docs-only: no source or behavior change. Claude-Session: https://claude.ai/code/session_01LHDijzf8zHXk5T8ahS2mKi Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent d7a6a53 commit 81efd2b

2 files changed

Lines changed: 2 additions & 1 deletion

File tree

TODO.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,6 @@ The `Origin` column (Actionable tables) and the `PR` column (Deferred tables) bo
3333
| `ImputationDiD` LOO conservative-variance refinement (BJS 2024 Supp. Appendix A.9) — a finite-sample improvement to the auxiliary-model residuals reducing overfit of `tau_tilde_g` to `epsilon`. Asymptotic Theorem-3 variance is implemented and matches R `didimputation` (which also omits LOO by default). | `imputation.py` | imputation-validation | Mid | Low |
3434
| `TwoWayFixedEffects(vcov_type in {hc2, hc2_bm})` with replicate-weight designs raises `NotImplementedError` (`twfe.py:~233`). The replicate path re-demeans per replicate, which doesn't compose with the full-dummy HC2/HC2-BM build — a correct impl needs per-replicate full-dummy refit. Workaround: `hc1` for replicate-weight CR1. | `twfe.py::fit` | follow-up | Heavy | Low |
3535
| TWFE's HC2/HC2-BM inline full-dummy build (`twfe.py:280-315`) duplicates the dummy-construction logic in `DifferenceInDifferences(fixed_effects=...)` (`estimators.py:478-486`). Extract a shared helper, or delegate TWFE's HC2/HC2-BM path to DiD's `fixed_effects=` branch (with TWFE-specific cluster-default threading), to reduce drift risk on FE naming / survey behavior / result-surface conventions. Substantive refactor — touches both estimators. | `twfe.py::fit`, `estimators.py::DifferenceInDifferences.fit` | follow-up | Heavy | Low |
36-
| Decide whether to formally deprecate `CallawaySantAnna.cluster=X` in favor of `survey_design=SurveyDesign(psu=X)` (the bare-cluster path already synthesizes a minimal SurveyDesign). Two equivalent paths = redundant surface. Mirrors the question for ImputationDiD / EfficientDiD / TwoStageDiD. | `staggered.py` | follow-up | Mid | Low |
3736

3837
### Performance
3938

docs/methodology/REGISTRY.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -474,6 +474,8 @@ Prior to the bare-`cluster=` wiring fix, `CallawaySantAnna(cluster="X")` was a s
474474

475475
The `cluster_name` and `n_clusters` fields on `CallawaySantAnnaResults` report the effective clustering level: `survey_design.psu` (canonical column) when explicit PSU is provided, `self.cluster` when bare cluster synthesizes or injects.
476476

477+
- **Note (API decision — `cluster=` retained, NOT deprecated):** the `cluster=` → `SurveyDesign(psu=cluster)` synthesis above is an internal implementation detail, not a user-facing redundancy to be consolidated away. `cluster=` is the **canonical ergonomic single-level clustering kwarg** and is intentionally retained on `CallawaySantAnna` (and the sibling IF-based estimators `EfficientDiD` / `ImputationDiD` / `TwoStageDiD`): it matches the field's universal convention (R `fixest::feols(..., cluster = ~unit)`, Stata `vce(cluster id)`, statsmodels `cov_type="cluster"`), so users reach for `cluster=` first. `survey_design=SurveyDesign(psu=X, ...)` is the **advanced** entry point (adds strata / FPC / replicate weights / explicit weights); a bare `cluster=` is the shorthand for the common "just cluster at X" case and would be strictly less ergonomic if forced through `survey_design=`. This mirrors the HAD survey-API consolidation, which deprecated only the *redundant* `survey=` / `weights=` entry points in favor of `survey_design=` while deliberately keeping `cluster=`. Decision recorded 2026-07-04: do not deprecate `cluster=`; the former "decide whether to deprecate `CallawaySantAnna.cluster=X`" `TODO.md` row is closed as resolved (keep).
478+
477479
*Estimator equation (as implemented):*
478480

479481
Group-time average treatment effect:

0 commit comments

Comments
 (0)