Skip to content

Deduplicate regime-aware donor fit frames#243

Merged
MaxGhenis merged 1 commit into
mainfrom
codex/regime-aware-deduplicate-fit-frame-20260606
Jun 6, 2026
Merged

Deduplicate regime-aware donor fit frames#243
MaxGhenis merged 1 commit into
mainfrom
codex/regime-aware-deduplicate-fit-frame-20260606

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

Fixes the remaining chained MicroImpute crash after #242 by normalizing duplicate DataFrame column labels at the RegimeAwareDonorImputer boundary. The Stage 05 proof resume still failed because Pandas can return duplicate target columns when the incoming fit frame itself has repeated labels, even after target vars are removed from base predictors.

Changes:

  • collapse duplicate fit-frame and condition-frame labels by keeping the first occurrence before selecting MicroImpute predictors/targets
  • generate from the deduplicated fitted target list instead of the raw target spec
  • add a regression with duplicate target labels in the fit frame and duplicate target names in the block spec

Verification

  • uv run ruff format src/microplex_us/pipelines/donor_imputers.py tests/pipelines/test_regime_aware_donor_imputer.py
  • uv run ruff check src/microplex_us/pipelines/donor_imputers.py tests/pipelines/test_regime_aware_donor_imputer.py
  • uv run --extra dev --extra policyengine python -m pytest -q tests/pipelines/test_regime_aware_donor_imputer.py -> 11 passed
  • uv run --extra dev --extra policyengine python -m pytest -q tests/pipelines/test_regime_aware_donor_imputer.py tests/test_variables.py tests/test_pe_source_impute_specs.py tests/test_pe_source_impute_engine.py tests/pipelines/test_zi_qrf_backend.py tests/pipelines/test_donor_imputer_negative_preservation.py tests/data_sources/test_scf_net_worth_components.py -k "imputer or regime_aware or columnwise or donor_imputation_block_specs or pe_source" -> 32 passed, 38 deselected

@MaxGhenis MaxGhenis merged commit db0540c into main Jun 6, 2026
5 checks passed
@MaxGhenis MaxGhenis deleted the codex/regime-aware-deduplicate-fit-frame-20260606 branch June 6, 2026 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant