Skip to content

PE-US rebuild final artifact allocation fails when checkpoint paths are inside version directory #179

@anth-volk

Description

@anth-volk

Summary

A workstation PE-US rebuild smoke run completed donor integration and entropy calibration, saved both durable checkpoints, then failed during final artifact export because the checkpoint paths were placed under the same versioned output directory that final artifact allocation expects to create.

The CLI accepts explicit checkpoint save paths, but if those paths are inside --output-root/--version-id, they pre-create the version directory. Later _allocate_versioned_output_dir() raises FileExistsError instead of treating the directory as the active run's output directory.

Command shape

.venv/bin/python -m microplex_us.pipelines.pe_us_data_rebuild_checkpoint \
  --output-root artifacts/local_us_microplex_smoke \
  --version-id local-smoke-v1-entropy \
  --baseline-dataset /Users/administrator/Documents/PolicyEngine/policyengine-us-data/policyengine_us_data/storage/enhanced_cps_2024.h5 \
  --targets-db /Users/administrator/Documents/PolicyEngine/policyengine-us-data/policyengine_us_data/storage/calibration/policy_data.db \
  --policyengine-us-data-repo /Users/administrator/Documents/PolicyEngine/policyengine-us-data \
  --policyengine-us-data-python /Users/administrator/Documents/PolicyEngine/worktrees/microplex-us/fix-pe-rebuild-smoke-issues/.venv/bin/python \
  --calibration-backend entropy \
  --donor-imputer-backend zi_qrf \
  --policyengine-materialize-batch-size 100000 \
  --cps-sample-n 1000 \
  --puf-sample-n 1000 \
  --donor-sample-n 1000 \
  --n-synthetic 1000 \
  --no-include-acs \
  --defer-policyengine-harness \
  --defer-policyengine-native-score \
  --defer-native-audit \
  --defer-imputation-ablation \
  --pipeline-checkpoint-save-post-imputation-path artifacts/local_us_microplex_smoke/local-smoke-v1-entropy/checkpoints/post_imputation \
  --pipeline-checkpoint-save-post-microsim-path artifacts/local_us_microplex_smoke/local-smoke-v1-entropy/checkpoints/post_microsim

Progress before failure

The run reached:

US microplex build: post-imputation checkpoint saved [path=artifacts/local_us_microplex_smoke/local-smoke-v1-entropy/checkpoints/post_imputation]
US microplex build: policyengine calibration start [backend=entropy]
US microplex build: post-microsim checkpoint saved [path=artifacts/local_us_microplex_smoke/local-smoke-v1-entropy/checkpoints/post_microsim]
US microplex build: policyengine calibration complete [backend=entropy, calibrated_rows=2741]

Failure

File "/Users/administrator/Documents/PolicyEngine/worktrees/microplex-us/fix-pe-rebuild-smoke-issues/src/microplex_us/pipelines/artifacts.py", line 1834, in _allocate_versioned_output_dir
    raise FileExistsError(
        f"Versioned artifact directory already exists: {output_dir}"
    )
FileExistsError: Versioned artifact directory already exists: artifacts/local_us_microplex_smoke/local-smoke-v1-entropy

Expected behavior

One of these should happen:

  • the rebuild CLI should reject checkpoint save paths inside the final versioned artifact directory before starting the expensive run,
  • final artifact allocation should allow the active run's checkpoint-created version directory and write final outputs into it, or
  • checkpoint paths should default to a separate checkpoint root that cannot collide with versioned artifact allocation.

This failure happened after the expensive stages had already completed, so it is costly even for a sampled smoke run.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions