Context
At the 2026-04-21 meeting with Lars Vilhuber (transcript lines 65-75), the topic of local-copy and caching guidance for researchers using the Python package came up as a piece of reproducibility hygiene distinct from TRACE. Lars's framing: researchers running a simulation on their laptop should be able to preserve exactly what they ran (model + data + reform + environment), so that if they come back to it six months later, they can still reproduce their own work.
This is the non-TRACE version-identification workstream Casper spoke about separately (transcript 415-417): TRACE is for citations a reader cannot rerun; local version-identification is for the researcher themselves.
Policyengine-app#2832 implements the webapp-side version badge. This issue is the Python-package-side equivalent: help a researcher running policyengine locally keep a reproducible record of each run.
What to build
-
A policyengine CLI command or helper that snapshots everything needed to reproduce a specific local run to a single directory:
- Pinned package versions (
pip freeze subset for pe.py + country + country-data)
- The reform JSON (if any)
- The h5 content hash (already in the release manifest)
- The simulation output (results + optional per-household frame)
- A short README documenting how to reproduce with the exact install line
-
Documentation in household-api-docs showing researchers how to use this — distinct from the TRACE emission flow. The distinction matters because TRACE targets citation durability; local snapshots target "can I get back to my own work?"
-
Default-on behavior for anyone using policyengine.calculate_household or policyengine.simulate via the Python API. A subdirectory under the working directory should be created automatically unless the user opts out. The cost of an extra megabyte of disk is worth the reproducibility gain.
Non-goals
- Not TRACE. Local snapshots are not signed, not institutionally attested, not meant to serve as paper citations. They are researcher-local cache.
- Not preservation-grade storage. Researchers responsible for their own backups.
Related
Context
At the 2026-04-21 meeting with Lars Vilhuber (transcript lines 65-75), the topic of local-copy and caching guidance for researchers using the Python package came up as a piece of reproducibility hygiene distinct from TRACE. Lars's framing: researchers running a simulation on their laptop should be able to preserve exactly what they ran (model + data + reform + environment), so that if they come back to it six months later, they can still reproduce their own work.
This is the non-TRACE version-identification workstream Casper spoke about separately (transcript 415-417): TRACE is for citations a reader cannot rerun; local version-identification is for the researcher themselves.
Policyengine-app#2832 implements the webapp-side version badge. This issue is the Python-package-side equivalent: help a researcher running
policyenginelocally keep a reproducible record of each run.What to build
A
policyengineCLI command or helper that snapshots everything needed to reproduce a specific local run to a single directory:pip freezesubset for pe.py + country + country-data)Documentation in
household-api-docsshowing researchers how to use this — distinct from the TRACE emission flow. The distinction matters because TRACE targets citation durability; local snapshots target "can I get back to my own work?"Default-on behavior for anyone using
policyengine.calculate_householdorpolicyengine.simulatevia the Python API. A subdirectory under the working directory should be created automatically unless the user opts out. The cost of an extra megabyte of disk is worth the reproducibility gain.Non-goals
Related
/tmp/aea-review/transcript.txtdocs/trace-case-study.md(PR Add TRACE case study writeup for AEA / TRACE grant team #315 — discusses this as adjacent to but not replaced by TRACE)