Skip to content

Local reproducibility snapshots for researchers using the Python package #320

@MaxGhenis

Description

@MaxGhenis

Context

At the 2026-04-21 meeting with Lars Vilhuber (transcript lines 65-75), the topic of local-copy and caching guidance for researchers using the Python package came up as a piece of reproducibility hygiene distinct from TRACE. Lars's framing: researchers running a simulation on their laptop should be able to preserve exactly what they ran (model + data + reform + environment), so that if they come back to it six months later, they can still reproduce their own work.

This is the non-TRACE version-identification workstream Casper spoke about separately (transcript 415-417): TRACE is for citations a reader cannot rerun; local version-identification is for the researcher themselves.

Policyengine-app#2832 implements the webapp-side version badge. This issue is the Python-package-side equivalent: help a researcher running policyengine locally keep a reproducible record of each run.

What to build

  1. A policyengine CLI command or helper that snapshots everything needed to reproduce a specific local run to a single directory:

    • Pinned package versions (pip freeze subset for pe.py + country + country-data)
    • The reform JSON (if any)
    • The h5 content hash (already in the release manifest)
    • The simulation output (results + optional per-household frame)
    • A short README documenting how to reproduce with the exact install line
  2. Documentation in household-api-docs showing researchers how to use this — distinct from the TRACE emission flow. The distinction matters because TRACE targets citation durability; local snapshots target "can I get back to my own work?"

  3. Default-on behavior for anyone using policyengine.calculate_household or policyengine.simulate via the Python API. A subdirectory under the working directory should be created automatically unless the user opts out. The cost of an extra megabyte of disk is worth the reproducibility gain.

Non-goals

  • Not TRACE. Local snapshots are not signed, not institutionally attested, not meant to serve as paper citations. They are researcher-local cache.
  • Not preservation-grade storage. Researchers responsible for their own backups.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions