Skip to content

Capture compute / runtime environment in TROs (container SHA, Python version, cloud region) #319

@MaxGhenis

Description

@MaxGhenis

Context

At the 2026-04-21 meeting with Tim Clark and Casper of the TRACE project, they explicitly flagged compute / runtime-environment capture as their next likely TROv vocabulary increment (transcript lines 503-523). Tim:

"I think Tim you sketched out an example of something that isn't yet formalized but it'll very likely be the next point release of trace to include one version of what computing architecture operating system software or whatever you can capture that goes into the trace with calendar isn't just an idiosyncratic add on."

For PolicyEngine, this matters because:

  • Stochastic imputation (QRF forests) is reproducible within a pinned numpy version but we have not guaranteed cross-numpy determinism. The TRO should record the exact Python / numpy / cloud-region combination that produced a given h5 or simulation result.
  • Modal-hosted builds run on specific GPU / compute pool configurations. The build-TROs we emit today (us-data PR #746) record pe:ciRunUrl and pe:ciGitSha but not the container image SHA, Python version, or cloud region at execution time.
  • Webapp-run TROs (api#3485) will face the same gap. A CI/deploy SHA documents how a container was built, not which container was running when a specific request was served.

What to build

  1. Extend pe: attestation fields to cover:

    • Container image SHA (not just build commit SHA).
    • Python version.
    • Relevant library versions with nondeterminism risk (numpy, scikit-learn, quantile-forest, huggingface_hub).
    • Cloud region / compute-pool identifier.
    • Instance / pod ID at execution time.
  2. Populate these at emission time both in us-data's build-TRO emission and in the webapp-run TRO emission scoped by api#3485.

  3. Offer the generalized subset upstream to TROv. Some of these fields (image SHA, Python version, region) are likely useful for any statistical-agency-style use case; others (quantile-forest version) are PolicyEngine-specific.

Non-goals

  • Not pinning every transitive Python dependency. TRACE has explicitly not built that in (transcript 399-403) and we should not either. Scope is to the nondeterminism-relevant subset.
  • Not blocking on TRACE formalizing runtime-environment fields — we use pe:* in the interim and migrate to TROv when ready.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions