Merged
Conversation
MaxGhenis
added a commit
to PavelMakarchuk/policyengine.py
that referenced
this pull request
Apr 18, 2026
Builds on PolicyEngine#274's bundle-level TRO and closes the gaps that would surface at an AEA replication review: - schema:creator is now a schema.org Organization, not a version string - model wheel is hashed as a fourth composition artifact (read from the manifest when present, fetched from the PyPI JSON API otherwise and degrades silently when unreachable) - every trov:path resolves over HTTPS (Hugging Face resolve URLs, PyPI download URL) so a reviewer can dereference the TRO without custom clients - certification metadata moves from prose in schema:description to structured pe:* fields on TrustedResearchPerformance (pe:certifiedForModelVersion, pe:compatibilityBasis, pe:builtWithModelVersion, pe:dataBuildFingerprint, pe:dataBuildId) - GitHub Actions runs add pe:ciRunUrl / pe:ciGitSha attestation - JSON Schema ships at data/schemas/trace_tro.schema.json and every generated TRO is validated against it in tests Adds the per-simulation layer that the bundle-level TRO doesn't cover: - build_simulation_trace_tro chains a bundle TRO to a reform + results - policyengine.results.build_results_trace_tro / write_results_with_trace_tro emit a TRO alongside a ResultsJson payload Wiring: - policyengine trace-tro CLI (plus release-manifest subcommand) - TaxBenefitModelVersion.trace_tro property and the build_trace_tro_from_release_bundle / compute_trace_composition_fingerprint / serialize_trace_tro / extract_bundle_tro_reference / build_simulation_trace_tro re-exports from policyengine.core that were dropped when PolicyEngine#276 merged - scripts/generate_trace_tros.py regenerates bundled TROs before a policyengine.py release - jsonschema added to dev dependencies Restores the TRACE TRO tests that PolicyEngine#276 removed as part of the test_release_manifests.py rewrite, now isolated in tests/test_trace_tro.py with coverage for determinism, schema conformance, CI attestation, and per-simulation chaining. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MaxGhenis
added a commit
that referenced
this pull request
Apr 18, 2026
* Add results.json schema validation and source tracking New `policyengine.results` module with two pieces: - `schema.py`: Pydantic models (ResultsJson, ValueEntry, TableEntry, ChartEntry) that validate results.json at generation time. Catches missing source_line/source_url, row/column mismatches in tables, and vague alt text on charts before they reach the blog build step. - `tracking.py`: `tracked_value()` helper that captures the caller's line number via `inspect` and builds the source_url automatically. Eliminates repetitive inspect.currentframe() boilerplate in analysis scripts. These support the blog post content pipeline where every number in a published post links back to the exact line of analysis code that produced it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix write() double-serialization and add parent dir creation - Use model_dump(mode="json") instead of json.loads(model_dump_json()) to avoid unnecessary serialize→parse→serialize round-trip - Create parent directories automatically so callers don't need to mkdir first - Add trailing newline to output file - Add test for nested directory creation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Harden TRACE TRO export and add per-simulation TROs Builds on #274's bundle-level TRO and closes the gaps that would surface at an AEA replication review: - schema:creator is now a schema.org Organization, not a version string - model wheel is hashed as a fourth composition artifact (read from the manifest when present, fetched from the PyPI JSON API otherwise and degrades silently when unreachable) - every trov:path resolves over HTTPS (Hugging Face resolve URLs, PyPI download URL) so a reviewer can dereference the TRO without custom clients - certification metadata moves from prose in schema:description to structured pe:* fields on TrustedResearchPerformance (pe:certifiedForModelVersion, pe:compatibilityBasis, pe:builtWithModelVersion, pe:dataBuildFingerprint, pe:dataBuildId) - GitHub Actions runs add pe:ciRunUrl / pe:ciGitSha attestation - JSON Schema ships at data/schemas/trace_tro.schema.json and every generated TRO is validated against it in tests Adds the per-simulation layer that the bundle-level TRO doesn't cover: - build_simulation_trace_tro chains a bundle TRO to a reform + results - policyengine.results.build_results_trace_tro / write_results_with_trace_tro emit a TRO alongside a ResultsJson payload Wiring: - policyengine trace-tro CLI (plus release-manifest subcommand) - TaxBenefitModelVersion.trace_tro property and the build_trace_tro_from_release_bundle / compute_trace_composition_fingerprint / serialize_trace_tro / extract_bundle_tro_reference / build_simulation_trace_tro re-exports from policyengine.core that were dropped when #276 merged - scripts/generate_trace_tros.py regenerates bundled TROs before a policyengine.py release - jsonschema added to dev dependencies Restores the TRACE TRO tests that #276 removed as part of the test_release_manifests.py rewrite, now isolated in tests/test_trace_tro.py with coverage for determinism, schema conformance, CI attestation, and per-simulation chaining. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Make results schema Python 3.9-compatible Switches PEP 604 `X | None` unions in `ResultsMetadata` and `ResultsJson.write` to `Optional[X]` / `Union[X, Y]`, matching the project-wide pattern enforced for the 3.9 floor (ruff `UP007` is disabled for the same reason in `pyproject.toml`). Without this fix the `content-pipeline-results` branch fails `ResultsMetadata` class construction on Python 3.9 with `TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Apply ruff format to results module Collapses string concatenations that the ruff 0.15.11 formatter in CI wants unified onto single lines. No behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Conform TRACE TRO to public TROv vocabulary; address reviewer findings Round two of reviewer fixes. The published TRACE/TROv reference demos use a different vocabulary than the draft this module was originally written against; reviewers caught that our emission would not validate against real TROv SHACL shapes. TROv vocabulary conformance: - Switch to the public namespace https://w3id.org/trace/2023/05/trov# - Flatten the locally-invented trov:hash / trov:hashAlgorithm / trov:hashValue wrapper to the vocabulary-native trov:sha256 property - Rename trov:path -> trov:hasLocation on ArtifactLocation - Rename the inverse pointer to trov:hasArtifact (was trov:artifact) - Correct TrustedResearchSystem -> TransparentResearchSystem - Correct TrustedResearchPerformance -> TransparentResearchPerformance - Drop the locally-invented ArrangementBinding chain; use the vocabulary-native trov:accessedArrangement on the TRP instead - Emit @type as a single string (not a 2-element array), matching the published trov-demos reference shape Hardening from reproducibility + code-simplifier reviewers: - pe:emittedIn is always present ("local" or "github-actions") so a verifier can tell a CI-emitted TRO from a laptop rebuild without inferring from absent fields - Per-simulation TRO records pe:bundleTroUrl on the performance node; a verifier can fetch that URL, re-hash it, and confirm it matches the bundle_tro artifact hash - so swapping the caller's bundle_tro dict is detectable - Composition fingerprint joins hashes with \n to prevent hex-length concatenation collisions (sha256("ab" + "cdef") vs "abcd" + "ef") - CertifiedDataArtifact.sha256 is now authoritative when present; us.json ships the real dataset sha256, so bundle TRO emission no longer requires the data release manifest to carry it - JSON Schema rejects non-HTTPS trov:hasLocation values and requires canonical 64-hex sha256 strings - Inline the real policyengine-us 1.647.0 / policyengine-uk 2.88.0 wheel sha256 + URL on us.json/uk.json Extracted shared helpers to collapse the ~120-line duplication between build_trace_tro_from_release_bundle and build_simulation_trace_tro (_assemble_composition_and_arrangement, _assemble_tro_node, _policyengine_trs, _build_bundle_performance). Removed dead code flagged by simplifier: - DataReleaseArtifact.https_uri (zero callers, zero tests) - _data_release_manifest_url (replaced by https_release_manifest_uri) - Prose certification_description_parts (metadata is now purely in pe:* structured fields, as the commit message for #274 originally claimed) CLI + release workflow: - Dropped the broken --offline flag (never had a working code path) - Added policyengine trace-tro-validate <path> subcommand that validates a TRO against the shipped JSON Schema - Versioning CI job now runs scripts/generate_trace_tros.py and commits the generated bundled TROs alongside the changelog, so every released wheel ships with its matching TRACE TRO - generate_trace_tros.py skips (with warning) countries whose data release manifest is unreachable instead of hard-failing Tests (34 total in tests/test_trace_tro.py, replacing the prior 20): - Real determinism: build TRO from two fresh manifest instances, assert bytes equal (previously tested only that json.dumps is deterministic) - Forgery detection: swap bundle_tro, assert hash in sim TRO changes - Schema rejects file:// locations - Schema rejects missing pe:emittedIn - Hex-length ambiguity test for the fingerprint separator - All 4 TROv property renames have explicit assertions so a future regression to the wrong names fails loudly - trace-tro-validate CLI accepts valid TROs and rejects invalid ones Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Close reviewer round-2 gaps: forgery anchor, schema regex, dead kwargs Reviewers came back accept / clean / minor revisions; this commit picks up the remaining suggestions. Forgery resistance: - Bundle TRO takes an optional self_url and records it as pe:selfUrl so a verifier with only the bundle bytes can discover the canonical location it was published at. - write_results_with_trace_tro now requires bundle_tro_url, not merely accepts it. A published simulation TRO that omitted the URL would leave reviewers without a pinned fetch target; raising when the caller forgets matches the "adversarial reviewer" expectation. - docs/release-bundles.md shows the three-step verifier workflow a replication reviewer should run: fetch pe:bundleTroUrl, recompute its sha256, compare to the sim TRO's bundle_tro artifact hash, confirm pe:bundleFingerprint matches the bundle's own CompositionFingerprint. A sim TRO with a swapped bundle_tro dict but a truthful URL fails step 2; both-swapped fails step 3. CI regression guard: - scripts/generate_trace_tros.py now exits non-zero if a country that previously shipped a .trace.tro.jsonld fails to regenerate (e.g. HUGGING_FACE_TOKEN expired). The Versioning CI job will block a release rather than silently ship a stale TRO. Schema tightening: - trov:hasLocation regex now anchors end-of-string on every legal local path and restricts data/ to data/release_manifests/<country>. data/../../etc/passwd and bundle.trace.tro.jsonld.evil no longer pass. HTTPS locations must contain no whitespace. - Added a test covering the multi-node @graph path after filter fix. extract_bundle_tro_reference filter: - Locates the trov:TransparentResearchObject node by @type rather than trusting @graph[0]. Future TROs that embed TRS/TSA nodes no longer break reference extraction. Dead-kwarg cleanup (simplifier): - Dropped emission_context kwarg from both public builders; tests use monkeypatch on GITHUB_ACTIONS/GITHUB_SHA instead, which is closer to what CI does anyway. - Dropped tro_id / composition_id / arrangement_id default kwargs from the helpers; hardcoded as module constants. - Dropped the bundle_tro_path branch from write_results_with_trace_tro — no caller, no test, no actual use case. Tests (38 total in test_trace_tro.py): - test__given_fixed_ci_env__then_tro_bytes_match_across_builds locks down determinism under CI with pinned run_id/git_sha - test__given_self_url__then_tro_records_it covers pe:selfUrl - test__given_graph_with_multiple_nodes__then_extract_finds_tro exercises the @type filter - test__given_write_helper_without_url__then_raises locks the required-kwarg contract Docstring caveat on build_trace_tro_from_release_bundle now states explicitly that pe:compatibilityBasis covers the model and data layers only; Python version, OS, and transitive lockfile are not yet pinned. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Unify emission-context plumbing; drop redundant bundle_tro_location Last two simplifier nits from round 3: - _build_bundle_performance no longer takes emission_context as a kwarg; like the sim builder, it calls _emission_context() inline at the end of performance construction. One fewer parameter, same ordering behaviour, matches the sim-side pattern. - write_results_with_trace_tro no longer passes the URL to both bundle_tro_location and bundle_tro_url; the build_simulation_trace_tro fallback (bundle_tro_location or bundle_tro_url or <default>) picks the URL up on its own. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Max Ghenis <mghenis@gmail.com> Co-authored-by: Max Ghenis <max@policyengine.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
build_metadataTesting
PYTHONPATH=src /Users/maxghenis/PolicyEngine/policyengine.py/.venv/bin/python -m pytest tests/test_release_manifests.py tests/test_models.py tests/test_inequality.py tests/test_us_regions.py tests/test_uk_regions.py -qPYTHONPATH=src /Users/maxghenis/PolicyEngine/policyengine.py/.venv/bin/python - <<'PY' ... us_ms(); uk_ms(); ...smoke run produced matching managed US/UK baseline metrics with the certified datasets