Skip to content

Add results.json schema validation and source tracking#237

Merged
MaxGhenis merged 9 commits intoPolicyEngine:mainfrom
PavelMakarchuk:content-pipeline-results
Apr 18, 2026
Merged

Add results.json schema validation and source tracking#237
MaxGhenis merged 9 commits intoPolicyEngine:mainfrom
PavelMakarchuk:content-pipeline-results

Conversation

@PavelMakarchuk
Copy link
Copy Markdown
Collaborator

Summary

Adds a policyengine.results module with two small pieces that support the blog post content pipeline:

  • Schema validation (schema.py): Pydantic models that validate results.json at generation time
  • Source tracking (tracking.py): Helper that auto-captures line numbers for traceability

What it does

Schema validation

from policyengine.results import ResultsJson, ResultsMetadata, ValueEntry

results = ResultsJson(
    metadata=ResultsMetadata(title="SALT Cap Repeal", repo="PolicyEngine/analyses"),
    values={
        "budget_impact": ValueEntry(
            value=-15.2e9,
            display="$15.2 billion",
            source_line=47,
            source_url="https://github.com/.../analysis.py#L47",
        ),
    },
)
results.write("results.json")

Catches errors at generation time:

  • Missing source_line or source_url on any value
  • Table rows with wrong column count
  • Chart alt text that's too short (< 20 chars)
  • Missing required metadata fields

Source tracking

from policyengine.results import tracked_value

budget = reform_revenue - baseline_revenue
results["values"]["budget_impact"] = tracked_value(
    value=budget,
    display=f"${abs(budget)/1e9:.1f} billion",
    repo="PolicyEngine/analyses",
)
# Automatically captures source_line and builds source_url

Why

Every number in a PolicyEngine blog post should link to the exact line of code that produced it. This module ensures the results.json contract is valid before it reaches the resolve-posts build step.

Test plan

  • 11 unit tests passing (schema validation, source tracking, edge cases)
  • CI passes

🤖 Generated with Claude Code

PavelMakarchuk and others added 9 commits February 23, 2026 19:46
New `policyengine.results` module with two pieces:

- `schema.py`: Pydantic models (ResultsJson, ValueEntry, TableEntry,
  ChartEntry) that validate results.json at generation time. Catches
  missing source_line/source_url, row/column mismatches in tables, and
  vague alt text on charts before they reach the blog build step.

- `tracking.py`: `tracked_value()` helper that captures the caller's
  line number via `inspect` and builds the source_url automatically.
  Eliminates repetitive inspect.currentframe() boilerplate in analysis
  scripts.

These support the blog post content pipeline where every number in a
published post links back to the exact line of analysis code that
produced it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use model_dump(mode="json") instead of json.loads(model_dump_json())
  to avoid unnecessary serialize→parse→serialize round-trip
- Create parent directories automatically so callers don't need
  to mkdir first
- Add trailing newline to output file
- Add test for nested directory creation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Builds on PolicyEngine#274's bundle-level TRO and closes the gaps that would surface
at an AEA replication review:

- schema:creator is now a schema.org Organization, not a version string
- model wheel is hashed as a fourth composition artifact (read from the
  manifest when present, fetched from the PyPI JSON API otherwise and
  degrades silently when unreachable)
- every trov:path resolves over HTTPS (Hugging Face resolve URLs, PyPI
  download URL) so a reviewer can dereference the TRO without custom
  clients
- certification metadata moves from prose in schema:description to
  structured pe:* fields on TrustedResearchPerformance
  (pe:certifiedForModelVersion, pe:compatibilityBasis,
  pe:builtWithModelVersion, pe:dataBuildFingerprint, pe:dataBuildId)
- GitHub Actions runs add pe:ciRunUrl / pe:ciGitSha attestation
- JSON Schema ships at data/schemas/trace_tro.schema.json and every
  generated TRO is validated against it in tests

Adds the per-simulation layer that the bundle-level TRO doesn't cover:

- build_simulation_trace_tro chains a bundle TRO to a reform + results
- policyengine.results.build_results_trace_tro /
  write_results_with_trace_tro emit a TRO alongside a ResultsJson
  payload

Wiring:

- policyengine trace-tro CLI (plus release-manifest subcommand)
- TaxBenefitModelVersion.trace_tro property and the
  build_trace_tro_from_release_bundle / compute_trace_composition_fingerprint /
  serialize_trace_tro / extract_bundle_tro_reference /
  build_simulation_trace_tro re-exports from policyengine.core that
  were dropped when PolicyEngine#276 merged
- scripts/generate_trace_tros.py regenerates bundled TROs before a
  policyengine.py release
- jsonschema added to dev dependencies

Restores the TRACE TRO tests that PolicyEngine#276 removed as part of the
test_release_manifests.py rewrite, now isolated in tests/test_trace_tro.py
with coverage for determinism, schema conformance, CI attestation, and
per-simulation chaining.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switches PEP 604 `X | None` unions in `ResultsMetadata` and
`ResultsJson.write` to `Optional[X]` / `Union[X, Y]`, matching the
project-wide pattern enforced for the 3.9 floor (ruff `UP007` is
disabled for the same reason in `pyproject.toml`).

Without this fix the `content-pipeline-results` branch fails
`ResultsMetadata` class construction on Python 3.9 with
`TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collapses string concatenations that the ruff 0.15.11 formatter in CI
wants unified onto single lines. No behaviour change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round two of reviewer fixes. The published TRACE/TROv reference demos
use a different vocabulary than the draft this module was originally
written against; reviewers caught that our emission would not validate
against real TROv SHACL shapes.

TROv vocabulary conformance:
- Switch to the public namespace https://w3id.org/trace/2023/05/trov#
- Flatten the locally-invented trov:hash / trov:hashAlgorithm /
  trov:hashValue wrapper to the vocabulary-native trov:sha256 property
- Rename trov:path -> trov:hasLocation on ArtifactLocation
- Rename the inverse pointer to trov:hasArtifact (was trov:artifact)
- Correct TrustedResearchSystem -> TransparentResearchSystem
- Correct TrustedResearchPerformance -> TransparentResearchPerformance
- Drop the locally-invented ArrangementBinding chain; use the
  vocabulary-native trov:accessedArrangement on the TRP instead
- Emit @type as a single string (not a 2-element array), matching the
  published trov-demos reference shape

Hardening from reproducibility + code-simplifier reviewers:
- pe:emittedIn is always present ("local" or "github-actions") so a
  verifier can tell a CI-emitted TRO from a laptop rebuild without
  inferring from absent fields
- Per-simulation TRO records pe:bundleTroUrl on the performance node;
  a verifier can fetch that URL, re-hash it, and confirm it matches the
  bundle_tro artifact hash - so swapping the caller's bundle_tro dict
  is detectable
- Composition fingerprint joins hashes with \n to prevent hex-length
  concatenation collisions (sha256("ab" + "cdef") vs "abcd" + "ef")
- CertifiedDataArtifact.sha256 is now authoritative when present;
  us.json ships the real dataset sha256, so bundle TRO emission no
  longer requires the data release manifest to carry it
- JSON Schema rejects non-HTTPS trov:hasLocation values and requires
  canonical 64-hex sha256 strings
- Inline the real policyengine-us 1.647.0 / policyengine-uk 2.88.0
  wheel sha256 + URL on us.json/uk.json

Extracted shared helpers to collapse the ~120-line duplication between
build_trace_tro_from_release_bundle and build_simulation_trace_tro
(_assemble_composition_and_arrangement, _assemble_tro_node,
_policyengine_trs, _build_bundle_performance).

Removed dead code flagged by simplifier:
- DataReleaseArtifact.https_uri (zero callers, zero tests)
- _data_release_manifest_url (replaced by https_release_manifest_uri)
- Prose certification_description_parts (metadata is now purely in pe:*
  structured fields, as the commit message for PolicyEngine#274 originally claimed)

CLI + release workflow:
- Dropped the broken --offline flag (never had a working code path)
- Added policyengine trace-tro-validate <path> subcommand that
  validates a TRO against the shipped JSON Schema
- Versioning CI job now runs scripts/generate_trace_tros.py and
  commits the generated bundled TROs alongside the changelog, so every
  released wheel ships with its matching TRACE TRO
- generate_trace_tros.py skips (with warning) countries whose data
  release manifest is unreachable instead of hard-failing

Tests (34 total in tests/test_trace_tro.py, replacing the prior 20):
- Real determinism: build TRO from two fresh manifest instances,
  assert bytes equal (previously tested only that json.dumps is
  deterministic)
- Forgery detection: swap bundle_tro, assert hash in sim TRO changes
- Schema rejects file:// locations
- Schema rejects missing pe:emittedIn
- Hex-length ambiguity test for the fingerprint separator
- All 4 TROv property renames have explicit assertions so a future
  regression to the wrong names fails loudly
- trace-tro-validate CLI accepts valid TROs and rejects invalid ones

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewers came back accept / clean / minor revisions; this commit
picks up the remaining suggestions.

Forgery resistance:
- Bundle TRO takes an optional self_url and records it as pe:selfUrl
  so a verifier with only the bundle bytes can discover the canonical
  location it was published at.
- write_results_with_trace_tro now requires bundle_tro_url, not
  merely accepts it. A published simulation TRO that omitted the URL
  would leave reviewers without a pinned fetch target; raising when
  the caller forgets matches the "adversarial reviewer" expectation.
- docs/release-bundles.md shows the three-step verifier workflow a
  replication reviewer should run: fetch pe:bundleTroUrl, recompute
  its sha256, compare to the sim TRO's bundle_tro artifact hash,
  confirm pe:bundleFingerprint matches the bundle's own
  CompositionFingerprint. A sim TRO with a swapped bundle_tro dict
  but a truthful URL fails step 2; both-swapped fails step 3.

CI regression guard:
- scripts/generate_trace_tros.py now exits non-zero if a country that
  previously shipped a .trace.tro.jsonld fails to regenerate (e.g.
  HUGGING_FACE_TOKEN expired). The Versioning CI job will block a
  release rather than silently ship a stale TRO.

Schema tightening:
- trov:hasLocation regex now anchors end-of-string on every legal
  local path and restricts data/ to data/release_manifests/<country>.
  data/../../etc/passwd and bundle.trace.tro.jsonld.evil no longer
  pass. HTTPS locations must contain no whitespace.
- Added a test covering the multi-node @graph path after filter fix.

extract_bundle_tro_reference filter:
- Locates the trov:TransparentResearchObject node by @type rather
  than trusting @graph[0]. Future TROs that embed TRS/TSA nodes no
  longer break reference extraction.

Dead-kwarg cleanup (simplifier):
- Dropped emission_context kwarg from both public builders; tests
  use monkeypatch on GITHUB_ACTIONS/GITHUB_SHA instead, which is
  closer to what CI does anyway.
- Dropped tro_id / composition_id / arrangement_id default kwargs
  from the helpers; hardcoded as module constants.
- Dropped the bundle_tro_path branch from write_results_with_trace_tro
  — no caller, no test, no actual use case.

Tests (38 total in test_trace_tro.py):
- test__given_fixed_ci_env__then_tro_bytes_match_across_builds locks
  down determinism under CI with pinned run_id/git_sha
- test__given_self_url__then_tro_records_it covers pe:selfUrl
- test__given_graph_with_multiple_nodes__then_extract_finds_tro
  exercises the @type filter
- test__given_write_helper_without_url__then_raises locks the
  required-kwarg contract

Docstring caveat on build_trace_tro_from_release_bundle now states
explicitly that pe:compatibilityBasis covers the model and data
layers only; Python version, OS, and transitive lockfile are not
yet pinned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Last two simplifier nits from round 3:

- _build_bundle_performance no longer takes emission_context as a
  kwarg; like the sim builder, it calls _emission_context() inline
  at the end of performance construction. One fewer parameter, same
  ordering behaviour, matches the sim-side pattern.
- write_results_with_trace_tro no longer passes the URL to both
  bundle_tro_location and bundle_tro_url; the build_simulation_trace_tro
  fallback (bundle_tro_location or bundle_tro_url or <default>) picks
  the URL up on its own.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MaxGhenis MaxGhenis marked this pull request as ready for review April 18, 2026 15:13
@MaxGhenis MaxGhenis merged commit 26c372c into PolicyEngine:main Apr 18, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants