Skip to content

Represent restricted-input provenance in TROs (UK FRS, IRS PUF) via external-DOI pinning #318

@MaxGhenis

Description

@MaxGhenis

Context

At the 2026-04-21 meeting with Lars Vilhuber, Tim Clark, and Casper of the TRACE project, the question of how to represent restricted-input provenance in a TRO surfaced explicitly (transcript lines 239-243 and 485-489). Lars's summary:

"Tim, correct me if I'm wrong, but I think we don't have it yet in the trace protocol that that's firmed up. We've talked about it. But that is where it becomes fuzzy again because now you're having to rely on external validation of these things and how to tie them together."

and later:

"in the UK example is a good example... You identify the inputs. You've pinned those inputs. Those are inputs. Again, for now based on checksums in the current trace way of doing it. We've talked about enhancements that point then to external UIs or things like that."

The specific TRACE feature they are discussing: pinning restricted inputs by external DOI + checksum rather than by redistributable content. For PolicyEngine:

  • UK FRS: UKDS-licensed, cannot redistribute; would pin by UKDS study number + checksum once the TRACE vocabulary supports it.
  • IRS-PUF: IRS-license-required; would pin by the IRS PUF identifier + checksum.

What we want to contribute

The PolicyEngine pipeline is the clearest use case for this TRACE feature — we regularly ingest restricted microdata, compute a calibrated derivative, and need the TRO that cites the derivative to trace back to the restricted input in a verifiable way without redistribution. Two things we can do:

  1. Document our exact requirements. What fields would a restricted-input external reference need? UKDS study number + version + SHA-256 of a reference extract? IRS PUF vintage year + SHA-256 of a canonical file hash? Write a short memo describing our two real cases so TRACE's vocabulary designers have concrete input.

  2. Prototype a pe: extension that represents restricted-input provenance in the interim (before TRACE formalizes it), and offer our pe:* fields to TROv as they generalize. This lets us emit meaningful TROs now and swap the namespace later.

Non-goals

  • Not proposing to redistribute IRS PUF or UKDS data under any guise.
  • Not proposing to wait on TRACE — the pe:* interim extension lets us ship today.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions