feat(profile): DCAT-AP v3 SHACL validator via pyshacl + multi-resource framework#3912
Merged
Conversation
…e framework Closes the final §6 follow-up from profile3-handoff.md. DCAT-AP v3 ships SHACL constraints upstream, not JSON Schema, so qsv's built-in JSON-Schema validator can't validate the emitted block. This commit generalizes the `validation.external` framework introduced in #3911 to support multi-file validators and wires up the canonical Python SHACL engine, `pyshacl`, as the DCAT-AP v3 reference validator. Framework generalization - New `ExternalValidatorResource` struct + `resources: Vec<...>` field on `ExternalValidator`. Each entry declares a logical embedded-resource name resolved against a compile-time `EMBEDDED_RESOURCES` table (`include_str!`-bundled), a token name used as `{<name>}` in args, and an optional file suffix. - New `EMBEDDED_RESOURCES` table in `src/cmd/profile/external_validate.rs` with one entry today: `dcat-ap-v3-shacl-shapes` mapping to the vendored Turtle file. Custom YAML profiles can reference any name listed there but cannot register new ones (qsv-release-time decision). - `resolve_args` / `substitute_tokens` rewritten for multi-token substitution. `{file}` keeps its "append if absent" fallback; named tokens (`{shapes}`, `{schema}`, etc.) substitute in place; unknown `{tokens}` pass through verbatim so a validator that genuinely wants `{foo}` in its CLI gets it. The OsString path-fidelity guarantee from #3911 is preserved. - Validation: reserved `name: "file"` and unknown `embedded` values are rejected at spawn time with `Severity::Required` warnings so a misconfigured profile fails fast. Vendored SHACL bundle - `resources/dcat-ap-v3/shacl/dcat-ap-SHACL.ttl` (176KB, 2321 lines): canonical SHACL shapes from SEMICeu DCAT-AP v3.0.0, fetched verbatim from https://github.com/SEMICeu/DCAT-AP/blob/master/releases/3.0.0/shacl/dcat-ap-SHACL.ttl - `resources/dcat-ap-v3/shacl/README.md`: source + re-vendor procedure, similar to the GSA bundle's documentation. Profile wiring - `resources/profiles/dcat-ap-v3.yaml` declares the pyshacl external validator with `args: [-s, {shapes}, -sf, turtle, -df, json-ld, -f, human, {file}]`, references the embedded SHACL bundle via `resources: [{name: shapes, embedded: dcat-ap-v3-shacl-shapes, suffix: .ttl}]`, and carries an `install_hint` so users see `pip install pyshacl` exactly when pyshacl is missing. Docs - `resources/profiles/README.md`: config table extended with the `resources` field; full DCAT-AP/pyshacl example added; instructions for adding new `EMBEDDED_RESOURCES` entries. Tests (+7 new in external_validate, embedded smoke extended) - resolve_args_substitutes_named_extras - embedded_resources_table_includes_dcat_ap_v3_shapes (sanity- checks the bundle is real Turtle, guarding against bad re-vendor steps) - lookup_embedded_returns_none_for_unknown - resource_with_reserved_name_file_is_rejected - resource_with_unknown_embedded_is_rejected - resource_tempfile_is_materialized_with_correct_suffix - embedded_dcat_ap_v3_parses_and_dry_compiles extended End-to-end verified on a host without pyshacl: profile run with `--validate-dcat --profile dcat-ap-v3` emits one Severity::Info warning carrying the install hint and the projection still ships. All 161 profile unit tests + 53 integration tests pass. Clippy + docs-drift-check clean. All four binaries build green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Up to standards ✅🟢 Issues
|
Contributor
There was a problem hiding this comment.
Pull request overview
Generalizes the validation.external framework introduced in #3911 to support multiple input files (beyond the implicit rendered JSON-LD), and wires DCAT-AP v3 to pyshacl using a vendored SEMICeu DCAT-AP 3.0.0 SHACL bundle bundled into the qsv binary via include_str!.
Changes:
- Add
ExternalValidatorResource+resources: Vec<...>onExternalValidator;resolve_args/substitute_tokensrewritten for multi-token substitution with reserved-name + unknown-embedded validation. - Vendor
resources/dcat-ap-v3/shacl/dcat-ap-SHACL.ttl(2321 lines) and register a singleEMBEDDED_RESOURCESentry; documented re-vendor procedure. - Wire
dcat-ap-v3.yaml'svalidation.externaltopyshaclwith{shapes}/{file}tokens; update README and unit/integration tests.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/cmd/profile/external_validate.rs | Adds EMBEDDED_RESOURCES table, resource validation, multi-token substitute_tokens, and tests. |
| src/cmd/profile/profile_spec.rs | Adds ExternalValidatorResource struct + resources field; extends embedded DCAT-AP v3 smoke test. |
| resources/profiles/dcat-ap-v3.yaml | Wires pyshacl with embedded SHACL shapes via {shapes} token. |
| resources/profiles/README.md | Documents resources field and DCAT-AP/pyshacl example. |
| resources/dcat-ap-v3/shacl/dcat-ap-SHACL.ttl | Vendored SEMICeu DCAT-AP v3.0.0 SHACL shapes (2321 lines). |
| resources/dcat-ap-v3/shacl/README.md | Documents source URL and re-vendoring procedure. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the final §6 follow-up from
profile3-handoff.md. DCAT-AP v3 ships SHACL constraints upstream (not JSON Schema), so qsv's built-in JSON-Schema validator can't validate emitted DCAT-AP v3 blocks. This PR generalizes thevalidation.externalframework from #3911 to support multi-file validators and wires uppyshaclas the canonical DCAT-AP v3 reference validator.Framework generalization
ExternalValidatorResourcestruct +resources: Vec<...>onExternalValidator. Each entry declares a logicalembeddedname resolved at spawn time againstEMBEDDED_RESOURCES(include_str!-bundled), plus a token name used as{<name>}in args.resolve_args/substitute_tokensrewritten for multi-token substitution.{file}keeps its "append if absent" fallback; named tokens ({shapes}, etc.) substitute in place; unknown{tokens}pass through verbatim. OsString path-fidelity from feat(profile): Croissant mlcroissant validator + generic external validator framework #3911 preserved.name: "file"and unknownembeddedvalues rejected withSeverity::Requiredwarnings so a misconfigured profile fails fast.Vendored SHACL bundle
resources/dcat-ap-v3/shacl/dcat-ap-SHACL.ttl(176KB, 2321 lines) — verbatim from SEMICeu DCAT-AP v3.0.0 at https://github.com/SEMICeu/DCAT-AP/blob/master/releases/3.0.0/shacl/dcat-ap-SHACL.ttlresources/dcat-ap-v3/shacl/README.md— source + re-vendor procedure.Profile wiring
resources/profiles/dcat-ap-v3.yaml'svalidation.externaldeclarespyshaclwithargs: [-s, {shapes}, -sf, turtle, -df, json-ld, -f, human, {file}], references the embedded shapes viaresources: [{name: shapes, embedded: dcat-ap-v3-shacl-shapes, suffix: .ttl}], and carries aninstall_hintpointing atpip install pyshacl (https://github.com/RDFLib/pySHACL).End-to-end verified. On a host without pyshacl installed:
When pyshacl IS installed, qsv writes the rendered JSON-LD AND the embedded shapes to tempfiles, spawns pyshacl with both substituted, and surfaces findings as ProjectionWarnings with the stable
external_validatefield +pyshacl: <line>message format from #3911.Test plan
cargo test --bin qsv -F profile,feature_capable cmd::profile::— 161 unit tests pass (7 new inexternal_validate::tests+ extended embedded smoke for DCAT-AP v3)cargo test --test tests -F profile,feature_capable -- test_profile::— 53 integration tests pass (incl. trust-gate coverage from feat(profile): Croissant mlcroissant validator + generic external validator framework #3911)qsv,qsvmcp,qsvlite,qsvdpcargo clippy --bin qsv -F profile,feature_capable— no new warningscargo +nightly fmtappliedpython3 scripts/docs-drift-check.py— no driftresources/profiles/README.mdupdated withresourcesconfig + DCAT-AP example + instructions for adding new EMBEDDED_RESOURCES entriesREADME.mdwith source + re-vendor procedure (mirrors the GSA bundle pattern)Notes
With this PR, all 5 §6 follow-ups from PR #3908's handoff are complete:
🤖 Generated with Claude Code