Skip to content

feat(profile): DCAT-AP v3 SHACL validator via pyshacl + multi-resource framework#3912

Merged
jqnatividad merged 1 commit into
masterfrom
dcat-ap-v3-shacl-validator
May 27, 2026
Merged

feat(profile): DCAT-AP v3 SHACL validator via pyshacl + multi-resource framework#3912
jqnatividad merged 1 commit into
masterfrom
dcat-ap-v3-shacl-validator

Conversation

@jqnatividad
Copy link
Copy Markdown
Collaborator

Summary

Closes the final §6 follow-up from profile3-handoff.md. DCAT-AP v3 ships SHACL constraints upstream (not JSON Schema), so qsv's built-in JSON-Schema validator can't validate emitted DCAT-AP v3 blocks. This PR generalizes the validation.external framework from #3911 to support multi-file validators and wires up pyshacl as the canonical DCAT-AP v3 reference validator.

Framework generalization

  • New ExternalValidatorResource struct + resources: Vec<...> on ExternalValidator. Each entry declares a logical embedded name resolved at spawn time against EMBEDDED_RESOURCES (include_str!-bundled), plus a token name used as {<name>} in args.
  • resolve_args / substitute_tokens rewritten for multi-token substitution. {file} keeps its "append if absent" fallback; named tokens ({shapes}, etc.) substitute in place; unknown {tokens} pass through verbatim. OsString path-fidelity from feat(profile): Croissant mlcroissant validator + generic external validator framework #3911 preserved.
  • Reserved name: "file" and unknown embedded values rejected with Severity::Required warnings so a misconfigured profile fails fast.

Vendored SHACL bundle

Profile wiring

  • resources/profiles/dcat-ap-v3.yaml's validation.external declares pyshacl with args: [-s, {shapes}, -sf, turtle, -df, json-ld, -f, human, {file}], references the embedded shapes via resources: [{name: shapes, embedded: dcat-ap-v3-shacl-shapes, suffix: .ttl}], and carries an install_hint pointing at pip install pyshacl (https://github.com/RDFLib/pySHACL).

End-to-end verified. On a host without pyshacl installed:

$ qsv profile <input> --profile dcat-ap-v3 --validate-dcat -o out.json
$ jq '.dcat_warnings[] | select(.field=="external_validate")' out.json
{
  "field": "external_validate",
  "severity": "info",
  "message": "\`pyshacl\` not installed; skipped dcat-ap-v3 validation. Install: pip install pyshacl (https://github.com/RDFLib/pySHACL)"
}

When pyshacl IS installed, qsv writes the rendered JSON-LD AND the embedded shapes to tempfiles, spawns pyshacl with both substituted, and surfaces findings as ProjectionWarnings with the stable external_validate field + pyshacl: <line> message format from #3911.

Test plan

  • cargo test --bin qsv -F profile,feature_capable cmd::profile::161 unit tests pass (7 new in external_validate::tests + extended embedded smoke for DCAT-AP v3)
  • cargo test --test tests -F profile,feature_capable -- test_profile::53 integration tests pass (incl. trust-gate coverage from feat(profile): Croissant mlcroissant validator + generic external validator framework #3911)
  • All 4 binaries build clean: qsv, qsvmcp, qsvlite, qsvdp
  • cargo clippy --bin qsv -F profile,feature_capable — no new warnings
  • cargo +nightly fmt applied
  • python3 scripts/docs-drift-check.py — no drift
  • resources/profiles/README.md updated with resources config + DCAT-AP example + instructions for adding new EMBEDDED_RESOURCES entries
  • Vendored 176KB SHACL bundle includes README.md with source + re-vendor procedure (mirrors the GSA bundle pattern)

Notes

With this PR, all 5 §6 follow-ups from PR #3908's handoff are complete:

  1. ✅ CI gate for embedded profile dry_compile (feat(profile): YAML projection follow-ups — CI gate, per-dist merge, catalog inheritance #3910)
  2. ✅ Per-distribution identity-based discovery merge (feat(profile): YAML projection follow-ups — CI gate, per-dist merge, catalog inheritance #3910)
  3. ✅ Template-driven catalog inheritance (feat(profile): YAML projection follow-ups — CI gate, per-dist merge, catalog inheritance #3910)
  4. ✅ Croissant mlcroissant validator + external-validator framework (feat(profile): Croissant mlcroissant validator + generic external validator framework #3911)
  5. ✅ SHACL backend for DCAT-AP v3 (this PR)

🤖 Generated with Claude Code

…e framework

Closes the final §6 follow-up from profile3-handoff.md. DCAT-AP v3
ships SHACL constraints upstream, not JSON Schema, so qsv's built-in
JSON-Schema validator can't validate the emitted block. This commit
generalizes the `validation.external` framework introduced in #3911
to support multi-file validators and wires up the canonical Python
SHACL engine, `pyshacl`, as the DCAT-AP v3 reference validator.

Framework generalization
- New `ExternalValidatorResource` struct + `resources: Vec<...>`
  field on `ExternalValidator`. Each entry declares a logical
  embedded-resource name resolved against a compile-time
  `EMBEDDED_RESOURCES` table (`include_str!`-bundled), a token
  name used as `{<name>}` in args, and an optional file suffix.
- New `EMBEDDED_RESOURCES` table in
  `src/cmd/profile/external_validate.rs` with one entry today:
  `dcat-ap-v3-shacl-shapes` mapping to the vendored Turtle file.
  Custom YAML profiles can reference any name listed there but
  cannot register new ones (qsv-release-time decision).
- `resolve_args` / `substitute_tokens` rewritten for multi-token
  substitution. `{file}` keeps its "append if absent" fallback;
  named tokens (`{shapes}`, `{schema}`, etc.) substitute in
  place; unknown `{tokens}` pass through verbatim so a validator
  that genuinely wants `{foo}` in its CLI gets it. The OsString
  path-fidelity guarantee from #3911 is preserved.
- Validation: reserved `name: "file"` and unknown `embedded`
  values are rejected at spawn time with `Severity::Required`
  warnings so a misconfigured profile fails fast.

Vendored SHACL bundle
- `resources/dcat-ap-v3/shacl/dcat-ap-SHACL.ttl` (176KB, 2321
  lines): canonical SHACL shapes from SEMICeu DCAT-AP v3.0.0,
  fetched verbatim from
  https://github.com/SEMICeu/DCAT-AP/blob/master/releases/3.0.0/shacl/dcat-ap-SHACL.ttl
- `resources/dcat-ap-v3/shacl/README.md`: source + re-vendor
  procedure, similar to the GSA bundle's documentation.

Profile wiring
- `resources/profiles/dcat-ap-v3.yaml` declares the pyshacl
  external validator with `args: [-s, {shapes}, -sf, turtle,
  -df, json-ld, -f, human, {file}]`, references the embedded
  SHACL bundle via `resources: [{name: shapes, embedded:
  dcat-ap-v3-shacl-shapes, suffix: .ttl}]`, and carries an
  `install_hint` so users see `pip install pyshacl` exactly when
  pyshacl is missing.

Docs
- `resources/profiles/README.md`: config table extended with
  the `resources` field; full DCAT-AP/pyshacl example added;
  instructions for adding new `EMBEDDED_RESOURCES` entries.

Tests (+7 new in external_validate, embedded smoke extended)
- resolve_args_substitutes_named_extras
- embedded_resources_table_includes_dcat_ap_v3_shapes (sanity-
  checks the bundle is real Turtle, guarding against bad
  re-vendor steps)
- lookup_embedded_returns_none_for_unknown
- resource_with_reserved_name_file_is_rejected
- resource_with_unknown_embedded_is_rejected
- resource_tempfile_is_materialized_with_correct_suffix
- embedded_dcat_ap_v3_parses_and_dry_compiles extended

End-to-end verified on a host without pyshacl: profile run with
`--validate-dcat --profile dcat-ap-v3` emits one Severity::Info
warning carrying the install hint and the projection still ships.

All 161 profile unit tests + 53 integration tests pass. Clippy +
docs-drift-check clean. All four binaries build green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codacy-production
Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Generalizes the validation.external framework introduced in #3911 to support multiple input files (beyond the implicit rendered JSON-LD), and wires DCAT-AP v3 to pyshacl using a vendored SEMICeu DCAT-AP 3.0.0 SHACL bundle bundled into the qsv binary via include_str!.

Changes:

  • Add ExternalValidatorResource + resources: Vec<...> on ExternalValidator; resolve_args/substitute_tokens rewritten for multi-token substitution with reserved-name + unknown-embedded validation.
  • Vendor resources/dcat-ap-v3/shacl/dcat-ap-SHACL.ttl (2321 lines) and register a single EMBEDDED_RESOURCES entry; documented re-vendor procedure.
  • Wire dcat-ap-v3.yaml's validation.external to pyshacl with {shapes}/{file} tokens; update README and unit/integration tests.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/cmd/profile/external_validate.rs Adds EMBEDDED_RESOURCES table, resource validation, multi-token substitute_tokens, and tests.
src/cmd/profile/profile_spec.rs Adds ExternalValidatorResource struct + resources field; extends embedded DCAT-AP v3 smoke test.
resources/profiles/dcat-ap-v3.yaml Wires pyshacl with embedded SHACL shapes via {shapes} token.
resources/profiles/README.md Documents resources field and DCAT-AP/pyshacl example.
resources/dcat-ap-v3/shacl/dcat-ap-SHACL.ttl Vendored SEMICeu DCAT-AP v3.0.0 SHACL shapes (2321 lines).
resources/dcat-ap-v3/shacl/README.md Documents source URL and re-vendoring procedure.

@jqnatividad jqnatividad merged commit 2e6a590 into master May 27, 2026
18 of 19 checks passed
@jqnatividad jqnatividad deleted the dcat-ap-v3-shacl-validator branch May 27, 2026 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants