feat(generators): add --default-language flag for language-tagged literals#9
Open
feat(generators): add --default-language flag for language-tagged literals#9
Conversation
…lib serialization Add a --deterministic / --no-deterministic CLI flag (default off) to OWL, SHACL, JSON-LD Context, and JSON-LD generators that produces byte-identical output across invocations. Three-phase hybrid pipeline for Turtle generators: 1. RDFC-1.0 canonicalization (W3C Recommendation) via pyoxigraph 2. Weisfeiler-Lehman structural hashing for diff-stable blank node IDs 3. Hybrid rdflib re-serialization for idiomatic Turtle (inline blank nodes, collection syntax, prefix filtering) JSON generators use deterministic_json() with recursive deep-sort and JSON-LD-aware key ordering that preserves conventional @context structure. Collection items (owl:oneOf, sh:in, sh:ignoredProperties) are sorted when --deterministic is set to ensure reproducible RDF list order. pyoxigraph >= 0.4.0 is imported lazily only when --deterministic is used. Tests skip gracefully when pyoxigraph is unavailable. Refs: linkml#1847 Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de> Signed-off-by: jdsika <carlo.van-driesten@bmw.de>
… names Add an opt-in --normalize-prefixes flag to OWL, SHACL, and JSON-LD Context generators that normalises non-standard prefix aliases to well-known names from a static prefix map (derived from rdflib 7.x defaults, cross-checked against prefix.cc consensus). Key design decisions: - Static frozen map (MappingProxyType) instead of runtime Graph().namespaces() lookup eliminates rdflib version dependency - Both http://schema.org/ and https://schema.org/ map to 'schema' - Shared normalize_graph_prefixes() helper used by OWL and SHACL - Two-phase graph normalisation: Phase 1 normalises schema-declared prefixes, Phase 2 cleans up runtime-injected bindings - Collision detection: skip with warning when standard prefix name is already user-declared for a different namespace - Phase 2 guard prevents overwriting HTTPS bindings with HTTP variants The flag defaults to off, preserving existing behaviour. Tests cover OWL, SHACL, and context generators with sdo->schema, dce->dc, http/https edge case, custom prefix preservation, flag-off backward compatibility, cross-generator consistency, prefix collision detection, schema1 regression prevention, Phase 2 HTTPS guard, empty schema edge case, and static map integrity. Signed-off-by: jdsika <carlo.van-driesten@bmw.de>
0f263f0 to
c3751dd
Compare
…erals Add a --default-language CLI option to gen-owl and gen-shacl that wraps human-readable annotation literals (rdfs:label, rdfs:comment, skos:definition, sh:name, sh:description, dcterms:title) with a BCP 47 language tag. - Element-level in_language overrides the generator default - Technical literals (URIs, numerics, XSD facets) are never tagged - Non-string annotation values preserve their native RDF datatype - Whitespace-only values are normalised to None Signed-off-by: Carlo van Driesten <carlo.van-driesten@2last.eu>
c3751dd to
06ac9fc
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
--default-languageCLI option to bothgen-owlandgen-shaclthat emits BCP 47 language-tagged string literals (e.g."Person"@en) for human-readable annotations.This enables ontology producers to comply with RDF 1.1 §3.3 (language-tagged strings as
rdf:langString) and OWL 2 §6.3 (annotation property values) without manual post-processing.Problem
LinkML generators currently emit all string literals as plain
xsd:stringvalues, even for human-readable annotations likerdfs:label,rdfs:comment,sh:name, andsh:description. This prevents downstream consumers from:FILTER(lang(?label) = "en"))The LinkML metamodel already has an
in_languagemetaslot, but no generator uses it.Changes
gen-owl (
owlgen.py)default_languagefield onOwlSchemaGenerator_LANGUAGE_TAGGABLE_RANGESfrozenset (string,ncname) guards tagging — technical types (URI, integer, boolean, datetime) are never tagged_resolve_language()resolves element-levelin_language→ generator-leveldefault_language→None_literal()helper creates properly taggedLiteralobjectsadd_metadata()tags string-range and fallback-range annotation literalsadd_enum()PV labels respect language tags (constraint values inowl:oneOfare correctly NOT tagged)--default-languageClick CLI optiongen-shacl (
shaclgen.py)default_languagefield with__post_init__whitespace normalisationrdfs:label/rdfs:commentget language tagssh:name/sh:descriptionget language tags viaprop_pv_text()_add_annotations()tags string annotation valuessh:order,sh:minCount, etc.) are never tagged--default-languageClick CLI optionTests
in_languageoverride, annotations, empty string, whitespace-onlyBackward compatibility
None(no language tags) — existing behaviour is completely unchangedNoneStandards compliance
rdf:langStringvsxsd:stringdistinctionrdf:langStringsh:name/sh:descriptionrange includesrdf:langStringTesting