Skip to content

feat(ontology): bO-1..bO-16 — DOLCE/FIBO/QUDT/ZUGFeRD/SKR hydrators + 2 new Tier-C hydrator types#407

Merged
AdaWorldAPI merged 11 commits into
mainfrom
claude/hydrate-dolce-dul-owl-Ce9Oa
May 21, 2026
Merged

feat(ontology): bO-1..bO-16 — DOLCE/FIBO/QUDT/ZUGFeRD/SKR hydrators + 2 new Tier-C hydrator types#407
AdaWorldAPI merged 11 commits into
mainfrom
claude/hydrate-dolce-dul-owl-Ce9Oa

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Ten of seventeen planned bO-* ontology hydrators ship in this branch,
plus two brand-new Tier-C hydrator types (XsdHydrator,
SchematronHydrator) that unlock the remaining XSD-based and rule-based
specs (UBL, ISO 20022, XRechnung, GoBD).

bO-PR Slot What hydrates Hydrator
bO-1 DOLCE_V1 DOLCE+DUL upper ontology + Conceptualization + LMM_L2 extensions OwlHydrator
bO-2 OWLTIME_V1 W3C OWL-Time OwlHydrator
bO-3 PROVO_V1 W3C PROV-O OwlHydrator
bO-4 QUDT_V1 QUDT core + units + quantitykinds OwlHydrator
bO-5 SKOS_V1 SKOS Core + XL (+ OWL-1-DL variant shipped) OwlHydrator
bO-6 FIBOFND_V1 FIBO Foundations (RDF/XML) OwlHydrator
bO-7 FIBOBE_V1 FIBO Business Entities (RDF/XML) OwlHydrator
bO-8 SCHEMAORG_V1 schema.org OwlHydrator
bO-13 SKR03_V1 + SKR04_V1 DATEV SKR 03 + SKR 04 charts of accounts new SkrHydrator
bO-15 ZUGFERDRULES_V1 ZUGFeRD EN16931 Schematron rules (~510 IRIs) new SchematronHydrator
bO-16 ZUGFERD_V1 ZUGFeRD/Factur-X EN16931 CII XSDs new XsdHydrator

New hydrator types (Tier-C unlocks)

  • XsdHydrator (hydrators/xsd.rs): streams XSD via quick-xml,
    interns <xs:element> / <xs:complexType> / <xs:simpleType> /
    <xs:attribute> / <xs:attributeGroup> names as
    {targetNamespace}#{name} IRIs. Reusable for UBL, ISO 20022, etc.
  • SchematronHydrator (hydrators/schematron.rs): walks .sch
    files, interns assert/report/pattern @id plus bracketed
    business-rule IDs from message text ([BR-52], [BR-CO-03],
    [PEPPOL-EN16931-R008]). Reusable for XRechnung, GoBD, and any
    ISO/IEC 19757-3 spec.
  • SkrHydrator (hydrators/skr.rs): CSV-driven hydrator for DATEV
    SKR charts of accounts. SKR 03 and SKR 04 go into separate G slots
    because the same account number means different things across the
    two schemes (1000 = Kasse in SKR03 vs RHB in SKR04).

Format-detection improvement

OwlHydrator::detect_format now content-sniffs the first 256 bytes
for <?xml / <rdf:RDF when the extension is ambiguous (.owl files
exist in both Turtle and RDF/XML form in the wild). Existing .rdf /
.ttl extension-based routes are unchanged.

Data drops

  • data/ontologies/dul-extensions/ — Conceptualization + LMM_L2 (CC-BY)
  • data/ontologies/skos/ — Core + XL + OWL-1-DL variant (W3C)
  • data/ontologies/zugferd/ — 4 CII XSDs + Schematron + codelist
    (Apache-2.0, LandrixSoftware)
  • data/ontologies/skr-datev/ — SKR 03 (1623), SKR 03 Bau (1880),
    SKR 04 (1232) accounts as CSV + parse_pdfs.py reproducibility
    script + README documenting known data-quality gaps

What's NOT here (deferred)

  • bO-9 (XBRL GL) — needs XbrlHydrator (taxonomy-specific extension
    of XsdHydrator)
  • bO-10 (IFRS) — same
  • bO-11 (UBL 2.4) — drop-in via existing XsdHydrator
  • bO-12 (ISO 20022) — drop-in via existing XsdHydrator
  • bO-14 (HGB-Bilanz) — needs a paragraph-extractor or hand-curated TTL
  • bO-17 (AEC3PO) — Tier-A drop-in via OwlHydrator
  • Type-graph projection for XSD (xs:extension baserdfs:subClassOf)
  • Schematron severity / XPath context resolution
  • Cross-scheme alignment axioms (SKR ⊑ FIBO MonetaryAmount, ZUGFeRD CII
    ⊑ schema.org Invoice). Left to downstream consumers.

Test plan

  • All 114 lance-graph-ontology tests pass (was 92 before the branch)
  • cargo clippy clean (only pre-existing oxrdf deprecation warnings)
  • Downstream consumers build clean: lance-graph-callcenter,
    lance-graph-consumer-conformance, cognitive-shader-driver
  • Per-hydrator smoke tests verify load-bearing IRIs resolve:
    • DOLCE/DUL upper categories + extension classes
    • SKOS Core classes + XL Label surface + *Match mapping predicates
    • FIBO MonetaryAmount, Currency, Party
    • QUDT units + quantitykinds (Length, Mass, Time, …)
    • schema.org Person, Organization, Invoice
    • PROV-O Entity/Activity/Agent triad
    • OWL-Time Instant/Interval surface
    • ZUGFeRD CrossIndustryInvoice + RAM TradeParty/Address types
    • ZUGFeRD rules: BR-52, BR-CO-03, BR-S-08, BR-Z-08, PEPPOL-EN16931-R008
    • SKR 03 anchors: 1000 Kasse, 1200 Bank, 1400 Forderungen, 8400 Erlöse
    • SKR 04 anchors: 1000 RHB, 1200 Forderungen, 1600 Kasse, 4200 Erlöse
  • Schema separation: SKR03 IRIs do NOT resolve in SKR04 bundle
  • SKR 03 Bau trade extensions resolve under separate IRI base

Consumer impact

  • woa-rs: API surface it imports (NamespaceBridge, OntologyRegistry,
    bridges::WoaBridge) unchanged — its lance-graph integration is
    blocked operationally on Docker build context, not on API. No
    breaking changes.
  • smb-office-rs: SKR 04 chart-of-accounts, ZUGFeRD invoice schemas,
    schema.org Customer/Organization are now available via hydrators —
    can replace hand-rolled smb-ontology schemas with composite hydrator
    calls once consumer chooses to wire them in.

Branch ancestry

375b87a (bO-1 DOLCE) → 67f7dc4 (bO-2/3/4/8) → 465361e (bO-6/7

  • RDF/XML) → ea9c368 (bO-4 quantitykinds) → e041d0d (bO-5 +
    DUL extensions) → 9e65307 (bO-16 + XsdHydrator) → d4589aa (bO-15
  • SchematronHydrator) → 6f983db (bO-13 data) → 299e6c4 (bO-13
    SkrHydrator).

10 commits, ~12k insertions of mostly data + tests, 114 tests pass.


Generated by Claude Code

claude added 10 commits May 21, 2026 11:55
…logy)

Hydrate DOLCE+DnS Ultralite as the L1 upper ontology at OGIT::DOLCE_V1.0.
DOLCE is the root of the L1-L4 business-logic DAG (inherits_from: None) —
every downstream L2/L3/L4 hydrator (OWL-Time, PROV-O, QUDT, FIBO-FND, ...)
declares inherits_from: Some(OGIT::DOLCE_V1.0) and resolves against this
hydration via rdfs:subClassOf chains.

Additions:
- data/ontologies/dul.ttl — curated extract of canonical DOLCE+DUL with 244
  named entities covering the four upper categories (Endurant/Perdurant/
  Quality/Abstract), the DnS role hierarchy (Role/Agent/Patient/Instrument/
  Location/TimeInterval), and 60+ object/datatype properties. CC-BY 4.0.
- LICENSES/DOLCE.txt — attribution per CC-BY 4.0.
- crates/lance-graph-ontology/src/hydrators/{mod,owl,dolce}.rs — Pattern-D
  scaffolding: OwlHydrator + MetaStructureHydrator trait + ContextBundle +
  OntologySlot, plus the DOLCE-specific glue (~50 LOC) with the 17-edge
  cascade whitelist (subClassOf, classification, role-binding, part-of,
  temporal anchoring).
- OntologyRegistry surface extension: register_bundle / bundle_for /
  register_edge_types / edge_types_for / entity_count_for / resolve_iri_in.
- 4 integration tests (dolce_hydrator_smoke, dolce_upper_category_count,
  dolce_dns_role_resolution, dolce_edge_whitelist_registered) + 2 owl.rs
  unit tests + 2 dolce.rs unit tests; all green (10 new + 54 pre-existing
  pass; cargo clippy clean; hydration <1ms in release).
… DUL naming

Replace the hand-curated DUL.ttl extract with the canonical DOLCE+DnS
Ultralite ontology (v4.2, ~187 KB, 78 classes + 113 object properties
+ 5 datatype properties) as supplied by the user.

Test alignment: canonical DUL deliberately departs from DOLCE-Lite-Plus
naming per its own header ("the names of classes and relations have been
made more intuitive"). The two load-bearing renames for the cognitive-
shader L1 slot are Endurant -> Object and Perdurant -> Event. The four
upper categories in DUL are therefore: Object, Event, Quality, Abstract
(all direct sub-classes of Entity).

Likewise, canonical DUL does NOT define `Patient` / `Instrument` /
`Location` as named classes — those are runtime Concept individuals or
live in extension modules (CoreLegal, IOLite, SystemsLite). The DnS
role-resolution test now anchors on the canonical role hierarchy:
Agent, Concept, Role, Task, Parameter, Goal, Method, Plan, TimeInterval.

Test surface (10 tests total, all green):
- dolce_hydrator_smoke (entity_count > 200 confirmed against canonical)
- dolce_upper_category_count (5 root IRIs incl. Entity)
- dolce_dns_role_resolution + dns_role_anchor_concept_is_present +
  dul_extension_role_iris_do_not_resolve (the new test pins the
  expectation that Patient/Instrument/Location are NOT canonical classes)
- dolce_edge_whitelist_registered + edge_types_for_unknown_g_returns_none
  + re_register_edge_types_is_idempotent
- 2 owl.rs + 2 dolce.rs unit tests

All 17 edges in the cascade whitelist verified present in canonical DUL.
LICENSES/DOLCE.txt updated to reflect verbatim-canonical usage and the
DOLCE-LP -> DUL rename rationale.
…DT, schema.org

Ship four L2/L3 hydrators on top of the Pattern-D substrate landed in
PR-bO-1 (DOLCE). Each declares inherits_from: Some(OGIT::DOLCE_V1.0)
and reuses the generic OwlHydrator; per-ontology glue is ~50 LOC plus
cascade-edge whitelist.

bO-2 OWL-Time (OGIT::TIME_V1, slot 10):
- data/ontologies/time.ttl (W3C, 103 KB)
- 22-edge whitelist incl. all 13 Allen interval relations
- assertions: TemporalEntity/Instant/Interval/ProperInterval resolve

bO-3 PROV-O (OGIT::PROVO_V1, slot 11):
- data/ontologies/provo.ttl (W3C, 70 KB)
- 21-edge whitelist incl. wasGeneratedBy/used/wasDerivedFrom/
  wasAttributedTo/wasAssociatedWith/actedOnBehalfOf
- assertions: Entity/Activity/Agent/Bundle resolve

bO-4 QUDT 2.1 (OGIT::QUDT_V1, slot 12):
- data/ontologies/qudt-core.ttl (137 KB) + qudt-units.ttl (3.85 MB,
  ~2900 unit individuals)
- Multi-file hydration via new OwlHydrator::hydrate_many; both TTL
  artifacts merge into one ContextBundle
- assertions: SI base units (M/SEC/K/A/MOL/CD/KiloGM) resolve
- NOTE: quantitykinds catalogue not included (user-uploaded file was
  byte-identical duplicate of units file); follow-up upload needed

bO-8 schema.org (OGIT::SCHEMAORG_V1, slot 13):
- data/ontologies/schemaorg.ttl (1.13 MB, ~1400 named subjects)
- 18-edge whitelist incl. schema-org-specific flexible-typing surface
  (domainIncludes / rangeIncludes)
- assertions: Thing/Person/Organization/Place/Event/Product resolve

Infrastructure:
- OwlHydrator::hydrate_many for multi-file ontologies
- 4 new canonical slot tokens (TIME=10, PROVO=11, QUDT=12, SCHEMAORG=13)
- 4 new modules/*/manifest.yaml files declaring inherits_from: dolce
- Entity-code allocation in 500-539 range to clear smb-office (200),
  q2-cockpit (300), hubspo (400)
- LICENSES/{OWL-TIME,PROV-O,QUDT,SCHEMAORG}.txt attribution files
- 13 new tests (all green); QUDT hydration of core+units in 0.42s release
- Contract test ALL_G_SLOTS.len assertion loosened from `== 6` to `>= 6`
  with explanatory comment about the bO-* L2 series
… Business Entities (BE)

Ship the L3 financial / business ontology hydrators on the Pattern-D
substrate landed in PR-bO-1. FIBO is the second non-Turtle format the
hydrator supports — it ships as RDF/XML across ~111 modular .rdf files
under data/ontologies/fibo-{fnd,be}/. The generic OwlHydrator now
dispatches by extension: .rdf → oxrdfxml, everything else → oxttl.

Substrate changes:
- oxttl upgraded 0.1 → 0.2, oxrdf 0.2 → 0.3 (to share oxrdf with the
  new oxrdfxml dep — duplicate-version pulls of oxrdf produce confusing
  type mismatches at the parser boundary)
- Added oxrdfxml = "0.2" dep
- OwlHydrator::hydrate_many() now dispatches by file extension via
  detect_format(); the trait surface is unchanged

bO-6 FIBO-FND (OGIT::FIBOFND_V1, slot 20):
- data/ontologies/fibo-fnd/ (~59 RDF/XML files, ~1.5 MB)
- entity_count = 2232 after hydration (covers OMG Commons re-imports
  + FND-native classes)
- 21-edge whitelist incl. OMG Commons hasIdentifier / hasPart, plus
  FND-specific Party / Address / Currency / Contract predicates
- 3 smoke tests verify load-bearing IRIs resolve (MonetaryAmount,
  Currency, AmountOfMoney, ExchangeRate, InterestRate, Person,
  cmns-org:LegalPerson)

bO-7 FIBO-BE (OGIT::FIBOBE_V1, slot 21):
- data/ontologies/fibo-be/ (~52 RDF/XML files, ~1.3 MB)
- entity_count = 1964 after hydration
- 17-edge whitelist incl. ownership / control / corporate-governance
  predicates plus FND-inherited foundation predicates
- 3 smoke tests verify Corporation / Partnership / Trust / LegalEntity
  / BusinessEntity resolve

Both modules declare inherits_from: Some(OGIT::DOLCE_V1.0) directly.
A future PR can chain BE → FND → DOLCE once multi-parent inherits_from
chains land in the registry.

Test surface (6 new tests, all green); cargo clippy clean; full
lance-graph-ontology suite (84 tests) plus downstream consumers
(callcenter, consumer-conformance, cognitive-shader-driver) build clean.

Slot allocation: FIBOFND=20, FIBOBE=21 (above the L2 universal block
at 10-13, leaves 7-9 + 14-19 reserved for future inter-layer ontologies).

License: MIT (EDM Council). Attribution in LICENSES/FIBO.txt.
Adds the canonical QUDT 2.1 quantitykinds catalogue (~1240 quantity-kind
individuals, 1.92 MB) at data/ontologies/qudt-quantitykinds.ttl. This
was the missing third leg of bO-4 — prior PR shipped core + units only
because the uploaded quantitykinds file was a byte-identical duplicate
of units (same MD5).

hydrate_qudt() now passes all three artifacts (core + units + quantity-
kinds) to OwlHydrator::hydrate_many, merging them into one bundle
keyed by OGIT::QUDT_V1.0.

Test additions:
- Smoke test threshold raised from >2000 to >4000 entities (was
  ~3000 with core+units; now substantially higher with quantitykinds)
- New qudt_si_quantitykinds_resolve test asserts the seven SI base
  quantitykinds (Length, Mass, Time, ElectricCurrent, Temperature,
  AmountOfSubstance, LuminousIntensity) plus common derived ones
  (Force, Energy, Power, Pressure, Frequency, Voltage, Velocity)
  resolve under G=QUDT_V1

LICENSES/QUDT.txt updated to reflect full three-file coverage.
All 85 lance-graph-ontology tests pass; cargo clippy clean.
Two additions on the Pattern-D substrate:

bO-5 SKOS (OGIT::SKOS_V1, slot 14):
- data/ontologies/skos/skos-core.rdf — canonical SKOS Core (32 classes
  + properties: Concept, ConceptScheme, Collection, OrderedCollection,
  broader/narrower/related/{broader,narrower}Transitive, in/topConcept,
  prefLabel/altLabel/hiddenLabel, notation, the *Match mapping family)
- data/ontologies/skos/skos-xl.rdf — SKOS-XL eXtension for Labels
  (lifts labels to first-class IRIs via Label/literalForm/labelRelation)
- data/ontologies/skos/skos-owl1dl.rdf — OWL-1-DL-conformant variant
  shipped for tools that require strict DL conformance (not used by
  the default hydration)
- 27-edge whitelist (subClassOf + subPropertyOf + 23 SKOS semantic-relation
  / mapping / label predicates) — load-bearing for SKR03/SKR04 alignment
- LICENSES/SKOS.txt (W3C document license)
- 4 smoke tests verify Core classes, XL Label surface, and the *Match
  predicates resolve

DUL extension modules merged into OGIT::DOLCE_V1:
- data/ontologies/dul-extensions/conceptualization.owl — agent
  conceptualization patterns (knows / believes / assumes / adopts +
  InternalRepresentation class). Refines the dul:conceptualizes surface
  used by cognitive-shader agency cascades.
- data/ontologies/dul-extensions/lmm-l2.owl — Lexical MetaModel L2.
  Adds NER surface (NamedEntity / Name / ConceptExpression /
  ContextualExpression / IndividualReference / MultipleReference /
  Gloss / hasSyntacticFunction / hasInstance / isInstanceOf).
- New `hydrate_dolce_from_many` API that merges DUL.ttl + extensions
  into a single ContextBundle keyed by OGIT::DOLCE_V1.0 via
  OwlHydrator::hydrate_many. Extensions are loaded best-effort
  (skipped if missing) so the path works under either deployment.
- 3 smoke tests verify the Conceptualization and LMM_L2 IRIs resolve
  under G=DOLCE_V1, plus the merged entity count grows past >=240.

Format detection improvement:
- OwlHydrator::detect_format now accepts bytes and content-sniffs the
  first 256 bytes for `<?xml` / `<rdf:RDF` when the extension is
  ambiguous (.owl files exist in both Turtle and RDF/XML form in the
  wild; the DUL extensions are RDF/XML-with-.owl). Existing routes by
  extension (.rdf / .ttl / .nt) are unchanged.

All 92 lance-graph-ontology tests pass; cargo clippy clean.
… XsdHydrator

First Tier-C hydrator (per spec) — handles XSD-shaped schemas where
the OwlHydrator's TTL/RDF-XML routes don't apply. ZUGFeRD/Factur-X
EN16931 is the German hybrid PDF/A-3 + XML invoice format aligned
with EU directive EN 16931; the underlying XML schema is UN/CEFACT
CrossIndustryInvoice (CII) v100.

New XsdHydrator (minimal name-extraction shape):
- crates/lance-graph-ontology/src/hydrators/xsd.rs
- Walks every <xs:element>/<xs:complexType>/<xs:simpleType>/<xs:attribute>/
  <xs:attributeGroup> declaration via quick-xml streaming, interns each
  as `{targetNamespace}#{name}` IRIs into the existing ContextBundle
  surface
- Adds quick-xml = "0.37" as a direct dep (already transitively present
  via oxrdfxml)
- collect_xsd_files() walks a directory tree for `.xsd`, sorted for
  stable interning order
- Unit test verifies a tiny inline XSD interns 5 named declarations
- Type-graph semantics (xs:extension base / xs:restriction base) NOT
  resolved into rdfs:subClassOf-equivalent edges — deferred follow-up,
  documented in xsd.rs module docs

bO-16 ZUGFeRD (OGIT::ZUGFERD_V1, slot 30):
- data/ontologies/zugferd/ — 4 XSD files (top-level CII + RAM + QDT +
  UDT) from the Factur-X 1.08 EN16931 profile, plus FACTUR-X_EN16931.sch
  Schematron rules and FACTUR-X_EN16931_codedb.xml code-list database
  (latter two shipped for future SchematronHydrator / CodeListHydrator
  PRs, not hydrated today)
- Source: LandrixSoftware/validator-configuration-zugferd (Apache-2.0)
- entity_count > 200 after hydration of all 4 XSDs
- 17-edge cascade whitelist covers the three CII top-level relational
  containers (ExchangedDocument / ExchangedDocumentContext /
  SupplyChainTradeTransaction) plus the 14 most load-bearing RAM
  predicates (Seller/Buyer TradeParty, TradeAgreement / TradeDelivery /
  TradeSettlement headers, line-item containers, tax / payment terms,
  monetary summation)
- 5 smoke tests verify CII root, RAM types (Header*TradeAgreementType,
  TradePartyType, TradeAddressType, TradeTaxType, ...), all four CII
  namespaces present, edge whitelist registered

LICENSES/ZUGFERD.txt cites Apache-2.0 (LandrixSoftware config) +
UN/CEFACT IP policy for the underlying CII XSDs.

All 98 lance-graph-ontology tests pass; cargo clippy clean; downstream
consumers build clean.
Second Tier-C hydrator. Schematron (ISO/IEC 19757-3) is the rule-
language layer that sits on top of XSD across every major e-business
standard — EN 16931, PEPPOL, XRechnung, ZUGFeRD, UBL, ISO 20022 all
ship structural XSD + behavioral .sch business rules. This hydrator
generalizes the rule-extraction shape so each future spec only needs
glue, not new hydrator code.

New SchematronHydrator (minimal name-extraction shape):
- crates/lance-graph-ontology/src/hydrators/schematron.rs
- Walks `<assert id="..." test="..." flag="...">` and
  `<report id="...">` and `<pattern id="...">` elements via quick-xml
  streaming, interns each as `{base_iri}/{element}/{id}`
- Additionally scans text bodies for bracketed business-rule IDs
  like [BR-52], [BR-CO-03], [BR-DE-1], [PEPPOL-EN16931-R008] —
  these are the canonical EN16931 / PEPPOL / DE / VAT-category rule
  identifiers used by cross-spec alignment. Each distinct ID interns
  as `{base_iri}/rule/{rule-id}`.
- Strict business-rule ID validator (uppercase-only, dash-segmented,
  no leading digit / double dash / trailing dash) keeps the rule
  namespace clean of false positives from message text.
- 3 unit tests verify the validator + end-to-end interning on a tiny
  in-memory .sch fixture.

bO-15 ZUGFeRD rules (OGIT::ZUGFERDRULES_V1, slot 31):
- hydrate_zugferd_rules() points the SchematronHydrator at the
  FACTUR-X_EN16931.sch file that already shipped with bO-16.
- Base IRI urn:schematron:factur-x-1.08-en16931 — stable URN for
  downstream alignment to PEPPOL / EN16931 / FeRD rule registries.
- inherits_from: Some(OGIT::ZUGFERD_V1.0) — rule namespace is
  meaningless without the structural CII namespace.
- 5 smoke tests verify the EN16931 anchor rules (BR-52 / BR-45 /
  BR-CO-03 / BR-CO-17 / BR-S-08 / BR-Z-08 / BR-DEC-19), PEPPOL
  extension (PEPPOL-EN16931-R008), schema-level FX-SCH-A-* IDs that
  KoSIT validator output references directly, and rule-family
  coverage (BR-CO / BR-DEC / BR-S / BR-Z / PEPPOL).
- ~510 IRIs hydrated: 301 distinct schema assert IDs (428 elements
  collapse to 301 — some IDs are reused across patterns) + ~209
  distinct bracketed business-rule IDs.

Inspection findings documented in tests + LICENSES/ZUGFERD.txt:
- 191 <report> elements present but NONE have @id (Schematron-1
  style; text bodies still contribute business-rule IRIs).
- 0 patterns / 0 phases with @id.

Side fix: factored out the XsdHydrator interning-closure type alias
to clear a clippy type_complexity warning. Now 5 clippy warnings,
all pre-existing oxrdf deprecations.

All 106 lance-graph-ontology tests pass; downstream consumers build
clean.
Converts two DATEV Standardkontenrahmen PDFs (German standard chart of
accounts) to machine-readable CSV, ready for future hydration as the
PR-bO-13 SKR slot in lance-graph-ontology.

Data files (data/ontologies/skr-datev/):

  skr04.csv          1232 accounts. SKR 04 generic (Abschlussgliederungs-
                     prinzip), DATEV Art.-Nr. 11175, valid 2023. Family
                     numbering 0-9 maps to balance-sheet structure:
                     0=Anlagevermögen, 1=Umlaufvermögen, 2=Eigenkapital,
                     3=Fremdkapital, 4=Erträge, 5=Materialaufwand,
                     6=Personalaufwand, 7=Finanzergebnis, 9=Statistische.

  skr03.csv          1476 accounts. Canonical SKR 03 (Prozessgliederungs-
                     prinzip), extracted as the NN=00 subset of the Bau
                     variant. Family numbering follows process-oriented
                     scheme: 0=Anlage/Kapital, 1=Finanz/Privat, 2=Abgrenzung,
                     3=Wareneingang/Bestände, 4=Betriebliche Aufwendungen,
                     5/6=frei, 7=Erzeugnisbestände, 8=Erlöse, 9=Vortrags/
                     Statistische.

  skr03-bau.csv      1686 accounts. Full SKR 03 Bau und Handwerk (DATEV
                     Art.-Nr. 19606, 2026 edition). Uses 6-digit account
                     format NNNN+NN. 210 of these are trade-specific
                     extensions (NN>00) on top of the canonical 1476.

  parse_pdfs.py      Python script that produced the CSVs. Character-
                     column slicing of `pdftotext -layout` output. Depends
                     on poppler-utils for the underlying extraction.

  README.md          Documents file layout, family-numbering differences
                     between SKR 03 and SKR 04, source artifact refs,
                     known data-quality issues (~12% of entries have
                     multi-column bleeding in account_name and may need
                     hand-cleanup; account NUMBERS are 100% reliable).

LICENSES/DATEV-SKR.txt documents the legal basis: only the underlying
chart-data is redistributed (account numbers + descriptive names), not
DATEV's copyrighted PDF formatting / layout / typography. Account numbers
are public-domain bookkeeping standard identifiers; account names are
descriptive labels for HGB-defined accounting concepts.

Next step (NOT in this commit): PR-bO-13 hydrator that interns each
account as `urn:datev:skr04:account/{number}` and
`urn:datev:skr03:account/{number}` IRIs into the OntologyRegistry,
with family classifications as SKOS Collections and cross-walks to
FIBO MonetaryAmount + ZUGFeRD invoice projections as separate axioms.
Third Tier-C hydrator. Reads the SKR 03 / SKR 04 CSVs that landed in
the previous commit and interns each account as a stable u32 IRI in
the OntologyRegistry. The two schemes hydrate into SEPARATE G slots
(SKR03_V1=40, SKR04_V1=41) because the same account number means
different things across them — e.g. account 1000 is "Kasse" in SKR 03
but "Roh-, Hilfs- und Betriebsstoffe" in SKR 04.

New SkrHydrator (minimal name-extraction shape):
- crates/lance-graph-ontology/src/hydrators/skr.rs — generic CSV-driven
  hydrator. Streams a CSV via a small hand-rolled parser (handles
  double-quoted fields with embedded commas + "" escape), interns
  `{iri_prefix}/{account_number}` for each row.
- No new external dep — CSV parsing is ~30 lines of state-machine.
- 2 unit tests cover the CSV parser and a tiny end-to-end hydration.

New glue (hydrators/skr_datev.rs) ships three entry points:
- hydrate_skr03 → OGIT::SKR03_V1, IRI base `urn:datev:skr03:account`
- hydrate_skr04 → OGIT::SKR04_V1, IRI base `urn:datev:skr04:account`
- hydrate_skr03_bau → also OGIT::SKR03_V1 but with a separate IRI base
  `urn:datev:skr03-bau:account` so 6-digit Bau accounts coexist with
  canonical 4-digit ones without clashing.

CSV data regeneration (data/ontologies/skr-datev/parse_pdfs.py):
- Widened SKR 03 Bau column slice from [44, 90) to [38, 100). The old
  bound cut off account numbers preceded by function-code prefixes
  ("F 1000 00 Kasse" started at col 41; slicing at 44 dropped the
  function code AND the first digit). SKR 03 canonical entity count
  grew 1476 -> 1623.
- SKR 04: unchanged.

Manifests + slots:
- modules/skr03/manifest.yaml — declares anchor accounts (1000 Kasse,
  1200 Bank, 1400 Forderungen LL, 1576 Vorsteuer 19%, 3300/8400/8300
  revenues) with stable u16 entity IDs for external code reference.
- modules/skr04/manifest.yaml — declares balance-sheet-oriented anchors
  (1000 RHB, 1200 Forderungen LL, 1400 Vorsteuer, 1600 Kasse, 1800 Bank,
  2900 Eigenkapital, 3300 Verbindlichkeiten, 4400 Erlöse, 5400 Wareneingang)
- crates/lance-graph-contract/build.rs: adds SKR03=40 + SKR04=41 to the
  G-slot table.

6 smoke tests verify:
- both schemes hydrate into their respective G slots with the right
  domain name and DOLCE inheritance
- entity counts in expected range (~1500 each)
- load-bearing anchor accounts resolve under their scheme-specific
  IRIs (Kasse / Bank / Forderungen / Vorsteuer / Wareneingang / Erlöse)
- SKR 03 / SKR 04 are independent slots — SKR03 IRIs don't resolve in
  the SKR04 bundle and vice versa
- SKR 03 Bau extensions (007510 Sand- und Kiesausbeute, 010010 Bauliche
  Anlagen für stationäre Fertigung) resolve under the Bau IRI base

Known parser coverage gaps (documented in test comments):
- SKR 04 account 2900 "Gezeichnetes Kapital" not extracted (page-8
  column boundary issue)
- SKR 04 account 4400 entangled in multi-account bleed at 4332
Both are CSV data quality issues, not hydrator bugs. The hydrator is
correct; future parse_pdfs.py refinement will close these gaps.

All 114 lance-graph-ontology tests pass; clippy clean; downstream
consumers (callcenter, consumer-conformance, cognitive-shader-driver)
build clean.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

Important

Review skipped

Too many files!

This PR contains 189 files, which is 39 over the limit of 150.

To get a review, narrow the scope:
• coderabbit review --type committed # exclude uncommitted changes
• coderabbit review --dir # limit to a subdirectory
• coderabbit review --base # compare against a closer base

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 39bd36e8-5c14-439c-a94f-91bdbf53f94e

📥 Commits

Reviewing files that changed from the base of the PR and between 02b066e and 2e3dd48.

⛔ Files ignored due to path filters (4)
  • Cargo.lock is excluded by !**/*.lock
  • data/ontologies/skr-datev/skr03-bau.csv is excluded by !**/*.csv
  • data/ontologies/skr-datev/skr03.csv is excluded by !**/*.csv
  • data/ontologies/skr-datev/skr04.csv is excluded by !**/*.csv
📒 Files selected for processing (189)
  • LICENSES/DATEV-SKR.txt
  • LICENSES/DOLCE.txt
  • LICENSES/FIBO.txt
  • LICENSES/OWL-TIME.txt
  • LICENSES/PROV-O.txt
  • LICENSES/QUDT.txt
  • LICENSES/SCHEMAORG.txt
  • LICENSES/SKOS.txt
  • LICENSES/ZUGFERD.txt
  • crates/lance-graph-contract/build.rs
  • crates/lance-graph-contract/tests/manifest_codegen.rs
  • crates/lance-graph-ontology/Cargo.toml
  • crates/lance-graph-ontology/src/hydrators/dolce.rs
  • crates/lance-graph-ontology/src/hydrators/fibo.rs
  • crates/lance-graph-ontology/src/hydrators/mod.rs
  • crates/lance-graph-ontology/src/hydrators/owl.rs
  • crates/lance-graph-ontology/src/hydrators/owltime.rs
  • crates/lance-graph-ontology/src/hydrators/provo.rs
  • crates/lance-graph-ontology/src/hydrators/qudt.rs
  • crates/lance-graph-ontology/src/hydrators/schemaorg.rs
  • crates/lance-graph-ontology/src/hydrators/schematron.rs
  • crates/lance-graph-ontology/src/hydrators/skos.rs
  • crates/lance-graph-ontology/src/hydrators/skr.rs
  • crates/lance-graph-ontology/src/hydrators/skr_datev.rs
  • crates/lance-graph-ontology/src/hydrators/xsd.rs
  • crates/lance-graph-ontology/src/hydrators/zugferd.rs
  • crates/lance-graph-ontology/src/lib.rs
  • crates/lance-graph-ontology/src/registry.rs
  • crates/lance-graph-ontology/tests/dolce_dns_role_resolution.rs
  • crates/lance-graph-ontology/tests/dolce_edge_whitelist_registered.rs
  • crates/lance-graph-ontology/tests/dolce_extension_modules.rs
  • crates/lance-graph-ontology/tests/dolce_hydrator_smoke.rs
  • crates/lance-graph-ontology/tests/dolce_upper_category_count.rs
  • crates/lance-graph-ontology/tests/fibo_be_hydrator_smoke.rs
  • crates/lance-graph-ontology/tests/fibo_fnd_hydrator_smoke.rs
  • crates/lance-graph-ontology/tests/owltime_hydrator_smoke.rs
  • crates/lance-graph-ontology/tests/provo_hydrator_smoke.rs
  • crates/lance-graph-ontology/tests/qudt_hydrator_smoke.rs
  • crates/lance-graph-ontology/tests/schemaorg_hydrator_smoke.rs
  • crates/lance-graph-ontology/tests/skos_hydrator_smoke.rs
  • crates/lance-graph-ontology/tests/skr_hydrator_smoke.rs
  • crates/lance-graph-ontology/tests/zugferd_hydrator_smoke.rs
  • crates/lance-graph-ontology/tests/zugferd_rules_hydrator_smoke.rs
  • data/ontologies/dul-extensions/conceptualization.owl
  • data/ontologies/dul-extensions/lmm-l2.owl
  • data/ontologies/dul.ttl
  • data/ontologies/fibo-be/AllBE-Europe.rdf
  • data/ontologies/fibo-be/AllBE-ExampleIndividuals.rdf
  • data/ontologies/fibo-be/AllBE-NorthAmerica.rdf
  • data/ontologies/fibo-be/AllBE-NorthAmericanExamples.rdf
  • data/ontologies/fibo-be/AllBE-ReferenceIndividuals.rdf
  • data/ontologies/fibo-be/AllBE.rdf
  • data/ontologies/fibo-be/Corporations/Corporations.rdf
  • data/ontologies/fibo-be/Corporations/MetadataBECorporations.rdf
  • data/ontologies/fibo-be/FunctionalEntities/FunctionalEntities.rdf
  • data/ontologies/fibo-be/FunctionalEntities/MetadataBEFunctionalEntities.rdf
  • data/ontologies/fibo-be/FunctionalEntities/Publishers.rdf
  • data/ontologies/fibo-be/GovernmentEntities/AsianJurisdiction/CentralAsiaGovernmentEntitiesAndJurisdictions.rdf
  • data/ontologies/fibo-be/GovernmentEntities/AsianJurisdiction/EasternAsiaGovernmentEntitiesAndJurisdictions.rdf
  • data/ontologies/fibo-be/GovernmentEntities/AsianJurisdiction/SoutheasternAsiaGovernmentEntitiesAndJurisdictions.rdf
  • data/ontologies/fibo-be/GovernmentEntities/AsianJurisdiction/SouthernAsiaGovernmentEntitiesAndJurisdictions.rdf
  • data/ontologies/fibo-be/GovernmentEntities/AsianJurisdiction/WesternAsiaGovernmentEntitiesAndJurisdictions.rdf
  • data/ontologies/fibo-be/GovernmentEntities/EuropeanJurisdiction/EUGovernmentEntitiesAndJurisdictions.rdf
  • data/ontologies/fibo-be/GovernmentEntities/EuropeanJurisdiction/EasternEuropeGovernmentEntitiesAndJurisdictions.rdf
  • data/ontologies/fibo-be/GovernmentEntities/EuropeanJurisdiction/NorthernEuropeGovernmentEntitiesAndJurisdictions.rdf
  • data/ontologies/fibo-be/GovernmentEntities/EuropeanJurisdiction/SouthernEuropeGovernmentEntitiesAndJurisdictions.rdf
  • data/ontologies/fibo-be/GovernmentEntities/EuropeanJurisdiction/UKGovernmentEntitiesAndJurisdictions.rdf
  • data/ontologies/fibo-be/GovernmentEntities/EuropeanJurisdiction/WesternEuropeGovernmentEntitiesAndJurisdictions.rdf
  • data/ontologies/fibo-be/GovernmentEntities/GovernmentEntities.rdf
  • data/ontologies/fibo-be/GovernmentEntities/LatinAmericanJurisdiction/CentralAmericanGovernmentEntitiesAndJurisdictions.rdf
  • data/ontologies/fibo-be/GovernmentEntities/LatinAmericanJurisdiction/SouthAmericanGovernmentEntitiesAndJurisdictions.rdf
  • data/ontologies/fibo-be/GovernmentEntities/MetadataBEGovernmentEntities.rdf
  • data/ontologies/fibo-be/GovernmentEntities/NorthAmericanJurisdiction/CAGovernmentEntitiesAndJurisdictions.rdf
  • data/ontologies/fibo-be/GovernmentEntities/NorthAmericanJurisdiction/CaribbeanGovernmentEntitiesAndJurisdictions.rdf
  • data/ontologies/fibo-be/GovernmentEntities/NorthAmericanJurisdiction/MXGovernmentEntitiesAndJurisdictions.rdf
  • data/ontologies/fibo-be/GovernmentEntities/NorthAmericanJurisdiction/USGovernmentEntitiesAndJurisdictions.rdf
  • data/ontologies/fibo-be/LegalEntities/CorporateBodies.rdf
  • data/ontologies/fibo-be/LegalEntities/FormalBusinessOrganizations.rdf
  • data/ontologies/fibo-be/LegalEntities/LEIEntities.rdf
  • data/ontologies/fibo-be/LegalEntities/LegalPersons.rdf
  • data/ontologies/fibo-be/LegalEntities/MetadataBELegalEntities.rdf
  • data/ontologies/fibo-be/LegalEntities/NorthAmericanEntities/USExampleEntities.rdf
  • data/ontologies/fibo-be/LegalEntities/NorthAmericanEntities/USExampleExecutives.rdf
  • data/ontologies/fibo-be/MetadataBE.rdf
  • data/ontologies/fibo-be/OwnershipAndControl/ControlParties.rdf
  • data/ontologies/fibo-be/OwnershipAndControl/CorporateControl.rdf
  • data/ontologies/fibo-be/OwnershipAndControl/CorporateOwnership.rdf
  • data/ontologies/fibo-be/OwnershipAndControl/Executives.rdf
  • data/ontologies/fibo-be/OwnershipAndControl/MetadataBEOwnershipAndControl.rdf
  • data/ontologies/fibo-be/OwnershipAndControl/OwnershipParties.rdf
  • data/ontologies/fibo-be/Partnerships/MetadataBEPartnerships.rdf
  • data/ontologies/fibo-be/Partnerships/Partnerships.rdf
  • data/ontologies/fibo-be/PrivateLimitedCompanies/MetadataBEPrivateLimitedCompanies.rdf
  • data/ontologies/fibo-be/PrivateLimitedCompanies/PrivateLimitedCompanies.rdf
  • data/ontologies/fibo-be/README.md
  • data/ontologies/fibo-be/SoleProprietorships/MetadataBESoleProprietorships.rdf
  • data/ontologies/fibo-be/SoleProprietorships/SoleProprietorships.rdf
  • data/ontologies/fibo-be/Trusts/MetadataBETrusts.rdf
  • data/ontologies/fibo-be/Trusts/Trusts.rdf
  • data/ontologies/fibo-be/catalog-v001.xml
  • data/ontologies/fibo-fnd/Accounting/AccountingEquity.rdf
  • data/ontologies/fibo-fnd/Accounting/CashFlows.rdf
  • data/ontologies/fibo-fnd/Accounting/CurrencyAmount.rdf
  • data/ontologies/fibo-fnd/Accounting/ISO4217-CurrencyCodes.rdf
  • data/ontologies/fibo-fnd/Accounting/MetadataFNDAccounting.rdf
  • data/ontologies/fibo-fnd/AgentsAndPeople/Agents.rdf
  • data/ontologies/fibo-fnd/AgentsAndPeople/MetadataFNDAgentsAndPeople.rdf
  • data/ontologies/fibo-fnd/AgentsAndPeople/People.rdf
  • data/ontologies/fibo-fnd/Agreements/Agreements.rdf
  • data/ontologies/fibo-fnd/Agreements/Contracts.rdf
  • data/ontologies/fibo-fnd/Agreements/MetadataFNDAgreements.rdf
  • data/ontologies/fibo-fnd/AllFND-NorthAmerica.rdf
  • data/ontologies/fibo-fnd/AllFND.rdf
  • data/ontologies/fibo-fnd/Arrangements/Arrangements.rdf
  • data/ontologies/fibo-fnd/Arrangements/Assessments.rdf
  • data/ontologies/fibo-fnd/Arrangements/ClassificationSchemes.rdf
  • data/ontologies/fibo-fnd/Arrangements/Documents.rdf
  • data/ontologies/fibo-fnd/Arrangements/IdentifiersAndIndices.rdf
  • data/ontologies/fibo-fnd/Arrangements/Lifecycles.rdf
  • data/ontologies/fibo-fnd/Arrangements/MetadataFNDArrangements.rdf
  • data/ontologies/fibo-fnd/Arrangements/Ratings.rdf
  • data/ontologies/fibo-fnd/Arrangements/Reporting.rdf
  • data/ontologies/fibo-fnd/DatesAndTimes/BusinessDates.rdf
  • data/ontologies/fibo-fnd/DatesAndTimes/FinancialDates.rdf
  • data/ontologies/fibo-fnd/DatesAndTimes/MetadataFNDDatesAndTimes.rdf
  • data/ontologies/fibo-fnd/DatesAndTimes/Occurrences.rdf
  • data/ontologies/fibo-fnd/GoalsAndObjectives/MetadataFNDGoalsAndObjectives.rdf
  • data/ontologies/fibo-fnd/GoalsAndObjectives/Objectives.rdf
  • data/ontologies/fibo-fnd/Law/LegalCapacity.rdf
  • data/ontologies/fibo-fnd/Law/LegalCore.rdf
  • data/ontologies/fibo-fnd/Law/MetadataFNDLaw.rdf
  • data/ontologies/fibo-fnd/MetadataFND.rdf
  • data/ontologies/fibo-fnd/Organizations/FormalOrganizations.rdf
  • data/ontologies/fibo-fnd/Organizations/MetadataFNDOrganizations.rdf
  • data/ontologies/fibo-fnd/OwnershipAndControl/Control.rdf
  • data/ontologies/fibo-fnd/OwnershipAndControl/MetadataFNDOwnershipAndControl.rdf
  • data/ontologies/fibo-fnd/OwnershipAndControl/Ownership.rdf
  • data/ontologies/fibo-fnd/OwnershipAndControl/OwnershipAndControl.rdf
  • data/ontologies/fibo-fnd/Parties/MetadataFNDParties.rdf
  • data/ontologies/fibo-fnd/Parties/Parties.rdf
  • data/ontologies/fibo-fnd/Places/Addresses.rdf
  • data/ontologies/fibo-fnd/Places/Facilities.rdf
  • data/ontologies/fibo-fnd/Places/MetadataFNDPlaces.rdf
  • data/ontologies/fibo-fnd/Places/NorthAmerica/USPostalServiceAddresses.rdf
  • data/ontologies/fibo-fnd/Places/NorthAmerica/USPostalServiceAddressesIndividuals.rdf
  • data/ontologies/fibo-fnd/Places/RealProperty.rdf
  • data/ontologies/fibo-fnd/Places/VirtualPlaces.rdf
  • data/ontologies/fibo-fnd/ProductsAndServices/MetadataFNDProductsAndServices.rdf
  • data/ontologies/fibo-fnd/ProductsAndServices/PaymentsAndSchedules.rdf
  • data/ontologies/fibo-fnd/ProductsAndServices/ProductsAndServices.rdf
  • data/ontologies/fibo-fnd/README.md
  • data/ontologies/fibo-fnd/Relations/MetadataFNDRelations.rdf
  • data/ontologies/fibo-fnd/Relations/Relations.rdf
  • data/ontologies/fibo-fnd/TransactionsExt/MarketTransactions.rdf
  • data/ontologies/fibo-fnd/TransactionsExt/MetadataFNDTransactionsExt.rdf
  • data/ontologies/fibo-fnd/TransactionsExt/REATransactions.rdf
  • data/ontologies/fibo-fnd/TransactionsExt/SecuritiesTransactions.rdf
  • data/ontologies/fibo-fnd/Utilities/Analytics.rdf
  • data/ontologies/fibo-fnd/Utilities/AnnotationVocabulary.rdf
  • data/ontologies/fibo-fnd/Utilities/MetadataFNDUtilities.rdf
  • data/ontologies/provo.ttl
  • data/ontologies/qudt-core.ttl
  • data/ontologies/qudt-quantitykinds.ttl
  • data/ontologies/qudt-units.ttl
  • data/ontologies/schemaorg.ttl
  • data/ontologies/skos/skos-core.rdf
  • data/ontologies/skos/skos-owl1dl.rdf
  • data/ontologies/skos/skos-xl.rdf
  • data/ontologies/skr-datev/README.md
  • data/ontologies/skr-datev/parse_pdfs.py
  • data/ontologies/time.ttl
  • data/ontologies/zugferd/FACTUR-X_EN16931.sch
  • data/ontologies/zugferd/FACTUR-X_EN16931.xsd
  • data/ontologies/zugferd/FACTUR-X_EN16931_codedb.xml
  • data/ontologies/zugferd/FACTUR-X_EN16931_urn_un_unece_uncefact_data_standard_QualifiedDataType_100.xsd
  • data/ontologies/zugferd/FACTUR-X_EN16931_urn_un_unece_uncefact_data_standard_ReusableAggregateBusinessInformationEntity_100.xsd
  • data/ontologies/zugferd/FACTUR-X_EN16931_urn_un_unece_uncefact_data_standard_UnqualifiedDataType_100.xsd
  • modules/fibobe/manifest.yaml
  • modules/fibofnd/manifest.yaml
  • modules/owltime/manifest.yaml
  • modules/provo/manifest.yaml
  • modules/qudt/manifest.yaml
  • modules/schemaorg/manifest.yaml
  • modules/skos/manifest.yaml
  • modules/skr03-bau/manifest.yaml
  • modules/skr03/manifest.yaml
  • modules/skr04/manifest.yaml
  • modules/zugferd-rules/manifest.yaml
  • modules/zugferd/manifest.yaml

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/hydrate-dolce-dul-owl-Ce9Oa

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 299e6c43b6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +67 to +74
g: OGIT::SKR03_V1.0,
version: OGIT::SKR03_V1.1,
domain_name: "skr03-bau".to_string(),
inherits_from: Some(OGIT::DOLCE_V1.0),
starting_entity_id: 100,
iri_prefix: SKR03_BAU_IRI_PREFIX.to_string(),
};
h.hydrate(csv_path, registry)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Merge SKR03 Bau rows instead of overwriting SKR03 bundle

hydrate_skr03_bau_from registers its data into the same G slot (OGIT::SKR03_V1.0) as hydrate_skr03, but register_bundle replaces any existing bundle at that slot. In practice, if a caller hydrates canonical SKR03 and then adds Bau extensions (which the docs/tests describe as coexisting via a different IRI prefix), the second call drops all canonical 4-digit SKR03 accounts and leaves only Bau rows. This breaks mixed consumers that expect both account sets in one SKR03 context.

Useful? React with 👍 / 👎.

Comment on lines +131 to +135
Ok(Event::Start(e)) | Ok(Event::Empty(e)) => {
let qname = e.name();
let local: Vec<u8> = local_name(qname.as_ref()).to_vec();
let id = attr_value(&e, b"id");
match local.as_slice() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reset assert/report state for self-closing Schematron nodes

The parser handles Event::Start and Event::Empty with the same branch and always sets in_assert_or_report = true for assert/report, but self-closing elements (<assert .../> or <report .../>) never emit a matching End event. That leaves the state stuck "inside" an assertion, so later unrelated text can be scanned as rule text and produce spurious /rule/... IRIs. This is input-dependent but can corrupt rule extraction for valid XML event streams.

Useful? React with 👍 / 👎.

P1 — SKR03 Bau no longer overwrites canonical SKR03 slot:

  Before: hydrate_skr03_bau registered into OGIT::SKR03_V1 (same slot
  as canonical), so a caller that hydrated both lost the canonical
  4-digit account set on the second call. The test only invoked Bau
  alone so the silent overwrite went undetected.

  After: SKR 03 Bau hydrates into its OWN G slot OGIT::SKR03BAU_V1=42,
  with inherits_from: Some(OGIT::SKR03_V1.0) to make the structural
  dependency on canonical SKR 03 explicit. Mixed consumers can now
  hold canonical SKR 03 (4-digit, slot 40) AND Bau (6-digit, slot 42)
  in the same OntologyRegistry without interference.

  - crates/lance-graph-contract/build.rs: allocates SKR03BAU=42
  - crates/lance-graph-ontology/src/hydrators/skr_datev.rs: hydrate_skr03_bau
    writes to OGIT::SKR03BAU_V1.0 with the new inherits_from
  - modules/skr03-bau/manifest.yaml: declares 5 anchor trade-specific
    accounts (Sand- und Kiesausbeute, Bauliche Anlagen für stationäre
    Fertigung, Bauten auf dem Bauhof) with stable u16 entity IDs
  - tests: skr03_bau_extensions_hydrate_into_skr03_slot renamed to
    skr03_bau_extensions_hydrate_into_dedicated_slot (expects slot 42)
  - NEW REGRESSION TEST: skr03_canonical_and_bau_coexist_in_one_registry
    hydrates both schemes in sequence, asserts:
      - canonical bundle still has 1400+ entities (not dropped)
      - canonical SKR 03/1000 (Kasse) resolves in canonical bundle
      - Bau /007510 (Sand- und Kiesausbeute) resolves in Bau bundle
      - Bau IRIs do NOT resolve in canonical bundle (and vice versa)

P2 — Schematron Event::Empty no longer leaks state into later text:

  Before: <assert .../>, <report .../>, <pattern .../> (self-closing)
  emitted Event::Empty handled with the same branch as Event::Start
  and set in_assert_or_report = true. Self-closing elements never
  emit a matching Event::End, so the flag stayed stuck true and the
  parser scanned later unrelated text body as rule message text,
  producing spurious /rule/... IRIs.

  After: Event::Start and Event::Empty are split into separate match
  arms. Empty interns the @id only (no body to collect) and does NOT
  touch in_assert_or_report. Start sets the flag; End extracts rule
  IDs from current_text_buf and resets state. Self-closing elements
  no longer affect state.

  - crates/lance-graph-ontology/src/hydrators/schematron.rs
  - NEW REGRESSION TEST: self_closing_assert_does_not_capture_later_text
    constructs a fixture with a self-closing <assert/> followed by
    a non-assert <bar> element containing "[BR-99]" text, then a
    normal Start/End assert containing "[BR-42]":
      - A-EMPTY-001 and A-NORMAL-001 assert IRIs both resolve
      - BR-42 (real rule from message body) resolves
      - BR-99 (from stray text outside any assert) does NOT resolve

Side fix: replaced `if !map.contains_key(&iri) { map.insert(iri, id) }`
in SkrHydrator with the Entry::Vacant pattern to clear a clippy
`map_entry` warning. Clippy back to 5 warnings (all pre-existing
oxrdf deprecation warnings, no new ones).

All 116 lance-graph-ontology tests pass (was 114; +2 regression tests);
downstream consumers build clean.
@AdaWorldAPI AdaWorldAPI merged commit a891cc4 into main May 21, 2026
7 of 8 checks passed
AdaWorldAPI pushed a commit that referenced this pull request May 21, 2026
… vocabs

Companion to PR #407 (merged). Expands `NamespaceRegistry::seed_defaults`
from 16 to 29 entries, registering the 13 external vocabularies that
PR #407 added hydrators for. This is the O(1) IRI ↔ context_id matching
table backed by `lance_cache.rs`'s Lance dataset; consumers like
smb-office-rs and woa-rs lookup by namespace shortname instead of
hand-rolling slot constants.

Why this lives in lance-graph-ontology, not in OGIT:
- Public OWL/RDF source files stay pristine in data/ontologies/
  (DOLCE+DUL, FIBO-FND/BE, OWL-Time, PROV-O, QUDT, schema.org, SKOS,
  ZUGFeRD CII XSDs + Schematron). Modifying them taints downstream use.
- The OGIT repo is authoritative for namespace registrations but adding
  new TTL files there with hand-picked contextIds would be drift.
- The matching table belongs in the CLIENT (lance-graph-ontology), keyed
  by namespace shortname, persisted via the existing lance_cache layer.
- Per user direction 2026-05-21: "expand always but drift is probably bad"
  + "deinterlace them locally and keep that matching table in a lance
  table for O(1) and check what lance-graph-ontology has in regards"
  → expansion lives here, OGIT untouched.

Allocation:

  ID    Namespace                            PR / Hydrator
  ─────────────────────────────────────────────────────────
   0    SMB                                  (pre-existing)
   1    WorkOrder                            (pre-existing)
   2    Healthcare                           (pre-existing)
   3    Network                              (pre-existing)
   4    EmailCorrespondance                  (pre-existing)
   5    SharePoint                           (pre-existing)
  10-19 Medical/<sub>                        (pre-existing, dense)
  20    Foundation/DOLCE-DUL                 bO-1   hydrate_dolce
  21    Foundation/OWL-Time                  bO-2   hydrate_owltime
  22    Foundation/PROV-O                    bO-3   hydrate_provo
  23    Foundation/QUDT                      bO-4   hydrate_qudt
  24    Foundation/schema-org                bO-8   hydrate_schemaorg
  25    Foundation/SKOS                      bO-5   hydrate_skos
  30    FinancialAccounting/FIBO-FND         bO-6   hydrate_fibo_fnd
  31    FinancialAccounting/FIBO-BE          bO-7   hydrate_fibo_be
  32    FinancialAccounting/ZUGFeRD          bO-16  hydrate_zugferd
  33    FinancialAccounting/ZUGFeRD-Rules    bO-15  hydrate_zugferd_rules
  34    FinancialAccounting/SKR03            bO-13  hydrate_skr03
  35    FinancialAccounting/SKR04            bO-13  hydrate_skr04
  36    FinancialAccounting/SKR03-Bau        bO-13  hydrate_skr03_bau

Allocation policy matches the existing Medical/<sub> pattern: dense
within family-range, gaps between ranges left as expansion room.
`allocate()` continues to fill gaps 6..=9 and 26..=29 first, then 37+.

Notes:
- `next_free_id` doc-comment updated to reflect the new seed layout.
  First dynamic id is now 6 (was already 6 in practice; the prior
  comment said "20" which was off by 14).
- Three regression tests updated:
  * `seed_defaults_has_sixteen_entries` → `_has_twenty_nine_entries`
  * `seed_defaults_assigns_canonical_ids` adds spot-checks at 20/25/30/34/35/36
  * `allocate_skips_to_first_unused_id` len assertion 16 → 29
- One integration test (`tests/context_id_test.rs`) updated to match.

All 116 lance-graph-ontology tests pass; clippy clean (5 pre-existing
oxrdf deprecation warnings, no new); downstream consumers
(callcenter, consumer-conformance, cognitive-shader-driver) build clean.
AdaWorldAPI added a commit that referenced this pull request May 21, 2026
…-Ce9Oa

feat(ontology): seed NamespaceRegistry with bO-* upstream vocabs (PR #407 follow-up)
AdaWorldAPI pushed a commit that referenced this pull request May 21, 2026
…st PR #407)

PR #407 + the ~11 preceding bO-* feature commits shipped the
concrete OWL/DOLCE/OGIT cross-walk hydrators in
lance_graph_ontology::hydrators::*. §4 of this plan referenced
those surfaces abstractly; this commit tightens to concrete
type pointers.

§4.1 (OWL/DOLCE cross-walk surface) — table now names the
hydrator that populates each MetaAnchors field:
- owl_upper_class + dolce_marker → hydrate_dolce
  (OGIT::DOLCE_V1, inherits_from: None, 17-IRI edge whitelist
  covering rdfs:subClassOf + owl:equivalentClass + DnS
  classify/role-binding + dul:hasPart/isPartOf +
  dul:hasTimeInterval/isObservableAt; note the canonical
  DOLCE+DUL Endurant→Object/Perdurant→Event rename).
- foundry_object_type → hydrate_schemaorg + hydrate_fibo_be
  for the upper-class anchors Foundry typically maps to.
- wikidata_qid → not yet in the hydrator surface; deferred
  until a tenant requests Wikidata sync (~50 LOC glue).

§4.3 (NEW — Hydrator inventory) — full surface map:
- Generic substrate: OwlHydrator (the bO-* scaffold every
  hydrator instantiates), MetaStructureHydrator trait,
  ContextBundle, EntityId, OntologySlot, HydrateErr.
- Layered ontologies (L1 → L4 sector):
  · L1: hydrate_dolce (root, inherits_from: None)
  · L2: hydrate_owltime / hydrate_provo / hydrate_qudt
    (all inherits_from: Some(OGIT::DOLCE_V1.0))
  · L3: hydrate_schemaorg (commercial-web)
  · Sector: hydrate_skos, hydrate_fibo_fnd,
    hydrate_fibo_be (FIBO BE inherits FND)
- Dedicated (non-OWL): SchematronHydrator,
  XsdHydrator + collect_xsd_files, SkrHydrator +
  hydrate_skr03/skr04/skr03_bau + the three IRI prefix
  constants.
- ZUGFeRD/Factur-X: hydrate_zugferd + hydrate_zugferd_rules
  (XSD + Schematron over EN16931).
- Full re-export surface from the crate root shown as a
  single `use` block for consumer ergonomics.

§4.3 also maps each plan deliverable to the hydrators it uses:
- D-UB-1 names the producer-side shape.
- D-UB-2 (SmbBridge) declares no hydrator dep until OGIT/NTO/SMB/
  ships.
- D-UB-3 (lance_cache::ontology_cache_schema) persists each
  hydrator's ContextBundle output as the Lance column rows.
- D-UB-4..6 (per-consumer constructors) take an already-hydrated
  Arc<OntologyRegistry>; deployment chooses the menu:
  · woa-rs: dolce + provo + qudt + schemaorg + fibo_fnd +
    skr03/skr04 + future OGIT/NTO/WorkOrder.
  · smb-office-rs: same minus WorkOrder, plus skr03_bau +
    zugferd + zugferd_rules.
  · MedCare-rs: dolce + owltime + provo + qudt + skos +
    future OGIT/NTO/Healthcare.

No deliverable IDs renumbered; this is a clarification of §4's
referenced surface against the now-shipped types. Other plan
sections unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants