# Contract Semantics Toolkit

Ontology-grounded contract authoring and validation for docpact. Demonstrates
document analysis, contract recommendation, alias resolution (AGROVOC + GeoNames),
contract materialization, and SHACL validation — all with real documents and contracts.

## Design

The `contract-semantics` toolkit sits upstream of docpact: it helps **author** contracts,
not execute them. The contract JSON remains the sole interface between the two packages.

**Authoring workflow**: analyze raw documents → recommend a contract skeleton → enrich with
ontology aliases (AGROVOC for crops/metrics, GeoNames for regions) → diff alias
coverage → materialize the final contract → validate output records against SHACL shapes.

**Runtime workflow**: `build_semantic_context()` resolves all concept URIs in an annotated
contract and produces a `SemanticContext` — a plain dataclass that the docpact pipeline
consumes for runtime alias enrichment, pre-flight header checks, and post-extraction
value validation. docpact never imports from `contract_semantics` — the `SemanticContext`
is the sole interface.

This notebook walks through each step using the Russian agriculture contract
(`ru_ag_ministry.json`) and Australian shipping stem contract (`au_shipping_stem.json`)
as real-world examples. Ontology adapters use small in-memory fixtures for offline
reproducibility — in production, use `contract-semantics fetch-agrovoc` and
`contract-semantics fetch-geonames` to download full datasets.

In [None]:
import sys
import json
import shutil
from pathlib import Path

sys.path.insert(0, "tools/contract-semantics/src")

# Ensure dependencies are available (uv-managed venv — no pip)
def _ensure_pkg(import_name, pip_name):
    try:
        __import__(import_name)
    except ImportError:
        import subprocess
        uv = shutil.which("uv")
        if uv:
            subprocess.check_call([uv, "pip", "install", pip_name])
        else:
            subprocess.check_call([sys.executable, "-m", "pip", "install", pip_name])

_ensure_pkg("rdflib", "rdflib>=7.0,<8")
_ensure_pkg("docx", "python-docx")
_ensure_pkg("pyshacl", "pyshacl>=0.26")

import rdflib
print(f"rdflib {rdflib.__version__}")
print("Setup complete.")

In [None]:
# Build small AGROVOC + GeoNames fixtures for offline testing.
# In production: `contract-semantics fetch-agrovoc` / `contract-semantics fetch-geonames`

from rdflib import Graph, Literal, Namespace
from rdflib.namespace import SKOS

AGROVOC = Namespace("http://aims.fao.org/aos/agrovoc/")
g = Graph()

# All crops referenced in ru_ag_ministry.json
for uri_frag, en, ru, alts in [
    ("c_8373",  "wheat",      "\u043f\u0448\u0435\u043d\u0438\u0446\u0430",      ["common wheat"]),
    ("c_631",   "barley",     "\u044f\u0447\u043c\u0435\u043d\u044c",        ["barleycorn"]),
    ("c_12332", "maize",      "\u043a\u0443\u043a\u0443\u0440\u0443\u0437\u0430",      ["corn"]),
    ("c_6599",  "rice",       "\u0440\u0438\u0441",           []),
    ("c_1094",  "buckwheat",  "\u0433\u0440\u0435\u0447\u0438\u0445\u0430",       []),
    ("c_7440",  "sunflowers", "\u043f\u043e\u0434\u0441\u043e\u043b\u043d\u0435\u0447\u043d\u0438\u043a",  ["sunflower"]),
    ("c_14477", "soyabeans",  "\u0441\u043e\u0435\u0432\u044b\u0435 \u0431\u043e\u0431\u044b",   ["soya", "soybean"]),
    ("c_6476",  "rapeseed",   "\u0440\u0430\u043f\u0441",          ["canola", "rape"]),
]:
    c = AGROVOC[uri_frag]
    g.add((c, SKOS.prefLabel, Literal(en, lang="en")))
    g.add((c, SKOS.prefLabel, Literal(ru, lang="ru")))
    for alt in alts:
        g.add((c, SKOS.altLabel, Literal(alt, lang="en")))

# Wheat narrower: durum wheat
durum = AGROVOC["c_33556"]
g.add((AGROVOC["c_8373"], SKOS.narrower, durum))
g.add((durum, SKOS.prefLabel, Literal("durum wheat", lang="en")))
g.add((durum, SKOS.prefLabel, Literal("\u0442\u0432\u0451\u0440\u0434\u0430\u044f \u043f\u0448\u0435\u043d\u0438\u0446\u0430", lang="ru")))

# Harvest metrics
for uri_frag, en, ru, alts in [
    ("c_330918", "area harvested", "\u0443\u0431\u0440\u0430\u043d\u043d\u0430\u044f \u043f\u043b\u043e\u0449\u0430\u0434\u044c", []),
    ("c_5765",   "crop yield",     "\u0443\u0440\u043e\u0436\u0430\u0439\u043d\u043e\u0441\u0442\u044c",     ["yield"]),
    ("c_3486",   "harvesting",     "\u0443\u0431\u043e\u0440\u043a\u0430 \u0443\u0440\u043e\u0436\u0430\u044f",   ["collected"]),
]:
    c = AGROVOC[uri_frag]
    g.add((c, SKOS.prefLabel, Literal(en, lang="en")))
    g.add((c, SKOS.prefLabel, Literal(ru, lang="ru")))
    for alt in alts:
        g.add((c, SKOS.altLabel, Literal(alt, lang="en")))

agrovoc_graph = g
print(f"AGROVOC fixture: {len(g)} triples")

# GeoNames mock: 3 Russian ADM1 regions
geonames_features = {
    524894: {"geoname_id": 524894, "name": "Moscow Oblast", "asciiname": "Moscow Oblast",
             "alternatenames": "Moskovskaya Oblast", "lat": 55.75, "lng": 37.62,
             "feature_class": "A", "feature_code": "ADM1", "country_code": "RU",
             "admin1_code": "48", "population": 7500000},
    472039: {"geoname_id": 472039, "name": "Voronezh Oblast", "asciiname": "Voronezh Oblast",
             "alternatenames": "Voronezhskaya Oblast", "lat": 51.67, "lng": 39.21,
             "feature_class": "A", "feature_code": "ADM1", "country_code": "RU",
             "admin1_code": "86", "population": 2300000},
    511180: {"geoname_id": 511180, "name": "Penza Oblast", "asciiname": "Penza Oblast",
             "alternatenames": "Penzenskaya Oblast", "lat": 53.12, "lng": 44.10,
             "feature_class": "A", "feature_code": "ADM1", "country_code": "RU",
             "admin1_code": "57", "population": 1300000},
}
geonames_alt_names = {
    524894: [("ru", "\u041c\u043e\u0441\u043a\u043e\u0432\u0441\u043a\u0430\u044f \u043e\u0431\u043b\u0430\u0441\u0442\u044c"), ("en", "Moscow Region")],
    472039: [("ru", "\u0412\u043e\u0440\u043e\u043d\u0435\u0436\u0441\u043a\u0430\u044f \u043e\u0431\u043b\u0430\u0441\u0442\u044c")],
    511180: [("ru", "\u041f\u0435\u043d\u0437\u0435\u043d\u0441\u043a\u0430\u044f \u043e\u0431\u043b\u0430\u0441\u0442\u044c")],
}
print(f"GeoNames fixture: {len(geonames_features)} features")

In [None]:
from contract_semantics.agrovoc import AgrovocAdapter
from contract_semantics.geonames import GeoNamesAdapter
from contract_semantics.models import ConceptRef, ResolveConfig
from contract_semantics.resolve import resolve_column
from contract_semantics.diff import diff_aliases, diff_all
from contract_semantics.materialize import materialize_contract
from contract_semantics.context import build_semantic_context
from contract_semantics.validate import generate_shapes, validate_records, records_to_graph

agrovoc = AgrovocAdapter(graph=agrovoc_graph)
geonames = GeoNamesAdapter(features=geonames_features, alternate_names=geonames_alt_names)

print("Contract-semantics modules loaded.")
print(f"  AgrovocAdapter: offline ({len(agrovoc_graph)} triples)")
print(f"  GeoNamesAdapter: offline ({len(geonames_features)} regions)")

---
## 1. Ontology Resolution: AGROVOC

AGROVOC is FAO's multilingual agricultural thesaurus. The contract-semantics toolkit
resolves `concept_uris` in contracts to multilingual labels (`skos:prefLabel`,
`skos:altLabel`) with optional `skos:narrower` traversal.

The Russian agriculture contract uses AGROVOC for two column types:
- **Metric** (harvest output): `area harvested`, `crop yield`, `harvesting`
- **Crop** (planting output): wheat, barley, maize, rice, buckwheat, sunflowers, soyabeans, rapeseed

In [None]:
# Resolve wheat — multilingual labels from AGROVOC
wheat_aliases = agrovoc.resolve_concept(
    "http://aims.fao.org/aos/agrovoc/c_8373",
    languages=["en", "ru"],
    label_types=["prefLabel", "altLabel"],
)

print("Wheat (c_8373) — resolved aliases:")
for a in wheat_aliases:
    print(f"  [{a.language}] {a.label_type}: {a.alias}")

# With narrower traversal — discovers durum wheat
wheat_deep = agrovoc.resolve_concept(
    "http://aims.fao.org/aos/agrovoc/c_8373",
    languages=["en", "ru"],
    label_types=["prefLabel", "altLabel"],
    include_narrower=True,
    narrower_depth=1,
)

print(f"\nWith narrower: {len(wheat_deep)} aliases (vs {len(wheat_aliases)} without)")
known = {a.alias for a in wheat_aliases}
for a in wheat_deep:
    if a.alias not in known:
        print(f"  [narrower] [{a.language}] {a.label_type}: {a.alias}")

In [None]:
# Resolve the Crop column from the planting contract.
# prefix_patterns: ["spring {label}", "{label}"] multiply each resolved label,
# generating "spring wheat", "spring barley" etc. alongside bare "wheat", "barley".

crop_refs = [
    ConceptRef(uri="http://aims.fao.org/aos/agrovoc/c_8373", label="wheat"),
    ConceptRef(uri="http://aims.fao.org/aos/agrovoc/c_631", label="barley"),
    ConceptRef(uri="http://aims.fao.org/aos/agrovoc/c_12332", label="maize"),
]

crop_config = ResolveConfig(
    languages=["en", "ru"],
    label_types=["prefLabel", "altLabel"],
    prefix_patterns=["spring {label}", "{label}"],
)

crop_manual = ["spring wheat", "spring barley", "corn", "grains and grasses"]

crop_result = resolve_column("Crop", crop_refs, crop_manual, crop_config, agrovoc)

print(f"Crop column resolution:")
print(f"  Resolved: {len(crop_result.resolved_aliases)} aliases")
print(f"  Manual:   {len(crop_result.manual_aliases)} aliases")
print(f"  Matched:  {crop_result.matched}")
print(f"  Manual-only: {crop_result.manual_only}")
print(f"  Coverage: {crop_result.coverage:.0%}")

print(f"\nResolved aliases (first 15):")
for a in crop_result.resolved_aliases[:15]:
    pattern = f" (pattern: {a.pattern})" if a.pattern else ""
    print(f"  {a.alias}{pattern}")

---
## 2. Ontology Resolution: GeoNames

GeoNames provides geographic feature metadata. The Russian agriculture contract
uses GeoNames for the **Region** column — 78 Russian ADM1 administrative divisions.
Resolution produces multilingual alternate names (English + Russian), and enrichment
adds lat/lng coordinates for downstream GIS joins.

In [None]:
# Resolve alternate names for Moscow Oblast
moscow = geonames.resolve_geoname(524894, languages=["en", "ru"])

print("Moscow Oblast (524894) — alternate names:")
for a in moscow:
    print(f"  [{a.language}] {a.alias}")

# Resolve via concept URI (as used in contracts)
voronezh = geonames.resolve_concept(
    "https://sws.geonames.org/472039/",
    languages=["en", "ru"],
)
print(f"\nVoronezh Oblast (via URI): {[a.alias for a in voronezh]}")

# Geographic enrichment — lat/lng/admin codes for GIS
print("\nGeo enrichment:")
for gid in [524894, 472039, 511180]:
    e = geonames.enrich_geoname(gid)
    print(f"  {e.name}: lat={e.lat}, lng={e.lng}, "
          f"admin1={e.admin1_code}, pop={e.population:,}")

---
## 3. Alias Diff

`diff_aliases()` compares ontology-resolved aliases against the manual aliases in the
contract. This is the key feedback loop for contract authoring: it shows which manual
aliases the ontology can cover automatically, which manual aliases have no ontology
equivalent, and which ontology labels are available but not yet in the contract.

In [None]:
# Diff for the Metric column (harvest output)
metric_refs = [
    ConceptRef(uri="http://aims.fao.org/aos/agrovoc/c_330918", label="area harvested"),
    ConceptRef(uri="http://aims.fao.org/aos/agrovoc/c_5765", label="crop yield"),
    ConceptRef(uri="http://aims.fao.org/aos/agrovoc/c_3486", label="harvesting"),
]
metric_config = ResolveConfig(languages=["en", "ru"], label_types=["prefLabel", "altLabel"])
metric_manual = ["Area harvested", "collected", "Yield"]

metric_result = resolve_column("Metric", metric_refs, metric_manual, metric_config, agrovoc)

print(diff_aliases(metric_result))
print("\n" + "="*60 + "\n")

# Diff for the Crop column (planting output) — prefix patterns generate "spring wheat" etc.
crop_refs = [
    ConceptRef(uri="http://aims.fao.org/aos/agrovoc/c_8373", label="wheat"),
    ConceptRef(uri="http://aims.fao.org/aos/agrovoc/c_631", label="barley"),
    ConceptRef(uri="http://aims.fao.org/aos/agrovoc/c_12332", label="maize"),
]
crop_config = ResolveConfig(
    languages=["en", "ru"],
    label_types=["prefLabel", "altLabel"],
    prefix_patterns=["spring {label}", "{label}"],
)
crop_manual = ["spring wheat", "spring barley", "corn", "grains and grasses"]
crop_result = resolve_column("Crop", crop_refs, crop_manual, crop_config, agrovoc)

# Multi-column diff — Metric + Crop together
print(diff_all([metric_result, crop_result]))

---
## 4. Contract Materialization

`materialize_contract()` reads an annotated contract (with `concept_uris` and `resolve`
blocks), resolves all concept URIs via the appropriate adapters, merges resolved aliases
into the alias lists (union/resolved_only/manual_priority), strips annotation fields,
and writes a clean contract consumable by docpact's `load_contract()`.

For GeoNames columns with `enrich_fields`, it also produces a **geo sidecar** JSON
mapping region names to coordinates — enabling post-extraction GIS joins.

In [None]:
# Materialize the Russian contract with union merge strategy
materialized = materialize_contract(
    "contracts/ru_ag_ministry.json",
    agrovoc=agrovoc,
    geonames=geonames,
    merge_strategy="union",
)

# Compare alias counts before/after
with open("contracts/ru_ag_ministry.json") as f:
    original = json.load(f)

print("Alias count comparison (harvest output):")
for col_name in ["Metric", "Region"]:
    orig_col = next(c for c in original["outputs"]["harvest"]["schema"]["columns"] if c["name"] == col_name)
    mat_col = next(c for c in materialized["outputs"]["harvest"]["schema"]["columns"] if c["name"] == col_name)
    orig_n = len(orig_col.get("aliases", []))
    mat_n = len(mat_col.get("aliases", []))
    new = sorted(set(mat_col.get("aliases", [])) - set(orig_col.get("aliases", [])))
    print(f"  {col_name}: {orig_n} -> {mat_n} aliases (+{len(new)})")
    if new:
        print(f"    new: {new[:8]}{'...' if len(new) > 8 else ''}")

print("\nAlias count comparison (planting output):")
for col_name in ["Crop"]:
    orig_col = next(c for c in original["outputs"]["planting"]["schema"]["columns"] if c["name"] == col_name)
    mat_col = next(c for c in materialized["outputs"]["planting"]["schema"]["columns"] if c["name"] == col_name)
    orig_n = len(orig_col.get("aliases", []))
    mat_n = len(mat_col.get("aliases", []))
    new = sorted(set(mat_col.get("aliases", [])) - set(orig_col.get("aliases", [])))
    print(f"  {col_name}: {orig_n} -> {mat_n} aliases (+{len(new)})")
    if new:
        print(f"    new: {new[:10]}{'...' if len(new) > 10 else ''}")

In [None]:
# Inspect materialized contract structure — concept_uris and resolve blocks are stripped
print("Materialized contract:")
print(f"  Provider: {materialized['provider']}")
print(f"  Outputs: {list(materialized['outputs'].keys())}")

for name, spec in materialized["outputs"].items():
    cols = spec["schema"]["columns"]
    print(f"\n  {name} ({len(cols)} columns):")
    for c in cols:
        aliases = c.get("aliases", [])
        src = c.get("source", "")
        if aliases:
            print(f"    {c['name']} ({c['type']}): {len(aliases)} aliases")
        elif src:
            print(f"    {c['name']} ({c['type']}): source={src}")

# Verify annotation fields are stripped
has_concept_uris = any(
    "concept_uris" in col
    for spec in materialized["outputs"].values()
    for col in spec["schema"]["columns"]
)
has_resolve = any(
    "resolve" in col
    for spec in materialized["outputs"].values()
    for col in spec["schema"]["columns"]
)
print(f"\nconcept_uris stripped: {not has_concept_uris}")
print(f"resolve blocks stripped: {not has_resolve}")

---
## 5. Document Structural Analysis

`profile_document()` examines a raw PDF or DOCX and extracts structural signals:
column names, data types, cardinalities, table layout (flat/transposed/pivoted),
section labels, temporal patterns, and unit annotations. These signals feed into
the recommendation engine to produce a contract skeleton.

This section profiles a real Russian agricultural DOCX report — the same documents
that `pipeline.ipynb` processes through the extraction pipeline.

In [None]:
from contract_semantics.analyze import profile_document, merge_profiles, format_analysis_report

# Profile one Russian weekly grain report
docx_files = sorted(Path("inputs/docx/input").glob("*.docx"))
print(f"Found {len(docx_files)} DOCX reports:")
for p in docx_files:
    print(f"  {p.name}")

docx_path = docx_files[0]
print(f"\nProfiling: {docx_path.name}")
ru_profile = profile_document(str(docx_path))

print(format_analysis_report(ru_profile))

In [None]:
# Detailed look at the first 3 table profiles
for tp in ru_profile.tables[:3]:
    print(f"\n{'='*70}")
    print(f"  Table {tp.table_index}: {tp.title or '(no title)'}")
    print(f"  Layout: {tp.layout}, {tp.row_count} rows x {tp.col_count} cols")
    if tp.section_labels:
        print(f"  Section labels: {tp.section_labels[:5]}")
    print(f"  Columns:")
    for cp in tp.column_profiles:
        unit = f" [{cp.unit_annotations[0]}]" if cp.unit_annotations else ""
        year = " (YEAR)" if cp.year_detected else ""
        print(f"    {cp.header_text}: {cp.inferred_type}{unit}{year} "
              f"({cp.unique_count} unique / {cp.total_count} total)")
        if cp.sample_values:
            print(f"      samples: {cp.sample_values[:5]}")

---
## 6. Contract Recommendation

`recommend_contract()` takes a `DocumentProfile` (or `MultiDocumentProfile`) and
produces a draft contract JSON with inline `_recommendation` and `_detected_*`
guidance fields. The draft includes:

- Category keywords extracted from table header tokens
- Output schemas with columns, inferred types, and alias candidates
- Year template aliases (`{YYYY}`) for year columns
- Section labels as dimension column aliases
- `report_date` configuration from temporal patterns

`strip_recommendations()` removes all annotation fields, producing a clean
contract ready for `load_contract()`.

In [None]:
from contract_semantics.recommend import recommend_contract, strip_recommendations, compare_contract

# Generate draft contract from the DOCX profile
draft = recommend_contract(ru_profile, provider_name="ru_ag_ministry")

print("Draft contract:")
print(f"  _analyzer_version: {draft.get('_analyzer_version')}")
print(f"  _source_documents: {draft.get('_source_documents')}")
print(f"  Categories: {list(draft.get('categories', {}).keys())}")
print(f"  Outputs: {list(draft.get('outputs', {}).keys())}")

if draft.get("report_date"):
    print(f"  report_date: {draft['report_date']}")

# Show first output schema with inline recommendations
first_out_name = next(iter(draft.get("outputs", {})))
first_out = draft["outputs"][first_out_name]
print(f"\n--- {first_out_name} schema ---")
for col in first_out["schema"]["columns"][:6]:
    print(f"  {col['name']} ({col['type']})")
    if "_recommendation" in col:
        print(f"    rec: {col['_recommendation']}")
    if col.get("aliases"):
        print(f"    aliases: {col['aliases'][:4]}{'...' if len(col['aliases']) > 4 else ''}")

In [None]:
# Compare profile against the existing hand-authored contract
report = compare_contract(ru_profile, "contracts/ru_ag_ministry.json")
print(report)

In [None]:
# Strip recommendation fields -> clean contract
clean = strip_recommendations(draft)

def count_underscore_keys(obj, count=0):
    if isinstance(obj, dict):
        for k, v in obj.items():
            if k.startswith("_"):
                count += 1
            count = count_underscore_keys(v, count)
    elif isinstance(obj, list):
        for item in obj:
            count = count_underscore_keys(item, count)
    return count

print(f"Draft: {count_underscore_keys(draft)} annotation fields")
print(f"Clean: {count_underscore_keys(clean)} annotation fields")

print(f"\nClean contract JSON (excerpt):")
print(json.dumps(clean, indent=2, ensure_ascii=False)[:600] + "\n...")

---
## 7. Multi-Document Analysis

The multi-document workflow profiles several documents independently, then merges
them into a `MultiDocumentProfile` that aligns structurally similar tables across
documents. This is the path for multi-provider contracts like `au_shipping_stem.json`
where 6 different PDF formats all need one canonical schema.

`merge_profiles()` groups tables by structural similarity (column-count ratio +
Jaccard token overlap) and aligns columns, collecting all header variants for
each structural position.

In [None]:
# Profile 3 Australian shipping stem PDFs (subset for speed)
shipping_pdfs = sorted(Path("inputs").glob("*.pdf"))
shipping_pdfs = [p for p in shipping_pdfs if "shipping" in p.name.lower()
                 or "loading" in p.name.lower()
                 or "bunge" in p.name.lower()
                 or "cbh" in p.name.lower().replace(" ", "")]

print(f"Profiling {len(shipping_pdfs)} shipping PDFs:")
pdf_profiles = []
for p in shipping_pdfs[:3]:  # first 3 for speed
    print(f"  {p.name}... ", end="", flush=True)
    prof = profile_document(str(p))
    pdf_profiles.append(prof)
    print(f"{len(prof.tables)} tables")

# Merge into consolidated multi-document profile
multi = merge_profiles(pdf_profiles)
print(f"\n{format_analysis_report(multi)}")

In [None]:
# Inspect aligned columns — header variants across providers
for i, tg in enumerate(multi.table_groups):
    print(f"\n{'='*60}")
    print(f"  Group {i}: {len(tg.tables)} tables, layout={tg.layout}")
    print(f"  Common tokens: {sorted(tg.common_tokens)[:10]}")
    if tg.all_section_labels:
        print(f"  Section labels: {tg.all_section_labels[:5]}")
    print(f"  Aligned columns:")
    for ac in tg.aligned_columns:
        variants = sorted(set(ac.all_headers))
        print(f"    {ac.canonical_header} ({ac.inferred_type}): {variants}")

In [None]:
# Recommend a contract from the multi-provider profile
shipping_draft = recommend_contract(multi, provider_name="au_shipping_stem")

print("Multi-provider draft contract:")
print(f"  Categories: {list(shipping_draft.get('categories', {}).keys())}")
print(f"  Outputs: {list(shipping_draft.get('outputs', {}).keys())}")

# Compare against the hand-authored shipping contract
shipping_report = compare_contract(multi, "contracts/au_shipping_stem.json")
print(f"\n{shipping_report}")

---
## 8. SHACL Validation

`generate_shapes()` auto-generates SHACL NodeShapes from a contract's output
schemas — `sh:datatype` for typed columns, `sh:in` for string columns with aliases.
`validate_records()` checks tabular records against these shapes.

This lets you catch invalid values (wrong types, unknown categories) before they
propagate downstream.

In [None]:
# Generate SHACL shapes from the shipping contract
shapes = generate_shapes("contracts/au_shipping_stem.json")

print(f"Generated shapes for: {list(shapes.keys())}")
for name, path in shapes.items():
    content = Path(path).read_text()
    print(f"\n--- {name} ({path}) ---")
    print(content[:500])
    if len(content) > 500:
        print("...")

In [None]:
# Validate sample records — good and bad
try:
    shapes_path = shapes["vessels"]

    with open("contracts/au_shipping_stem.json") as f:
        au_contract = json.load(f)
    schema_cols = au_contract["outputs"]["vessels"]["schema"]["columns"]

    # Valid records
    good = [
        {"load_port": "GERALDTON", "vessel_name": "MV STAR", "commodity": "Wheat",
         "tons": 50000, "eta": "2025-10-15", "status": "Loading"},
    ]
    ok, report, _ = validate_records(good, shapes_path, schema_columns=schema_cols)
    print(f"Valid records: conforms={ok}")

    # Invalid records — wrong type
    bad = [
        {"load_port": "GERALDTON", "vessel_name": 12345, "tons": "not a number"},
    ]
    ok2, report2, _ = validate_records(bad, shapes_path, schema_columns=schema_cols)
    print(f"Invalid records: conforms={ok2}")
    if not ok2:
        # Show first few lines of the report
        for line in report2.strip().splitlines()[:15]:
            print(f"  {line}")

except ImportError:
    print("SHACL validation requires pyshacl.")
    print("Install with: pip install contract-semantics[shacl]")
except Exception as e:
    print(f"Validation error: {e}")

---
## 9. End-to-End: Russian Agriculture

Complete workflow from raw document to validated extraction. Compares two contracts:
- **Draft** — auto-generated by the analyzer (what you get with zero domain knowledge)
- **Semantic** — hand-authored + ontology-resolved aliases (AGROVOC crops, GeoNames regions)

The difference demonstrates how semantic enforcement improves extraction quality:
more deterministic mappings, enrichment columns, and consistent naming.

In [None]:
# Step 1: Choose and profile a document

from contract_semantics.analyze import profile_document, format_analysis_report

docx_files = sorted(Path("inputs/docx/input").glob("*.docx"))
print("Available DOCX reports:")
for i, p in enumerate(docx_files):
    print(f"  [{i}] {p.name}")

chosen_docx = docx_files[0]
print(f"\nProfiling: {chosen_docx.name}")
e2e_profile = profile_document(str(chosen_docx))
print(format_analysis_report(e2e_profile))

In [None]:
# Step 2: Generate two contracts for comparison
#   (a) Draft — auto-generated from the document profile (zero domain knowledge)
#   (b) Semantic — hand-authored ru_ag_ministry.json enriched with ontology aliases

from contract_semantics.recommend import recommend_contract, strip_recommendations

# (a) Draft contract
draft = recommend_contract(e2e_profile, provider_name="ru_ag_draft")
draft_clean = strip_recommendations(draft)

# (b) Semantic contract — materialize with ontology aliases
semantic = materialize_contract(
    "contracts/ru_ag_ministry.json",
    agrovoc=agrovoc,
    geonames=geonames,
    merge_strategy="union",
)

# Save both to temp files for the pipeline
import tempfile
tmp_dir = Path(tempfile.mkdtemp())
draft_path = tmp_dir / "draft_contract.json"
semantic_path = tmp_dir / "semantic_contract.json"
draft_path.write_text(json.dumps(draft_clean, indent=2, ensure_ascii=False))
semantic_path.write_text(json.dumps(semantic, indent=2, ensure_ascii=False))

# Compare structure
draft_aliases = sum(
    len(c.get("aliases", []))
    for out in draft_clean.get("outputs", {}).values()
    for c in out.get("schema", {}).get("columns", [])
)
semantic_aliases = sum(
    len(c.get("aliases", []))
    for out in semantic.get("outputs", {}).values()
    for c in out.get("schema", {}).get("columns", [])
)
draft_enrichment = sum(
    1 for out in draft_clean.get("outputs", {}).values()
    for c in out.get("schema", {}).get("columns", [])
    if c.get("source")
)
semantic_enrichment = sum(
    1 for out in semantic.get("outputs", {}).values()
    for c in out.get("schema", {}).get("columns", [])
    if c.get("source")
)

print(f"{'='*60}")
print(f"  Contract comparison")
print(f"{'='*60}")
print(f"\n  {'':20s} {'Draft':>10s}  {'Semantic':>10s}")
print(f"  {'Categories':20s} {len(draft_clean.get('categories', {})):>10d}  {len(semantic.get('categories', {})):>10d}")
print(f"  {'Outputs':20s} {len(draft_clean.get('outputs', {})):>10d}  {len(semantic.get('outputs', {})):>10d}")
print(f"  {'Total aliases':20s} {draft_aliases:>10d}  {semantic_aliases:>10d}")
print(f"  {'Enrichment cols':20s} {draft_enrichment:>10d}  {semantic_enrichment:>10d}")

print(f"\nDraft saved to:    {draft_path}")
print(f"Semantic saved to: {semantic_path}")

In [None]:
# Step 3: Run extraction with the semantic contract (preview)

import nest_asyncio
nest_asyncio.apply()

from docpact.contracts import load_contract
from docpact.pipeline import process_document_async

semantic_cc = load_contract(str(semantic_path))
print(f"Semantic contract: {semantic_cc.provider}")
print(f"  Model: {semantic_cc.model}")
print(f"  Outputs: {list(semantic_cc.outputs.keys())}")

semantic_result = await process_document_async(str(chosen_docx), semantic_cc)

print(f"\nExtraction complete (report_date: {semantic_result.report_date})")
for name, df in semantic_result.dataframes.items():
    if df is not None and len(df) > 0:
        print(f"\n{'='*60}")
        print(f"  {name}: {len(df)} rows x {len(df.columns)} cols")
        print(f"  Columns: {list(df.columns)}")
        print(f"{'='*60}")
        display(df.head(8))
    else:
        print(f"\n  {name}: EMPTY")

In [None]:
# Step 4: Run extraction with the draft contract and compare

try:
    draft_cc = load_contract(str(draft_path))
    print(f"Draft contract: {draft_cc.provider}")
    print(f"  Outputs: {list(draft_cc.outputs.keys())}")

    draft_result = await process_document_async(str(chosen_docx), draft_cc)
    print(f"Draft extraction complete (report_date: {draft_result.report_date})")
except Exception as e:
    draft_result = None
    print(f"Draft extraction error: {e}")
    print("This is expected — the auto-generated draft may lack structure")
    print("needed for full extraction (e.g. year templates, enrichment rules).")

# Comparison
print(f"\n{'='*60}")
print(f"  Extraction comparison: Draft vs Semantic")
print(f"{'='*60}")

all_outputs = sorted(set(
    list(semantic_result.dataframes.keys()) +
    (list(draft_result.dataframes.keys()) if draft_result else [])
))

print(f"\n  {'Output':20s} {'Draft':>15s}  {'Semantic':>15s}")
print(f"  {'-'*20} {'-'*15}  {'-'*15}")

for name in all_outputs:
    df_s = semantic_result.dataframes.get(name)
    df_d = draft_result.dataframes.get(name) if draft_result else None

    s_shape = f"{len(df_s)}r x {len(df_s.columns)}c" if df_s is not None and len(df_s) > 0 else "empty"
    d_shape = f"{len(df_d)}r x {len(df_d.columns)}c" if df_d is not None and len(df_d) > 0 else "empty"

    print(f"  {name:20s} {d_shape:>15s}  {s_shape:>15s}")

    # Columns only in semantic output (enrichment columns)
    if df_s is not None and df_d is not None and len(df_s) > 0 and len(df_d) > 0:
        extra = sorted(set(df_s.columns) - set(df_d.columns))
        if extra:
            print(f"  {'':20s} Semantic adds: {extra}")

rd_s = semantic_result.report_date or "N/A"
rd_d = (draft_result.report_date if draft_result else None) or "N/A"
print(f"\n  {'report_date':20s} {rd_d:>15s}  {rd_s:>15s}")

# Show a sample semantic DataFrame with all enrichment columns visible
print(f"\n{'='*60}")
print(f"  Semantic output sample (with enrichment columns)")
print(f"{'='*60}")
first_name = next(iter(semantic_result.dataframes), None)
if first_name:
    df_show = semantic_result.dataframes[first_name]
    if df_show is not None and len(df_show) > 0:
        display(df_show.head(10))

In [None]:
# Interactive dashboard: Semantic Web Enrichment Benefits

_ensure_pkg("plotly", "plotly>=5.0")

import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px

# ═══════════════════════════════════════════════════════════
# Prepare data
# ═══════════════════════════════════════════════════════════

# 1. Per-column alias breakdown: manual vs ontology-resolved
with open("contracts/ru_ag_ministry.json") as f:
    orig_contract = json.load(f)

alias_rows = []
for out_name, out_spec in semantic.get("outputs", {}).items():
    for col in out_spec.get("schema", {}).get("columns", []):
        all_aliases = col.get("aliases", [])
        if not all_aliases:
            continue
        col_name = col["name"]
        orig_aliases = []
        if out_name in orig_contract.get("outputs", {}):
            for oc in orig_contract["outputs"][out_name]["schema"]["columns"]:
                if oc["name"] == col_name:
                    orig_aliases = oc.get("aliases", [])
                    break
        manual_n = len(orig_aliases)
        ontology_n = max(0, len(all_aliases) - manual_n)
        alias_rows.append(dict(
            column=f"{col_name}", output=out_name,
            manual=manual_n, ontology=ontology_n, total=len(all_aliases),
        ))

# 2. Extraction comparison
extract_rows = []
for name in sorted(semantic_result.dataframes.keys()):
    df_s = semantic_result.dataframes.get(name)
    df_d = draft_result.dataframes.get(name) if draft_result else None
    s_rows = len(df_s) if df_s is not None else 0
    d_rows = len(df_d) if df_d is not None else 0
    s_cols = len(df_s.columns) if df_s is not None and len(df_s) > 0 else 0
    d_cols = len(df_d.columns) if df_d is not None and len(df_d) > 0 else 0
    extract_rows.append(dict(output=name, draft_rows=d_rows, semantic_rows=s_rows,
                             draft_cols=d_cols, semantic_cols=s_cols))

# 3. Crop breakdown (only available via title enrichment)
df_harvest = semantic_result.dataframes.get("harvest")
crop_summary = None
if df_harvest is not None and "Crop" in df_harvest.columns and "Value" in df_harvest.columns:
    crop_summary = (
        df_harvest.dropna(subset=["Crop", "Value"])
        .groupby("Crop")["Value"].sum()
        .sort_values(ascending=True).tail(10)
    )

# 4. Geographic grounding — join harvest data with GeoNames coordinates
geo_rows = []
if df_harvest is not None and "Region" in df_harvest.columns:
    for gid, feat in geonames_features.items():
        region_name = feat["name"]
        # Match by first word (e.g. "Moscow" matches "Moscow Oblast" or "Moscow Region")
        keyword = region_name.split()[0]
        matches = df_harvest[df_harvest["Region"].str.contains(keyword, case=False, na=False)]
        if len(matches) > 0:
            total = matches["Value"].dropna().sum()
            geo_rows.append(dict(
                region=region_name, lat=feat["lat"], lng=feat["lng"],
                value=total, records=len(matches),
                admin1=feat["admin1_code"], pop=feat["population"],
            ))

# ═══════════════════════════════════════════════════════════
# Figure 1: Main Dashboard (2x2)
# ═══════════════════════════════════════════════════════════

fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=[
        "<b>Contract Structure</b>  Draft vs Semantic",
        "<b>Extraction Results</b>  Rows per Output",
        "<b>Alias Expansion</b>  Manual + Ontology-Resolved",
        "<b>Crop Analysis</b>  Enabled by Title Enrichment",
    ],
    vertical_spacing=0.16, horizontal_spacing=0.14,
)

DRAFT_COLOR = "#EF553B"
SEMANTIC_COLOR = "#636EFA"
ONTOLOGY_COLOR = "#00CC96"
CROP_COLOR = "#AB63FA"

# ── Panel 1: Contract structure ──
struct_labels = ["Categories", "Outputs", "Aliases", "Enrichment<br>Columns"]
struct_draft = [
    len(draft_clean.get("categories", {})),
    len(draft_clean.get("outputs", {})),
    draft_aliases, draft_enrichment,
]
struct_semantic = [
    len(semantic.get("categories", {})),
    len(semantic.get("outputs", {})),
    semantic_aliases, semantic_enrichment,
]
fig.add_trace(go.Bar(
    x=struct_labels, y=struct_draft, name="Draft",
    marker_color=DRAFT_COLOR, legendgroup="a", legendgrouptitle_text="Contract",
), row=1, col=1)
fig.add_trace(go.Bar(
    x=struct_labels, y=struct_semantic, name="Semantic",
    marker_color=SEMANTIC_COLOR, legendgroup="a",
), row=1, col=1)

# ── Panel 2: Extraction results ──
out_names = [r["output"] for r in extract_rows]
fig.add_trace(go.Bar(
    x=out_names, y=[r["draft_rows"] for r in extract_rows], name="Draft",
    marker_color=DRAFT_COLOR, legendgroup="b", legendgrouptitle_text="Extraction",
    showlegend=True,
    text=[f'{r["draft_cols"]}c' for r in extract_rows], textposition="outside",
), row=1, col=2)
fig.add_trace(go.Bar(
    x=out_names, y=[r["semantic_rows"] for r in extract_rows], name="Semantic",
    marker_color=SEMANTIC_COLOR, legendgroup="b",
    text=[f'{r["semantic_cols"]}c' for r in extract_rows], textposition="outside",
), row=1, col=2)

# ── Panel 3: Alias expansion ──
a_cols = [r["column"] for r in alias_rows]
fig.add_trace(go.Bar(
    x=a_cols, y=[r["manual"] for r in alias_rows], name="Manual",
    marker_color=SEMANTIC_COLOR, legendgroup="c", legendgrouptitle_text="Aliases",
), row=2, col=1)
fig.add_trace(go.Bar(
    x=a_cols, y=[r["ontology"] for r in alias_rows], name="Ontology",
    marker_color=ONTOLOGY_COLOR, legendgroup="c",
), row=2, col=1)

# ── Panel 4: Crop breakdown ──
if crop_summary is not None and len(crop_summary) > 0:
    fig.add_trace(go.Bar(
        y=crop_summary.index, x=crop_summary.values,
        orientation="h", marker_color=CROP_COLOR,
        name="Harvest Total", legendgroup="d", legendgrouptitle_text="Crops",
        hovertemplate="%{y}: %{x:,.0f}<extra></extra>",
    ), row=2, col=2)
    fig.add_annotation(
        text="<i>Crop dimension extracted from table titles<br>— only possible with semantic enrichment</i>",
        xref="x4 domain", yref="y4 domain", x=0.95, y=0.02,
        showarrow=False, font=dict(size=9, color="gray"), xanchor="right",
    )
else:
    fig.add_annotation(
        text="No crop data — run Step 3 first",
        xref="x4 domain", yref="y4 domain", x=0.5, y=0.5, showarrow=False,
    )

fig.update_layout(
    height=800, barmode="group",
    title=dict(text="<b>Semantic Web Enrichment — Impact Dashboard</b>", x=0.5),
    template="plotly_white",
    legend=dict(orientation="h", yanchor="bottom", y=-0.12, xanchor="center", x=0.5),
    font=dict(size=11),
    margin=dict(b=100),
)
fig.show()

# ═══════════════════════════════════════════════════════════
# Figure 2: Geographic Grounding (GeoNames)
# ═══════════════════════════════════════════════════════════

if geo_rows:
    import pandas as pd
    df_geo = pd.DataFrame(geo_rows)

    fig_geo = px.scatter_geo(
        df_geo, lat="lat", lon="lng", size="value",
        hover_name="region",
        hover_data={"records": True, "pop": ":,", "value": ":,.0f", "lat": False, "lng": False},
        color="value", color_continuous_scale="Viridis",
        size_max=30, projection="natural earth",
        title="<b>Geographic Grounding</b> — Harvest values mapped via GeoNames coordinates",
    )
    fig_geo.update_geos(
        scope="europe", center=dict(lat=55, lon=45),
        projection_scale=3.5,
        showland=True, landcolor="rgb(243,243,243)",
        showocean=True, oceancolor="rgb(204,224,245)",
        showcountries=True, countrycolor="rgb(204,204,204)",
    )
    fig_geo.update_layout(
        height=500, template="plotly_white",
        margin=dict(l=20, r=20, t=60, b=20),
        font=dict(size=11),
    )
    fig_geo.add_annotation(
        text="<i>3 regions from fixture — with full GeoNames data, all 78 Russian oblasts are mappable</i>",
        xref="paper", yref="paper", x=0.5, y=-0.02,
        showarrow=False, font=dict(size=10, color="gray"), xanchor="center",
    )
    fig_geo.show()
else:
    print("No geographic data available — run extraction (Step 3) first.")

---
## Summary

### Contract Authoring (contract-semantics)

| Capability | Module | Key function |
|---|---|---|
| Document profiling | `analyze.py` | `profile_document()`, `merge_profiles()` |
| Contract recommendation | `recommend.py` | `recommend_contract()`, `compare_contract()` |
| AGROVOC resolution | `agrovoc.py` | `AgrovocAdapter.resolve_concept()` |
| GeoNames resolution | `geonames.py` | `GeoNamesAdapter.resolve_geoname()` |
| Column resolution | `resolve.py` | `resolve_column()` |
| Alias diff | `diff.py` | `diff_aliases()`, `diff_all()` |
| Materialization | `materialize.py` | `materialize_contract()` |
| SHACL validation | `validate.py` | `generate_shapes()`, `validate_records()` |
| Semantic context builder | `context.py` | `build_semantic_context()` |

### Pipeline Integration (docpact)

| Capability | Module | Key function |
|---|---|---|
| Runtime alias enrichment | `pipeline.py` | `interpret_output_async(..., semantic_context=)` |
| Pre-flight header check | `semantics.py` | `preflight_check()` |
| Post-extraction validation | `semantics.py` | `validate_output()` |
| Semantic context (data) | `semantics.py` | `SemanticContext.from_json()` / `.to_json()` |

**CLI equivalents** for production use:
```bash
contract-semantics analyze report.docx                       # profile + recommend
contract-semantics analyze *.pdf --compare contracts/X.json  # multi-doc + gap report
contract-semantics diff contracts/ru_ag_ministry.json        # alias coverage
contract-semantics materialize contracts/X.json -o out.json  # enrich + strip
contract-semantics build-context contracts/X.json -o ctx.json # build SemanticContext
contract-semantics generate-shapes contracts/X.json          # SHACL shapes
contract-semantics validate output.csv shapes/X.ttl          # SHACL check
```

In [None]:
# Step 1: Build a SemanticContext from the annotated contract.
# This resolves all concept_uris through the AGROVOC and GeoNames adapters
# and packages the results into a plain dataclass that docpact can consume.

ctx = build_semantic_context(
    "contracts/ru_ag_ministry.json",
    agrovoc=agrovoc,
    geonames=geonames,
    merge_strategy="union",
)

print(f"SemanticContext built at: {ctx.resolved_at}")
print(f"  Adapter versions: {ctx.adapter_versions}")

total_aliases = sum(len(a) for cols in ctx.resolved_aliases.values() for a in cols.values())
total_valid = sum(len(v) for cols in ctx.valid_values.values() for v in cols.values())
print(f"  Resolved aliases: {total_aliases} across all columns")
print(f"  Valid values: {total_valid} across all columns")

# Inspect resolved aliases per output/column
for out_name in sorted(ctx.resolved_aliases):
    print(f"\n  {out_name}:")
    for col_name, aliases in sorted(ctx.resolved_aliases[out_name].items()):
        preview = aliases[:5]
        suffix = "..." if len(aliases) > 5 else ""
        print(f"    {col_name}: {len(aliases)} aliases — {preview}{suffix}")

In [None]:
# Step 2: Run pipeline WITH and WITHOUT SemanticContext.
# The annotated contract (with concept_uris) is used directly — no materialization needed.

from docpact.contracts import load_contract
from docpact.pipeline import process_document_async

cc = load_contract("contracts/ru_ag_ministry.json")
print(f"Contract: {cc.provider}")
print(f"  has_semantic_annotations: {cc.has_semantic_annotations}")
for out_name, out_spec in cc.outputs.items():
    sem_cols = list(out_spec.semantic_columns.keys())
    print(f"  {out_name}: semantic_columns = {sem_cols}")

# With semantic context — enables preflight + validation + alias enrichment
result_with = await process_document_async(str(chosen_docx), cc, semantic_context=ctx)

# Without semantic context — baseline behavior (identical to before)
result_without = await process_document_async(str(chosen_docx), cc)

print(f"\nExtraction complete: {chosen_docx.name}")
print(f"  report_date: {result_with.report_date}")
print(f"\n  With SemanticContext:")
print(f"    preflight_reports:  {list(result_with.preflight_reports.keys())}")
print(f"    validation_reports: {list(result_with.validation_reports.keys())}")
print(f"\n  Without SemanticContext:")
print(f"    preflight_reports:  {result_without.preflight_reports}")
print(f"    validation_reports: {result_without.validation_reports}")

In [None]:
# Step 3: Inspect pre-flight reports — header coverage before extraction.
# Pre-flight compares document pipe-table headers against all known aliases
# (manual + ontology-resolved). Informational only — never blocks extraction.

for out_name, report in result_with.preflight_reports.items():
    print(f"\n{'='*60}")
    print(f"  Pre-flight: {out_name}")
    print(f"{'='*60}")
    print(f"  Header coverage: {report.header_coverage:.0%}")

    if report.unmatched_headers:
        print(f"  Unmatched doc headers: {report.unmatched_headers}")
    else:
        print(f"  All document headers matched!")

    if report.missing_aliases:
        print(f"  Contract aliases with no doc header: {report.missing_aliases}")

    if report.findings:
        print(f"\n  Findings ({len(report.findings)}):")
        for f in report.findings[:8]:
            print(f"    [{f.severity}] {f.message}")

In [None]:
# Step 4: Inspect validation reports — value compliance after extraction.
# For columns with validate=True, checks every extracted value against the
# combined set of concept URI labels + ontology-resolved aliases.

for out_name, report in result_with.validation_reports.items():
    print(f"\n{'='*60}")
    print(f"  Validation: {out_name}")
    print(f"{'='*60}")
    print(f"  Total rows: {report.total_rows}")
    print(f"  Valid: {report.valid_count}  Invalid: {report.invalid_count}")

    if report.column_summaries:
        print(f"\n  Column summaries:")
        for col_name, summary in report.column_summaries.items():
            print(f"    {col_name}: {summary['valid']} valid, {summary['invalid']} invalid")
            if summary.get("unknown_values"):
                unknowns = summary["unknown_values"][:5]
                suffix = "..." if len(summary["unknown_values"]) > 5 else ""
                print(f"      unknown values: {unknowns}{suffix}")

    if report.findings:
        print(f"\n  Findings ({len(report.findings)}):")
        for f in report.findings[:10]:
            print(f"    row {f.row_index}, {f.column_name}: \"{f.value}\" — {f.message}")

In [None]:
# Step 5: SemanticContext serialization — cache to JSON for reuse.
# In production: build once (slow, hits ontology adapters), cache, reuse many times.

from docpact.semantics import SemanticContext

ctx_path = Path(tempfile.mkdtemp()) / "semantic_context.json"
ctx.to_json(ctx_path)

# Reload from cache
ctx_reloaded = SemanticContext.from_json(ctx_path)

print(f"Cached to: {ctx_path}")
print(f"  File size: {ctx_path.stat().st_size:,} bytes")
print(f"  Round-trip OK: {ctx_reloaded.resolved_aliases == ctx.resolved_aliases}")
print(f"  resolved_at preserved: {ctx_reloaded.resolved_at == ctx.resolved_at}")
print(f"  adapter_versions preserved: {ctx_reloaded.adapter_versions == ctx.adapter_versions}")

# Show JSON structure
print(f"\nJSON excerpt:")
ctx_json = json.loads(ctx_path.read_text())
print(json.dumps({k: type(v).__name__ for k, v in ctx_json.items()}, indent=2))

---
## Summary

| Capability | Module | Key function |
|---|---|---|
| Document profiling | `analyze.py` | `profile_document()`, `merge_profiles()` |
| Contract recommendation | `recommend.py` | `recommend_contract()`, `compare_contract()` |
| AGROVOC resolution | `agrovoc.py` | `AgrovocAdapter.resolve_concept()` |
| GeoNames resolution | `geonames.py` | `GeoNamesAdapter.resolve_geoname()` |
| Column resolution | `resolve.py` | `resolve_column()` |
| Alias diff | `diff.py` | `diff_aliases()`, `diff_all()` |
| Materialization | `materialize.py` | `materialize_contract()` |
| SHACL validation | `validate.py` | `generate_shapes()`, `validate_records()` |

**CLI equivalents** for production use:
```bash
contract-semantics analyze report.docx                       # profile + recommend
contract-semantics analyze *.pdf --compare contracts/X.json  # multi-doc + gap report
contract-semantics diff contracts/ru_ag_ministry.json        # alias coverage
contract-semantics materialize contracts/X.json -o out.json  # enrich + strip
contract-semantics generate-shapes contracts/X.json          # SHACL shapes
contract-semantics validate output.csv shapes/X.ttl          # SHACL check
```