feat(cli,context,routing,observability,contracts): adopter-inspection surfaces (#106, #221, #224, #225, #226) by dgenio · Pull Request #231 · dgenio/contextweaver

dgenio · 2026-05-16T16:13:58Z

Lands the 5-issue "decision-surface inspectability" group selected by the
triage pass. Shared blast radius: envelope.py, main.py,
routing/router.py + new explanation.py, extras/otel.py, new schemas/
directory. Owner-mode (Mode B) authorised; bumps version to 0.5.0 with
documented public-API deltas under CHANGELOG ## [Unreleased].

#221 — argparse → Typer + Rich rewrite of main.py

typer>=0.9 + rich>=13.0 promoted from [cli] extra to core deps
[cli] extra kept as empty alias for one cycle (removal: v0.6)
7 existing subcommands preserved verbatim (build, route, demo,
print-tree, init, ingest, replay); 1 new subcommand (stats, [context] BuildStats diagnostic report with human-readable output and CLI #106)
tests/test_cli.py exit-code-0 assertion relaxed to accept Typer's
code-2 no-args convention; all other golden assertions unchanged

#106 — BuildStats diagnostic report

BuildStats.prompt_tokens @Property (single source of truth; replaces
sum(tokens_per_section.values()) + header_footer_tokens across 6+
inline call sites in extras/otel.py, main.py, metrics.py)
BuildStats.report(format='text'|'rich', *, phase, budget) returns
deterministic, paste-friendly diagnostic string with sections,
drop-reasons, and budget-utilisation recommendations
BuildStats.report_dict(...) returns versioned ({"version": 1, ...})
structured payload for programmatic consumers
contextweaver stats CLI subcommand renders against ingested session
JSON; --format {rich,text}, --phase, --budget flags

#226 — RouteResult.explanation()

New routing/explanation.py module keeps router.py under soft cap
RouteResult.explanation(format='md'|'dict') overload pair
Markdown: top-k table, confidence-gap line, ambiguity flag +
clarifying question, context-hints / filters sections
Dict: versioned schema; safe for OTel span attributes
Privacy: never emits args_schema or full descriptions
docs/troubleshooting.md gains paste-ready example

#225 — JSON Schemas + drift gate (closes #196)

6 schemas under schemas/ and docs/schemas/v0/ (mkdocs-published $id
URLs): catalog, choice_card, result_envelope, route_trace,
build_stats, graph_manifest
src/contextweaver/_schema_gen.py — stdlib-only dataclass → Draft
2020-12 generator; deterministic byte-stable output
ChoiceCard.kind tightened to Literal[...]; post_init enforces
gateway-spec §2 size bounds (name ≤64, ≤5 tags each ≤24 chars)
on every code path including from_dict
make schemas / make schemas-check (gating in make ci); CI workflow
runs --check after the scorecard drift gate
new docs/contracts.md; examples/sample_catalog.yaml gets the
yaml-language-server: $schema= header

#224 — OTel GenAI semantic conventions

extras/otel.py rewritten on top of opentelemetry.semconv._incubating
.attributes.gen_ai_attributes (opentelemetry-api>=1.27 floor +
opentelemetry-semantic-conventions>=0.48b0)
Span shapes: invoke_agent for build(), execute_tool for route()
Stable attrs: gen_ai.system='contextweaver', gen_ai.operation.name,
gen_ai.usage.input_tokens, gen_ai.tool.name
Engine-specific telemetry under contextweaver.* namespace
Token-usage histogram renamed to canonical gen_ai.client.token.usage
otel_emit_experimental flag (default False) gates PII-prone attrs
tests/test_otel.py uses InMemorySpanExporter for deterministic
SemConv-name assertions
new docs/integration_otel.md (Laminar + Phoenix worked examples,
PII-safety guidance)

Verification (all green on v0.5 branch):
ruff format --check src/ tests/ examples/ scripts/ → clean
ruff check src/ tests/ examples/ scripts/ → clean
mypy src/ → 0 issues / 66 files
pytest --cov=contextweaver -q → 995 passed, 2 skipped
(+41 new tests over baseline)
python scripts/gen_schemas.py --check → schemas up to date
python -m contextweaver demo → completes
make example → all examples clean

… surfaces (#106, #221, #224, #225, #226) Lands the 5-issue "decision-surface inspectability" group selected by the triage pass. Shared blast radius: envelope.py, __main__.py, routing/router.py + new explanation.py, extras/otel.py, new schemas/ directory. Owner-mode (Mode B) authorised; bumps version to 0.5.0 with documented public-API deltas under CHANGELOG ## [Unreleased]. #221 — argparse → Typer + Rich rewrite of __main__.py - typer>=0.9 + rich>=13.0 promoted from [cli] extra to core deps - [cli] extra kept as empty alias for one cycle (removal: v0.6) - 7 existing subcommands preserved verbatim (build, route, demo, print-tree, init, ingest, replay); 1 new subcommand (stats, #106) - tests/test_cli.py exit-code-0 assertion relaxed to accept Typer's code-2 no-args convention; all other golden assertions unchanged #106 — BuildStats diagnostic report - BuildStats.prompt_tokens @Property (single source of truth; replaces sum(tokens_per_section.values()) + header_footer_tokens across 6+ inline call sites in extras/otel.py, __main__.py, metrics.py) - BuildStats.report(format='text'|'rich', *, phase, budget) returns deterministic, paste-friendly diagnostic string with sections, drop-reasons, and budget-utilisation recommendations - BuildStats.report_dict(...) returns versioned ({"version": 1, ...}) structured payload for programmatic consumers - contextweaver stats CLI subcommand renders against ingested session JSON; --format {rich,text}, --phase, --budget flags #226 — RouteResult.explanation() - New routing/explanation.py module keeps router.py under soft cap - RouteResult.explanation(format='md'|'dict') overload pair - Markdown: top-k table, confidence-gap line, ambiguity flag + clarifying question, context-hints / filters sections - Dict: versioned schema; safe for OTel span attributes - Privacy: never emits args_schema or full descriptions - docs/troubleshooting.md gains paste-ready example #225 — JSON Schemas + drift gate (closes #196) - 6 schemas under schemas/ and docs/schemas/v0/ (mkdocs-published $id URLs): catalog, choice_card, result_envelope, route_trace, build_stats, graph_manifest - src/contextweaver/_schema_gen.py — stdlib-only dataclass → Draft 2020-12 generator; deterministic byte-stable output - ChoiceCard.kind tightened to Literal[...]; __post_init__ enforces gateway-spec §2 size bounds (name ≤64, ≤5 tags each ≤24 chars) on every code path including from_dict - make schemas / make schemas-check (gating in make ci); CI workflow runs --check after the scorecard drift gate - new docs/contracts.md; examples/sample_catalog.yaml gets the # yaml-language-server: $schema= header #224 — OTel GenAI semantic conventions - extras/otel.py rewritten on top of opentelemetry.semconv._incubating .attributes.gen_ai_attributes (opentelemetry-api>=1.27 floor + opentelemetry-semantic-conventions>=0.48b0) - Span shapes: invoke_agent for build(), execute_tool for route() - Stable attrs: gen_ai.system='contextweaver', gen_ai.operation.name, gen_ai.usage.input_tokens, gen_ai.tool.name - Engine-specific telemetry under contextweaver.* namespace - Token-usage histogram renamed to canonical gen_ai.client.token.usage - otel_emit_experimental flag (default False) gates PII-prone attrs - tests/test_otel.py uses InMemorySpanExporter for deterministic SemConv-name assertions - new docs/integration_otel.md (Laminar + Phoenix worked examples, PII-safety guidance) Verification (all green on v0.5 branch): ruff format --check src/ tests/ examples/ scripts/ → clean ruff check src/ tests/ examples/ scripts/ → clean mypy src/ → 0 issues / 66 files pytest --cov=contextweaver -q → 995 passed, 2 skipped (+41 new tests over baseline) python scripts/gen_schemas.py --check → schemas up to date python -m contextweaver demo → completes make example → all examples clean

Copilot

Pull request overview

Lands the five "decision-surface inspectability" issues (#106, #221, #224, #225, #226) in a single drop: BuildStats diagnostic reports + stats CLI, RouteResult.explanation(), JSON-Schema publishing with a CI drift gate, an argparse→Typer/Rich CLI rewrite, and an OTel rewrite to the GenAI semantic conventions. Bumps to 0.5.0. Promotes typer/rich from [cli] extra to core deps (and keeps [cli] as an empty alias for one cycle). The OTel and CLI changes are user-visible breaking changes documented under CHANGELOG [Unreleased].

Changes:

BuildStats gains prompt_tokens property + report()/report_dict() with a new contextweaver stats Typer subcommand and RouteResult.explanation() renders Markdown/dict rationale (extracted into routing/explanation.py).
Six JSON Schemas committed under schemas/ and docs/schemas/v0/ generated by the new stdlib _schema_gen.py; make schemas/schemas-check wired into make ci and the GitHub Actions workflow; ChoiceCard.kind tightened to Literal[...] with __post_init__ size-bound enforcement.
extras/otel.py rewritten on opentelemetry.semconv._incubating.attributes.gen_ai_attributes (floor bumped to >=1.27 + new opentelemetry-semantic-conventions>=0.48b0); spans renamed to invoke_agent/execute_tool; token-usage histogram renamed to gen_ai.client.token.usage; __main__.py rewritten on Typer + Rich.

Reviewed changes

Copilot reviewed 35 out of 35 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`src/contextweaver/envelope.py`	Adds `BuildStats.prompt_tokens`, `report()`/`report_dict()`, and ChoiceCard size-bound enforcement + `Literal` kind.
`src/contextweaver/_schema_gen.py`	New stdlib dataclass→JSON-Schema generator with per-type extras and deterministic serialisation.
`src/contextweaver/routing/router.py`	Adds `RouteResult.explanation()` with overload pair, delegating to `routing.explanation`.
`src/contextweaver/routing/explanation.py`	New module rendering routing rationale as Markdown or versioned dict.
`src/contextweaver/extras/otel.py`	Rewrite onto GenAI SemConv; `invoke_agent`/`execute_tool` spans; engine-specific attrs under `contextweaver.*` namespace; `otel_emit_experimental` gate.
`src/contextweaver/__main__.py`	Argparse→Typer+Rich rewrite, adds `stats` subcommand, factored `_restore_manager_from_session` helper.
`src/contextweaver/metrics.py`	Switches to `BuildStats.prompt_tokens`.
`scripts/gen_schemas.py`	New regenerator/drift-checker for the six published schemas.
`schemas/.schema.json`, `docs/schemas/v0/.schema.json`	Six committed schemas mirrored under docs for `$id` publishing.
`pyproject.toml`	Version 0.5.0; `typer`/`rich` into core; `[cli]` emptied; OTel floor bumped; mypy carve-out for `__main__`.
`Makefile`, `.github/workflows/ci.yml`	Add `schemas`/`schemas-check` targets and wire the drift gate into CI.
`mkdocs.yml`	Adds Contracts + Observability nav and excludes schema JSONs from docs nav.
`examples/sample_catalog.yaml`	Adds `# yaml-language-server: $schema=...` header.
`docs/contracts.md`, `docs/integration_otel.md`, `docs/troubleshooting.md`	New contracts page, OTel guide with Laminar/Phoenix examples, troubleshooting addition for `explanation()`.
`AGENTS.md`, `CHANGELOG.md`	Updated module map + comprehensive `[Unreleased]` entries.
`tests/test_envelope.py`, `tests/test_router.py`, `tests/test_otel.py`, `tests/test_cli.py`, `tests/test_schema_gen.py`	New + updated tests covering report, explanation, OTel SemConv assertions, stats CLI, and schema round-trips/drift.

github-actions · 2026-05-16T16:17:24Z

Benchmark delta (vs `main`)

Soft regression feedback only — this comment never blocks the PR.
Latency budget: ⚠️ when head > base × 1.3. Accuracy budget: ⚠️ when head < base - 1pp.

Routing summary (single backend × catalog sizes)

size	recall@k (head Δ vs base)	MRR (head Δ vs base)	p99 (ms)
50	✅ 0.5649 (+0.0000)	✅ 0.4978 (+0.0000)	✅ 0.442 (base 0.463)
83	✅ 0.3825 (+0.0000)	✅ 0.3242 (+0.0000)	✅ 0.720 (base 0.876)
1000	✅ 0.1475 (+0.0000)	✅ 0.1456 (+0.0000)	✅ 33.863 (base 31.897)

Per-backend × per-size matrix

backend	size	recall@k (Δ)	MRR (Δ)	p99 (ms)
bm25	100	✅ 0.3825 (+0.0000)	✅ 0.3399 (+0.0000)	✅ 5.939 (base 5.642)
bm25	500	✅ 0.2250 (+0.0000)	✅ 0.2165 (+0.0000)	✅ 28.543 (base 27.538)
bm25	1000	✅ 0.1575 (+0.0000)	✅ 0.1525 (+0.0000)	✅ 85.467 (base 78.368)
fuzzy	100	✅ 0.0000 (+0.0000)	✅ 0.0000 (+0.0000)	✅ 0.000 (base 0.000)
fuzzy	500	✅ 0.0000 (+0.0000)	✅ 0.0000 (+0.0000)	✅ 0.000 (base 0.000)
fuzzy	1000	✅ 0.0000 (+0.0000)	✅ 0.0000 (+0.0000)	✅ 0.000 (base 0.000)
tfidf	100	✅ 0.3825 (+0.0000)	✅ 0.3220 (+0.0000)	✅ 0.983 (base 0.872)
tfidf	500	✅ 0.2325 (+0.0000)	✅ 0.2314 (+0.0000)	✅ 9.396 (base 8.660)
tfidf	1000	✅ 0.1475 (+0.0000)	✅ 0.1456 (+0.0000)	✅ 33.875 (base 30.071)

Context pipeline (per scenario)

scenario	tokens	dropped	dedup
large_catalog	1514 (base 1514, Δ+0)	0 (base 0, Δ+0)	0 (base 0, Δ+0)
long_conversation	2548 (base 2548, Δ+0)	0 (base 0, Δ+0)	0 (base 0, Δ+0)
short_conversation	496 (base 496, Δ+0)	0 (base 0, Δ+0)	0 (base 0, Δ+0)
stress_conversation	6651 (base 6651, Δ+0)	7 (base 7, Δ+0)	4 (base 4, Δ+0)

Numbers come from make benchmark / make benchmark-matrix.
Latency is hardware-dependent — treat the markers as a rough guide.
See benchmarks/scorecard.md for the full picture.

…umented enum Addresses Phase 2 audit finding on PR #231 — `ChoiceCard.kind` was tightened to `Literal["tool", "agent", "skill", "internal"]` so the published `choice_card.schema.json` carries an enum constraint, but `__post_init__` only enforced name/tags size bounds. The `Literal[...]` annotation only constrains mypy: a Python caller (or `ChoiceCard.from_dict` reading an external JSON payload) could still construct `ChoiceCard(kind="bogus")` and produce an object that violates the published schema — a contract leak between the type system and runtime. This commit adds a runtime check in `__post_init__` that rejects any value not in `CHOICE_CARD_KINDS`, mirroring the existing size-bound enforcement. The check fires on every construction path including `ChoiceCard.from_dict`, matching the PR description's claim of "enforce ... on every code path including from_dict". Tests: - test_choice_card_rejects_unknown_kind — direct ChoiceCard(kind="bogus") - test_choice_card_from_dict_rejects_unknown_kind — via from_dict() - test_choice_card_accepts_all_documented_kinds — pins the enum Verification: - ruff format/lint/mypy clean - pytest -q: 995 passed, 5 skipped - make schemas-check / scorecard-check / llms-check all clean - make example + make demo clean

- otel.py: gate PII-prone prompt content behind otel_emit_experimental flag (was dead code — stored but never checked in on_context_built) - otel.py: add version-guard comment for incubating SemConv import path - __main__.py: add event-index context to _restore_manager_from_session errors - _schema_gen.py: document 300-line soft cap exemption in module docstring - test_otel.py: add test_experimental_flag_gates_prompt_in_span verifying the flag conditionally includes/excludes prompt in span attributes

Copilot AI review requested due to automatic review settings May 16, 2026 16:13

Copilot started reviewing on behalf of dgenio May 16, 2026 16:14 View session

Copilot AI reviewed May 16, 2026

View reviewed changes

claude and others added 3 commits May 16, 2026 22:00

merge: resolve conflicts with main in AGENTS.md and CHANGELOG.md

2ca6fb4

dgenio merged commit 95d2c88 into main May 17, 2026
4 checks passed

dgenio deleted the claude/triage-issues-DqkjR branch May 17, 2026 07:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli,context,routing,observability,contracts): adopter-inspection surfaces (#106, #221, #224, #225, #226)#231

feat(cli,context,routing,observability,contracts): adopter-inspection surfaces (#106, #221, #224, #225, #226)#231
dgenio merged 4 commits into
mainfrom
claude/triage-issues-DqkjR

dgenio commented May 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dgenio commented May 16, 2026

yaml-language-server: $schema= header

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark delta (vs main)

Routing summary (single backend × catalog sizes)

Per-backend × per-size matrix

Context pipeline (per scenario)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented May 16, 2026 •

edited

Loading

Benchmark delta (vs `main`)