release: bring main up to develop (0.2.17 — release-readiness docs + eval pattern examples + transitive CVE patches) by constk · Pull Request #107 · constk/harness-python-react

constk · 2026-05-26T07:51:13Z

What ships in this release

PR	Commit	Theme
#83	`ea6b8b1`	pin-freshness audit normalises sub-path actions before API call (carried over from prior session)
#103	`d256e32`	Security: transitive-dep CVE patches — `idna 3.13 → 3.16` (CVE-2026-45409), `starlette 1.0.0 → 1.1.0` (PYSEC-2026-161)
#104	`18b4d30`	Feature: eval pattern examples calling Azure OpenAI — 4 worked cases across the existing tolerance modes, new `src/eval/adapters/azure_openai.py` adapter, optional `[eval]` extra
#106	`eb0136e`	Chore: align develop with main (backport #86's Beads guidance + scaffold updates that landed directly on main on 2026-05-25)
#101	`722293d`	Docs: mark admin-merge policy as transitional solo-owner state
#99	`59ad7f0`	Docs: reframe README opener around the human+agent audience
#100	`7c84f18`	Docs: add concrete agent-failure example to "Why a harness"
#105	`8938eb7`	Docs: replace Jaeger screenshot TODO with section scaffold

Version

0.2.11 → 0.2.17. Six PATCH bumps cascaded as each in-flight PR rebased over the previous one — one bump per merge, as required by the version-bump gate.

Highlights

Open-source release readiness. Issues docs: reframe README opener around the human+agent audience #90, docs: add a concrete agent-failure example to make the harness claim tangible #91, docs: replace Jaeger screenshot TODO in README observability section #92, docs: mark admin-merge policy as transitional solo-owner state #93, test: strengthen eval slice — realistic cases or explicit scaffold framing #94 (the original release-blocker set) are all addressed and closed on develop. Some closeout work remains: the Jaeger PNG capture (one-line README edit once captured) and the LEARNING.md / DEMO.md polish items left for a later sprint.
First real eval slice. The eval harness moves from a single toy echo-hello case to four worked-pattern cases that exercise factual recall, numeric reasoning, definitional prose, and structured-output adherence against a real Azure OpenAI deployment. Live cases are gated on AZURE_OPENAI_* env vars; uv run pytest eval/ on a stock checkout still exits 0.
CVE-clean develop. pip-audit returns "No known vulnerabilities found" against the full lock.
Admin-merge policy hardened. CONTRIBUTING.md now explicitly frames the --admin workflow as transitional, with a numbered exit checklist (the enforce_admins: true flip is now required, not optional).

Test plan

Every component PR went through CI cleanly before merging to develop (each one squash-merged after green checks)
Develop tip is at 8938eb7; full unit suite + mypy --strict + ruff + import-linter all passed on the final tip during the chore: align develop with main — backport #86 content + version #106 sync
pip-audit clean on develop
CHANGELOG pre-stage workflow runs on this PR's open (changelog-prestage.yml) — verify after creation

Invariants affected

None new. #101 strengthened the wording around the admin-merge exemption (transitional framing) — same invariant content, sharper documentation.

New deps / actions / external surface

New optional Python extra: [eval] with openai>=1.40.0 (pulled in only on uv sync --extra eval).
New external endpoint: Azure OpenAI (per-deployment URL). Only called from eval/test_golden_patterns.py, only when AZURE_OPENAI_* env vars are set.
No new GitHub Actions; no new runtime deps in the default install.

Tagging note

Per .github/workflows/release.yml, the public release (GHCR image push, CycloneDX SBOM, GitHub Release page) is tag-triggered. Tag v0.2.17 against the merge commit when this PR lands to publish.

Linked issues

Closes none directly (all linked issues already closed on develop). This PR fans the closures out to main.

…83)

pip-audit on develop is flagging two transitive-dep CVEs: - idna 3.13 CVE-2026-45409 (fix in 3.15+) - starlette 1.0.0 PYSEC-2026-161 (fix in 1.0.1+) Both are surfaced via fastapi/httpx. Bumps via: uv lock --upgrade-package idna --upgrade-package starlette Resolves to idna 3.16 (3.15 was the listed fix; 3.16 is a further patch with the same fix) and starlette 1.1.0 (minor bump; FastAPI is compatible with it). All 192 unit tests pass on the upgraded lock. Bumps the project self-version 0.2.10 -> 0.2.11 per docs/DEVELOPMENT.md. Unblocks the pip-audit CI gate on #99, #100, #101, #102 (and any other PRs currently sitting on develop), all of which inherit the flagged transitive CVEs from develop and cannot pass that gate until this lands.

* feat: eval pattern examples calling Azure OpenAI (#94) The eval slice previously shipped one toy case (echo-hello) and a disabled-by-default nightly. A reader expecting an LLM-eval story found the infrastructure without conviction. Adds four worked-pattern cases that exercise the existing three tolerance modes against a real Azure OpenAI deployment. These are not benchmarks — they demonstrate what an eval case *looks like* for the four LLM-eval patterns you most often need to write: - factual-http-200 exact_match format-constrained recall - numeric-seconds-per-day numeric_close numeric reasoning + tolerance - definitional-fastapi-depends semantic_similar free-form judge-scored prose - structured-json-status exact_match structured-output adherence When the template is forked for a real project, replace these four with cases that exercise the project's own prompts; the patterns transfer regardless of what product is bolted on. Provider choice — Azure OpenAI via the openai SDK with AzureOpenAI client — is intentionally distinct from the rest of the harness (which uses Claude via Claude Code). Demonstrates that the LLMClient Protocol in src/eval/judge.py does its job: the eval core never imports openai, vendor lock-in lives only in the adapter. Changes: - src/eval/adapters/azure_openai.py — implements LLMClient via the openai.AzureOpenAI SDK. Reads endpoint/key/deployment/api-version from env. Lazy-imports the SDK so the module is importable without the optional extra installed; the adapter raises a clear AzureOpenAIConfigError if the env or SDK is missing. - eval/golden_patterns.json — the four cases with notes explaining which pattern each demonstrates. - eval/test_golden_patterns.py — separate test file gated on the Azure env vars via pytestmark. Skipped on a stock checkout, so `uv run pytest eval/` always exits 0. The toy test_golden_qa.py keeps running as before. - pyproject.toml — new optional [project.optional-dependencies] eval extra (just `openai>=1.40.0`), mypy override for openai.* matching the existing opentelemetry.* pattern, and a 0.2.10 -> 0.2.11 self-version bump. - .github/workflows/eval-nightly.yml — env vars renamed from the placeholder LLM_* set to AZURE_OPENAI_*. Header comment updated with the Azure setup recipe. uv sync now passes --extra eval. - docs/EVAL_HARNESS.md — new "Worked patterns" section with the table mapping case -> tolerance -> pattern, the local setup recipe, and a "Swapping providers" note documenting the Protocol-based extension path. Local gates: mypy --strict clean on 42 source files (was 31), ruff clean, ruff format clean, import-linter both contracts kept, 192 unit tests pass, eval/ runs 1 passed + 4 skipped without LLM env. Closes #94 * test: add adapter unit tests + adapters README (#94 review fixes) Addresses two gate failures on #104 surfaced by code review: 1. "Tests required" gate — feat: prefix declared a behaviour change but tests/ had no test for the new adapter (the eval/-side test only runs with live Azure credentials). Adds tests/test_eval_azure_openai_adapter.py: 13 fully-offline cases covering _resolve_config (defaults, override, empty-string fallback, missing-env error listing), the constructor (env wiring, explicit API version, missing-env, missing-SDK), and the two SDK call paths (complete_json structured-output mode, complete user-message dispatch, null-content returns "" / "{}"). The SDK is mocked at sys.modules level so the test never hits the network and never requires the openai extra to be installed. 2. "src/ README audit" gate — every src/ package needs a README.md per CLAUDE.md. Adds src/eval/adapters/README.md documenting the layer's purpose, the current adapter, a 7-step "adding a new adapter" recipe, and why the layer lives at the top of the import order. Also applies the reviewer's non-blocking sentinel-string suggestion: the magic "azure-deployment" string passed as judge_model in eval/test_golden_patterns.py is now the named constant _AZURE_DEPLOYMENT_SENTINEL with a comment explaining why the runner threads it through but the Azure adapter discards it. Local gates: 205 unit tests pass (was 192, +13 new), mypy clean on 43 source files, ruff/format/import-linter all green. Refs #94 * docs: add Key interfaces section to adapters README (#94 review) src/ README audit gate looks for a `## Key interfaces` (or `## Public surface`) anchor — the existing README had purpose / table / extension recipe / layering rationale, but no exported-names section. Adds a `## Key interfaces` section listing the two exported names: - AzureOpenAIClient — the LLMClient implementation with notes on complete() vs complete_json() and the discarded `model` arg (Azure dispatches by deployment, not model). - AzureOpenAIConfigError — the construction-time error type, noting that it batches every missing env var into a single message instead of failing-and-retrying. Both already documented in the adapter docstrings; this section hoists them to the README anchor the audit gate enforces. Refs #94 * chore: bump version to 0.2.12 (rebase onto develop after #103)

* chore: add optional Beads issue queue guidance * chore: address PR-86 review feedback (BEADS doc + template + CI-script compile gate) Applies the actionable items from the PR-86 review: - docs/BEADS.md: lead with a one-sentence "what Beads is" + upstream link; state the stance explicitly (optional/additive, recommended for agent-driven flows, GitHub remains authoritative); add a YAML example block under Recommended Bead fields; replace the duplicated Closure checklist with a Bead-specific narrowing that cites the PR template + CONTRIBUTING; call out that .beads/ is wiped by git clean -fdx. - .github/pull_request_template.md: collapse the "Local Beads" section into an HTML-commented opt-in block so it is invisible in the rendered preview until a Beads-using team uncomments it. - CONTRIBUTING.md: document the one-shot git renormalisation step for Windows clones after the .gitattributes change lands. - tests/test_scripts_compile.py: regression gate that py_compiles every .github/scripts/*.py. The "scripts unparseable" review finding was based on an older local Python — PEP 758 (3.14) makes the unparenthesised except clauses valid, so the scripts ARE fine on the project pin. The test guards against an actual syntax error landing in future. * chore: bump version to 0.2.11 --------- Co-authored-by: jakelindsay87 <jacob.b.lindsay@gmail.com>

* docs: mark admin-merge policy as transitional solo-owner state (#93) The existing "Solo-owner merge policy" section accurately documented how merges work today, but read as standing policy. From an external contributor's perspective it could look like the maintainer routinely bypasses their own gates. Adds a leading "Transitional" blockquote framing this as a single-owner workaround, not standing policy, and replaces the closing sentence with a numbered exit checklist (drop --admin, remove the subsection, update CODEOWNERS, optionally flip enforce_admins to true). All four changes land together when a second collaborator is onboarded. Mechanics of the merge command itself are unchanged. Closes #93 * chore: bump version to 0.2.11 * docs: make enforce_admins flip required in exit checklist (#93 review) Code review on #101 pushed back on step 4 of the "When the exemption ends" checklist: "Optionally flip enforce_admins to true". Leaving it false in a 2-person setup keeps the admin-bypass door open even after the single-owner workaround is no longer needed — which defeats the point of having an exit checklist. Drops "Optionally" and adds a one-line rationale so a future reader understands why the flip is non-optional. Refs #93

* docs: reframe README opener around the human+agent audience (#90) The previous opener led with what the harness is (a coding harness for Python+React) and folded the audience into a trailing clause. The new opener leads with who it's for — teams pairing AI agents with human engineers — and keeps the mechanism punchline ("every gate enforced mechanically in CI, not by discipline") that makes the harness story distinctive. Wording matches the repo's GitHub description for consistency between the two surfaces. Closes #90 * docs: tighten README opener — harness vocab + 0.2.11 bump (#90) Review feedback on #99: - "Production-grade SDLC scaffold" -> "Production-grade SDLC harness". Everywhere else (package name, docs/HARNESS.md, CLAUDE.md) calls it a harness; "scaffold" was an unintentional vocabulary drift. - "regardless of who's at the keyboard" -> "regardless of who shipped the code". Agents don't have keyboards; the original metaphor leaked. The new phrasing covers humans and agents without forcing the human-only mental model. - README opener now also mirrors the GitHub repo description verbatim ("human-LLM coding collaborations"), so the two surfaces stay aligned. Also bumps the project version 0.2.10 -> 0.2.11 (docs change -> PATCH per docs/DEVELOPMENT.md) in pyproject.toml and the self-version line in uv.lock, unblocking the "Version bump check" CI gate that flagged the original commit. The "enforced mechanically in CI, not by discipline" punchline is preserved verbatim. Refs #90

* docs: add concrete agent-failure example to "Why a harness" (#91) The "harness IS the product" claim reads abstract without a worked example. Adds a blockquoted, 3-line sidebar inside the "Why a harness" section showing one realistic failure mode: an agent reaches for a reverse import (src.models → src.tools), import-linter blocks it in CI against the "src.models depends on nothing in src/" contract, the agent's next iteration routes around it via docs/BOUNDARIES.md. Names a real gate, cites the real contract, links the real doc — so the example is verifiable, not theatre. Closes #91 * chore: bump version to 0.2.11

* docs: replace Jaeger screenshot TODO with section scaffold (#92) The observability story in README has one visible loose end: a TODO block where the Jaeger trace screenshot should go. The rest of the section reads cleanly, so the TODO sticks out. Promotes the placeholder to a real subsection ("Jaeger trace") with the explanatory caption already written: what boots the stack, what endpoint produces the trace, where to view it, and that span attributes use only the constant-defined semconv keys from src/observability/spans.py. The image itself still needs to be captured. The original capture recipe is preserved as an HTML comment so it remains discoverable, and the comment includes the exact one-line markdown to paste in once docs/images/jaeger-trace.png lands. Hiding the placeholder inside an HTML comment (rather than a broken-image ref) keeps the rendered README clean while the PNG is outstanding. The image-capture step itself is a follow-up — needs the maintainer to run docker compose locally and take the screenshot. Closes #92 (capture step tracked separately as a single-line README edit when the PNG is committed). * chore: bump version to 0.2.11

constk · 2026-05-26T07:56:14Z

Closing — direct develop → main path conflicts on pyproject.toml/uv.lock because #86 went to main directly and gave the version line two divergent histories. Reopening from release/0.2.17 which is main + a single merge commit with the conflict resolved (take develop's 0.2.17). Same content, mergeable head.

constk and others added 8 commits May 3, 2026 13:56

fix: pin-freshness audit normalises sub-path actions before API call (#…

ea6b8b1

…83)

constk closed this May 26, 2026

constk mentioned this pull request May 26, 2026

release: bring main up to develop (0.2.17 — release-readiness docs + eval pattern examples + transitive CVE patches) #108

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: bring main up to develop (0.2.17 — release-readiness docs + eval pattern examples + transitive CVE patches)#107

release: bring main up to develop (0.2.17 — release-readiness docs + eval pattern examples + transitive CVE patches)#107
constk wants to merge 8 commits into
mainfrom
develop

constk commented May 26, 2026

Uh oh!

constk commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

constk commented May 26, 2026

What ships in this release

Version

Highlights

Test plan

Invariants affected

New deps / actions / external surface

Tagging note

Linked issues

Uh oh!

constk commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant