PR-N4: remove SDK conftest stub + finalize no-doubles cleanup by FluffyAIcode · Pull Request #56 · FluffyAIcode/Kakeya-LLM-Inference-engine

FluffyAIcode · 2026-06-02T02:55:39Z

Why

Final installment of the no-test-doubles cleanup. Closes the sequence PR-N1 → N2 → N3 → N4. After PR-N4 lands, NO test doubles implementing the verifier / engine / tokenizer protocols remain in the Linux test tree.

What was deleted

File	Δ	What
`tests/sdk/python/conftest.py`	−203	`_start_runtime` / `_stop_runtime` helpers + `runtime_address` fixture (FakeVerifier-backed)
`tests/sdk/python/test_client.py`	−157, 13 tests	Client lifecycle against fake runtime
`tests/sdk/python/test_session.py`	−502, 33 tests	Session.append + .generate + .info + .close end-to-end against fake runtime

Total: ~860 lines, 46 tests deleted.

What was added

File	Lines	Tests	What
`tests/integration/test_sdk_real.py`	+137	11	Client + Session integration against real Qwen3-0.6B-backed gRPC runtime: lifecycle, error mapping (NOT_FOUND / INVALID_ARGUMENT / SessionClosedError), session info round-trip, close idempotency
`tests/integration/conftest.py`	+180	—	`pytest_collection_modifyitems` hook + `real_speculative_engine` (Qwen3-0.6B SpeculativeEngine, session-scoped) + NEW `real_grpc_runtime_address` (in-process gRPC server backed by real verifier on background thread, yields `host:port`)
`tests/integration/__init__.py`	+0	—	placeholder
`scripts/review_pr_n4_on_mac.sh`	+93	—	Mac M4 reviewer aid running the full accumulated integration suite

What stays on Linux

tests/sdk/python/test_errors.py (9 tests) — pure _wrap_grpc_error mapping with synthesized grpc.RpcError objects. Verifier-independent; transport-only error-class translation.

CI workflow change

.github/workflows/ci.yaml — dropped kakeya.client, kakeya.session from --include= filter. Linux gate now covers ONLY:

inference_engine/server/{auth, config, errors, grpc_app, metrics, schemas, proto_gen}
inference_engine/memory/*
inference_engine/scheduler/{config, session, pooled_verifier}
inference_engine/pipeline/*
inference_engine/session/store
sdks/python/kakeya/{__init__, errors}
training/repr_align/*

This is the verifier-independent boundary, frozen post PR-N4.

Final state of the no-doubles cleanup

PR	Scope	Tests deleted	Integration test added
#53 PR-N1	FakeVerifier hierarchy	~70	`test_coordinator_real.py`, `test_generator_real.py`
#54 PR-N2	DeterministicEngine/Tokenizer (scheduler)	20	`test_scheduler_real.py`
#55 PR-N3	HTTP shim cluster	88	`test_http_shim_real.py`, `test_engine_real.py`, `test_tokenizer_real.py`, `test_streaming_real.py`
#56 PR-N4 (this)	SDK conftest stub	46	`test_sdk_real.py`

After all four merge, tests/integration/ contains:

test_inv3_session_determinism_gate.py          (PR-E1)
test_coordinator_real.py                       (PR-N1)
test_generator_real.py                         (PR-N1)
test_scheduler_real.py                         (PR-N2)
test_http_shim_real.py                         (PR-N3)
test_engine_real.py                            (PR-N3)
test_tokenizer_real.py                         (PR-N3)
test_streaming_real.py                         (PR-N3)
test_sdk_real.py                               (PR-N4)

Linux verification

PYTHONPATH=.:sdks/python coverage run -m pytest <Linux gate paths>:
  649 passed (was 695 on main; -46 net = removed 46 SDK runtime tests).
  100% coverage on 999 stmts (was 1660 on main; -661 net = all
  verifier-dependent modules now integration-only).

Mac M4 evidence (REQUIRED for merge)

bash scripts/review_pr_n4_on_mac.sh runs the full accumulated integration suite (PR-E1 INV-3 + PR-N1 coordinator/generator + PR-N2 scheduler + PR-N3 http_shim/engine/tokenizer/streaming + PR-N4 SDK) against real Qwen3-0.6B and produces pr-n4-mac-integration-tests-<unix>.json evidence.

Stack & merge order note

PR-N4 is branched off main, independent at the file level from PR-N1 (#53), PR-N2 (#54), PR-N3 (#55). However all four PRs add to tests/integration/conftest.py (each contributes a fixture), so post-merge the conftest needs a small reconciliation. Recommended merge order:

PR-N1 (adds the marker hook)
PR-N2 (adds real_speculative_engine)
PR-N3 (uses real_speculative_engine)
PR-N4 (this, adds real_grpc_runtime_address)

PR-N4's tests/integration/conftest.py is a superset of N1/N2's; if N4 lands first, the others need to skip their conftest creation and reuse the merged version.

Final installment of the no-test-doubles cleanup. Closes the sequence PR-N1 \u2192 N2 \u2192 N3 \u2192 N4. After PR-N4 lands, NO test doubles implementing the verifier / engine / tokenizer protocols remain in the Linux test tree. What was deleted ---------------- tests/sdk/python/conftest.py -203 lines. Contained _start_runtime / _stop_runtime helpers that spun up an in-process gRPC server with a FakeVerifier (later replaced by _MinimalVerifierStub in PR-N1's preview cleanup) on a background thread. The runtime_address + runtime_address_no_inspector fixtures are gone with it. tests/sdk/python/test_client.py -157 lines, 13 tests. Exercised Client + Session lifecycle against the FakeVerifier-backed runtime. tests/sdk/python/test_session.py -502 lines, 33 tests. Exercised Session.append + .generate + .info + .close end-to-end against the FakeVerifier-backed runtime. What was added -------------- tests/integration/test_sdk_real.py +137 lines, 11 tests SDK Client + Session integration tests against a real Qwen3-0.6B-backed gRPC runtime: - Client: create_session round-trip, eos_token_ids round- trip, idempotent close, address property - Session: append + generate yield tokens + metadata, info reflects history, close returns final length, close is locally idempotent - End-to-end error mapping: SessionNotFoundError on unknown id, InvalidArgumentError on max_tokens=0, SessionClosedError on append-after-close tests/integration/conftest.py +180 lines - pytest_collection_modifyitems hook (auto-marks everything under tests/integration/ with @pytest.mark.integration) - real_speculative_engine fixture (session-scoped, Qwen3-0.6B) - real_grpc_runtime_address fixture (session-scoped, in-process gRPC server backed by real Qwen3-0.6B verifier on a background thread; yields the host:port the SDK can connect to) tests/integration/__init__.py +0 lines (placeholder) scripts/review_pr_n4_on_mac.sh +93 lines Mac M4 reviewer aid running the full accumulated integration suite (PR-E1 INV-3 + PR-N1 coordinator/generator + PR-N2 scheduler + PR-N3 http_shim/engine/tokenizer/streaming + PR-N4 SDK). What stays on Linux ------------------- tests/sdk/python/test_errors.py (unchanged, 9 tests) Pure _wrap_grpc_error mapping with synthesized grpc.RpcError objects. Verifier-independent; transport-only error-class translation. Stays on Linux. CI workflow change ------------------ .github/workflows/ci.yaml: dropped kakeya.client and kakeya.session from the --include= filter. Linux gate now covers ONLY: inference_engine/server/{auth, config, errors, grpc_app, metrics, schemas, proto_gen} inference_engine/memory/* inference_engine/scheduler/{config, session, pooled_verifier} inference_engine/pipeline/* inference_engine/session/store sdks/python/kakeya/{__init__, errors} training/repr_align/* That's the verifier-independent boundary, frozen post PR-N4. Final state of the no-doubles cleanup ------------------------------------- PR-N1 (#53): retired FakeVerifier hierarchy (tests/inference_engine/session/test_coordinator.py, test_generator.py, test_grpc_app.py FakeVerifier-using sections). PR-N2 (#54): retired DeterministicEngine + DeterministicTokenizer (tests/inference_engine/scheduler/conftest.py + test_scheduler.py). PR-N3 (#55): retired the HTTP shim cluster (server/conftest.py + 6 test files + their subtypes). PR-N4 (this): retired the SDK conftest stub. The integration suite at tests/integration/ now contains: test_inv3_session_determinism_gate.py (PR-E1) test_coordinator_real.py (PR-N1) test_generator_real.py (PR-N1) test_scheduler_real.py (PR-N2) test_http_shim_real.py (PR-N3) test_engine_real.py (PR-N3) test_tokenizer_real.py (PR-N3) test_streaming_real.py (PR-N3) test_sdk_real.py (PR-N4) Linux verification ------------------ PYTHONPATH=.:sdks/python coverage run -m pytest <Linux gate paths>: 649 passed (was 695 on main; -46 net = removed 46 SDK runtime tests, kept 9 SDK error-mapping tests). 100% coverage on 999 stmts (was 1660 on main; -661 net stmts is all verifier-dependent modules now integration-only). Mac M4 evidence (REQUIRED for merge) ------------------------------------ Per ADR 0008 \u00a79: this PR's runtime correctness lives in the integration suite. Reviewer runs: bash scripts/review_pr_n4_on_mac.sh git add results/platform-tests/pr-n4-mac-* git commit -m 'Mac M4 review evidence for PR-N4' git push Stack ----- PR-N4 is branched off main, independent of PR-N1 (#53) / PR-N2 (#54) / PR-N3 (#55) at the file level. Conftests in tests/integration/ added by N1/N2/N3/N4 are file-disjoint from each other (each adds one fixture) but the file IS shared, so post-merge the four contributors' fixture defs need to be reconciled. The recommended merge order: 1. PR-N1 (verifier doubles) — adds conftest with marker hook 2. PR-N2 (engine/tokenizer doubles) — adds real_speculative_engine 3. PR-N3 (HTTP shim doubles) — uses real_speculative_engine 4. PR-N4 (this, SDK doubles) — adds real_grpc_runtime_address If a different order lands first, the integration conftest needs a small merge to combine fixtures. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

…4-sdk-conftest-stub-cleanup-8e7f Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # .github/workflows/ci.yaml # tests/integration/conftest.py # tests/sdk/python/conftest.py

…tion workflow Closes the loop on automated GA gating. After PR-N1..N4 retired all verifier-protocol test doubles from the Linux gate, the integration suite (tests/integration/) became the binding correctness gate for runtime modules \u2014 inference_engine.session.coordinator, inference_engine.session.generator, inference_engine.scheduler.scheduler, inference_engine.server.{app,engine,tokenizer,streaming}, and kakeya.{client,session}. Until this PR, that suite ran manually via scripts/review_pr_*_on_mac.sh; PR-E2 wires it into CI on every PR labelled needs-mac-m4. Three artifacts ship: .github/workflows/integration.yaml +136 lines Self-hosted runner workflow targeting [self-hosted, macOS, ARM64, kakeya-mac-m4]. Triggers on PR events when the needs-mac-m4 label is present, plus on workflow_dispatch for manual re-runs. Steps: 1. Checkout (full history). 2. Verify host shape (chip, memory, python version). 3. Verify Qwen/Qwen3-0.6B is in HF cache (HF_HUB_OFFLINE=1 at test time \u2014 no downloads in CI; cache miss fails fast with a clear pre-warm command). 4. pip install -e . + pytest dependencies (warm pip cache keeps this <30 s). 5. pytest -m integration tests/integration/ \u2014 expected runtime 60-120 s on M4 with warm cache. 90-min timeout is a safety margin, not the operating point. 6. Upload JUnit XML artifact. 7. On failure, inline the test names + first-line error messages into the Action log so triage doesn't require downloading the artifact. Concurrency: cancel-in-progress per PR, so a new push supersedes the previous run. .github/workflows/auto-label-mac.yaml +89 lines pull_request_target workflow that auto-applies (or removes) the needs-mac-m4 label based on which paths the PR touches. Trigger paths: inference_engine/ \u2014 runtime, scheduler, session, server sdks/ \u2014 Python + TypeScript SDK proto/ \u2014 wire contract tests/integration/ \u2014 the integration suite itself kv_cache_proposer/ \u2014 verifier + decoder Doc-only or CI-only PRs are NOT labelled \u2014 they skip the integration gate entirely, saving runner time. The label is automatically dropped if a subsequent push removes all verifier-dependent edits. docs/ops/mac-m4-runner-setup.md +137 lines Operator runbook for the self-hosted runner: hardware requirements (24 GB minimum, ~50 GB free disk), runner registration with the kakeya-mac-m4 label, HF cache pre-warm command (Qwen3-0.6B), Python toolchain setup, runtime expectations, cache hygiene cron, runner upgrade procedure, and failure triage steps. CI workflow split rationale --------------------------- The pre-existing .github/workflows/ci.yaml stays as the Linux gate (verifier-independent, runs on github-hosted ubuntu-latest, fires on every PR). PR-E2 adds integration.yaml as a SEPARATE workflow because: 1. Self-hosted runners are slow / few; doc-only PRs shouldn't touch them. 2. The integration gate is intentionally OPT-IN by label; ci.yaml is non-optional. 3. Failure semantics differ: Linux gate failure blocks merge unconditionally; Mac M4 gate failure surfaces a structured report but the merge decision is a human one until v0.3.0 final ships. Together the two workflows form the post-cleanup gating model: - Linux gate (ci.yaml): verifier-independent code; 100% coverage; every PR. - Mac M4 gate (integration.yaml): verifier-dependent code; binding GA gate; PRs touching runtime / SDK / proto / integration tests. Stack ----- PR-E2 is branched off main, independent of the cleanup PRs (#49, #50, #51, #52, #53, #54, #55, #56). The workflow doesn't fail at launch even before PR-E1 lands; it just won't find any tests under tests/integration/ until that PR is merged. Recommended merge order: cleanup PRs first (so the workflow has tests to run), then PR-E2. Per ADR 0008 \u00a79 ---------------- PR-E2 ships ONLY workflow YAML + a runbook \u2014 no Python source changes. No Mac M4 evidence required for this PR (the workflow itself becomes the Mac M4 evidence machinery for ALL future PRs). Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

cursoragent and others added 2 commits June 2, 2026 02:54

Merge remote-tracking branch 'origin/main' into AgentMemory/v030-pr-n…

f44cc80

…4-sdk-conftest-stub-cleanup-8e7f Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # .github/workflows/ci.yaml # tests/integration/conftest.py # tests/sdk/python/conftest.py

FluffyAIcode marked this pull request as ready for review June 2, 2026 04:06

FluffyAIcode merged commit e8e8415 into main Jun 2, 2026
5 of 6 checks passed

FluffyAIcode deleted the AgentMemory/v030-pr-n4-sdk-conftest-stub-cleanup-8e7f branch June 2, 2026 04:06

FluffyAIcode mentioned this pull request Jun 2, 2026

PR-E2 (ADR 0008 §6.5): self-hosted Mac M4 GitHub Actions integration workflow #57

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR-N4: remove SDK conftest stub + finalize no-doubles cleanup#56

PR-N4: remove SDK conftest stub + finalize no-doubles cleanup#56
FluffyAIcode merged 2 commits into
mainfrom
AgentMemory/v030-pr-n4-sdk-conftest-stub-cleanup-8e7f

FluffyAIcode commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FluffyAIcode commented Jun 2, 2026

Why

What was deleted

What was added

What stays on Linux

CI workflow change

Final state of the no-doubles cleanup

Linux verification

Mac M4 evidence (REQUIRED for merge)

Stack & merge order note

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants