test: regression coverage for code review fixes (CR-01/02/03, WR-04/05/07/08) by W00DSRULES · Pull Request #14 · DavidHavoc/openworkers

W00DSRULES · 2026-05-09T09:19:13Z

Adds 27 regression tests for fixes shipped in #12 that landed without test coverage. Test count: 123 → 150.

What's covered

`tests/test_api.py` (new, 8 tests) — first tests for `apps/api/main.py`

Submit/get/list/delete endpoint happy paths + 404s
CR-03 — `DELETE /tasks/{id}` cancels the in-flight `asyncio.Task` via the `_task_handles` dict
CR-03 secondary — `_run_task` tolerates the `_tasks` entry vanishing mid-flight without `KeyError`
WR-07 — `_tasks` `OrderedDict` is capped at `_MAX_TASKS` via LRU eviction at submit time

`tests/test_blackboard.py` (new, 17 tests) — direct Blackboard coverage

CR-01 — `memory_guidance` is a valid entry type. Parametrized snapshot test over all 11 documented types so future renames/removals are deliberate
WR-08 — Timestamps round-trip through `datetime.fromisoformat` with non-None tzinfo (catches naive `utcnow().isoformat() + 'Z'` regressions)
WR-04 supporting — Per-session prefix isolation works correctly under a shared Redis instance
Invalid type raises `ValueError`, `get_entries_by_type` filters correctly, etc.

Extension to `tests/test_core.py`

CR-02 — `test_concurrent_executions_do_not_corrupt_self_blackboard`: runs two `execute()` calls concurrently on a shared `ThesisOrchestrator` and asserts `orch.blackboard` reference + `session_id` are unchanged. Verified that re-introducing the pre-fix `self.blackboard = Blackboard(session_id=session_id)` assignment makes this test fail.

Extension to `tests/test_budget.py`

WR-05 — `test_unified_estimate_cost_includes_input_tokens`: asserts `_estimate_cost` reflects long prompt + system tokens, not just response length.

Verification

150 tests pass (`pytest tests/`)
ruff E/W/F/I clean, black clean
CR-02 test confirmed to fail when the pre-fix is re-introduced (regression-detection verified)

🤖 Generated with Claude Code

…, WR-08 New test files cover code paths that previously had no automated coverage: tests/test_api.py (8 tests) — first tests for apps/api/main.py - submit/get/list/delete endpoint happy paths and 404s - CR-03: DELETE /tasks/{id} cancels the in-flight asyncio.Task - CR-03: _run_task tolerates dict eviction mid-flight without KeyError - WR-07: _tasks OrderedDict caps at _MAX_TASKS via LRU eviction tests/test_blackboard.py (17 tests) — direct Blackboard coverage - CR-01: memory_guidance is in the allowed_types set (parametrized over all 11 documented types as a snapshot guard) - WR-08: timestamps are tz-aware ISO 8601 (datetime.now(timezone.utc)) - WR-04 supporting: per-session prefix isolation under shared Redis Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Locks in two fixes that shipped without coverage: CR-02 — test_concurrent_executions_do_not_corrupt_self_blackboard Runs two execute() calls concurrently on a shared ThesisOrchestrator in DRY_RUN and asserts orch.blackboard reference + session_id are unchanged. Pre-fix _execute_inner did `self.blackboard = ...` on every call, so the second concurrent call overwrote the first's reference mid-pipeline. Verified the test fails when the pre-fix assignment is reintroduced. WR-05 — test_unified_estimate_cost_includes_input_tokens Asserts UnifiedLLM._estimate_cost reflects long prompt + system tokens, not just response length. Pre-fix the estimate measured only the response string, so post-call BudgetGuard.record_actual systematically under- reported spend on the typical thesis-pipeline pattern (long prompts, short structured responses). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

imer and others added 2 commits May 9, 2026 11:18

W00DSRULES merged commit 63d05e9 into main May 9, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: regression coverage for code review fixes (CR-01/02/03, WR-04/05/07/08)#14

test: regression coverage for code review fixes (CR-01/02/03, WR-04/05/07/08)#14
W00DSRULES merged 2 commits into
mainfrom
test/concurrency-and-coverage

W00DSRULES commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

W00DSRULES commented May 9, 2026

What's covered

tests/test_api.py (new, 8 tests) — first tests for apps/api/main.py

`tests/test_blackboard.py` (new, 17 tests) — direct Blackboard coverage

Extension to `tests/test_core.py`

Extension to `tests/test_budget.py`

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`tests/test_api.py` (new, 8 tests) — first tests for `apps/api/main.py`