test: regression coverage for code review fixes (CR-01/02/03, WR-04/05/07/08)#14
Merged
Conversation
…, WR-08
New test files cover code paths that previously had no automated coverage:
tests/test_api.py (8 tests) — first tests for apps/api/main.py
- submit/get/list/delete endpoint happy paths and 404s
- CR-03: DELETE /tasks/{id} cancels the in-flight asyncio.Task
- CR-03: _run_task tolerates dict eviction mid-flight without KeyError
- WR-07: _tasks OrderedDict caps at _MAX_TASKS via LRU eviction
tests/test_blackboard.py (17 tests) — direct Blackboard coverage
- CR-01: memory_guidance is in the allowed_types set (parametrized over
all 11 documented types as a snapshot guard)
- WR-08: timestamps are tz-aware ISO 8601 (datetime.now(timezone.utc))
- WR-04 supporting: per-session prefix isolation under shared Redis
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Locks in two fixes that shipped without coverage: CR-02 — test_concurrent_executions_do_not_corrupt_self_blackboard Runs two execute() calls concurrently on a shared ThesisOrchestrator in DRY_RUN and asserts orch.blackboard reference + session_id are unchanged. Pre-fix _execute_inner did `self.blackboard = ...` on every call, so the second concurrent call overwrote the first's reference mid-pipeline. Verified the test fails when the pre-fix assignment is reintroduced. WR-05 — test_unified_estimate_cost_includes_input_tokens Asserts UnifiedLLM._estimate_cost reflects long prompt + system tokens, not just response length. Pre-fix the estimate measured only the response string, so post-call BudgetGuard.record_actual systematically under- reported spend on the typical thesis-pipeline pattern (long prompts, short structured responses). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds 27 regression tests for fixes shipped in #12 that landed without test coverage. Test count: 123 → 150.
What's covered
tests/test_api.py(new, 8 tests) — first tests forapps/api/main.py`tests/test_blackboard.py` (new, 17 tests) — direct Blackboard coverage
Extension to `tests/test_core.py`
Extension to `tests/test_budget.py`
Verification
🤖 Generated with Claude Code