fix(cognition): max_concurrency: usize::MAX panics tokio Semaphore at registration#1419
fix(cognition): max_concurrency: usize::MAX panics tokio Semaphore at registration#1419joelteply wants to merge 1 commit into
Conversation
…ation CognitionModule::config() set `max_concurrency: usize::MAX` as a mistaken way to spell "no cap" (PR #1306). The Runtime's `register()` calls `tokio::sync::Semaphore::new(max_concurrency)` whenever `max_concurrency > 0`; tokio's MAX_PERMITS is `usize::MAX >> 3` (~2^61), so `Semaphore::new(usize::MAX)` panics immediately at registration with: thread 'X' panicked at tokio-1.50.0/src/sync/batch_semaphore.rs:141:9: a semaphore may not have more than MAX_PERMITS permits (2305843009213693951) The Runtime's `register()` only instantiates a Semaphore when `max_concurrency > 0` — `0` is the canonical spelling of "unbounded, no semaphore" used by other modules (`inference-llm`, `RecordingModule` test fixtures, etc.). Switching to `0` preserves the original intent ("no cap on cognition; downstream ai_provider enforces the inference serialization") without panicking. This was a latent bug surfaced while building integration tests for the Lane D `persona/turn-execute` command (#1409) that go through `runtime.route_command`. Existing cognition tests bypass the panic by calling `module.handle_command` directly without going through `runtime.register`. * Add `registration_safety_tests::cognition_module_registers_without_semaphore_panic` as a regression guard: this test calls `Runtime::register` with CognitionModule, which panics if `max_concurrency > MAX_PERMITS`. Asserts `config.max_concurrency == 0` to make the dependency explicit for future readers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Superseded by #1417 — codex's PR carries the same max_concurrency: usize::MAX → 0 fix (commit 5bde37e) stacked with the Rust-only dispatch fix on the same Lane D issue. Per manager coordination on airc: keeping #1417 as the single canonical bug-fix PR for #1409 risks, so the bug-fix history stays clean and reviewable. My #1419 was orthogonal in concept but redundant in scope — landing both would just be churn. Closing. |
|
REOPENING — manager confirmed on airc that this Semaphore panic is the root cause for 'site does literally nothing' on canary. Every npm start hits Runtime::register → Semaphore::new(usize::MAX) → tokio MAX_PERMITS overflow → CognitionModule registration panic → worker dies. This standalone fix is the smallest fast-track patch to unblock canary; #1417 carries the same fix but bundled with the Rust-only dispatch fix which is broader scope. Landing this first restores npm start, then #1417 can land cleanly on top of fixed canary. |
Latent bug — CognitionModule panics at Runtime registration
Symptom
`CognitionModule::config()` sets `max_concurrency: usize::MAX` as a mistaken way to spell "no cap". The `Runtime::register()` path calls `tokio::sync::Semaphore::new(max_concurrency)` whenever `max_concurrency > 0`. tokio's MAX_PERMITS is `usize::MAX >> 3` (~2^61), so `Semaphore::new(usize::MAX)` panics immediately:
```
thread '...' panicked at tokio-1.50.0/src/sync/batch_semaphore.rs:141:9:
a semaphore may not have more than MAX_PERMITS permits (2305843009213693951)
```
Why it's been latent
Existing cognition tests bypass the panic by calling `module.handle_command` directly without going through `runtime.register`. The IPC path that production uses (`ipc/mod.rs:944` → `runtime.register(Arc::new(CognitionModule::new(...)))` → `runtime.route_command` for cognition commands) IS affected.
Surfaced while building Lane D `persona/turn-execute` integration tests that go through `runtime.route_command`.
Fix
```
```
The Runtime's `register()` only creates a Semaphore when `max_concurrency > 0` — `0` is the canonical spelling of "unbounded, no semaphore" used by other modules (`inference-llm`, the `RecordingModule` test fixtures, etc.). Preserves the original intent ("no cap on cognition; downstream ai_provider enforces inference serialization") without panicking.
Regression test
Added `registration_safety_tests::cognition_module_registers_without_semaphore_panic` that calls `Runtime::register` with CognitionModule. If `max_concurrency` is ever raised above tokio's MAX_PERMITS, this test panics at the original error message — easy breadcrumb back to this regression.
Provenance
Introduced in PR #1306 "perf(cognition): lift max_concurrency:1 cap so event fanout is not the gate". Author meant "no cap"; the right spelling for that in this codebase is `0`.
Validation
```
cd src/workers/continuum-core
cargo test --features metal,accelerate --lib modules::cognition
```
5/5 pass (4 pre-existing + 1 new regression test).
🤖 Generated with Claude Code