Skip to content

feat(modules/cargo): cargo/build + cargo/test — structured Rust toolchain wrappers#1502

Merged
joelteply merged 2 commits into
canaryfrom
feat/cargo-build-test-module
May 31, 2026
Merged

feat(modules/cargo): cargo/build + cargo/test — structured Rust toolchain wrappers#1502
joelteply merged 2 commits into
canaryfrom
feat/cargo-build-test-module

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Summary

Closes Priority 2 from PERSONA-AS-DEVELOPER-GAP.md: Rust iteration parity with TypeScript. Personas can now build + test their own scaffolded modules and get the same structured feedback density a human gets from npm run build:ts / cargo test.

What this adds

New stateless cargo ServiceModule at src/workers/continuum-core/src/modules/cargo/:

Command Signature Returns
cargo/build {package?, features?, release?, workingDir?, timeoutMs?} {success, errors: CargoMessage[], warnings: CargoMessage[], exitCode?, durationMs, error?}
cargo/test {package?, filter?, features?, libOnly?, release?, workingDir?, timeoutMs?} {success, passed, failed, ignored, measured, failures: string[], buildErrors: CargoMessage[], exitCode?, durationMs, error?}

Plus 6 ts-rs wire types: CargoBuildParams, CargoBuildResult, CargoTestParams, CargoTestResult, CargoMessage, CargoSpan.

Doctrine followed (per field manual)

  • §3 Module Design Template — typed envelope at entry/exit, ts-rs annotations, camelCase serde, optional fields elided
  • §4.1 Concurrency — stateless module (cargo handles its own target-dir locking; module-level lock unnecessary)
  • §4.2 Concurrency tests — multi-thread tokio (worker_threads = 4) fires 8 parallel real-cargo subprocess invocations; every result consistent
  • Three primitives — both commands are pure Commands. Streaming variants land when Stream cell shape ships (gap report priority 4)
  • Rethink-not-port — designed Rust-first; no TS predecessor

Sharp design decisions (the kinks the tests caught pre-merge)

Bug What broke Fix
parse_summary_counts got 0 every time Positional indices 0,1 broke on "ok. 22 passed" (token 0 is "ok.") Scan WITHIN each chunk for first <int> <label> pair
Failures-block exit condition Exited on lines containing : — but test names ARE module::test Enter on failures:, capture single-token lines containing ::, exit on next test result:
Repeated failures: blocks libtest emits TWO per failing binary (with stdout decorators + without) Capture from both, dedupe by first-seen order, skip ---- ... ---- markers
Subprocess pipe deadlock potential If child fills stdout buffer waiting for us Concurrent tokio tasks for stdout/stderr read alongside wait()
Runaway timeout Persona could hold substrate forever Hard-cap: build 15min, test 30min (configurable lower)

Composability with the grid (the alignment payoff)

Per the gap report's "later parts of the vision" section: both result envelopes are flat camelCase JSON, trivially serializable across airc's grid. A persona on Joel's M-series Mac can call cargo/test against a module a persona on a peer's RTX 5090 just authored — result envelope routes back on the same Commands/Events bus.

Once events/command-completed (gap report priority 3) lands, build/test attribution becomes observable in real time — closing the loop from "I built this" to "the grid knows I built this." See [[alignment-via-substrate-economics]].

Test plan (29/29 pass)

parse_build_messages (5):
  parse_build_extracts_errors_with_codes_and_spans ... ok
  parse_build_separates_warnings_from_errors ... ok
  parse_build_ignores_non_diagnostic_reasons ... ok
  parse_build_tolerates_non_json_lines ... ok
  parse_build_handles_diagnostic_without_primary_span ... ok

parse_test_output (5):
  parse_test_extracts_passing_counts_from_summary ... ok
  parse_test_captures_failure_names_in_order ... ok
  parse_test_aggregates_across_multiple_test_binaries ... ok
  parse_test_dedupes_failures_across_repeated_blocks ... ok
  parse_test_empty_output_returns_zero_counts_not_error ... ok

parse_summary_counts (2):
  summary_counts_handles_filtered_out_field ... ok
  summary_counts_handles_failed_verdict ... ok  ← caught the verdict-prefix bug

timeout (2):
  timeout_uses_default_when_none_provided ... ok
  timeout_clamps_to_max_when_request_exceeds_it ... ok

types (5): round-trip, defaults, optional-omission, lib_only, failure-order

dispatch (2):
  config_advertises_cargo_prefix ... ok
  handle_command_rejects_unknown_command_loud ... ok

end-to-end (1):
  end_to_end_subprocess_pipeline_works ... ok  ← real `cargo --version`

concurrency stress (1):
  concurrent_cargo_invocations_dont_corrupt_subprocess_pipeline ... ok
  ← 8 parallel real cargo invocations, multi-thread tokio

ts-rs exports (6): all 6 binding files generated

What this PR does NOT do

  • Does NOT add TS wrapper commands. Rust ServiceModule + IPC bridge is canonical per rust-is-the-core-node-is-the-shell
  • Does NOT stream output line-by-line. Returns single envelope at end. Streaming = gap report priority 4 (needs Stream cell shape)
  • Does NOT manage per-persona workspaces. Optional working_dir (default cwd); per-persona workspace isolation is orthogonal
  • Does NOT depend on libtest's unstable JSON output. Parses stable human-readable format. Can upgrade when JSON stabilizes
  • Does NOT scaffold via generate/module invocation. Hand-authored matching v2 template shape exactly. Future PR can swap in literal generator invocation

Stacks on

canary directly. Independent of any open chain.

References

Next up

Priority 3 from the gap report: events/command-completed — restores RTOS-brain doctrine by emitting completion events on the bus instead of forcing blocking polls. Largest scope of the three; touches dispatch hot path. Separate PR after this lands.

…hain wrappers

Closes Priority 2 from
[PERSONA-AS-DEVELOPER-GAP.md](docs/planning/PERSONA-AS-DEVELOPER-GAP.md):
Rust iteration parity with TypeScript. Personas can now build +
test their own scaffolded modules and get the same structured
feedback density Joel gets from `npm run build:ts` / `cargo test`.

# What this PR adds

New stateless `cargo` ServiceModule
(`src/workers/continuum-core/src/modules/cargo/`):

| Command | Signature | Returns |
|---|---|---|
| `cargo/build` | `{package?, features?, release?, working_dir?, timeout_ms?}` | `{success, errors: CargoMessage[], warnings: CargoMessage[], exit_code?, duration_ms, error?}` |
| `cargo/test` | `{package?, filter?, features?, lib_only?, release?, working_dir?, timeout_ms?}` | `{success, passed, failed, ignored, measured, failures: string[], build_errors: CargoMessage[], exit_code?, duration_ms, error?}` |

Plus 6 ts-rs-exported wire types: `CargoBuildParams`,
`CargoBuildResult`, `CargoTestParams`, `CargoTestResult`,
`CargoMessage`, `CargoSpan`.

# Doctrine followed (per [field manual](docs/architecture/COMMAND-INFRASTRUCTURE-FIELD-MANUAL.md))

- **Module Design Template §3** — typed `Params/Result` shapes with
  `#[derive(TS)]`, camelCase serde, optional fields with
  `#[serde(skip_serializing_if = "Option::is_none")]` + `#[ts(optional)]`
- **Concurrency doctrine §4.1** — module is stateless; cargo manages
  its own target-dir locking (concurrent invocations on the same
  target dir serialize at cargo's level; different target dirs stay
  parallel). When correctness lives BELOW the module, the
  module-level lock is unnecessary.
- **Concurrency doctrine §4.2** — multi-thread tokio stress test
  (`flavor = "multi_thread", worker_threads = 4`) fires 8 parallel
  real-cargo subprocess invocations through `run_with_timeout` and
  asserts every result is internally consistent (no plumbing
  corruption under concurrent spawn/wait).
- **Three primitives** — both commands are pure **Commands**
  (request/response). When the Stream cell shape lands (gap report
  priority 4), `cargo/build/stream` and `cargo/test/stream` can
  follow as line-by-line variants.
- **Rethink-not-port** — designed Rust-first; no TS predecessor.

# Sharp design decisions (the kinks the tests caught pre-merge)

1. **`parse_summary_counts` had to scan within each chunk** for the
   first `<int> <label>` pair, not require positional indices 0
   and 1. libtest's summary line includes a verdict prefix in the
   first chunk: `"ok. 22 passed; 1 failed"` or
   `"FAILED. 22 passed; 1 failed"`. Positional parsing got 0 every
   time. Test `summary_counts_handles_failed_verdict` pins it.

2. **Failures-block exit condition was wrong.** Initial impl exited
   on lines containing `:` — but test names ARE `module::path::test`
   which contains `::`. Fix: enter on `failures:`, capture single-
   token lines that contain `::` (strong "this is a Rust test
   name" heuristic), exit on next `test result:`. Test
   `parse_test_captures_failure_names_in_order` pins it.

3. **libtest emits TWO `failures:` blocks per failing binary** —
   first with `---- foo::b stdout ----` decorators + panic
   stdout, second with the bare test-name list. Parser captures
   from both forms (skipping decorator lines), then dedupes by
   first-seen order. Test
   `parse_test_dedupes_failures_across_repeated_blocks` pins it.

4. **Timeout clamping is hard-capped at substrate level.**
   `BUILD_MAX_TIMEOUT_MS = 900_000` (15 min); `TEST_MAX_TIMEOUT_MS
   = 1_800_000` (30 min). Higher values silently clamp — prevents
   a runaway persona from holding the substrate forever. Defaults
   (5min / 10min) cover typical iteration loops.

5. **Subprocess output captured concurrently with `wait()`.** Using
   tokio tasks for stdout/stderr read avoids the classic deadlock
   where the child fills its pipe buffer waiting for us to read
   while we wait for it to exit.

# Composability with the grid (the alignment payoff)

Per the gap report's "later parts of the vision" section: both
result envelopes are flat camelCase JSON, trivially serializable
across airc's grid. A persona on Joel's M-series Mac can call
`cargo/test` against a module a persona on a peer's RTX 5090 just
authored — result envelope routes back on the same Commands/Events
bus. The substrate already routes commands across peers; this PR
makes the wire shape grid-friendly.

See [[alignment-via-substrate-economics]] — once
`events/command-completed` (gap report priority 3) lands,
build/test attribution becomes observable in real time, closing
the loop from "I built this" to "the grid knows I built this."

# Tests (29/29 pass)

**parse_build_messages (5)** — fixture cargo JSON lines:
- E0382 with code + primary span + rendered
- Warnings separate from errors
- Non-diagnostic reasons skipped (compiler-artifact, build-finished)
- Non-JSON lines tolerated
- Diagnostic without primary span (linker errors)

**parse_test_output (5)** — fixture libtest output:
- All-pass summary extraction
- Failure-name capture in order
- Multi-binary aggregation (sum across summaries)
- Dedup across repeated failures blocks
- Empty output returns zero counts (vacuously success)

**parse_summary_counts (2)** — edge cases:
- "filtered out" tail field tolerated
- FAILED verdict prefix doesn't break positional parsing

**timeout (2)** — defaults + clamping to max

**types (5)** — camelCase round-trip, defaults, optional-omission,
lib_only flag, failure-order preservation

**dispatch (2)** — config advertises cargo/ prefix; unknown
command surfaces typed error

**end-to-end (1)** — real `cargo --version` subprocess pipeline

**concurrency stress (1)** — 8 parallel real `cargo --version`
invocations on multi-thread tokio, every result consistent

**ts-rs exports (6)** — wire bindings auto-generated

# What this PR does NOT do

- **Does NOT add TS wrapper commands.** Rust ServiceModule + IPC
  bridge is the canonical surface per `rust-is-the-core-node-is-the-shell`.
- **Does NOT stream output.** Returns single envelope at end.
  Streaming is gap report priority 4 — needs Stream cell shape
  implementation.
- **Does NOT manage per-persona workspaces.** Takes optional
  `working_dir` (default: process cwd). Per-persona workspace
  isolation is an orthogonal layer (`workspace/resolve` command
  for a future PR).
- **Does NOT depend on libtest's JSON output** (`-Z
  unstable-options`). Parses stable human-readable test output.
  When libtest stabilizes JSON output, can upgrade to structured
  per-test events in a follow-up.
- **Does NOT scaffold via `generate/module --stateful` invocation**
  for the dogfood demo. Hand-authored matching the v2 template
  shape exactly. A future PR can swap in a literal generator
  invocation as a build-time scaffold step.

# References

- [docs/planning/PERSONA-AS-DEVELOPER-GAP.md](docs/planning/PERSONA-AS-DEVELOPER-GAP.md)
  Priority 2 (this PR) — Priority 1 was code/exists+list+glob (#1501)
- [docs/architecture/COMMAND-INFRASTRUCTURE-FIELD-MANUAL.md](docs/architecture/COMMAND-INFRASTRUCTURE-FIELD-MANUAL.md)
  §3 (Module Design Template) + §4 (Concurrency doctrine)
- [docs/architecture/MODULE-CATALOG.md §0](docs/architecture/MODULE-CATALOG.md)
  — new `cargo` row to add when this lands
- Memories: [[three-primitives-commands-events-persona]],
  [[alignment-via-substrate-economics]],
  [[continuum-thesis-airc-is-the-medium]]

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ands actually dispatch

Adversarial PR review caught: `pub mod cargo;` was added to
`modules/mod.rs` but the production wire-up in `ipc::start_server`
never called `runtime.register(Arc::new(CargoModule::new()))`. Net
effect: `cargo/build` and `cargo/test` would return "Unknown
command — No module registered for this command prefix" at runtime.
The unit tests passed because they instantiate `CargoModule::new()`
directly and call `handle_command`, bypassing the runtime registry
entirely. The PR shipped dead code from the caller's perspective —
the title's deliverable didn't work end-to-end.

Fix: add the missing import + register call alongside the other
ServiceModule registrations in `ipc/mod.rs::start_server`, sandwich
between `ForgeModule` and `EventsModule` for consistency with the
existing ordering.

Per [[every-error-is-an-opportunity-to-battle-harden]]: the proper
substrate-level fix is a CI guard that asserts every `pub mod foo;`
in `modules/mod.rs` is paired with a `runtime.register(Arc::new(
FooModule::new()))` call somewhere in `ipc/mod.rs`. Filed as a
follow-up task — the dispatcher's silent miss on an
"Unknown command" prefix is exactly the class of bug that
mechanical checks should catch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
joelteply added a commit that referenced this pull request May 31, 2026
…event (#1503)

Closes Priority 3 from
[PERSONA-AS-DEVELOPER-GAP.md](docs/planning/PERSONA-AS-DEVELOPER-GAP.md):
restores the RTOS-brain doctrine ("handlers read pre-staged
results, never block on recall/embedding/planning") at the
dispatch layer. Every `CommandExecutor::execute()` now emits a
`command:completed` event on the wired bus after the dispatch
settles — subscribers consume completion events instead of
polling result surfaces.

# What this adds

## `CommandCompletedEvent` (new type)

```rust
pub struct CommandCompletedEvent {
    pub command_name: String,
    pub duration_ms: u64,
    pub success: bool,
    pub error: Option<String>,
}
```

- ts-rs exported to `shared/generated/runtime/CommandCompletedEvent.ts`
- camelCase wire shape, optional `error` elided on success
- Topic constant `COMMAND_COMPLETED_TOPIC = "command:completed"`
  centralized for publishers + subscribers + tests to share

## `CommandExecutor` extensions

- New `bus: Option<Arc<MessageBus>>` field
- Builder `with_message_bus(bus: Arc<MessageBus>) -> Self`
- New init function `init_executor_with_bus_and_interceptors(...)`
  for production startup; existing `init_executor` paths still work
  without a bus (telemetry no-ops)
- `execute()` wraps `execute_inner()` with timing + event emission
  — single `OnceLock`-set path for both production and back-compat

## `MessageBus` change

Added `command:` to the realtime passthrough list. The bus
coalesces non-realtime events with the same prefix in 50ms windows
to prevent floods from bulk ops — but command-completion events
violate the RTOS doctrine if coalesced (a persona's loop would
miss 31 out of 32 events under multi-persona load). Now flows
through uncoalesced, same as `chat:`, `sentinel:`, `presence:`,
`tool:`.

# Sharp design decisions (kinks the tests caught pre-merge)

1. **Coalescing dropped events under load.** Initial
   `concurrent_dispatches_each_emit_their_own_event` test asserted
   32 events from 32 concurrent dispatches — got 1. Root cause: the
   bus's 50ms coalescing window collapses same-prefix events. Fix:
   `command:` joins the realtime passthrough list. The test then
   confirms 32 distinct events arrive (with unique command_names,
   no event loss, no payload corruption).

2. **CommandResult doesn't impl Clone.** Test fixtures need to
   return the same canned result on repeated calls. Solution:
   `CannedModule` stores `Result<Value, String>` (cloneable) and
   wraps in `CommandResult::Json` on each handler call. No
   substrate change.

3. **Event emission is infallible telemetry, not contract.** The
   `emit_command_completed` helper publishes via `publish_async_only`
   (fire-and-forget) and silently logs serialize failures (which
   shouldn't happen for a struct of plain fields, but tolerated).
   Telemetry must never break the dispatch contract.

# Pinned invariants (multi-thread tests)

`runtime::command_executor::tests`:
- `dispatch_emits_completed_event_on_success` — happy path event with
  command_name + duration + success=true + no error
- `dispatch_emits_completed_event_on_handler_error` — failure path
  event with success=false + populated error mirroring the Err msg
- `dispatch_without_wired_bus_is_no_op_telemetry` — back-compat
  path (no bus) doesn't panic + dispatch still works
- `ts_bridge_failure_still_emits_completed_event` — third dispatch
  tier (TS bridge fallthrough) covered for both no-handler and
  failure paths; telemetry is exhaustive
- `concurrent_dispatches_each_emit_their_own_event` —
  `flavor = "multi_thread", worker_threads = 4`; 32 parallel
  dispatches each produce exactly one distinct event (no loss,
  no dupe, no payload interleave)

`runtime::command_events::tests`:
- `event_round_trips_through_wire_with_camel_case`
- `event_with_error_includes_error_on_wire`
- `event_parses_from_wire_shape_subscribers_will_see` — pin the
  exact JSON shape downstream consumers will see
- `topic_constant_is_namespaced_action_format`
- `export_bindings_commandcompletedevent` (ts-rs)

# What this PR does NOT do

- **Does NOT wire production startup to use the new init function.**
  `ipc::start_server` still calls `init_executor_with_interceptors`
  (no bus). A follow-up PR threads the runtime's bus through into
  startup. Safe: with no bus wired, the event emission is a silent
  no-op so production behavior is byte-identical until the wire
  lands.
- **Does NOT emit per-tier events** (interceptor handled vs local
  Rust vs TS bridge). One event per `execute()` call — the
  outermost outcome. Per-tier telemetry can be added later if a
  consumer needs it.
- **Does NOT emit `command:queued` / `command:dispatching`
  lifecycle events.** Just `command:completed`. The Stream cell
  shape (gap report priority 4) is the right home for in-flight
  progress events when it lands.
- **Does NOT add a default subscriber** (a persona loop that
  consumes these events). The substrate ships the publisher;
  consumers wire up per their use case via `bus.receiver()` or
  the existing `bus.subscribe()` path.

# Substrate doctrine reinforced

Per [[three-primitives-commands-events-persona]] +
[[alignment-via-substrate-economics]]: this PR composes the
Commands primitive (dispatch) with the Events primitive
(completion notifications) at the kernel layer. Personas now
have a substrate-level signal for "command X just finished with
outcome Y" — the foundation `code/shell/stream` (gap report
priority 4) extends with line-by-line streaming when the Stream
cell shape activates.

For the alignment economics: once peer dispatches over airc grid
also emit these events on the local bus (transparent via the
GridInterceptor → grid event echo), attribution becomes
substrate-observable across the grid. A peer's `cargo/build`
completing on their machine emits `command:completed` to your
local bus; your persona learns who built what, when.

# References

- [docs/planning/PERSONA-AS-DEVELOPER-GAP.md](docs/planning/PERSONA-AS-DEVELOPER-GAP.md)
  Priority 3 (this PR). Priority 1 was #1501, Priority 2 was #1502.
- [docs/architecture/COMMAND-INFRASTRUCTURE-FIELD-MANUAL.md](docs/architecture/COMMAND-INFRASTRUCTURE-FIELD-MANUAL.md)
  §2 (Substrate primitives) — adds the dispatch-level event hook
- [MODULE-CATALOG.md §0](docs/architecture/MODULE-CATALOG.md) —
  runtime substrate row to add when this lands
- Memories: [[three-primitives-commands-events-persona]],
  [[alignment-via-substrate-economics]],
  [[rtos-brain-no-region-on-hot-path]]

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@joelteply joelteply merged commit e5b0ce1 into canary May 31, 2026
4 checks passed
@joelteply joelteply deleted the feat/cargo-build-test-module branch May 31, 2026 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant