feat: Bindu Gateway + bug-tracking infrastructure#463
Conversation
Adds a new TypeScript/Bun workspace at `gateway/` — a task-first
orchestrator that accepts { question, agents[], preferences? } from an
external caller, plans the work with an LLM, calls downstream Bindu
agents over A2A, and streams results back as SSE.
Based on the plan at gateway/plans/ — calibrated against live Bindu
agents via Phase 0 dry-run fixtures captured in scripts/dryrun-fixtures/.
## Phase 0 — Protocol dry-run (scripts/)
- scripts/bindu-dryrun.ts: end-to-end polling client against a local
Bindu echo agent. Captures AgentCard, DID Doc, Skills, Negotiation,
message/send response, and terminal Task including signatures.
- Fixtures in scripts/dryrun-fixtures/echo-agent/ drive Phase 1 Zod
schemas and integration tests.
## Phase 1 — Gateway implementation (gateway/)
Runtime: TypeScript on Bun/Node 22, Effect 4 beta, Hono 4.10,
@supabase/supabase-js 2.58, AI SDK 6, @noble/ed25519 + bs58.
### What's fresh (Bindu-native)
- bus/ typed event bus (Effect Service + PubSub)
- config/ hierarchical config loader with env overrides
- db/ Supabase adapter (sessions, messages, tasks)
- auth/ keystore on disk for downstream credentials
- permission/ wildcard ruleset evaluator
- provider/ thin AI SDK wrapper (Anthropic, OpenAI)
- tool/ Tool.define + scoped registry
- skill/ .md + YAML frontmatter loader
- agent/ agent.md loader + Agent.Info schema
- session/ 9 files — message types, session service,
streamText wrapper, THE LOOP, compaction,
summary, overflow detection, revert
- bindu/protocol/ Zod for Message, Part, Artifact, Task,
HistoryMessage, AgentCard, DID Document,
JSON-RPC envelope, BinduError with code
classification (auth, schema-mismatch, etc.)
- bindu/identity/ ed25519 bootstrap + verify + verifyArtifact
+ DID resolver with TTL cache
- bindu/auth/ PeerAuth (none | bearer | bearer_env) → headers
- bindu/client/ HTTP transport + message/send + tasks/get poll
loop with camelCase-first + -32700/-32602 flip
retry + signature verification when trust.verifyDID
- bindu/index.ts barrel (imports identity first to trigger bootstrap)
- planner/ agent catalog → dynamic tools, orchestrates
SessionPrompt.prompt with compactIfNeeded hook
- api/plan-route.ts POST /plan — bearer auth, Zod request validation,
SSE emitter for session/plan/task.*/final/done
- server/ Hono shell + /health
- index.ts Layer graph (Config → DB/Provider/Agent → Session
→ SessionPrompt/SessionCompaction → Planner) +
ManagedRuntime boot
### What's copied from OpenCode (trimmed, vendored @opencode-ai/shared)
- effect/, util/, id/, global/, _shared/ — Effect runtime glue,
logger, filesystem helpers, ID generators, XDG paths, error types.
~3400 lines of generic infra we don't need to re-derive.
### Migrations
- 001_init.sql: gateway_sessions, gateway_messages, gateway_tasks
with RLS (service-role bypass)
- 002_compaction_revert.sql: compacted/reverted flags on messages
and tasks, compaction_summary on sessions, partial indexes for
active-row lookups
### Tests
Three test files, 20 tests total, all passing:
- tests/bindu/protocol.test.ts (12): fixture parsing, casing normalize,
DID parse, error code classification
- tests/bindu/identity.test.ts (4): REAL signature verification against
Phase 0 echo agent artifact, tamper detection
- tests/bindu/poll.test.ts (4): mock-fetch polling scenarios (submitted
→ working → completed, -32700 casing flip, input-required needsAction,
-32013 InsufficientPermissions)
## Plan documents (gateway/plans/)
- PLAN.md: master plan — architecture, protocol wire spec, config
schema, fork-and-extract plan, risks
- phase-0..5 detail files: preconditions, work breakdown, code
sketches, test plans, phase-specific risks, exit gates
- README.md: index
## What's not done yet (future commits)
- Day 10: E2E tests + demo docker-compose + README top-level
- Phase 2: reconnect, tenancy/RLS enforcement, circuit breakers,
rate limits, observability
- Phase 3: inbound Bindu server + DID signing + mTLS
- Phase 4: registry + trust scoring + cycle limits
- Phase 5: payments, negotiation orchestrator, push notifications
## Statistics
- 128 files, 16,504 insertions
- src/ = ~8700 lines TypeScript, tsc --noEmit green
- 20 tests passing (vitest)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t + README
Wraps Phase 1. The gateway now has a runnable quickstart, a CI-friendly
integration test against an in-process mock Bindu agent, and a project
README for onboarding.
## Added
- tests/helpers/mock-bindu-agent.ts — in-process HTTP server implementing
the minimum Bindu A2A wire surface (.well-known/agent.json, message/send,
tasks/get, tasks/cancel). Configurable respond() function; binds a random
port per invocation.
- tests/integration/bindu-client-e2e.test.ts — 3 tests that spin up the
mock agent and exercise sendAndPoll end-to-end. Covers:
- message/send → tasks/get round-trip yields the expected artifact
- respond() transform runs server-side (uppercase)
- snake_case context_id on the wire normalizes to camelCase contextId
on the parsed Task (Phase 0 finding validated in CI)
- gateway/README.md — quickstart, prerequisites, Supabase migration steps,
architecture overview, test matrix, repo layout, license note.
## Totals
- 23/23 tests passing (vitest)
- tsc --noEmit green
- src/ = ~8700 lines TypeScript
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Concurrent /plan requests shared the global event bus with no filter, so subscribers in one request's SSE stream received text.delta, task.started, task.artifact and final frames from every other in-flight plan. In multi-tenant deployments this was a cross-tenant information disclosure. Split Planner.startPlan into prepareSession + runPlan so the /plan handler learns sessionID BEFORE opening the SSE stream. Every bus.subscribe() is then piped through Stream.filter((e) => e.properties.sessionID === ...), so each request only ever sees its own session's frames. The session row is now emitted as the first SSE event (previously last), letting clients correlate every subsequent frame from the start. Adds tests/api/plan-route-filter.test.ts — two concurrent subscribers with different session IDs; each must see only its own deltas. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
spawnReader in plan-route.ts called Stream.runForEach on an infinite
PubSub-backed stream with no termination condition. The
`ac.signal.aborted` guard inside the callback only suppressed SSE writes;
the underlying fiber kept pulling events from the PubSub forever. Each
/plan request leaked five such fibers plus five PubSub subscriptions,
which accumulated linearly with request volume.
Introduce an `abortEffect(signal)` helper that converts an AbortSignal
into an Effect that resolves when the signal fires (via
`Effect.callback`, the Effect 4.0 replacement for `Effect.async`). Pipe
every reader stream through `Stream.interruptWhen(abortEffect(signal))`
so the fiber terminates cleanly when the handler's `finally { ac.abort() }`
runs — releasing the PubSub subscription and freeing closure-captured
state.
Drops the prior 100ms setTimeout flush hack from the success path; the
interrupt now gates the lifecycle deterministically.
Extends tests/api/plan-route-filter.test.ts with a new case that forks a
reader, publishes an event, aborts the signal, and awaits the fiber. If
interruptWhen is broken, the await hangs and Vitest fails on timeout.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gateway_tasks.remote_task_id (and its index) were always NULL. The column exists to correlate the gateway's internal task row with the peer-assigned task id returned by the downstream Bindu agent — the ID that appears in the peer's own logs and is required for tasks/cancel, resume, or any cross-system debugging. recordTask runs BEFORE the peer has issued an id, so the column stays NULL at insert time. finishTask runs AFTER the peer has responded and has the id in outcome.task.id, but the interface had no field for it — so the update never wrote it through. Adds `remoteTaskId?: string` to FinishTaskInput, writes it into the update patch when supplied, and captures `outcome.task.id` in the planner tool path so every successful Bindu call produces an audit row keyed to the peer's task id. Typecheck-only verification; this is pure plumbing — the interface change guarantees the field flows end-to-end at compile time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Session compaction overwrote gateway_sessions.compaction_summary on every run with only the summary of the newly-added messages. Run #1 summarized turns 1–N into paragraph A; run #2 summarized turns N+1–M into paragraph B, which REPLACED A wholesale. Any load-bearing fact captured in A was permanently lost — long sessions progressively forgot early context (user's original goal, early agent results, translations, pinned facts). An additional latent bug: session.history() prepends the prior summary as a synthetic user-message with a freshly-minted UUID. On pass #2 that synthetic would land in `head` and be re-summarized as part of the body, paraphrase-of-a-paraphrase style; and the subsequent `UPDATE ... WHERE id IN (head_ids)` was a silent no-op for the synthetic id. Fix: - summarize() grows an optional `priorSummary?: string | null`. When present, it's injected as a leading user message tagged `[PRIOR SUMMARY — preserve every fact below]`. - The system prompt gains an explicit fact-preservation clause and "new summary must be a SUPERSET of the prior summary" language. - Closing instruction switches to a union-with-prior variant when a non-empty prior summary is present. - Whitespace-only prior summary is treated as absent (single `hasPrior` flag gates both the marker block and the closing prompt — first regression test caught this edge case). compaction.runCompaction: - Filters synthetic messages out of history before splitHead, so the no-op UPDATE path is gone and the prior summary is not rewritten as part of head. - Reads compaction_summary directly from the session row before summarizing and passes it as priorSummary. - No-ops cleanly when there's nothing new to fold in, avoiding redundant LLM calls that would just re-paraphrase the same content. Overwriting the column is now safe because the new summary is constructed as a SUPERSET of the old one. Adds tests/session/summary.test.ts — three cases covering marker injection, closing-prompt variant selection, and whitespace handling. Verified the "with priorSummary" test catches the regression by temporarily reverting summary.ts and re-running. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
splitHead did a raw `history.slice(0, history.length - keepTail)`. A
single planner turn (user → assistant-with-tool_use → tool_result → ...
→ final-assistant) can span far more than 4 messages — three-tool turns
run 8 messages, ten-tool turns run 22. The naive cut routinely landed
INSIDE a turn, stranding an assistant tool_use in `head` whose matching
tool_result was kept verbatim in `tail`.
On the very next model call the provider (Anthropic, OpenAI) rejected
the request with "tool_use / tool_result mismatch" — a 400 error that
the planner cannot retry its way out of, because the DB state is already
broken (head rows are flagged compacted=true). The session was dead
until someone manually cleared the flag.
Fix: walk LEFT from the naive cut until the message at the split point
is a user turn. Since a user message starts a new turn by definition,
the invariant is that every assistant tool_use is in the same half as
its tool_result. `keepTail` becomes a MINIMUM — we keep more in tail to
reach a safe boundary, never fewer. If no user message exists left of
the naive cut (entire history is one unbroken turn), we bail with
{head: [], tail: history} rather than break the pairing.
Adds tests/session/compaction-split.test.ts — five cases covering
tool-heavy turns near the tail, single-unbroken-turn histories, and
the general keepTail invariant. Verified regression-catching: with the
old raw-slice algorithm restored, two cases fail (mid-turn cut and
tool-pair-terminal turn).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two concurrent /plan requests on the same session_id both triggered compactIfNeeded. Each read the same pre-compaction history, each called the summarizer LLM (doubling cost), and each UPDATE'd gateway_sessions.compaction_summary — last writer wins. Because LLMs are non-deterministic even at low temperature, the two summaries diverged and whichever paragraph lost the race silently dropped its facts from session state. The head-row UPDATE (`SET compacted=true`) is idempotent so that part was harmless, but the summary-column race was a real data-loss path. Fix: application-layer promise dedupe. A per-process Map<SessionID, Promise<CompactOutcome>> records the in-flight compaction for each session. Second caller finds the existing entry and awaits THE SAME promise — no second LLM call, no second UPDATE, both callers receive the identical CompactOutcome. The map entry is cleared in a finally block, so a resolved (or failed) compaction does not block the next one. Limitation: this is per-process state. A horizontally-scaled gateway fronting a single Supabase could still race across processes. Noted in code comment; Phase 2 can add a Postgres version column or stored-proc-wrapped compaction for cross-process safety. Single-process Phase 1 is correct today. Adds tests/session/compaction-dedupe.test.ts — four cases covering the happy path (same promise reused), post-settle behavior (next call kicks off fresh producer), per-session isolation (different keys don't share), and error-path recovery (rejected promise clears the entry so retry works). Verified the happy-path test catches the regression by disabling the map lookup and re-running. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
plan-route.ts authenticated incoming requests with
`authConfig.tokens.includes(token)`, which compares strings byte-by-byte
via `===` and short-circuits on the first mismatch. The time difference
between "first byte matched" and "first byte didn't" is observable over
the network; with enough samples an attacker can recover a bearer token
byte-by-byte. Iterating the tokens array with a short-circuiting match
additionally leaks which token in the list was a prefix of the guess.
Replace with a constant-time validator:
1. SHA-256 both the provided token and each configured token.
Hashing normalizes inputs to 32 bytes — removes the length leak
and lets timingSafeEqual run without throwing on unequal-length
buffers.
2. Run timingSafeEqual against EVERY entry even after a match. Total
time becomes O(tokens.length), independent of which token matched
or whether any did.
3. OR the results into a single boolean at the end.
Exported from plan-route.ts as validateBearerToken so it can be tested
directly. The call site in handleRequest() replaces the `includes`
check — no behavior change for valid inputs, no timing leak for
invalid ones.
Adds tests/api/bearer-token.test.ts — six cases covering: single-token
match, unknown rejection, empty config (no default-accept), tokens of
vastly different lengths (length not leaked), exact-match semantics
(no prefix/suffix/case hits), and a loose timing-variance check that
runs 10k iterations each of a "byte-0 match" and a "byte-0 mismatch"
guess and asserts their ratio stays under 3x. The old includes()
would fail that last test because character-by-character compare
amplifies the byte-depth difference over thousands of iterations.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Establishes a three-way split for tracking bugs across the repo:
- GitHub Issues: volatile, where triage and status live. Source of
truth for IS THIS FIXED YET.
- bugs/*.md: durable postmortems for bugs that taught us something
about a class of failure. One file per bug, indefinite retention.
Required template with Symptom / Root cause / Fix sections;
Why-tests-missed and Class-of-bug sections strongly encouraged
(not CI-enforced yet — medium strictness by intent).
- docs/known-issues.md: user-facing heads-up for current limitations.
Entries are REMOVED as issues close; this file grows only for
things that aren't planned to be fixed soon.
Adds bugs/README.md with the template and inclusion rules.
Seeds bugs/ with six postmortems for the critical and security bugs
resolved in commits 484b6b8 through 857197a (the gateway code-review
pass):
- sse-cross-contamination: bus.subscribe without tenancy filter
- spawnreader-fiber-leak: Stream.runForEach on infinite stream,
AbortSignal check inside callback only skipped writes
- compaction-lossy-second-pass: overwriting a lossy-compressed
column compounds loss; must merge-then-write
- compaction-mid-turn-cut: raw-index slice on a message list with
semantic (turn) boundaries broke tool_use/tool_result pairing
- compaction-concurrent-races: non-idempotent UPDATE on a shared
row had no dedupe; LLM non-determinism made the race silent
- timing-unsafe-token-compare: .includes() on secrets short-
circuits on first mismatch, recoverable byte-by-byte via timing
Skipped a postmortem for the remote_task_id fix (commit ad4f1b5)
— pure plumbing with no generalizable lesson beyond the commit
message.
Seeds docs/known-issues.md with thirteen still-open limitations
surfaced by the same review pass but not yet fixed (context-window
hardcoded, abort-signal propagation to the Bindu client, permission
rules not enforced for tool calls, tool-name collisions, agent-
catalog overwrite, signature verification semantics, pagination
truncation, TTL cleanup, rate limiting, token estimation accuracy,
DID resolver stampede, bearer-env error collapse, and the known
single-process-only limitation of the compaction-dedupe fix).
Structure is repo-wide by intent: bugs/ sits at the top level with
`area:` frontmatter tagging the subsystem (gateway/api, gateway/
session, bindu/core, sdks/typescript). docs/known-issues.md has
per-subsystem sections. Single archive across Python core, gateway,
SDKs, and frontend.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Expands docs/known-issues.md from 13 to 36 gateway entries, covering
every unfixed item surfaced in the original code review — not just
the subset seeded with the initial commit.
Additions by severity:
high (new):
- poll-budget-unbounded-wall-clock: sendAndPoll's 60 × 10s backoff
means one hung peer stalls a tool call for up to 5 min with no
overall plan deadline.
- no-session-concurrency-guard: two /plan requests on the same
session_id interleave writes to gateway_messages; compaction
dedupe doesn't cover plan turns themselves.
medium (new):
- tool-input-sent-as-textpart: JSON.stringify(args) goes as
TextPart; skills expecting DataPart fail silently or partially.
- prompt-injection-scrubbing-theater: the regex strip of
"ignore previous" etc. is trivially bypassable and gives false
confidence; real defenses listed in the workaround.
- did-resolver-no-key-id-selection: primaryPublicKeyBase58 picks
first key, wrong during rotation windows.
- no-graceful-shutdown: httpServer.close + runtime.dispose drops
in-flight SSE streams mid-frame; no drain, no deadline.
- assistant-message-lost-on-stream-error: LLM stream errors drop
the partially-completed assistant row even when tool calls have
already been billed. gateway_tasks and gateway_messages drift.
- json-schema-to-zod-incomplete: enum, oneOf, pattern, numeric
bounds, additionalProperties are ignored — planner LLM gets no
signal about valid values.
medium (expanded):
- tool-name-collisions-silent: added note about
parseAgentFromTool's non-greedy regex mis-parsing agent names
that contain underscores (breaks the task.started SSE event).
low (new):
- resume-race-duplicate-session: concurrent first-request on a
fresh session_id hits the UNIQUE constraint on the second
insert; caller sees 500, retry resolves.
- cancel-casing-not-retried: poll-exhaust cancel uses camelCase
only; peers requiring snake_case get a silent leak.
- health-endpoint-no-dependency-probe: /health returns 200 even
if Supabase / provider are down.
- no-request-id-in-logs: no correlation ID; client/server log
joins rely on timestamp + peer URL.
- no-config-hot-reload: changes to agents/planner.md or config
require a full restart.
- resolve-env-limited-to-simple-var: only bare $VAR matches, not
${VAR}/suffix or default-value syntax.
- compaction-summary-injected-as-user-role: synthetic message
uses user role; system role (or tagged marker) would be safer.
- revert-millisecond-ties-nondeterministic: created_at boundary
ambiguity under sub-ms insertions.
- revert-doesnt-cancel-remote-tasks: local-only revert; peer-side
tasks continue consuming resources.
- empty-agents-catalog-no-400: agents: [] default accepted; LLM
attempts phantom tool calls instead of a clear 400.
- no-migration-rollback: migrations are forward-only; paired
down.sql does not exist.
nit (new):
- tasks-recorded-is-dead-state: planner populates an array that's
never returned or persisted.
- map-finish-reason-pointless-ternary: conditional type is always
`any`; simplify in a cleanup pass.
- db-effect-promise-swallows-errors: two paths in db/index.ts use
Effect.promise which treats rejection as defect, silently
resolving. Correctness-adjacent.
- test-coverage-gaps: consolidated entry enumerating the missing
end-to-end cases (concurrent plans, long-session compaction,
revert, non-English, >1000-row sessions, aborted requests,
snake_case cancel retry).
Not added (already fixed in prior commits):
- unused BusEvent import in plan-route.ts (fixed in 484b6b8)
- setTimeout(100) flush hack in plan-route.ts (fixed in 9e49d97)
Organizational change: entries are now grouped by severity header
(High / Medium / Low / Nits) within each subsystem, with a leading
note explaining the ordering convention.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collocates the two bug-tracking artifacts under a single top-level
folder:
bugs/
├── README.md (explains both formats)
├── known-issues.md (user-facing, current limitations)
├── 2026-04-18-sse-cross-contamination.md
├── 2026-04-18-spawnreader-fiber-leak.md
└── ... (dated postmortems, fixed bugs)
Rationale: `bugs/` and `docs/known-issues.md` were answering two
closely-related questions ("what's broken today" / "what broke
historically"); keeping them in separate folders meant readers had
to discover the split and maintainers had to update cross-references
every time. One folder, one README, one place to look.
File naming distinguishes them at a glance: dated `YYYY-MM-DD-*.md`
files are postmortems for FIXED bugs with indefinite retention;
`known-issues.md` is a single living file of CURRENT limitations
whose entries are REMOVED as the underlying issues get fixed.
Changes:
- git mv docs/known-issues.md → bugs/known-issues.md (rename
preserves history).
- Rewrote bugs/README.md intro to describe both artifacts and the
two questions they answer; updated the "Relationship to other
tracking" cross-reference to use a sibling path.
- Updated three postmortem files (compaction-concurrent-races,
compaction-lossy-second-pass, compaction-mid-turn-cut) that
referenced "docs/known-issues.md" in prose to use the new
sibling path.
- Fixed three internal links inside known-issues.md that pointed
at "../bugs/" (now "./" since it lives in the same folder).
No content change in known-issues.md or any postmortem — pure
reorganization + link fixups. Tested with
`grep -rn "known-issues" bugs/` to confirm no stale paths remain.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (134)
📝 WalkthroughWalkthroughThe PR introduces a complete Bindu Gateway service—a TypeScript/Bun HTTP server that accepts user questions, orchestrates multi-agent task execution via an LLM-based planner, persists conversation state in Supabase, and streams results back via Server-Sent Events. Implementation spans database schemas, Bindu protocol handling with signature verification, session management including compaction and revert, dynamic tool registration, LLM streaming integration, and comprehensive infrastructure. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
Post-merge CI on main failed on two pre-commit hooks after #463 landed: 1. detect-secrets: 4 high-entropy strings in gateway dry-run fixtures (echo-agent/NOTES.md, did-doc.json, final-task.json) and one hex expectation in gateway/tests/bindu/protocol.test.ts not present in .secrets.baseline. Plus a fifth flag on a docstring example URL in scripts/backfill_owner_did.py. 2. pydocstyle on scripts/backfill_owner_did.py: D301 (docstring contained literal backslashes without r"" prefix) and D103 (main() missing a docstring). Fixes: * .secrets.baseline — re-run `detect-secrets scan --baseline .secrets.baseline` to pick up the four new fixture/test locations. Entries land as unaudited (is_secret unset), same shape as the existing 22 baseline entries, so the pre-commit hook passes; the pre-push verify-secrets-audited hook is orthogonal and was not in the failing CI step. * scripts/backfill_owner_did.py — docstring made raw (``r"""``) so the ``\`` line-continuations in the usage examples no longer need doubling; backslashes now render as actual shell line continuations in ``--help`` output. main() gains a one-line docstring. The placeholder URL in the docstring keeps a ``# pragma: allowlist secret`` inline so it does not need a second baseline entry. Local checks: * detect-secrets-hook --baseline → exit 0 on the flagged files * pydocstyle scripts/backfill_owner_did.py → exit 0 * pytest tests/unit/ → 795 passed, 3 skipped Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What's in this branch
Lands the Bindu Gateway (a new top-level service that sits in front of peer agents and orchestrates planner-driven tool calls against them) plus the bug-tracking infrastructure this repo now uses to track fixes across the whole codebase.
Twelve commits, split into three layers:
Bug-tracking infrastructure (docs only)
79be162bugs/folder — README, postmortem template, six dated postmortems for gateway bugs surfaced in the initial review26fbd6dknown-issues.mdfrom the 13 initial gateway entries to 36, covering every unfixed item from the review1e26bd7known-issues.mdunderbugs/alongside postmortems — one folder answers both "what's broken today" and "what broke historically"Gateway initial landing
0e651dbae77e5dGateway follow-up fixes
Each of these corresponds to a postmortem under
bugs/2026-04-18-*.md:484b6b8sse-cross-contamination9e49d97spawnreader-fiber-leakad4f1b5remote_task_idin audit rowsbbb1474compaction-lossy-second-pass77603dacompaction-mid-turn-cut0655ac1compaction-concurrent-races857197atiming-unsafe-token-compareDownstream PRs that depend on this merge
Three fix branches are open against
mainthat each cherry-picked the three bug-tracking infra commits (sincebugs/doesn't exist on main yet). When this PR lands, each downstream PR's diff will collapse by those three commits via git's cherry detection on rebase:fix/task-ownership-idor— closes IDOR on task/context endpointsfix/did-signature-fail-open— DID middleware fail-openfix/types-populate-by-name— accept snake_case on input, camelCase on wireReview notes
bugs/are documentation-only and pair with the seven fix commits in this branch. Useful reading-order: open the postmortem that matches a given commit's subject to see the root cause analysis alongside the fix.bugs/README.mdexplains the schema for future postmortems.gateway/— no Python core behavior changes.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
POST /planHTTP endpoint for AI-powered task planning across multiple agentsDocumentation