Skip to content

feat: Bindu Gateway + bug-tracking infrastructure#463

Merged
raahulrahl merged 13 commits intomainfrom
ft-gateway
Apr 18, 2026
Merged

feat: Bindu Gateway + bug-tracking infrastructure#463
raahulrahl merged 13 commits intomainfrom
ft-gateway

Conversation

@raahulrahl
Copy link
Copy Markdown
Contributor

@raahulrahl raahulrahl commented Apr 18, 2026

What's in this branch

Lands the Bindu Gateway (a new top-level service that sits in front of peer agents and orchestrates planner-driven tool calls against them) plus the bug-tracking infrastructure this repo now uses to track fixes across the whole codebase.

Twelve commits, split into three layers:

Bug-tracking infrastructure (docs only)

Commit Purpose
79be162 Bootstrap bugs/ folder — README, postmortem template, six dated postmortems for gateway bugs surfaced in the initial review
26fbd6d Expand known-issues.md from the 13 initial gateway entries to 36, covering every unfixed item from the review
1e26bd7 Collocate known-issues.md under bugs/ alongside postmortems — one folder answers both "what's broken today" and "what broke historically"

Gateway initial landing

Commit Purpose
0e651db Phase 0 dry-run + Phase 1 Days 1–9 — the gateway itself
ae77e5d Day 10: mock Bindu agent helper, E2E integration test, README

Gateway follow-up fixes

Each of these corresponds to a postmortem under bugs/2026-04-18-*.md:

Commit Fix
484b6b8 Isolate SSE events per /plan session — sse-cross-contamination
9e49d97 Terminate SSE reader fibers on request end — spawnreader-fiber-leak
ad4f1b5 Populate remote_task_id in audit rows
bbb1474 Preserve prior summary across compaction passes — compaction-lossy-second-pass
77603da Land compaction cuts on turn boundaries — compaction-mid-turn-cut
0655ac1 Dedupe concurrent compactions per session — compaction-concurrent-races
857197a Constant-time bearer token compare — timing-unsafe-token-compare

Downstream PRs that depend on this merge

Three fix branches are open against main that each cherry-picked the three bug-tracking infra commits (since bugs/ doesn't exist on main yet). When this PR lands, each downstream PR's diff will collapse by those three commits via git's cherry detection on rebase:

  • fix/task-ownership-idor — closes IDOR on task/context endpoints
  • fix/did-signature-fail-open — DID middleware fail-open
  • fix/types-populate-by-name — accept snake_case on input, camelCase on wire

Review notes

  • The six dated postmortems under bugs/ are documentation-only and pair with the seven fix commits in this branch. Useful reading-order: open the postmortem that matches a given commit's subject to see the root cause analysis alongside the fix.
  • bugs/README.md explains the schema for future postmortems.
  • The gateway lives entirely under gateway/ — no Python core behavior changes.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Introduced Bindu Gateway service with POST /plan HTTP endpoint for AI-powered task planning across multiple agents
    • Added Server-Sent Events (SSE) for real-time streaming of planning progress, task execution, and results
    • Implemented persistent session management supporting multi-turn conversations and session resumption
    • Enabled dynamic agent skill discovery and execution with remote peer communication
    • Added identity verification using ed25519 signatures for secure agent authentication
  • Documentation

    • Added comprehensive guides, architecture specifications, and phased roadmap for gateway development

raahulrahl and others added 12 commits April 17, 2026 22:15
Adds a new TypeScript/Bun workspace at `gateway/` — a task-first
orchestrator that accepts { question, agents[], preferences? } from an
external caller, plans the work with an LLM, calls downstream Bindu
agents over A2A, and streams results back as SSE.

Based on the plan at gateway/plans/ — calibrated against live Bindu
agents via Phase 0 dry-run fixtures captured in scripts/dryrun-fixtures/.

## Phase 0 — Protocol dry-run (scripts/)

- scripts/bindu-dryrun.ts: end-to-end polling client against a local
  Bindu echo agent. Captures AgentCard, DID Doc, Skills, Negotiation,
  message/send response, and terminal Task including signatures.
- Fixtures in scripts/dryrun-fixtures/echo-agent/ drive Phase 1 Zod
  schemas and integration tests.

## Phase 1 — Gateway implementation (gateway/)

Runtime: TypeScript on Bun/Node 22, Effect 4 beta, Hono 4.10,
@supabase/supabase-js 2.58, AI SDK 6, @noble/ed25519 + bs58.

### What's fresh (Bindu-native)

- bus/               typed event bus (Effect Service + PubSub)
- config/            hierarchical config loader with env overrides
- db/                Supabase adapter (sessions, messages, tasks)
- auth/              keystore on disk for downstream credentials
- permission/        wildcard ruleset evaluator
- provider/          thin AI SDK wrapper (Anthropic, OpenAI)
- tool/              Tool.define + scoped registry
- skill/             .md + YAML frontmatter loader
- agent/             agent.md loader + Agent.Info schema
- session/           9 files — message types, session service,
                     streamText wrapper, THE LOOP, compaction,
                     summary, overflow detection, revert
- bindu/protocol/    Zod for Message, Part, Artifact, Task,
                     HistoryMessage, AgentCard, DID Document,
                     JSON-RPC envelope, BinduError with code
                     classification (auth, schema-mismatch, etc.)
- bindu/identity/    ed25519 bootstrap + verify + verifyArtifact
                     + DID resolver with TTL cache
- bindu/auth/        PeerAuth (none | bearer | bearer_env) → headers
- bindu/client/      HTTP transport + message/send + tasks/get poll
                     loop with camelCase-first + -32700/-32602 flip
                     retry + signature verification when trust.verifyDID
- bindu/index.ts     barrel (imports identity first to trigger bootstrap)
- planner/           agent catalog → dynamic tools, orchestrates
                     SessionPrompt.prompt with compactIfNeeded hook
- api/plan-route.ts  POST /plan — bearer auth, Zod request validation,
                     SSE emitter for session/plan/task.*/final/done
- server/            Hono shell + /health
- index.ts           Layer graph (Config → DB/Provider/Agent → Session
                     → SessionPrompt/SessionCompaction → Planner) +
                     ManagedRuntime boot

### What's copied from OpenCode (trimmed, vendored @opencode-ai/shared)

- effect/, util/, id/, global/, _shared/ — Effect runtime glue,
  logger, filesystem helpers, ID generators, XDG paths, error types.
  ~3400 lines of generic infra we don't need to re-derive.

### Migrations

- 001_init.sql: gateway_sessions, gateway_messages, gateway_tasks
  with RLS (service-role bypass)
- 002_compaction_revert.sql: compacted/reverted flags on messages
  and tasks, compaction_summary on sessions, partial indexes for
  active-row lookups

### Tests

Three test files, 20 tests total, all passing:
- tests/bindu/protocol.test.ts (12): fixture parsing, casing normalize,
  DID parse, error code classification
- tests/bindu/identity.test.ts (4): REAL signature verification against
  Phase 0 echo agent artifact, tamper detection
- tests/bindu/poll.test.ts (4): mock-fetch polling scenarios (submitted
  → working → completed, -32700 casing flip, input-required needsAction,
  -32013 InsufficientPermissions)

## Plan documents (gateway/plans/)

- PLAN.md: master plan — architecture, protocol wire spec, config
  schema, fork-and-extract plan, risks
- phase-0..5 detail files: preconditions, work breakdown, code
  sketches, test plans, phase-specific risks, exit gates
- README.md: index

## What's not done yet (future commits)

- Day 10: E2E tests + demo docker-compose + README top-level
- Phase 2: reconnect, tenancy/RLS enforcement, circuit breakers,
  rate limits, observability
- Phase 3: inbound Bindu server + DID signing + mTLS
- Phase 4: registry + trust scoring + cycle limits
- Phase 5: payments, negotiation orchestrator, push notifications

## Statistics

- 128 files, 16,504 insertions
- src/ = ~8700 lines TypeScript, tsc --noEmit green
- 20 tests passing (vitest)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t + README

Wraps Phase 1. The gateway now has a runnable quickstart, a CI-friendly
integration test against an in-process mock Bindu agent, and a project
README for onboarding.

## Added

- tests/helpers/mock-bindu-agent.ts — in-process HTTP server implementing
  the minimum Bindu A2A wire surface (.well-known/agent.json, message/send,
  tasks/get, tasks/cancel). Configurable respond() function; binds a random
  port per invocation.
- tests/integration/bindu-client-e2e.test.ts — 3 tests that spin up the
  mock agent and exercise sendAndPoll end-to-end. Covers:
    - message/send → tasks/get round-trip yields the expected artifact
    - respond() transform runs server-side (uppercase)
    - snake_case context_id on the wire normalizes to camelCase contextId
      on the parsed Task (Phase 0 finding validated in CI)
- gateway/README.md — quickstart, prerequisites, Supabase migration steps,
  architecture overview, test matrix, repo layout, license note.

## Totals

- 23/23 tests passing (vitest)
- tsc --noEmit green
- src/ = ~8700 lines TypeScript

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Concurrent /plan requests shared the global event bus with no filter, so
subscribers in one request's SSE stream received text.delta, task.started,
task.artifact and final frames from every other in-flight plan. In
multi-tenant deployments this was a cross-tenant information disclosure.

Split Planner.startPlan into prepareSession + runPlan so the /plan handler
learns sessionID BEFORE opening the SSE stream. Every bus.subscribe() is
then piped through Stream.filter((e) => e.properties.sessionID === ...),
so each request only ever sees its own session's frames.

The session row is now emitted as the first SSE event (previously last),
letting clients correlate every subsequent frame from the start.

Adds tests/api/plan-route-filter.test.ts — two concurrent subscribers with
different session IDs; each must see only its own deltas.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
spawnReader in plan-route.ts called Stream.runForEach on an infinite
PubSub-backed stream with no termination condition. The
`ac.signal.aborted` guard inside the callback only suppressed SSE writes;
the underlying fiber kept pulling events from the PubSub forever. Each
/plan request leaked five such fibers plus five PubSub subscriptions,
which accumulated linearly with request volume.

Introduce an `abortEffect(signal)` helper that converts an AbortSignal
into an Effect that resolves when the signal fires (via
`Effect.callback`, the Effect 4.0 replacement for `Effect.async`). Pipe
every reader stream through `Stream.interruptWhen(abortEffect(signal))`
so the fiber terminates cleanly when the handler's `finally { ac.abort() }`
runs — releasing the PubSub subscription and freeing closure-captured
state.

Drops the prior 100ms setTimeout flush hack from the success path; the
interrupt now gates the lifecycle deterministically.

Extends tests/api/plan-route-filter.test.ts with a new case that forks a
reader, publishes an event, aborts the signal, and awaits the fiber. If
interruptWhen is broken, the await hangs and Vitest fails on timeout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gateway_tasks.remote_task_id (and its index) were always NULL. The column
exists to correlate the gateway's internal task row with the peer-assigned
task id returned by the downstream Bindu agent — the ID that appears in
the peer's own logs and is required for tasks/cancel, resume, or any
cross-system debugging.

recordTask runs BEFORE the peer has issued an id, so the column stays
NULL at insert time. finishTask runs AFTER the peer has responded and
has the id in outcome.task.id, but the interface had no field for it —
so the update never wrote it through.

Adds `remoteTaskId?: string` to FinishTaskInput, writes it into the
update patch when supplied, and captures `outcome.task.id` in the planner
tool path so every successful Bindu call produces an audit row keyed to
the peer's task id.

Typecheck-only verification; this is pure plumbing — the interface change
guarantees the field flows end-to-end at compile time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Session compaction overwrote gateway_sessions.compaction_summary on every
run with only the summary of the newly-added messages. Run #1 summarized
turns 1–N into paragraph A; run #2 summarized turns N+1–M into paragraph
B, which REPLACED A wholesale. Any load-bearing fact captured in A was
permanently lost — long sessions progressively forgot early context
(user's original goal, early agent results, translations, pinned facts).

An additional latent bug: session.history() prepends the prior summary
as a synthetic user-message with a freshly-minted UUID. On pass #2 that
synthetic would land in `head` and be re-summarized as part of the body,
paraphrase-of-a-paraphrase style; and the subsequent `UPDATE ... WHERE
id IN (head_ids)` was a silent no-op for the synthetic id.

Fix:
  - summarize() grows an optional `priorSummary?: string | null`.
    When present, it's injected as a leading user message tagged
    `[PRIOR SUMMARY — preserve every fact below]`.
  - The system prompt gains an explicit fact-preservation clause and
    "new summary must be a SUPERSET of the prior summary" language.
  - Closing instruction switches to a union-with-prior variant when a
    non-empty prior summary is present.
  - Whitespace-only prior summary is treated as absent (single `hasPrior`
    flag gates both the marker block and the closing prompt — first
    regression test caught this edge case).

compaction.runCompaction:
  - Filters synthetic messages out of history before splitHead, so the
    no-op UPDATE path is gone and the prior summary is not rewritten as
    part of head.
  - Reads compaction_summary directly from the session row before
    summarizing and passes it as priorSummary.
  - No-ops cleanly when there's nothing new to fold in, avoiding
    redundant LLM calls that would just re-paraphrase the same content.

Overwriting the column is now safe because the new summary is
constructed as a SUPERSET of the old one.

Adds tests/session/summary.test.ts — three cases covering marker
injection, closing-prompt variant selection, and whitespace handling.
Verified the "with priorSummary" test catches the regression by
temporarily reverting summary.ts and re-running.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
splitHead did a raw `history.slice(0, history.length - keepTail)`. A
single planner turn (user → assistant-with-tool_use → tool_result → ...
→ final-assistant) can span far more than 4 messages — three-tool turns
run 8 messages, ten-tool turns run 22. The naive cut routinely landed
INSIDE a turn, stranding an assistant tool_use in `head` whose matching
tool_result was kept verbatim in `tail`.

On the very next model call the provider (Anthropic, OpenAI) rejected
the request with "tool_use / tool_result mismatch" — a 400 error that
the planner cannot retry its way out of, because the DB state is already
broken (head rows are flagged compacted=true). The session was dead
until someone manually cleared the flag.

Fix: walk LEFT from the naive cut until the message at the split point
is a user turn. Since a user message starts a new turn by definition,
the invariant is that every assistant tool_use is in the same half as
its tool_result. `keepTail` becomes a MINIMUM — we keep more in tail to
reach a safe boundary, never fewer. If no user message exists left of
the naive cut (entire history is one unbroken turn), we bail with
{head: [], tail: history} rather than break the pairing.

Adds tests/session/compaction-split.test.ts — five cases covering
tool-heavy turns near the tail, single-unbroken-turn histories, and
the general keepTail invariant. Verified regression-catching: with the
old raw-slice algorithm restored, two cases fail (mid-turn cut and
tool-pair-terminal turn).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two concurrent /plan requests on the same session_id both triggered
compactIfNeeded. Each read the same pre-compaction history, each called
the summarizer LLM (doubling cost), and each UPDATE'd
gateway_sessions.compaction_summary — last writer wins. Because LLMs
are non-deterministic even at low temperature, the two summaries
diverged and whichever paragraph lost the race silently dropped its
facts from session state.

The head-row UPDATE (`SET compacted=true`) is idempotent so that part
was harmless, but the summary-column race was a real data-loss path.

Fix: application-layer promise dedupe. A per-process
Map<SessionID, Promise<CompactOutcome>> records the in-flight
compaction for each session. Second caller finds the existing entry
and awaits THE SAME promise — no second LLM call, no second UPDATE,
both callers receive the identical CompactOutcome. The map entry is
cleared in a finally block, so a resolved (or failed) compaction
does not block the next one.

Limitation: this is per-process state. A horizontally-scaled gateway
fronting a single Supabase could still race across processes. Noted
in code comment; Phase 2 can add a Postgres version column or
stored-proc-wrapped compaction for cross-process safety. Single-process
Phase 1 is correct today.

Adds tests/session/compaction-dedupe.test.ts — four cases covering the
happy path (same promise reused), post-settle behavior (next call
kicks off fresh producer), per-session isolation (different keys don't
share), and error-path recovery (rejected promise clears the entry
so retry works). Verified the happy-path test catches the regression
by disabling the map lookup and re-running.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
plan-route.ts authenticated incoming requests with
`authConfig.tokens.includes(token)`, which compares strings byte-by-byte
via `===` and short-circuits on the first mismatch. The time difference
between "first byte matched" and "first byte didn't" is observable over
the network; with enough samples an attacker can recover a bearer token
byte-by-byte. Iterating the tokens array with a short-circuiting match
additionally leaks which token in the list was a prefix of the guess.

Replace with a constant-time validator:
  1. SHA-256 both the provided token and each configured token.
     Hashing normalizes inputs to 32 bytes — removes the length leak
     and lets timingSafeEqual run without throwing on unequal-length
     buffers.
  2. Run timingSafeEqual against EVERY entry even after a match. Total
     time becomes O(tokens.length), independent of which token matched
     or whether any did.
  3. OR the results into a single boolean at the end.

Exported from plan-route.ts as validateBearerToken so it can be tested
directly. The call site in handleRequest() replaces the `includes`
check — no behavior change for valid inputs, no timing leak for
invalid ones.

Adds tests/api/bearer-token.test.ts — six cases covering: single-token
match, unknown rejection, empty config (no default-accept), tokens of
vastly different lengths (length not leaked), exact-match semantics
(no prefix/suffix/case hits), and a loose timing-variance check that
runs 10k iterations each of a "byte-0 match" and a "byte-0 mismatch"
guess and asserts their ratio stays under 3x. The old includes()
would fail that last test because character-by-character compare
amplifies the byte-depth difference over thousands of iterations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Establishes a three-way split for tracking bugs across the repo:

  - GitHub Issues: volatile, where triage and status live. Source of
    truth for IS THIS FIXED YET.
  - bugs/*.md: durable postmortems for bugs that taught us something
    about a class of failure. One file per bug, indefinite retention.
    Required template with Symptom / Root cause / Fix sections;
    Why-tests-missed and Class-of-bug sections strongly encouraged
    (not CI-enforced yet — medium strictness by intent).
  - docs/known-issues.md: user-facing heads-up for current limitations.
    Entries are REMOVED as issues close; this file grows only for
    things that aren't planned to be fixed soon.

Adds bugs/README.md with the template and inclusion rules.

Seeds bugs/ with six postmortems for the critical and security bugs
resolved in commits 484b6b8 through 857197a (the gateway code-review
pass):

  - sse-cross-contamination: bus.subscribe without tenancy filter
  - spawnreader-fiber-leak: Stream.runForEach on infinite stream,
    AbortSignal check inside callback only skipped writes
  - compaction-lossy-second-pass: overwriting a lossy-compressed
    column compounds loss; must merge-then-write
  - compaction-mid-turn-cut: raw-index slice on a message list with
    semantic (turn) boundaries broke tool_use/tool_result pairing
  - compaction-concurrent-races: non-idempotent UPDATE on a shared
    row had no dedupe; LLM non-determinism made the race silent
  - timing-unsafe-token-compare: .includes() on secrets short-
    circuits on first mismatch, recoverable byte-by-byte via timing

Skipped a postmortem for the remote_task_id fix (commit ad4f1b5)
— pure plumbing with no generalizable lesson beyond the commit
message.

Seeds docs/known-issues.md with thirteen still-open limitations
surfaced by the same review pass but not yet fixed (context-window
hardcoded, abort-signal propagation to the Bindu client, permission
rules not enforced for tool calls, tool-name collisions, agent-
catalog overwrite, signature verification semantics, pagination
truncation, TTL cleanup, rate limiting, token estimation accuracy,
DID resolver stampede, bearer-env error collapse, and the known
single-process-only limitation of the compaction-dedupe fix).

Structure is repo-wide by intent: bugs/ sits at the top level with
`area:` frontmatter tagging the subsystem (gateway/api, gateway/
session, bindu/core, sdks/typescript). docs/known-issues.md has
per-subsystem sections. Single archive across Python core, gateway,
SDKs, and frontend.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Expands docs/known-issues.md from 13 to 36 gateway entries, covering
every unfixed item surfaced in the original code review — not just
the subset seeded with the initial commit.

Additions by severity:

high (new):
  - poll-budget-unbounded-wall-clock: sendAndPoll's 60 × 10s backoff
    means one hung peer stalls a tool call for up to 5 min with no
    overall plan deadline.
  - no-session-concurrency-guard: two /plan requests on the same
    session_id interleave writes to gateway_messages; compaction
    dedupe doesn't cover plan turns themselves.

medium (new):
  - tool-input-sent-as-textpart: JSON.stringify(args) goes as
    TextPart; skills expecting DataPart fail silently or partially.
  - prompt-injection-scrubbing-theater: the regex strip of
    "ignore previous" etc. is trivially bypassable and gives false
    confidence; real defenses listed in the workaround.
  - did-resolver-no-key-id-selection: primaryPublicKeyBase58 picks
    first key, wrong during rotation windows.
  - no-graceful-shutdown: httpServer.close + runtime.dispose drops
    in-flight SSE streams mid-frame; no drain, no deadline.
  - assistant-message-lost-on-stream-error: LLM stream errors drop
    the partially-completed assistant row even when tool calls have
    already been billed. gateway_tasks and gateway_messages drift.
  - json-schema-to-zod-incomplete: enum, oneOf, pattern, numeric
    bounds, additionalProperties are ignored — planner LLM gets no
    signal about valid values.

medium (expanded):
  - tool-name-collisions-silent: added note about
    parseAgentFromTool's non-greedy regex mis-parsing agent names
    that contain underscores (breaks the task.started SSE event).

low (new):
  - resume-race-duplicate-session: concurrent first-request on a
    fresh session_id hits the UNIQUE constraint on the second
    insert; caller sees 500, retry resolves.
  - cancel-casing-not-retried: poll-exhaust cancel uses camelCase
    only; peers requiring snake_case get a silent leak.
  - health-endpoint-no-dependency-probe: /health returns 200 even
    if Supabase / provider are down.
  - no-request-id-in-logs: no correlation ID; client/server log
    joins rely on timestamp + peer URL.
  - no-config-hot-reload: changes to agents/planner.md or config
    require a full restart.
  - resolve-env-limited-to-simple-var: only bare $VAR matches, not
    ${VAR}/suffix or default-value syntax.
  - compaction-summary-injected-as-user-role: synthetic message
    uses user role; system role (or tagged marker) would be safer.
  - revert-millisecond-ties-nondeterministic: created_at boundary
    ambiguity under sub-ms insertions.
  - revert-doesnt-cancel-remote-tasks: local-only revert; peer-side
    tasks continue consuming resources.
  - empty-agents-catalog-no-400: agents: [] default accepted; LLM
    attempts phantom tool calls instead of a clear 400.
  - no-migration-rollback: migrations are forward-only; paired
    down.sql does not exist.

nit (new):
  - tasks-recorded-is-dead-state: planner populates an array that's
    never returned or persisted.
  - map-finish-reason-pointless-ternary: conditional type is always
    `any`; simplify in a cleanup pass.
  - db-effect-promise-swallows-errors: two paths in db/index.ts use
    Effect.promise which treats rejection as defect, silently
    resolving. Correctness-adjacent.
  - test-coverage-gaps: consolidated entry enumerating the missing
    end-to-end cases (concurrent plans, long-session compaction,
    revert, non-English, >1000-row sessions, aborted requests,
    snake_case cancel retry).

Not added (already fixed in prior commits):
  - unused BusEvent import in plan-route.ts (fixed in 484b6b8)
  - setTimeout(100) flush hack in plan-route.ts (fixed in 9e49d97)

Organizational change: entries are now grouped by severity header
(High / Medium / Low / Nits) within each subsystem, with a leading
note explaining the ordering convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collocates the two bug-tracking artifacts under a single top-level
folder:

  bugs/
  ├── README.md                       (explains both formats)
  ├── known-issues.md                 (user-facing, current limitations)
  ├── 2026-04-18-sse-cross-contamination.md
  ├── 2026-04-18-spawnreader-fiber-leak.md
  └── ...                             (dated postmortems, fixed bugs)

Rationale: `bugs/` and `docs/known-issues.md` were answering two
closely-related questions ("what's broken today" / "what broke
historically"); keeping them in separate folders meant readers had
to discover the split and maintainers had to update cross-references
every time. One folder, one README, one place to look.

File naming distinguishes them at a glance: dated `YYYY-MM-DD-*.md`
files are postmortems for FIXED bugs with indefinite retention;
`known-issues.md` is a single living file of CURRENT limitations
whose entries are REMOVED as the underlying issues get fixed.

Changes:
  - git mv docs/known-issues.md → bugs/known-issues.md (rename
    preserves history).
  - Rewrote bugs/README.md intro to describe both artifacts and the
    two questions they answer; updated the "Relationship to other
    tracking" cross-reference to use a sibling path.
  - Updated three postmortem files (compaction-concurrent-races,
    compaction-lossy-second-pass, compaction-mid-turn-cut) that
    referenced "docs/known-issues.md" in prose to use the new
    sibling path.
  - Fixed three internal links inside known-issues.md that pointed
    at "../bugs/" (now "./" since it lives in the same folder).

No content change in known-issues.md or any postmortem — pure
reorganization + link fixups. Tested with
`grep -rn "known-issues" bugs/` to confirm no stale paths remain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 18, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a0ee166e-03a4-4ea8-b247-b1fac3a92c97

📥 Commits

Reviewing files that changed from the base of the PR and between a9c4dc3 and a28bf8f.

⛔ Files ignored due to path filters (2)
  • gateway/package-lock.json is excluded by !**/package-lock.json
  • scripts/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (134)
  • gateway/.env.example
  • gateway/.gitignore
  • gateway/README.md
  • gateway/agents/planner.md
  • gateway/migrations/001_init.sql
  • gateway/migrations/002_compaction_revert.sql
  • gateway/package.json
  • gateway/plans/PLAN.md
  • gateway/plans/README.md
  • gateway/plans/phase-0-dryrun.md
  • gateway/plans/phase-1-mvp.md
  • gateway/plans/phase-2-production.md
  • gateway/plans/phase-3-inbound.md
  • gateway/plans/phase-4-public-network.md
  • gateway/plans/phase-5-opportunistic.md
  • gateway/src/_shared/filesystem.ts
  • gateway/src/_shared/global.ts
  • gateway/src/_shared/npm.ts
  • gateway/src/_shared/types.d.ts
  • gateway/src/_shared/util/array.ts
  • gateway/src/_shared/util/binary.ts
  • gateway/src/_shared/util/effect-flock.ts
  • gateway/src/_shared/util/encode.ts
  • gateway/src/_shared/util/error.ts
  • gateway/src/_shared/util/flock.ts
  • gateway/src/_shared/util/fn.ts
  • gateway/src/_shared/util/glob.ts
  • gateway/src/_shared/util/hash.ts
  • gateway/src/_shared/util/identifier.ts
  • gateway/src/_shared/util/iife.ts
  • gateway/src/_shared/util/lazy.ts
  • gateway/src/_shared/util/module.ts
  • gateway/src/_shared/util/path.ts
  • gateway/src/_shared/util/retry.ts
  • gateway/src/_shared/util/slug.ts
  • gateway/src/agent/index.ts
  • gateway/src/api/plan-route.ts
  • gateway/src/auth/index.ts
  • gateway/src/bindu/auth/resolver.ts
  • gateway/src/bindu/client/fetch.ts
  • gateway/src/bindu/client/index.ts
  • gateway/src/bindu/client/poll.ts
  • gateway/src/bindu/identity/bootstrap.ts
  • gateway/src/bindu/identity/index.ts
  • gateway/src/bindu/identity/resolve.ts
  • gateway/src/bindu/identity/verify.ts
  • gateway/src/bindu/index.ts
  • gateway/src/bindu/protocol/agent-card.ts
  • gateway/src/bindu/protocol/identity.ts
  • gateway/src/bindu/protocol/index.ts
  • gateway/src/bindu/protocol/jsonrpc.ts
  • gateway/src/bindu/protocol/normalize.ts
  • gateway/src/bindu/protocol/types.ts
  • gateway/src/bus/bus-event.ts
  • gateway/src/bus/index.ts
  • gateway/src/config/index.ts
  • gateway/src/config/loader.ts
  • gateway/src/config/schema.ts
  • gateway/src/db/index.ts
  • gateway/src/db/types.ts
  • gateway/src/effect/index.ts
  • gateway/src/effect/instance-registry.ts
  • gateway/src/effect/logger.ts
  • gateway/src/effect/runner.ts
  • gateway/src/global/index.ts
  • gateway/src/id/id.ts
  • gateway/src/index.ts
  • gateway/src/permission/index.ts
  • gateway/src/planner/index.ts
  • gateway/src/provider/index.ts
  • gateway/src/server/index.ts
  • gateway/src/session/compaction.ts
  • gateway/src/session/index.ts
  • gateway/src/session/llm.ts
  • gateway/src/session/message.ts
  • gateway/src/session/overflow.ts
  • gateway/src/session/prompt.ts
  • gateway/src/session/revert.ts
  • gateway/src/session/schema.ts
  • gateway/src/session/summary.ts
  • gateway/src/skill/index.ts
  • gateway/src/tool/registry.ts
  • gateway/src/tool/tool.ts
  • gateway/src/util/abort.ts
  • gateway/src/util/archive.ts
  • gateway/src/util/color.ts
  • gateway/src/util/data-url.ts
  • gateway/src/util/defer.ts
  • gateway/src/util/effect-http-client.ts
  • gateway/src/util/effect-zod.ts
  • gateway/src/util/error.ts
  • gateway/src/util/filesystem.ts
  • gateway/src/util/fn.ts
  • gateway/src/util/format.ts
  • gateway/src/util/iife.ts
  • gateway/src/util/index.ts
  • gateway/src/util/lazy.ts
  • gateway/src/util/local-context.ts
  • gateway/src/util/locale.ts
  • gateway/src/util/lock.ts
  • gateway/src/util/log.ts
  • gateway/src/util/process.ts
  • gateway/src/util/queue.ts
  • gateway/src/util/record.ts
  • gateway/src/util/schema.ts
  • gateway/src/util/scrap.ts
  • gateway/src/util/signal.ts
  • gateway/src/util/timeout.ts
  • gateway/src/util/token.ts
  • gateway/src/util/update-schema.ts
  • gateway/src/util/wildcard.ts
  • gateway/tests/api/bearer-token.test.ts
  • gateway/tests/api/plan-route-filter.test.ts
  • gateway/tests/bindu/identity.test.ts
  • gateway/tests/bindu/poll.test.ts
  • gateway/tests/bindu/protocol.test.ts
  • gateway/tests/helpers/mock-bindu-agent.ts
  • gateway/tests/integration/bindu-client-e2e.test.ts
  • gateway/tests/session/compaction-dedupe.test.ts
  • gateway/tests/session/compaction-split.test.ts
  • gateway/tests/session/summary.test.ts
  • gateway/tsconfig.json
  • gateway/vitest.config.ts
  • scripts/.gitignore
  • scripts/bindu-dryrun.ts
  • scripts/dryrun-fixtures/echo-agent/NOTES.md
  • scripts/dryrun-fixtures/echo-agent/agent-card.json
  • scripts/dryrun-fixtures/echo-agent/did-doc.json
  • scripts/dryrun-fixtures/echo-agent/final-task.json
  • scripts/dryrun-fixtures/echo-agent/negotiation.json
  • scripts/dryrun-fixtures/echo-agent/skill-question-answering-v1.json
  • scripts/dryrun-fixtures/echo-agent/skills.json
  • scripts/dryrun-fixtures/echo-agent/submit-response.json
  • scripts/package.json

📝 Walkthrough

Walkthrough

The PR introduces a complete Bindu Gateway service—a TypeScript/Bun HTTP server that accepts user questions, orchestrates multi-agent task execution via an LLM-based planner, persists conversation state in Supabase, and streams results back via Server-Sent Events. Implementation spans database schemas, Bindu protocol handling with signature verification, session management including compaction and revert, dynamic tool registration, LLM streaming integration, and comprehensive infrastructure.

Changes

Cohort / File(s) Summary
Core Setup
gateway/.env.example, gateway/.gitignore, gateway/package.json, gateway/tsconfig.json, gateway/vitest.config.ts
Environment and build configuration with Supabase/API key placeholders, Node ≥22 requirement, TypeScript strict mode, Vitest setup with env file loading.
Documentation
gateway/README.md, gateway/agents/planner.md, gateway/plans/*.md
Comprehensive operational and design documentation: MVP quickstart, Anthropic Claude planner config, and detailed phase roadmap (0–5) covering protocol dry-run, MVP, production hardening, inbound exposure, discovery/trust, and opportunistic features.
Database & Migrations
gateway/migrations/001_init.sql, gateway/migrations/002_compaction_revert.sql
Supabase schema: gateway_sessions, gateway_messages, gateway_tasks tables with indexes, RLS-enabled; Phase 2 additions for compaction state and message/task revert tracking.
API & Server
gateway/src/server/index.ts, gateway/src/api/plan-route.ts
Hono app factory with /health route and SSE-streaming POST /plan handler that validates bearer auth, manages sessions, runs the planner, filters bus events by session, and handles abort signals.
Bindu Protocol
gateway/src/bindu/protocol/*.ts
Zod-based schemas and parsers: AgentCard, Task, Message, Part, Artifact, DIDDocument, SkillDetail; JSON-RPC 2.0 request/response; error classification (schema mismatch, auth, retryable); camelCase/snake_case normalization layer.
Bindu Identity & Auth
gateway/src/bindu/identity/*.ts, gateway/src/bindu/auth/resolver.ts
Ed25519 signature verification with bootstrap, DID document resolution with caching, DID parsing (did:bindu, did:key), authentication header resolution from config.
Bindu Client & Polling
gateway/src/bindu/client/fetch.ts, gateway/src/bindu/client/poll.ts, gateway/src/bindu/client/index.ts
HTTP RPC transport with timeout/abort, polling loop with adaptive casing retry on schema-mismatch, task completion detection, signature verification, peer cancellation support.
Configuration
gateway/src/config/schema.ts, gateway/src/config/loader.ts, gateway/src/config/index.ts
Zod schemas for server/auth/session/Supabase/limits/agents/permissions; environment and JSON file loading with $VAR substitution and override merging.
Database Service
gateway/src/db/types.ts, gateway/src/db/index.ts
Session/message/task CRUD operations backed by Supabase client; row type interfaces; task state enums and terminal-state checks.
Session Management
gateway/src/session/*.ts
Multi-file session subsystem: schema (branded UUIDs), message model with text/file/tool parts, LLM streaming wrapper, prompt loop with system prompt building and tool execution, compaction with turn-boundary safety, revert/rollback, overflow detection and token accounting, summary generation.
Planning & Tool System
gateway/src/planner/index.ts, gateway/src/tool/*.ts, gateway/src/skill/index.ts
Session prompt orchestration with dynamic tool registration per agent skill; tool context and execution result wrapping in <remote_content> to prevent injection; skill markdown parsing with YAML frontmatter; tool registry with scoped lifecycle.
Agent & Permission Management
gateway/src/agent/index.ts, gateway/src/permission/index.ts, gateway/src/auth/index.ts
Agent loading from markdown with config overlay; wildcard-based permission evaluation (allow/deny/ask); persistent credential store (bearer/API key/OAuth/DID/mTLS) with JSON file backing.
Event Bus
gateway/src/bus/*.ts
Typed event bus with per-session filtering, PubSub channels, and stream subscribers for prompt lifecycle events.
Effect Infrastructure
gateway/src/effect/*.ts
Logger wrapper with Effect context, runner state machine for async task coordination, instance registry for cleanup.
Provider & LLM
gateway/src/provider/index.ts
Model ID parsing, Anthropic/OpenAI factory with optional custom API keys/base URLs, Effect-based service layer.
Shared Utilities
gateway/src/_shared/*.ts, gateway/src/util/*.ts
Filesystem operations, npm package management with locking, global path computation, encode/hash helpers, binary search, lazy evaluation, glob/path utilities, error formatting, Effect HTTP client wrapping, Zod schema conversion.
Filesystem & Global State
gateway/src/global/index.ts
XDG-based directory paths (data, cache, config, state) with test home override, cache versioning, flock global state initialization.
ID Generation
gateway/src/id/id.ts
Prefix-based monotonic IDs with ascending/descending support, timestamp extraction.
Index & Entry Points
gateway/src/index.ts, gateway/src/bindu/index.ts, gateway/src/**/index.ts
Barrel modules wiring Effect layers, composing service dependencies, re-exporting subsystems.
Tests
gateway/tests/**/*.test.ts, gateway/tests/helpers/mock-bindu-agent.ts
Unit tests for bearer token auth, SSE event filtering, signature verification; integration tests for polling against mock agent; regression tests for compaction turn boundaries, summary preservation; in-process mock Bindu agent HTTP server.
Phase 0 Dry-Run
scripts/bindu-dryrun.ts, scripts/package.json, scripts/.gitignore
Protocol validation script fetching agent card, DID doc, skills, negotiation; submitting message and polling with adaptive casing; signature verification; anomaly detection and reporting.
Dry-Run Fixtures
scripts/dryrun-fixtures/echo-agent/*
Captured wire payloads: agent card JSON, DID document, skill definitions, task lifecycle (submit/poll response), negotiation payload, and protocol findings (NOTES.md).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes


🐰 A gateway born of code and dreams so bright,
Where agents dance and plans take flight.
Sessions store their tales with care,
Compaction's logic—turn-aware!
hops with glee 🌟

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ft-gateway

@raahulrahl raahulrahl merged commit dd25bf7 into main Apr 18, 2026
1 of 3 checks passed
raahulrahl added a commit that referenced this pull request Apr 19, 2026
Post-merge CI on main failed on two pre-commit hooks after #463
landed:

  1. detect-secrets: 4 high-entropy strings in gateway dry-run
     fixtures (echo-agent/NOTES.md, did-doc.json, final-task.json)
     and one hex expectation in gateway/tests/bindu/protocol.test.ts
     not present in .secrets.baseline. Plus a fifth flag on a
     docstring example URL in scripts/backfill_owner_did.py.

  2. pydocstyle on scripts/backfill_owner_did.py: D301 (docstring
     contained literal backslashes without r"" prefix) and D103
     (main() missing a docstring).

Fixes:

  * .secrets.baseline — re-run `detect-secrets scan --baseline
    .secrets.baseline` to pick up the four new fixture/test
    locations. Entries land as unaudited (is_secret unset), same
    shape as the existing 22 baseline entries, so the pre-commit
    hook passes; the pre-push verify-secrets-audited hook is
    orthogonal and was not in the failing CI step.

  * scripts/backfill_owner_did.py — docstring made raw (``r"""``)
    so the ``\`` line-continuations in the usage examples no longer
    need doubling; backslashes now render as actual shell line
    continuations in ``--help`` output. main() gains a one-line
    docstring. The placeholder URL in the docstring keeps a
    ``# pragma: allowlist secret`` inline so it does not need a
    second baseline entry.

Local checks:
  * detect-secrets-hook --baseline → exit 0 on the flagged files
  * pydocstyle scripts/backfill_owner_did.py → exit 0
  * pytest tests/unit/ → 795 passed, 3 skipped

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant