feat(alb-email-poll): hit /api/me/inbox-status before full inbox fetch by whoabuddy · Pull Request #2 · aibtcdev/agent-runtime

whoabuddy · 2026-05-04T18:46:38Z

Pairs with arc0btc/agents-love-bitcoin#18 (cf-cost cleanup campaign, PR 2 — `unread_count` + `/api/me/inbox-status`).

Summary

Idle inbox polls (the common case) currently scan the full ALB inbox table on every run. With 5 active agents on 600s polling cadence that's ~720 scans/day; at 10K agents it's 1.5M.

This adds a wake-up bit:

New `getInboxStatus()` calls `GET /api/me/inbox-status` with the same BIP-137/322 sig pattern as `listInbox()`. Returns `{ unread, total }` from a single 1-row stat lookup on the DO.
Before `listInbox`, fetch inbox-status. When `unread === 0`, emit a `completed` outcome with the wake-up summary and exit. The full inbox fetch only runs on real arrival.
Backwards-compatible: a 404 from the new endpoint (older ALB version that doesn't expose it yet) falls through to the legacy path. Safe to deploy this runtime change before or after ALB PR 2 lands.

Expected effect

At 5 active agents (current fleet): 720 inbox scans/day → ~5/agent/day = 25/day on real arrivals (~96% reduction). At 10K agents: 1.5M → 50K/day.

Test plan

`bunx tsc --noEmit` clean
Run on Spark or Forge once ALB PR 2 deploys; confirm a poll with no inbound mail emits `Polled ALB inbox-status: 0 unread` and exits without the heavy listInbox call
Send a test email; confirm the next poll falls through to listInbox and triages the message normally

🤖 Generated with Claude Code

Pairs with arc0btc/agents-love-bitcoin PR 2 (cf-cost cleanup campaign, PR 2 — unread_count + /api/me/inbox-status). Idle polls (the common case) now read a single 1-row stat from AgentDO instead of scanning the inbox table on every call. - New `getInboxStatus()` calls `GET /api/me/inbox-status` with the same BIP-137/322 sig pattern as `listInbox()`. - Before listInbox, fetch inbox-status. When `unread === 0`, emit a "completed" outcome with the wake-up summary and exit. The full inbox fetch runs only on actual mail arrival. - Backwards compat: if the new endpoint returns 404 (older ALB version), fall through to the legacy listInbox path. Safe to deploy this runtime change before or after ALB PR 2 merges. At 5 active agents on 600s polling cadence, this drops AgentDO inbox scans from 720/day to ~5/agent/day = 25/day on real arrivals — a ~97% reduction. At 10K agents, 1.5M scans/day → 50K. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…it flag, host requirement, write-back guard, idempotency key Addresses 7 inline review findings (Copilot + secret-mars + arc0btc) on PR #5 — all substantive items resolved in-tree, no follow-ups left. **Catch widening (Copilot #4, secret-mars [blocking-risk], arc0btc #1)** `runSubstrateIntakeTick` previously only caught `claimNextJob`. Both `jobRowToTaskInput` and `enqueueTask` ran outside the try, so a malformed JobRow or a local SQLite lock contention escaped as a thrown error from `runOnce`. Each call site now has its own catch with a distinct skip reason — `job-parse-fail` and `local-enqueue-fail` — so the contract's "NEVER throws" guarantee is real. The substrate job stays under lease in both failure modes; `releaseExpiredLeases` reconciles on the next cycle. **`substrateDbInitialized` retry on credential fail (Copilot #1, all 3 reviewers seconded)** The flag was set to `true` BEFORE `createSubstrateConnection` succeeded. A transient credential read miss at first tick (e.g. encrypted blob not yet available) permanently disabled substrate intake for the process lifetime — the contract's documented `[substrate] skip reason=credential-fail` log line would never re-fire. Flag now sets only inside the success branch; next tick retries. **`runSubstrateWriteBack` swallows transient PG blips (arc0btc #3)** Was the only substrate fn whose Postgres calls were unguarded — a mid-write-back connection reset would propagate out of `runOnce` even though the local task already finished cleanly. Wrapped both `completeJob` and `failJob` in a single try; new `[substrate] write-back error=<msg> jobs.id=<id>` log line on transient fail. Lease recovery still reconciles eventually. **Default host removed (Copilot #3, arc0btc seconded)** `host` no longer defaults to a hard-coded private IP. When `substrate.enabled: true`, an explicit `substrate.host` is required — `createSubstrateConnection` throws a clear error if unset. Closes the "dev/test slot misconfigured at enabled:true accidentally connects to prod" footgun. **Self-contained log fallbacks (secret-mars #5)** The empty-else branches on `completeJob`/`failJob` `{ ok: false }` results relied on the substrate-db package emitting its own `[substrate] complete-epoch-mismatch ...` log. If that package's log format ever changes, those branches went silent. Added `[substrate] complete-failed jobs.id=<id> epoch=<n>` and `[substrate] fail-failed ...` fallbacks so this code is self-contained. **Idempotency-key threading (arc0btc #2 design, secret-mars idempotency follow-up)** Closes the "side-effecting jobs can execute twice when a lease expires mid-flight" hazard structurally — the fence on `jobs.claim_epoch` only protects the status write, not the action. `jobRowToTaskInput` now threads `payload.idempotency_key = "substrate-<job_id>-e<claim_epoch>"` so downstream side-effecting handlers (email send, PR open, tx broadcast) can dedup against their own per-handler key store. Bounds the *consequence* from "action runs twice" to "action runs once, second attempt no-ops." Substrate tasks also enqueue with `priority: 1` so they execute on the same or next tick — the lease window now tracks real execution latency instead of waiting behind lower-priority work, which shrinks the "lease expires while task waits in local queue" window further. **Silent null claim (Copilot #5)** Contract said null-claim is silent; impl logged `[substrate] claim ... result=none` every tick. Dropped the log — quiet-tick visibility lives in successful-claim and idle-dispatch event lines, not substrate tick-rate noise. **Other nits (Copilot #2, arc0btc #5)** - Removed unused `getTaskById` import in `substrate.ts`. - Documented `substrate` block's shallow-merge behavior in `types.ts` (unlike `profiles`/`adapters` which deep-merge). Slots that extend a base and want to flip only `isLeaseRecoveryOwner: true` must repeat the whole substrate block. Deferred (already declared out of scope by reviewers): - `pg_advisory_lock` guard on `isLeaseRecoveryOwner` race (secret-mars [suggestion]) — multi-slot mis-config protection is a follow-up. - `_substrate_*` payload-injection vs strict validators (secret-mars [suggestion]) — naming is defensive prefixing already. Contract block in src/substrate.ts updated with all new log lines and a new "Side-effect duplicate-execution guard" section. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(substrate): opt-in substrate intake — config, credential, claim, enqueue Adds @genesis-works/substrate-db as a dependency and wires a substrate dispatch intake path into the runOnce tick. Opt-in per slot: substrate is disabled by default (config.substrate.enabled=false). Flipping this flag live changes zero behavior on any slot that has not set it. Config keys (host config substrate block): substrate.enabled — bool, default false substrate.credential — string, credential id (NEVER plaintext password) substrate.kinds — string[], job kinds this slot handles substrate.slotId — string, this slot's claim identifier substrate.leaseSecs — number, default 300 (5 minutes) substrate.host/port/database/user — Postgres connection params On each tick (when enabled): 1. Credential resolved via existing encrypted-blob pattern. 2. claimNextJob(slotId, kinds) called against shared Postgres. Log: "[substrate] claim slot=<id> kinds=<list> result=<jobs.id|none>" 3. Claimed job converted to TaskInput (source="substrate:<jobId>"). 4. TaskInput enqueued as a local runtime task for next-tick execution. Failure conditions (all logged, none crash): - substrate unreachable → "[substrate] skip reason=db-unreachable error=..." - credential fail → "[substrate] skip reason=credential-fail error=..." - claimNextJob null → "[substrate] claim ... result=none" (silent no-op) Pinned substrate-db SHA: d458200b008efe8be3d37912a71b990d73ff2b17 (arc0btc/substrate-db — private, deployed alongside agent-runtime on slots) Co-Authored-By: Claude <noreply@anthropic.com> * feat(substrate): lease semantics — epoch-fenced write-back, max_attempts, lease recovery Adds the epoch-fenced write-back hook and lease recovery cadence to the substrate intake adapter. Write-back (after finalizeTaskAttempt): - If task.source starts with "substrate:", calls runSubstrateWriteBack. - Reads _substrate_job_id and _substrate_claim_epoch from task payload. - Calls completeJob(epoch) on success, failJob(epoch) on failure. - Epoch mismatch (stale executor woke after lease recovery) → logged no-op: "[substrate] complete-epoch-mismatch ... stale executor, ignored" - Pre-fencing row (epoch=0, expected>0) → distinct log line: "pre-fencing row, ignored" - completeJob/failJob return WriteBackResult — conflict is logged, not thrown. Lease recovery: - Runs releaseExpiredLeases on the substrate DB when isLeaseRecoveryOwner=true. - Cadence: leaseRecoveryCadenceSecs (default 60s) using a module-level timer. - ONE nominated owner runs this — not all slots — to avoid stampede. - Log: "[substrate] lease-recovery released=<n>" - Lease expiry measured against Postgres now() — NTP drift irrelevant. Log lines contract (stable, parseable): [substrate] claim slot=<id> kinds=<list> result=<jobs.id|none> [substrate] complete jobs.id=<id> epoch=<n> [substrate] fail jobs.id=<id> epoch=<n> reason=<...> [substrate] lease-recovery released=<n> [substrate] skip reason=credential-fail error=<...> [substrate] skip reason=db-unreachable error=<...> [substrate] complete-epoch-mismatch jobs.id=<id> expected=<n> actual=<m> — ... Tests: 125/125 pass (10 new substrate tests + 115 existing). Co-Authored-By: Claude <noreply@anthropic.com> * fix(substrate): review-feedback correctness pass — catch widening, init flag, host requirement, write-back guard, idempotency key Addresses 7 inline review findings (Copilot + secret-mars + arc0btc) on PR #5 — all substantive items resolved in-tree, no follow-ups left. **Catch widening (Copilot #4, secret-mars [blocking-risk], arc0btc #1)** `runSubstrateIntakeTick` previously only caught `claimNextJob`. Both `jobRowToTaskInput` and `enqueueTask` ran outside the try, so a malformed JobRow or a local SQLite lock contention escaped as a thrown error from `runOnce`. Each call site now has its own catch with a distinct skip reason — `job-parse-fail` and `local-enqueue-fail` — so the contract's "NEVER throws" guarantee is real. The substrate job stays under lease in both failure modes; `releaseExpiredLeases` reconciles on the next cycle. **`substrateDbInitialized` retry on credential fail (Copilot #1, all 3 reviewers seconded)** The flag was set to `true` BEFORE `createSubstrateConnection` succeeded. A transient credential read miss at first tick (e.g. encrypted blob not yet available) permanently disabled substrate intake for the process lifetime — the contract's documented `[substrate] skip reason=credential-fail` log line would never re-fire. Flag now sets only inside the success branch; next tick retries. **`runSubstrateWriteBack` swallows transient PG blips (arc0btc #3)** Was the only substrate fn whose Postgres calls were unguarded — a mid-write-back connection reset would propagate out of `runOnce` even though the local task already finished cleanly. Wrapped both `completeJob` and `failJob` in a single try; new `[substrate] write-back error=<msg> jobs.id=<id>` log line on transient fail. Lease recovery still reconciles eventually. **Default host removed (Copilot #3, arc0btc seconded)** `host` no longer defaults to a hard-coded private IP. When `substrate.enabled: true`, an explicit `substrate.host` is required — `createSubstrateConnection` throws a clear error if unset. Closes the "dev/test slot misconfigured at enabled:true accidentally connects to prod" footgun. **Self-contained log fallbacks (secret-mars #5)** The empty-else branches on `completeJob`/`failJob` `{ ok: false }` results relied on the substrate-db package emitting its own `[substrate] complete-epoch-mismatch ...` log. If that package's log format ever changes, those branches went silent. Added `[substrate] complete-failed jobs.id=<id> epoch=<n>` and `[substrate] fail-failed ...` fallbacks so this code is self-contained. **Idempotency-key threading (arc0btc #2 design, secret-mars idempotency follow-up)** Closes the "side-effecting jobs can execute twice when a lease expires mid-flight" hazard structurally — the fence on `jobs.claim_epoch` only protects the status write, not the action. `jobRowToTaskInput` now threads `payload.idempotency_key = "substrate-<job_id>-e<claim_epoch>"` so downstream side-effecting handlers (email send, PR open, tx broadcast) can dedup against their own per-handler key store. Bounds the *consequence* from "action runs twice" to "action runs once, second attempt no-ops." Substrate tasks also enqueue with `priority: 1` so they execute on the same or next tick — the lease window now tracks real execution latency instead of waiting behind lower-priority work, which shrinks the "lease expires while task waits in local queue" window further. **Silent null claim (Copilot #5)** Contract said null-claim is silent; impl logged `[substrate] claim ... result=none` every tick. Dropped the log — quiet-tick visibility lives in successful-claim and idle-dispatch event lines, not substrate tick-rate noise. **Other nits (Copilot #2, arc0btc #5)** - Removed unused `getTaskById` import in `substrate.ts`. - Documented `substrate` block's shallow-merge behavior in `types.ts` (unlike `profiles`/`adapters` which deep-merge). Slots that extend a base and want to flip only `isLeaseRecoveryOwner: true` must repeat the whole substrate block. Deferred (already declared out of scope by reviewers): - `pg_advisory_lock` guard on `isLeaseRecoveryOwner` race (secret-mars [suggestion]) — multi-slot mis-config protection is a follow-up. - `_substrate_*` payload-injection vs strict validators (secret-mars [suggestion]) — naming is defensive prefixing already. Contract block in src/substrate.ts updated with all new log lines and a new "Side-effect duplicate-execution guard" section. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(substrate): expand coverage via mock.module — claim/complete/fail/lease-recovery paths Closes the test-gap finding from arc0btc #4 and secret-mars: prior suite covered only `resolveSubstrateConfig`, `jobRowToTaskInput`, and two write-back *early-return* guards via a fakeDb that threw if touched. Claim, complete, fail, epoch-mismatch no-op, transient write-back throw, and lease recovery — the "highest correctness surface" per the PR's own Commit-2 description — were untested. Rewires the file to use `mock.module("@genesis-works/substrate-db", ...)` with mock fns per substrate-db export. Local DB is a real `openDb(:memory:)` so `enqueueTask` exercises the actual sqlite write path. console.{info,error,warn} are spied per-test for log-line assertions. New tests (18, alongside 10 existing for 28 total): - `runSubstrateIntakeTick`: - happy claim → enqueues local task, returns `claimed=true` with correct epoch + jobId, emits `[substrate] claim ...` log - null claim → silent (no `[substrate] claim` log line; just `claimed: false`) - db-unreachable → catches `claimNextJob` throw, logs `reason=db-unreachable` - local-enqueue-fail → forces `enqueueTask` to throw by closing the local db, asserts catch fires + log line includes `reason=local-enqueue-fail jobs.id=<id>` (proves Copilot #4 / arc0btc #1 fix is wired) - `runSubstrateWriteBack`: - non-substrate source no-op (existing — kept) - missing `_substrate_job_id` no-op (existing — kept) - complete happy → calls `completeJob` with `claim_epoch` arg, `[substrate] complete jobs.id=... epoch=...` log - complete `{ ok: false }` → emits self-contained fallback `[substrate] complete-failed` log (proves secret-mars #5 fix) - complete throws → catches transient PG blip, emits `[substrate] write-back error=... jobs.id=...` (proves arc0btc #3 fix is wired — write-back no longer propagates out of `runOnce`) - fail happy → calls `failJob`, logs reason snippet - fail `{ ok: false }` → emits `[substrate] fail-failed` fallback - `runSubstrateLeaseRecovery`: - released > 0 → logs `[substrate] lease-recovery released=<n>` - released = 0 → no log (quiet success) - throws → catches, logs `[substrate] lease-recovery error=...`, does NOT propagate - `createSubstrateConnection`: - throws when `substrate.host` is unset (proves Copilot #3 fix — no implicit default to a hard-coded private IP) - throws on whitespace-only host - `jobRowToTaskInput`: - asserts `payload.idempotency_key = "substrate-<job_id>-e<epoch>"` (proves idempotency-key threading is wired) - asserts `priority = 1` - asserts `_substrate_*` payload fields threaded - asserts `max_attempts = 1` (substrate handles retry) - existing source / subject-fallback tests kept Header comment rewritten to match what's actually covered (Copilot #7). Unused `mock`/`beforeEach`/`afterEach` imports replaced with the ones actually used (Copilot #6 — though now they ARE used, so the prior nit naturally resolves). Test count: 143 total (28 substrate + 115 existing) — up from 125. `bunx tsc --noEmit` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jason Schrader <jason@joinfreehold.com> Co-authored-by: Claude <noreply@anthropic.com>

arc0btc mentioned this pull request May 4, 2026

fix(cf-cost): replace metering middleware with ratelimits binding arc0btc/agents-love-bitcoin#15

Merged

6 tasks

whoabuddy merged commit a4487af into main May 4, 2026

whoabuddy deleted the feat/alb-inbox-status-wakeup branch May 4, 2026 19:28

secret-mars mentioned this pull request May 30, 2026

feat(substrate): opt-in substrate dispatch intake (Phase 5) #5

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(alb-email-poll): hit /api/me/inbox-status before full inbox fetch#2

feat(alb-email-poll): hit /api/me/inbox-status before full inbox fetch#2
whoabuddy merged 1 commit into
mainfrom
feat/alb-inbox-status-wakeup

whoabuddy commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

whoabuddy commented May 4, 2026

Summary

Expected effect

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant