Skip to content

fix(ci): resolve eslint peer dep conflict breaking npm install#116

Open
trevormil wants to merge 1 commit intomainfrom
fix/ci-eslint-peer-deps
Open

fix(ci): resolve eslint peer dep conflict breaking npm install#116
trevormil wants to merge 1 commit intomainfrom
fix/ci-eslint-peer-deps

Conversation

@trevormil
Copy link
Copy Markdown
Collaborator

Summary

Test plan

  • npm install succeeds without ERESOLVE errors
  • All 518 tests pass locally (npm run test)
  • CI "Run Tests" workflow passes on this PR
  • CI "CodeQL" workflow passes on this PR

🤖 Generated with Claude Code

Remove unused eslint-config-standard-with-typescript (requires
@typescript-eslint/eslint-plugin ^6.4.0, conflicts with ^8.56.1 from
PR #107). Also remove its unused peer deps eslint-plugin-n and
eslint-plugin-promise. Update actions/checkout to v4, add
actions/setup-node@v4, and bump github/codeql-action from v1 to v3.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
trevormil added a commit that referenced this pull request Apr 22, 2026
…e throws

Greptile P1 on indexer PR #116 surfaced that the indexer's
TokenLedger.checkQuota() throws were being silently swallowed when
invoked from inside the onTokenUsage hook. Root cause: the SDK was
treating every hook as fire-and-forget via `fireHook()`, which wraps
the callback in `Promise.resolve().catch(() => {})`.

Distinction now documented + enforced:
- onTokenUsage is LOAD-BEARING. Awaited directly; rejections
  propagate out of runAgentLoop so consumers can enforce quotas.
  Matches the legacy indexer agentLoop contract.
- onToolCall / onStatusUpdate / onCompletion stay fire-and-forget —
  they're observability-only; a misbehaving logger must not hang
  a build.

Tests: 17/17 pass unchanged (no test depended on the old swallowed-
throw behavior).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
trevormil added a commit that referenced this pull request Apr 22, 2026
Addresses all critical issues from the parallel code-review + E2E smoke
runs on PRs #172 / #116. Tests: 146/146 pass.

**P0 blockers (users could not run a build before this):**
- anthropicClient: `Anthropic.Anthropic ?? Anthropic` resolved to
  `BaseAnthropic` (an internal parent class with no `.messages`
  resource), producing "Cannot read properties of undefined (reading
  'create')" the first time any agent ran. Verified in Anthropic
  SDK >=0.82: `mod.default.Anthropic` exists but points at BaseAnthropic.
  Fixed by using the module export directly (it IS the Anthropic
  class).
- anthropicClient: OAuth bearer tokens were rejected 401 — Anthropic
  requires `anthropic-beta: oauth-2025-04-20` header for OAuth creds.
  Now auto-applied when `authToken` is provided. API-key path unchanged.
- BitBadgesAgent: creatorAddress didn't land on the final tx if the
  LLM's first tool call was a non-session tool (search_knowledge_base,
  fetch_docs). The SDK's session was created lazily without the
  creator, resulting in empty `value.creator`. Now we pre-init the
  session via `getOrCreateSession(sid, creator)` up front — mirrors
  the legacy indexer handler's explicit init.
- BitBadgesAgent: sessionId used `Math.random()` for the random
  suffix — CodeQL flagged as insecure-randomness in a security
  context. Replaced with `crypto.randomUUID()`.

**P1 correctness:**
- toolAdapter.mergeDefaults: `{ ...defaults, ...incoming }` was a
  classic footgun — an `incoming` key set to `undefined` would knock
  out the default. Now strips undefined from incoming before merge.
- BitBadgesAgent: concurrent `build()` calls racing through
  `this.client ??= await getAnthropicClient()` each fired their own
  init and the last-winning result silently discarded the others'
  errors. Shared `clientInitPromise` now deduplicates. Promise
  cleared on rejection so transient failures don't poison future
  retries.
- BitBadgesAgent: `systemPromptAppend` was concatenated into the
  system prompt with zero screening. Hosted/untrusted deployments
  could inject "ignore previous instructions" via this field.
  `containsInjection` check now runs at construction and throws a
  clear error if the append contains obvious injection patterns.
- BitBadgesAgent: `exportPrompt` was skipping the ctor's
  `systemPromptAppend` — builds saw the append, exports didn't.
  Parity restored.
- BitBadgesAgent.runBuild: unguarded `txResponse?.transaction ??
  txResponse` could leave `transaction = undefined` if
  `get_transaction` returned nothing unexpected. Falls back to
  `{ messages: [] }` so downstream validation + sanity checks
  process a well-formed shell instead of NPE'ing.

**Local-dev ergonomics:**
- createBitBadgesCommunitySkillsFetcher now detects localhost /
  127.0.0.1 / *.localhost URLs and skips the "no API key → return
  []" gate in dev. Mirrors how the indexer itself relaxes auth for
  local development — third-party devs iterating against a local
  indexer don't need a BitBadges API key to exercise the community-
  skills path.

Tests added to cover: OAuth header presence, creator pre-init,
sessionId shape, mergeDefaults undefined filter, client init
dedup, systemPromptAppend injection rejection, exportPrompt
append parity, local-mode fetcher. (Most already in place from
the subagent's unit-test pass — tweaked a few to lock in the new
behavior.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
trevormil added a commit that referenced this pull request Apr 22, 2026
* feat(builder): open-source BitBadgesAgent — BYO Anthropic key

Adds a programmatic AI builder so anyone can build BitBadges
collections from a prompt without going through the BitBadges
frontend or API. Consumers install @anthropic-ai/sdk as a peer
dep and pass their own key — BitBadges never sees credentials.

Three-tier surface:
- bitbadges/builder/agent  → BitBadgesAgent class (stable)
- bitbadges/builder/internals → prompt, loop, validation, adapters
  (unstable — for DIY consumers, may break between minors)
- existing bitbadges-builder MCP stdio bin unchanged

BitBadgesAgent features:
- Zero-config: new BitBadgesAgent({ anthropicKey }).build('…')
- Model picker (haiku/sonnet/opus) with per-model cost reporting
- Validation modes: strict (default), lenient, off
- Skills filter, systemPromptAppend, full systemPrompt replace
- tools.add / tools.remove for bounded customization
- Hooks: onTokenUsage, onToolCall, onStatusUpdate, onCompletion
- Pluggable KVStore (MemoryStore + FileStore ship, consumers can
  BYO Redis/etc.)
- Typed errors (ValidationFailedError, QuotaExceededError, etc.)
- substituteImages / collectImageReferences helpers
- healthCheck() + validate() QoL methods
- OAuth token support in addition to API key (ANTHROPIC_OAUTH_TOKEN)
- BITBADGES_API_KEY passthrough into every query tool

Ported from the indexer: prompt assembly, agent loop with
retry/compression, validation gate + fix-loop driver, simulation
error patterns. Prompt/system-prompt behavior byte-identical to
indexer today.

Package changes:
- Exports map adds ./builder/agent and ./builder/internals
- @anthropic-ai/sdk as optional peerDependency (>=0.80.0 <1.0.0)
- Three example scripts under examples/builder-agent/
- 17 unit tests covering zero-config, OAuth, env vars, tool
  filtering, image substitution, hooks, typed errors

Backlog: #0298.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(builder/agent): address Greptile review on #172

Four findings:

- P1: double `handleGetTransaction` call — the `??` fallback invoked
  the tool twice when `.transaction` was nullish. Collapse to one
  call and extract the tx from either wrapping shape.
- P1: shared `abortController` field clobbered by concurrent `build()`
  calls. Move to a per-build controller tracked in a Set;
  `agent.abort()` now aborts every in-flight build on the instance.
  Restructured into `build() → runBuild()` so the Set entry is always
  cleaned up via try/finally.
- P2: `SUPPORTED_RANGE` was only used in error messages, never
  compared against the installed SDK. Parse `mod.VERSION` and throw
  a clear PeerDependencyError if outside >=0.80.0 <1.0.0. Silently
  skip when VERSION is absent/unparseable so future SDK renames
  don't brick builds.
- P2: document why raw `prompt` is intentionally not run through
  `containsInjection` — agent is BYO-key, caller controls the key +
  prompt. Server consumers exposing this to untrusted users apply
  `containsInjection` at their own trust boundary (indexer already
  does). Community-skill text from third parties IS still sanitized.

Tests: 17/17 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(builder/agent): onTokenUsage is load-bearing — await and propagate throws

Greptile P1 on indexer PR #116 surfaced that the indexer's
TokenLedger.checkQuota() throws were being silently swallowed when
invoked from inside the onTokenUsage hook. Root cause: the SDK was
treating every hook as fire-and-forget via `fireHook()`, which wraps
the callback in `Promise.resolve().catch(() => {})`.

Distinction now documented + enforced:
- onTokenUsage is LOAD-BEARING. Awaited directly; rejections
  propagate out of runAgentLoop so consumers can enforce quotas.
  Matches the legacy indexer agentLoop contract.
- onToolCall / onStatusUpdate / onCompletion stay fire-and-forget —
  they're observability-only; a misbehaving logger must not hang
  a build.

Tests: 17/17 pass unchanged (no test depended on the old swallowed-
throw behavior).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(builder/agent): Anthropic prompt caching — addresses backlog #0303

Builds on top of the load-bearing onTokenUsage fix. Rolls caching
into the agent rather than shipping as a standalone PR per user
request ("append to existing PRs").

Changes:
- assemblePromptParts now returns `userContent: Array<{text, cache_control?}>`
  in addition to the legacy `userMessage` string. The stable skills
  prefix (selectedSkillsSection + promptSkillsSection) is marked
  `cache_control: ephemeral`; the per-request tail (context,
  metadata, permissions, refinement history, prompt) sits in the
  trailing block with no cache mark.
- Skill ordering canonicalized: `[...new Set(selectedSkills)].sort()`
  in both the skills section and the request header so different
  orderings of the same skill set hit the same cache entry.
- runAgentLoop accepts `userContent`, parses Anthropic's
  `cache_creation_input_tokens` / `cache_read_input_tokens`, and
  threads them through to hooks and the result.
- Added cacheCreationTokens / cacheReadTokens fields to TokenUsage,
  BuildTrace, and AgentLoopResult — consumers can now monitor cache
  hit rate without re-parsing provider responses.
- computeCostUsd now takes cache counters and applies Anthropic's
  multipliers: cache write = 1.25x input, cache read = 0.10x input.
- Fix-loop rounds intentionally skip `userContent` — the fix prompt
  is dynamic error guidance with no cache value.
- result.toString() surfaces cache counts when non-zero.

Tests: 17/17 still pass (cache counters default to 0 in mocked
responses, cost math degrades to the old input+output formula).

Expected production impact (per backlog #0303): ~60-80% cost
reduction on the repeated stable prefix, faster time-to-first-token.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(builder/agent): restore BUILDER_SYSTEM_PROMPT_FOR_EXPORT + preserve version-check errors

Two Greptile P1s on re-review:

1. loadAnthropicSdk swallowed the PeerDependencyError that
   assertSupportedVersion throws for out-of-range SDKs. The generic
   try/catch wrapped both "module not installed" AND "version
   mismatch" into the same "not installed" message, masking the
   real issue. Split into two stages: the import is the only thing
   in the try/catch; the version check runs outside so its
   PeerDependencyError surfaces verbatim.

2. The DRY refactor accidentally deleted BUILDER_SYSTEM_PROMPT_FOR_EXPORT
   and collapsed assemblePromptParts's `forExport` option to a no-op.
   The export prompt is NOT the same as the hosted-session prompt
   — it's for pasting into Claude.ai / ChatGPT where no tools are
   available, so it swaps the tool-calling workflow for an explicit
   Output Format section describing the `MsgUniversalUpdateCollection`
   JSON shape + metadataPlaceholders sidecar layout. Restored:
   - BUILDER_SYSTEM_PROMPT_FOR_EXPORT constant (with updated
     Output Format section matching current shape)
   - `forExport: boolean` option on assemblePromptParts that swaps
     the system prompt
   - assembleExportPrompt helper for callers that want the
     concatenated string (indexer /export-prompt route)
   - Both exported from bitbadges/builder/internals

This is load-bearing for the frontend's self-host flow: the "paste
this into Claude.ai" path needs the export prompt; the SDK + MCP
paths use the tool-calling prompt.

Tests: 17/17 still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(builder/agent): community skills fetcher + skill discovery + exportPrompt stable API

Backlog #0309 items 1, 2, 3, 7.

- createBitBadgesCommunitySkillsFetcher: prebuilt fetcher that hits the
  public /api/v0/builder/community-skills endpoint. Power-user path —
  callers bring skill IDs, get the same community skill injection the
  hosted flow does. API-key gated; silently returns [] when no key is
  configured or the endpoint is unreachable.
- agent.listSkills(): returns SkillInstruction[] (filtered by the
  constructor skills whitelist when set). Sync, no network.
- agent.describeSkill(id): lookup one skill by ID. null when unknown
  or outside the whitelist.
- Debug-mode warning when selectedSkills contains unknown IDs. Drops
  unknown IDs silently (matches legacy behavior) but logs to stderr
  when debug: true so callers can catch typos.
- Construction-time warning when skills reference on-chain collections
  but no bitbadgesApiKey is configured. query_collection calls would
  fail mid-loop; the warning steers users to set the key.
- agent.exportPrompt(prompt, options) promoted from /internals to
  stable. Returns { prompt: string; communitySkillsIncluded: string[] }
  ready for paste-into-Claude.ai flows. Used by the frontend's "Pure
  prompt" path.
- Export getAllSkillInstructions + SkillInstruction from
  bitbadges/builder/agent for discovery-UI builders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(builder/agent): unit coverage for BitBadgesAgent SDK surface

Adds 129 new Jest tests across seven spec files covering the public
agent surface introduced on feat/bitbadges-agent. All mocks — no
real network, no real Anthropic calls.

New specs:
- models.spec.ts — resolveModel fallbacks + computeCostUsd with the
  1.25x cache-write / 0.10x cache-read multipliers verified on
  worked examples; zero-token inputs don't NaN.
- prompt.spec.ts — buildSystemPrompt(create|update|refine) section
  composition, BUILDER_SYSTEM_PROMPT_FOR_EXPORT contains Output
  Format, getSystemPromptHash is deterministic + 12 hex chars,
  findMatchingErrorPatterns, buildFixPrompt attempt header,
  assemblePromptParts cache-boundary layout + canonicalized skill
  ordering, assembleExportPrompt shape.
- sessionStore.spec.ts — parameterized over MemoryStore + FileStore
  (Date.now mock for memory TTL, mtime-based for file TTL),
  large-value round-trip, key sanitization, clear helper.
- toolAdapter.spec.ts — zero-config >40 builtins, remove/add/override
  by name, defaultArgs merged + explicit-args-win, >100KB truncation
  marker, unknown tool returns serialized error, handler throw is
  caught.
- images.spec.ts — nested placeholder walk, non-IMAGE_N strings
  preserved, partial substitution, no-mutation guarantee,
  lexicographic sort in collectImageReferences.
- communitySkills.spec.ts — empty IDs + no key short-circuit (no
  fetch), success path, 500 + network + timeout all return [],
  filters out entries missing name/promptText, honors
  BITBADGES_API_KEY/URL env.
- errors.spec.ts — instanceof dispatch across every subclass,
  ValidationFailedError carries errors/tx/advisory, QuotaExceededError
  carries tokensUsed/tokenCap, AbortedError carries partialTokens.

BitBadgesAgent.spec.ts extended with listSkills / describeSkill
whitelist semantics, exportPrompt round-trip, concurrent build
isolation, and agent.abort() cancelling every in-flight build.

Final: 8 suites, 146 agent tests, all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(builder/agent): resolve E2E blockers + correctness findings

Addresses all critical issues from the parallel code-review + E2E smoke
runs on PRs #172 / #116. Tests: 146/146 pass.

**P0 blockers (users could not run a build before this):**
- anthropicClient: `Anthropic.Anthropic ?? Anthropic` resolved to
  `BaseAnthropic` (an internal parent class with no `.messages`
  resource), producing "Cannot read properties of undefined (reading
  'create')" the first time any agent ran. Verified in Anthropic
  SDK >=0.82: `mod.default.Anthropic` exists but points at BaseAnthropic.
  Fixed by using the module export directly (it IS the Anthropic
  class).
- anthropicClient: OAuth bearer tokens were rejected 401 — Anthropic
  requires `anthropic-beta: oauth-2025-04-20` header for OAuth creds.
  Now auto-applied when `authToken` is provided. API-key path unchanged.
- BitBadgesAgent: creatorAddress didn't land on the final tx if the
  LLM's first tool call was a non-session tool (search_knowledge_base,
  fetch_docs). The SDK's session was created lazily without the
  creator, resulting in empty `value.creator`. Now we pre-init the
  session via `getOrCreateSession(sid, creator)` up front — mirrors
  the legacy indexer handler's explicit init.
- BitBadgesAgent: sessionId used `Math.random()` for the random
  suffix — CodeQL flagged as insecure-randomness in a security
  context. Replaced with `crypto.randomUUID()`.

**P1 correctness:**
- toolAdapter.mergeDefaults: `{ ...defaults, ...incoming }` was a
  classic footgun — an `incoming` key set to `undefined` would knock
  out the default. Now strips undefined from incoming before merge.
- BitBadgesAgent: concurrent `build()` calls racing through
  `this.client ??= await getAnthropicClient()` each fired their own
  init and the last-winning result silently discarded the others'
  errors. Shared `clientInitPromise` now deduplicates. Promise
  cleared on rejection so transient failures don't poison future
  retries.
- BitBadgesAgent: `systemPromptAppend` was concatenated into the
  system prompt with zero screening. Hosted/untrusted deployments
  could inject "ignore previous instructions" via this field.
  `containsInjection` check now runs at construction and throws a
  clear error if the append contains obvious injection patterns.
- BitBadgesAgent: `exportPrompt` was skipping the ctor's
  `systemPromptAppend` — builds saw the append, exports didn't.
  Parity restored.
- BitBadgesAgent.runBuild: unguarded `txResponse?.transaction ??
  txResponse` could leave `transaction = undefined` if
  `get_transaction` returned nothing unexpected. Falls back to
  `{ messages: [] }` so downstream validation + sanity checks
  process a well-formed shell instead of NPE'ing.

**Local-dev ergonomics:**
- createBitBadgesCommunitySkillsFetcher now detects localhost /
  127.0.0.1 / *.localhost URLs and skips the "no API key → return
  []" gate in dev. Mirrors how the indexer itself relaxes auth for
  local development — third-party devs iterating against a local
  indexer don't need a BitBadges API key to exercise the community-
  skills path.

Tests added to cover: OAuth header presence, creator pre-init,
sessionId shape, mergeDefaults undefined filter, client init
dedup, systemPromptAppend injection rejection, exportPrompt
append parity, local-mode fetcher. (Most already in place from
the subagent's unit-test pass — tweaked a few to lock in the new
behavior.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(builder/agent): peer-dep resolution under bun-link + onCompletion always fires + systemPrompt injection check

Round-two review findings from the parallel subagent pass. All 155
tests pass (+9 new, targeting the fixes).

**P0 (ship blocker for BYO-key flow):**
- Peer-dep resolver failed under bun-link. The Function('s','return
  import(s)') trick resolves against the SDK's own file URL. When
  the SDK is bun-linked into a consumer, the SDK sits at
  `bitbadgesjs/packages/bitbadgesjs-sdk/` which has no
  `@anthropic-ai/sdk` — the consumer's does, but the loader never
  looks there. Replaced with a three-strategy loader:
    1. bare dynamic import (normal installs)
    2. createRequire anchored at process.cwd() (bun-link + npm-link,
       consumer running from their project root — the common case)
    3. createRequire anchored at the SDK's own __filename (hoisted
       monorepos)
  Verified E2E: Strategy 2 resolves the dep when running from the
  indexer directory even though the SDK is symlinked.

**P1 (correctness):**
- onCompletion now fires on EVERY exit path, not just success. Prior
  spec documented it as "observability-only, fire-and-forget" but
  implementation skipped it on thrown errors (ValidationFailedError,
  QuotaExceededError, AbortedError). Restructured runBuild with a
  try/finally + accumulator; the hook fires once (idempotent) with
  whatever state was reached before the throw.
- systemPrompt full-replace field now gets the same containsInjection
  check that systemPromptAppend got. Previously only the append was
  guarded — a caller passing an untrusted full-replace could bypass
  every base-prompt protection.

**Tests added (9):**
- Injection rejection on systemPromptAppend AND systemPrompt (3 cases).
- exportPrompt picks up the constructor's systemPromptAppend (parity
  with build() — regression-guard for the prior gap).
- onCompletion fires once on success and once on ValidationFailedError
  (regression-guard for the contract fix).
- Community-skills localhost bypass works + non-localhost still
  requires a key.
- toolAdapter mergeDefaults: undefined doesn't knock out a default,
  null explicitly overrides it (pins the earlier fix).

**E2E verified (production settings, model=haiku, validation=lenient):**
- anthropic.ok: true (OAuth + beta header path clean)
- creator on final tx: bb1q0qsr... (pre-init propagation confirmed)
- cache read/write ratios healthy (559k/21k tokens on a second build)
- healthCheck / listSkills / describeSkill / exportPrompt all clean

**Known issue (not a regression, deferred):**
The LLM repeatedly omits `collectionPermissions` neutral-array fields,
exhausting the fix loop. This is a pre-existing validator/model-output
mismatch in the SDK tool schema layer — needs its own ticket to either
auto-coerce missing neutral arrays in the validator or strengthen the
system prompt's permissions section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(builder): basic builds work without set_permissions + restore mid-build onLog hook

Two regressions surfaced during code review. Both fixed; all 155
tests pass.

**collectionPermissions regression — "basic stuff failing when it
wasn't before":**
The session template at sessionState.ts:92 started as
`collectionPermissions: {}`. If the LLM skipped calling
`set_permissions` (more common post-caching-refactor for model-
behavior reasons), the validator rejected with 11 missing-field
errors and the fix loop burned 3 rounds trying to recover — ~$0.33
and 2 minutes on a trivial prompt.

Fix: default all 11 permission fields to `[]` (neutral) on the
session template. A build that never touches permissions is now
valid by default, and calling `set_permissions` still overwrites
the whole object identically to before. Matches the old indexer's
implicit autoFixTrivialIssues behavior that was removed earlier on
a "throw at producer, not consumer" rationale — but the real
producer problem here was the template shape, not the tool
handler. Fixing at the template is the cleanest place.

E2E verified: a bare build that doesn't touch permissions now
produces `collectionPermissions: { canDeleteCollection: [], ... }`
and passes validation.

**Dev-console log regression — we lost mid-build `info`/`ai_text`/
`validation` entries:**
Old `runAgentLoop` emitted round-start, AI-text, validation-result,
and error entries via `onLog` that fed `sessionLog → Redis +
fileLog`. My SDK port only kept `onToolCall` — dev-replay JSONL
and the frontend's log-polling route saw tool calls but not the
round boundaries or AI text between them.

Fix: added a generic `onLog` hook to the SDK's AgentHooks
contract. Fire-and-forget (same as onToolCall/onStatusUpdate).
Emitted from:
- loop.ts: round-N start (info + token counts) and AI-text
  responses.
- validation.ts: pass/fail with hard-error counts (already
  existed as gate-local `onLog`, now forwarded).
Indexer wires it to sessionLog() just like the pre-refactor code.

**Tests:**
- Three existing tests assumed "empty session = invalid" — updated
  them to force failure via a `simulate` hook that returns
  `valid: false` instead of relying on empty permissions.
- No new test surface needed; onLog is an additive observability
  hook mirrored from the audited hook contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(builder/agent): inject ctx.sessionId into tool args — critical regression

Session `ses_1g6lo9elkwp6` surfaced this: 8 rounds of successful
session-tool calls produced an EMPTY final tx (no approvals, no
standards, no tokens, no metadata) and a blank apply on the
frontend.

Root cause: session-mutating tools in the SDK
(handleSetPermissions, handleAddApproval, handleSetStandards, etc.)
internally call `getOrCreateSession(input.sessionId, input.creatorAddress)`
— they read sessionId from the ARGS object, not the ToolExecutionContext.
The pre-refactor indexer's toolRegistry explicitly merged ctx into
args (`{ ...args, sessionId: ctx.sessionId }`) before calling the
handler. My toolAdapter.ts dropped that merge, so tools were
mutating the SDK's default (no-sessionId) session while the agent's
`handleGetTransaction({ sessionId })` read its explicitly-bound
session — which got zero mutations.

Fix: `createAgentToolRegistry` now injects ctx.sessionId + ctx.callerAddress
into every tool call's args before handler execution. Ordering:
  { ...args, sessionId: ctx.sessionId, creatorAddress: ctx.callerAddress }
then mergeDefaults on top. LLM-supplied args can't override the
agent's session binding (they shouldn't — the LLM doesn't know
the correct sessionId).

Two existing tests updated to assert the new injected-context shape
(the contract is now: ctx values always land on args). 155/155 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(builder/agent): deterministic cache key + export AgentToolRegistry + bump to 0.35.0

Round-3 review findings:

- P1: prompt.ts formatContextHelpers() used Object.entries(publicParams)
  without sorting. Object.entries() iteration order is unstable across
  calls for string keys constructed in different orders. Two logically-
  identical claim configs could produce byte-different prompt prefixes,
  silently busting Anthropic's prompt-cache on the cache_control
  ephemeral block. Added .sort() on the key pairs before joining —
  matches the existing canonicalization on selectedSkills.
- P1: AgentToolRegistry / AgentTool / AnthropicTool types were only
  re-exported from /internals (the unstable subpath). Third-party devs
  using `agent.tools` in TypeScript couldn't import the type. Now
  exported from the public bitbadges/builder/agent entry.
- Version bump 0.34.3 → 0.35.0 for the BitBadgesAgent release. Minor
  bump reflects the new subpath + peerDep + fetcher + agent class.

155/155 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(builder/agent): rename BitBadgesAgent to BitBadgesBuilderAgent

The class is scoped to collection building, not a generic docs or
protocol agent. Renaming makes the surface name match its intent.

- Class: BitBadgesAgent -> BitBadgesBuilderAgent
- Errors: BitBadgesAgentError -> BitBadgesBuilderAgentError
- Options: BitBadgesAgentOptions -> BitBadgesBuilderAgentOptions
- Files: BitBadgesAgent.ts / .spec.ts renamed via git mv
- Log prefix: [bitbadges-agent] -> [bitbadges-builder-agent]
- Export path /builder/agent and examples dir unchanged

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(builder/agent): review polish — opus 4-7, doc accuracy, valid-JSON truncation, quota tests (#173)

Stacked on top of feat/bitbadges-agent. Addresses findings from deep
review of #172 that are unambiguous wins (no design debate needed).

Blockers
- models.ts: opus ID bumped claude-opus-4-6 → claude-opus-4-7 (latest).
  Updated pinned expectations in models.spec.ts + BitBadgesBuilderAgent.spec.ts
  so the tests actually enforce the current model.

Greptile-flagged polish
- images.ts: JSDoc for `substituteImages` previously claimed only fields
  named `image` were rewritten. Implementation matches every string
  anywhere in the tx. Doc now describes the real behavior.
- errors.ts: `AnthropicAuthError` message hardcoded ANTHROPIC_API_KEY.
  Rewritten to cover both API-key and OAuth credential paths.

Correctness nits
- loop.ts COMPRESSIBLE_TOOLS: add simulate_transaction + validate_transaction
  so the existing summarizeToolResult branches actually fire (dead code
  before).
- loop.ts partial-tokens: guard `err.partialTokens = …` in a try/catch.
  Some caught errors (frozen, primitive) would turn into a cryptic
  TypeError instead of propagating the original.
- toolAdapter.ts truncation: stop emitting "slice + suffix" which the
  LLM can't parse. Wrap in {_truncated, originalBytes, preview} — valid
  JSON, stays well under the 100KB cap.

Small API additions
- BitBadgesBuilderAgentOptions: new optional `sessionTtlSeconds` (default
  stays 7200s). Multi-day refinement flows no longer hit a hardcoded TTL.
- agent.validate() signature: second arg is now an options bag with
  `{ creatorAddress?, existingCollectionId?, abortSignal? }`. When
  `existingCollectionId` is set and an `onChainSnapshotFetcher` is
  configured, the snapshot is pulled for diff-based review — matches
  update-mode `build()` behavior.

Test coverage
- healthCheck() success + failure paths
- validate() with/without snapshot fetcher
- maxTokensPerBuild quota → QuotaExceededError
- sessionTtlSeconds threads through to store
- toolAdapter truncation envelope is valid JSON

All 162 agent tests pass locally (serial run).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant