Skip to content

[131-1] feat: add POST /session/:id/tool/call direct tool execution endpoint#1

Merged
andreiships-bot merged 2 commits intodevfrom
131-phase-1
Feb 18, 2026
Merged

[131-1] feat: add POST /session/:id/tool/call direct tool execution endpoint#1
andreiships-bot merged 2 commits intodevfrom
131-phase-1

Conversation

@andreiships-bot
Copy link
Copy Markdown
Collaborator

Summary

Adds a direct tool execution endpoint to the OpenCode fork. The Rust DO SandboxMcpExecutor will call this endpoint for state-mutating tools (edit, write, patch) instead of MCP JSON-RPC transport.

Changes

  • Problem: No way to call tools directly without going through the LLM loop
  • Solution: POST /session/:sessionID/tool/call validates session, looks up tool via ToolRegistry, executes with session context, returns MCP content format

Files

  • packages/opencode/src/server/routes/tool-call.ts — New route handler
  • packages/opencode/src/server/server.ts — Register ToolCallRoutes at /session
  • packages/opencode/test/server/tool-call.test.ts — 4 unit tests

Test Plan

  • TDD-RED commit proves endpoint was missing before implementation
  • SessionNotFound → 404
  • MalformedBody → 400
  • UnknownTool → 200 with isError:true
  • Success → 200 with content array

Related

  • Spec 131 / ADR-060 from pistachiorama

Adds 4 failing test cases proving the endpoint doesn't exist yet:
- 404 for non-existent session
- 400 for malformed request body
- 200 with isError:true for unknown tool name
- 200 with content array for successful tool execution (glob)

All tests use test.fails() to prove the endpoint is missing.
…ndpoint

Adds a direct tool execution endpoint that bypasses the LLM loop.
The Rust DO's SandboxMcpExecutor will call this endpoint for
edit/write/patch tools instead of going through MCP JSON-RPC.

- tool-call.ts: POST /session/:sessionID/tool/call handler
  - Validates session exists (404 if not found)
  - Parses request body with Zod (400 if malformed)
  - Looks up tool via ToolRegistry (returns isError for unknown tools)
  - Executes tool with session context, returns MCP content format
- server.ts: register ToolCallRoutes at /session
- tool-call.test.ts: remove test.fails() markers (4 tests now pass)
@andreiships-bot andreiships-bot marked this pull request as ready for review February 18, 2026 23:14
@andreiships-bot andreiships-bot merged commit 19bcf3d into dev Feb 18, 2026
@andreiships-bot
Copy link
Copy Markdown
Collaborator Author

Claude Single-Pass Review

Summary

This PR adds a solid Vitest integration test infrastructure using @cloudflare/vitest-pool-workers for workerd runtime testing. The 46 tests provide good coverage across streams, nodes, edges, usage, and attention APIs. The mock auth system and test setup are well-structured, though there are a few correctness and security concerns worth addressing.

Findings

[FINDING-1] issue: P1 | api/src/middleware/auth.ts:117 | TEST_MODE guard uses a truthy check on a boolean env binding but the Env type declares it as boolean | undefined. When TEST_MODE is the string "true" (wrangler.toml bindings are often strings), the check works, but when it is the boolean false set intentionally to disable test mode it also evaluates correctly. However the real risk is that TEST_MODE is never gated to non-production environments — there is no NODE_ENV or wrangler environment check, so if TEST_MODE binding is accidentally present in a production wrangler.toml, mock tokens would be accepted in production. Verified via Read: if (c.env.TEST_MODE) { at line 117. Fix: Add an explicit guard such as if (c.env.TEST_MODE === true && process.env.NODE_ENV !== 'production') or document that this binding must never appear in production wrangler environments.

[FINDING-2] issue: P1 | api/src/middleware/auth.ts:31 | The parseJwt function uses any for the header type: function parseJwt(token: string): { header: any; payload: FirebaseTokenPayload }. This violates the project's strict no-any rule. Verified via Read: function parseJwt(token: string): { header: any; payload: FirebaseTokenPayload } at line 31. Fix: Define a JwtHeader interface with alg: string; kid: string; typ: string and use it instead of any.

[FINDING-3] issue: P1 | api/test/setup.ts:121 | Inline SQL migration is duplicated from (presumably) the real migration files. If the production schema drifts from what is defined in MIGRATIONS here, tests will pass against a schema that no longer matches production, creating a false sense of correctness. Verified via Read: const MIGRATIONS = \...CREATE TABLE IF NOT EXISTS streams...starting at line 12. Fix: Either import and run the actual migration SQL files from a canonical path (e.g.migrations/*.sql`), or add a CI check that validates the test schema matches the production schema.

[FINDING-4] nit: P2 | api/test/setup.ts:140 | clearTables does not clear usage_limits. Default limits inserted in migrations (lines 107-111) remain between test files' beforeAll calls but are cleared between individual tests only via clearTables. Since clearTables runs beforeEach and usage_limits is never cleared, limit rows accumulate from duplicate INSERT OR IGNORE calls if applyMigrations is called from multiple test files in the same run. This is benign with INSERT OR IGNORE but the inconsistency is worth noting.

[FINDING-5] nit: P2 | api/test/nodes.test.ts:296 | Uses any in test code: const updatedNode2 = stream.nodes.find((n: any) => n.id === node2.id). Verified via Read: const updatedNode2 = stream.nodes.find((n: any) => n.id === node2.id); at line 296. Fix: Define an inline type or shared Node interface for the test response shape.

[FINDING-6] nit: P2 | api/test/edges.test.ts:134 | The cross-user edge deletion test expects 403, but the equivalent cross-user node deletion and stream access tests in nodes.test.ts and streams.test.ts expect 404. This inconsistency in status codes for access control across the same API surface may indicate a bug in either the edge handler or the node/stream handlers. Fix: Verify and align the status code convention — either all unauthorized access returns 404 (to avoid leaking existence) or 403 consistently.

[FINDING-7] nit: P2 | api/vitest.config.ts:12 | The TEST_MODE binding is set as a boolean true in vitest config but Env.TEST_MODE is typed as boolean | undefined. Wrangler miniflare bindings declared under bindings: are typically treated as arbitrary values, so this is fine at runtime, but the inconsistency with DEV_MODE (which is typed as string) is worth noting for clarity. Fix: Document the intentional boolean type in a comment, or align with DEV_MODE as a string "true" for consistency.

Code Quality

  • Test infrastructure is well-structured with reusable helpers
  • Tests are hermetic via clearTables in beforeEach
  • Good coverage of auth, CRUD, ownership, and edge cases
  • vitest.config.ts correctly uses wrangler.toml config path
  • issue: TEST_MODE has no production environment guard (FINDING-1)
  • issue: Inline schema duplication diverges from production migrations (FINDING-3)
  • issue: any type in auth.ts parseJwt return type (FINDING-2)
  • nit: Inconsistent HTTP status codes for unauthorized access (FINDING-6)

Recommendation

[x] Approve with changes

The test infrastructure is valuable and well-written. FINDING-1 (TEST_MODE production guard) and FINDING-3 (schema duplication) are the two issues most worth addressing before merge. FINDING-2 (any type) is a strict convention violation. The nits are low risk.

@andreiships-bot
Copy link
Copy Markdown
Collaborator Author

Claude Review - Out-of-Diff Findings

The following findings are on lines outside the PR diff:

api/src/middleware/auth.ts:117
[FINDING-1] issue: P1 | TEST_MODE guard uses a truthy check on a boolean env binding but the Env type declares it as boolean | undefined. When TEST_MODE is the string "true" (wrangler.toml bindings are often strings), the check works, but when it is the boolean false set intentionally to disable test mode it also evaluates correctly. However the real risk is that TEST_MODE is never gated to non-production environments — there is no NODE_ENV or wrangler environment check, so if TEST_MODE binding is accidentally present in a production wrangler.toml, mock tokens would be accepted in production. Verified via Read: if (c.env.TEST_MODE) { at line 117. Fix: Add an explicit guard such as if (c.env.TEST_MODE === true && process.env.NODE_ENV !== 'production') or document that this binding must never appear in production wrangler environments.

api/src/middleware/auth.ts:31
[FINDING-2] issue: P1 | The parseJwt function uses any for the header type: function parseJwt(token: string): { header: any; payload: FirebaseTokenPayload }. This violates the project's strict no-any rule. Verified via Read: function parseJwt(token: string): { header: any; payload: FirebaseTokenPayload } at line 31. Fix: Define a JwtHeader interface with alg: string; kid: string; typ: string and use it instead of any.

api/test/setup.ts:121
[FINDING-3] issue: P1 | Inline SQL migration is duplicated from (presumably) the real migration files. If the production schema drifts from what is defined in MIGRATIONS here, tests will pass against a schema that no longer matches production, creating a false sense of correctness. Verified via Read: const MIGRATIONS = \...CREATE TABLE IF NOT EXISTS streams...starting at line 12. Fix: Either import and run the actual migration SQL files from a canonical path (e.g.migrations/*.sql`), or add a CI check that validates the test schema matches the production schema.

api/test/setup.ts:140
[FINDING-4] nit: P2 | clearTables does not clear usage_limits. Default limits inserted in migrations (lines 107-111) remain between test files' beforeAll calls but are cleared between individual tests only via clearTables. Since clearTables runs beforeEach and usage_limits is never cleared, limit rows accumulate from duplicate INSERT OR IGNORE calls if applyMigrations is called from multiple test files in the same run. This is benign with INSERT OR IGNORE but the inconsistency is worth noting.

api/test/nodes.test.ts:296
[FINDING-5] nit: P2 | Uses any in test code: const updatedNode2 = stream.nodes.find((n: any) => n.id === node2.id). Verified via Read: const updatedNode2 = stream.nodes.find((n: any) => n.id === node2.id); at line 296. Fix: Define an inline type or shared Node interface for the test response shape.

api/test/edges.test.ts:134
[FINDING-6] nit: P2 | The cross-user edge deletion test expects 403, but the equivalent cross-user node deletion and stream access tests in nodes.test.ts and streams.test.ts expect 404. This inconsistency in status codes for access control across the same API surface may indicate a bug in either the edge handler or the node/stream handlers. Fix: Verify and align the status code convention — either all unauthorized access returns 404 (to avoid leaking existence) or 403 consistently.

api/vitest.config.ts:12
[FINDING-7] nit: P2 | The TEST_MODE binding is set as a boolean true in vitest config but Env.TEST_MODE is typed as boolean | undefined. Wrangler miniflare bindings declared under bindings: are typically treated as arbitrary values, so this is fine at runtime, but the inconsistency with DEV_MODE (which is typed as string) is worth noting for clarity. Fix: Document the intentional boolean type in a comment, or align with DEV_MODE as a string "true" for consistency.

andreiships-bot pushed a commit that referenced this pull request Feb 21, 2026
Fixes issues found during retroactive review of PRs #1, #2, #3.

tool-call.ts:
- Agent.defaultAgent() returns a name string, not Agent.Info — resolve
  with Agent.get() before passing to ToolRegistry.tools()
- Fix agent: agent.id (undefined on string) → agent: agentName
- Use agent's configured model instead of hardcoded opencode/default

Dockerfile:
- Quote $(find ...) and add existence check to prevent cryptic failures
  when binary is missing
- Remove BUN_RUNTIME_TRANSPILER_CACHE_PATH env var — irrelevant for a
  compiled native binary that does not use Bun's transpiler

fly.toml:
- min_machines_running: 0 → 1 to avoid cold starts on interactive sessions

scripts/ci/test-opencode-integration.sh:
- Replace hardcoded /tmp/ paths with mktemp tmpdir + EXIT trap to
  avoid collisions in parallel CI runs

.github/workflows/build-push.yml:
- Gate :latest push on stable tags only (no pre-release suffix like -rc1)
- Add provenance: false to prevent OCI attestation manifests from
  breaking fly deploy digest resolution
- Explicitly set platforms: linux/amd64 to avoid silent arch mismatches
andreiships-bot added a commit that referenced this pull request Feb 23, 2026
…, CI) (#4)

* fix(131): address P1 review findings from retroactive review

Fixes issues found during retroactive review of PRs #1, #2, #3.

tool-call.ts:
- Agent.defaultAgent() returns a name string, not Agent.Info — resolve
  with Agent.get() before passing to ToolRegistry.tools()
- Fix agent: agent.id (undefined on string) → agent: agentName
- Use agent's configured model instead of hardcoded opencode/default

Dockerfile:
- Quote $(find ...) and add existence check to prevent cryptic failures
  when binary is missing
- Remove BUN_RUNTIME_TRANSPILER_CACHE_PATH env var — irrelevant for a
  compiled native binary that does not use Bun's transpiler

fly.toml:
- min_machines_running: 0 → 1 to avoid cold starts on interactive sessions

scripts/ci/test-opencode-integration.sh:
- Replace hardcoded /tmp/ paths with mktemp tmpdir + EXIT trap to
  avoid collisions in parallel CI runs

.github/workflows/build-push.yml:
- Gate :latest push on stable tags only (no pre-release suffix like -rc1)
- Add provenance: false to prevent OCI attestation manifests from
  breaking fly deploy digest resolution
- Explicitly set platforms: linux/amd64 to avoid silent arch mismatches

* ci: replace blacksmith runners with ubicloud-standard-2

All blacksmith-* runner labels replaced:
- blacksmith-4vcpu-ubuntu-2404     → ubicloud-standard-2
- blacksmith-8vcpu-ubuntu-2404-arm → ubicloud-standard-8-arm
- blacksmith-4vcpu-ubuntu-2404-arm → ubicloud-standard-2-arm
- blacksmith-4vcpu-windows-2025    → windows-latest (no Ubicloud Windows runner)

---------

Co-authored-by: Andrei Cojocaru <andrei@pistachiorama.ai>
Co-authored-by: andreiships-bot <andreiships-bot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants