[131-1] feat: add POST /session/:id/tool/call direct tool execution endpoint by andreiships-bot · Pull Request #1 · andreiships/opencode

andreiships-bot · 2026-02-18T19:25:33Z

Summary

Adds a direct tool execution endpoint to the OpenCode fork. The Rust DO SandboxMcpExecutor will call this endpoint for state-mutating tools (edit, write, patch) instead of MCP JSON-RPC transport.

Changes

Problem: No way to call tools directly without going through the LLM loop
Solution: POST /session/:sessionID/tool/call validates session, looks up tool via ToolRegistry, executes with session context, returns MCP content format

Files

packages/opencode/src/server/routes/tool-call.ts — New route handler
packages/opencode/src/server/server.ts — Register ToolCallRoutes at /session
packages/opencode/test/server/tool-call.test.ts — 4 unit tests

Test Plan

TDD-RED commit proves endpoint was missing before implementation
SessionNotFound → 404
MalformedBody → 400
UnknownTool → 200 with isError:true
Success → 200 with content array

…ndpoint Adds a direct tool execution endpoint that bypasses the LLM loop. The Rust DO's SandboxMcpExecutor will call this endpoint for edit/write/patch tools instead of going through MCP JSON-RPC. - tool-call.ts: POST /session/:sessionID/tool/call handler - Validates session exists (404 if not found) - Parses request body with Zod (400 if malformed) - Looks up tool via ToolRegistry (returns isError for unknown tools) - Executes tool with session context, returns MCP content format - server.ts: register ToolCallRoutes at /session - tool-call.test.ts: remove test.fails() markers (4 tests now pass)

andreiships-bot · 2026-02-21T14:26:56Z

Claude Single-Pass Review

Summary

This PR adds a solid Vitest integration test infrastructure using @cloudflare/vitest-pool-workers for workerd runtime testing. The 46 tests provide good coverage across streams, nodes, edges, usage, and attention APIs. The mock auth system and test setup are well-structured, though there are a few correctness and security concerns worth addressing.

Findings

[FINDING-1] issue: P1 | api/src/middleware/auth.ts:117 | TEST_MODE guard uses a truthy check on a boolean env binding but the Env type declares it as boolean | undefined. When TEST_MODE is the string "true" (wrangler.toml bindings are often strings), the check works, but when it is the boolean false set intentionally to disable test mode it also evaluates correctly. However the real risk is that TEST_MODE is never gated to non-production environments — there is no NODE_ENV or wrangler environment check, so if TEST_MODE binding is accidentally present in a production wrangler.toml, mock tokens would be accepted in production. Verified via Read: if (c.env.TEST_MODE) { at line 117. Fix: Add an explicit guard such as if (c.env.TEST_MODE === true && process.env.NODE_ENV !== 'production') or document that this binding must never appear in production wrangler environments.

[FINDING-2] issue: P1 | api/src/middleware/auth.ts:31 | The parseJwt function uses any for the header type: function parseJwt(token: string): { header: any; payload: FirebaseTokenPayload }. This violates the project's strict no-any rule. Verified via Read: function parseJwt(token: string): { header: any; payload: FirebaseTokenPayload } at line 31. Fix: Define a JwtHeader interface with alg: string; kid: string; typ: string and use it instead of any.

[FINDING-3] issue: P1 | api/test/setup.ts:121 | Inline SQL migration is duplicated from (presumably) the real migration files. If the production schema drifts from what is defined in MIGRATIONS here, tests will pass against a schema that no longer matches production, creating a false sense of correctness. Verified via Read: const MIGRATIONS = \...CREATE TABLE IF NOT EXISTS streams...starting at line 12. Fix: Either import and run the actual migration SQL files from a canonical path (e.g.migrations/*.sql`), or add a CI check that validates the test schema matches the production schema.

[FINDING-4] nit: P2 | api/test/setup.ts:140 | clearTables does not clear usage_limits. Default limits inserted in migrations (lines 107-111) remain between test files' beforeAll calls but are cleared between individual tests only via clearTables. Since clearTables runs beforeEach and usage_limits is never cleared, limit rows accumulate from duplicate INSERT OR IGNORE calls if applyMigrations is called from multiple test files in the same run. This is benign with INSERT OR IGNORE but the inconsistency is worth noting.

[FINDING-5] nit: P2 | api/test/nodes.test.ts:296 | Uses any in test code: const updatedNode2 = stream.nodes.find((n: any) => n.id === node2.id). Verified via Read: const updatedNode2 = stream.nodes.find((n: any) => n.id === node2.id); at line 296. Fix: Define an inline type or shared Node interface for the test response shape.

[FINDING-6] nit: P2 | api/test/edges.test.ts:134 | The cross-user edge deletion test expects 403, but the equivalent cross-user node deletion and stream access tests in nodes.test.ts and streams.test.ts expect 404. This inconsistency in status codes for access control across the same API surface may indicate a bug in either the edge handler or the node/stream handlers. Fix: Verify and align the status code convention — either all unauthorized access returns 404 (to avoid leaking existence) or 403 consistently.

[FINDING-7] nit: P2 | api/vitest.config.ts:12 | The TEST_MODE binding is set as a boolean true in vitest config but Env.TEST_MODE is typed as boolean | undefined. Wrangler miniflare bindings declared under bindings: are typically treated as arbitrary values, so this is fine at runtime, but the inconsistency with DEV_MODE (which is typed as string) is worth noting for clarity. Fix: Document the intentional boolean type in a comment, or align with DEV_MODE as a string "true" for consistency.

Code Quality

Test infrastructure is well-structured with reusable helpers
Tests are hermetic via clearTables in beforeEach
Good coverage of auth, CRUD, ownership, and edge cases
vitest.config.ts correctly uses wrangler.toml config path
issue: TEST_MODE has no production environment guard (FINDING-1)
issue: Inline schema duplication diverges from production migrations (FINDING-3)
issue: any type in auth.ts parseJwt return type (FINDING-2)
nit: Inconsistent HTTP status codes for unauthorized access (FINDING-6)

Recommendation

[x] Approve with changes

The test infrastructure is valuable and well-written. FINDING-1 (TEST_MODE production guard) and FINDING-3 (schema duplication) are the two issues most worth addressing before merge. FINDING-2 (any type) is a strict convention violation. The nits are low risk.

andreiships-bot · 2026-02-21T14:26:58Z

Claude Review - Out-of-Diff Findings

The following findings are on lines outside the PR diff:

api/src/middleware/auth.ts:117
[FINDING-1] issue: P1 | TEST_MODE guard uses a truthy check on a boolean env binding but the Env type declares it as boolean | undefined. When TEST_MODE is the string "true" (wrangler.toml bindings are often strings), the check works, but when it is the boolean false set intentionally to disable test mode it also evaluates correctly. However the real risk is that TEST_MODE is never gated to non-production environments — there is no NODE_ENV or wrangler environment check, so if TEST_MODE binding is accidentally present in a production wrangler.toml, mock tokens would be accepted in production. Verified via Read: if (c.env.TEST_MODE) { at line 117. Fix: Add an explicit guard such as if (c.env.TEST_MODE === true && process.env.NODE_ENV !== 'production') or document that this binding must never appear in production wrangler environments.

api/src/middleware/auth.ts:31
[FINDING-2] issue: P1 | The parseJwt function uses any for the header type: function parseJwt(token: string): { header: any; payload: FirebaseTokenPayload }. This violates the project's strict no-any rule. Verified via Read: function parseJwt(token: string): { header: any; payload: FirebaseTokenPayload } at line 31. Fix: Define a JwtHeader interface with alg: string; kid: string; typ: string and use it instead of any.

api/test/setup.ts:121
[FINDING-3] issue: P1 | Inline SQL migration is duplicated from (presumably) the real migration files. If the production schema drifts from what is defined in MIGRATIONS here, tests will pass against a schema that no longer matches production, creating a false sense of correctness. Verified via Read: const MIGRATIONS = \...CREATE TABLE IF NOT EXISTS streams...starting at line 12. Fix: Either import and run the actual migration SQL files from a canonical path (e.g.migrations/*.sql`), or add a CI check that validates the test schema matches the production schema.

api/test/setup.ts:140
[FINDING-4] nit: P2 | clearTables does not clear usage_limits. Default limits inserted in migrations (lines 107-111) remain between test files' beforeAll calls but are cleared between individual tests only via clearTables. Since clearTables runs beforeEach and usage_limits is never cleared, limit rows accumulate from duplicate INSERT OR IGNORE calls if applyMigrations is called from multiple test files in the same run. This is benign with INSERT OR IGNORE but the inconsistency is worth noting.

api/test/nodes.test.ts:296
[FINDING-5] nit: P2 | Uses any in test code: const updatedNode2 = stream.nodes.find((n: any) => n.id === node2.id). Verified via Read: const updatedNode2 = stream.nodes.find((n: any) => n.id === node2.id); at line 296. Fix: Define an inline type or shared Node interface for the test response shape.

api/test/edges.test.ts:134
[FINDING-6] nit: P2 | The cross-user edge deletion test expects 403, but the equivalent cross-user node deletion and stream access tests in nodes.test.ts and streams.test.ts expect 404. This inconsistency in status codes for access control across the same API surface may indicate a bug in either the edge handler or the node/stream handlers. Fix: Verify and align the status code convention — either all unauthorized access returns 404 (to avoid leaking existence) or 403 consistently.

api/vitest.config.ts:12
[FINDING-7] nit: P2 | The TEST_MODE binding is set as a boolean true in vitest config but Env.TEST_MODE is typed as boolean | undefined. Wrangler miniflare bindings declared under bindings: are typically treated as arbitrary values, so this is fine at runtime, but the inconsistency with DEV_MODE (which is typed as string) is worth noting for clarity. Fix: Document the intentional boolean type in a comment, or align with DEV_MODE as a string "true" for consistency.

Fixes issues found during retroactive review of PRs #1, #2, #3. tool-call.ts: - Agent.defaultAgent() returns a name string, not Agent.Info — resolve with Agent.get() before passing to ToolRegistry.tools() - Fix agent: agent.id (undefined on string) → agent: agentName - Use agent's configured model instead of hardcoded opencode/default Dockerfile: - Quote $(find ...) and add existence check to prevent cryptic failures when binary is missing - Remove BUN_RUNTIME_TRANSPILER_CACHE_PATH env var — irrelevant for a compiled native binary that does not use Bun's transpiler fly.toml: - min_machines_running: 0 → 1 to avoid cold starts on interactive sessions scripts/ci/test-opencode-integration.sh: - Replace hardcoded /tmp/ paths with mktemp tmpdir + EXIT trap to avoid collisions in parallel CI runs .github/workflows/build-push.yml: - Gate :latest push on stable tags only (no pre-release suffix like -rc1) - Add provenance: false to prevent OCI attestation manifests from breaking fly deploy digest resolution - Explicitly set platforms: linux/amd64 to avoid silent arch mismatches

…, CI) (#4) * fix(131): address P1 review findings from retroactive review Fixes issues found during retroactive review of PRs #1, #2, #3. tool-call.ts: - Agent.defaultAgent() returns a name string, not Agent.Info — resolve with Agent.get() before passing to ToolRegistry.tools() - Fix agent: agent.id (undefined on string) → agent: agentName - Use agent's configured model instead of hardcoded opencode/default Dockerfile: - Quote $(find ...) and add existence check to prevent cryptic failures when binary is missing - Remove BUN_RUNTIME_TRANSPILER_CACHE_PATH env var — irrelevant for a compiled native binary that does not use Bun's transpiler fly.toml: - min_machines_running: 0 → 1 to avoid cold starts on interactive sessions scripts/ci/test-opencode-integration.sh: - Replace hardcoded /tmp/ paths with mktemp tmpdir + EXIT trap to avoid collisions in parallel CI runs .github/workflows/build-push.yml: - Gate :latest push on stable tags only (no pre-release suffix like -rc1) - Add provenance: false to prevent OCI attestation manifests from breaking fly deploy digest resolution - Explicitly set platforms: linux/amd64 to avoid silent arch mismatches * ci: replace blacksmith runners with ubicloud-standard-2 All blacksmith-* runner labels replaced: - blacksmith-4vcpu-ubuntu-2404 → ubicloud-standard-2 - blacksmith-8vcpu-ubuntu-2404-arm → ubicloud-standard-8-arm - blacksmith-4vcpu-ubuntu-2404-arm → ubicloud-standard-2-arm - blacksmith-4vcpu-windows-2025 → windows-latest (no Ubicloud Windows runner) --------- Co-authored-by: Andrei Cojocaru <andrei@pistachiorama.ai> Co-authored-by: andreiships-bot <andreiships-bot@users.noreply.github.com>

andreiships added 2 commits February 18, 2026 14:22

andreiships-bot mentioned this pull request Feb 18, 2026

[131-2] feat: Dockerfile (Bun), fly.toml, and integration smoke test #2

Merged

3 tasks

andreiships-bot marked this pull request as ready for review February 18, 2026 23:14

andreiships-bot merged commit 19bcf3d into dev Feb 18, 2026

andreiships-bot mentioned this pull request Feb 21, 2026

fix(131): address P1 review findings (tool-call, Dockerfile, fly.toml, CI) #4

Merged

4 tasks

This was referenced Mar 25, 2026

chore: upgrade fork to upstream v1.3.2 #27

Closed

chore: apply remaining CI customizations + fix test API #28

Merged

chore: add repo guards to upstream-only workflows #29

Merged

feat: add minify + bytecode for faster startup (Spec 146 Phase 5g) #34

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[131-1] feat: add POST /session/:id/tool/call direct tool execution endpoint#1

[131-1] feat: add POST /session/:id/tool/call direct tool execution endpoint#1
andreiships-bot merged 2 commits intodevfrom
131-phase-1

andreiships-bot commented Feb 18, 2026

Uh oh!

andreiships-bot commented Feb 21, 2026

Uh oh!

andreiships-bot commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andreiships-bot commented Feb 18, 2026

Summary

Changes

Files

Test Plan

Related

Uh oh!

andreiships-bot commented Feb 21, 2026

Claude Single-Pass Review

Summary

Findings

Code Quality

Recommendation

Uh oh!

andreiships-bot commented Feb 21, 2026

Claude Review - Out-of-Diff Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants