Skip to content

Conversation

@ethanndickson
Copy link
Member

@ethanndickson ethanndickson commented Dec 18, 2025

Overview

Implements Programmatic Tool Calling (PTC) - a code_execution tool that enables AI models to orchestrate multi-tool workflows via JavaScript code in a sandboxed QuickJS environment. Instead of N inference round-trips for N tool calls, the model writes code that executes all tools in a single round-trip.

Gated behind experiment flags (disabled by default):

  • PROGRAMMATIC_TOOL_CALLING - Adds code_execution alongside existing tools
  • PROGRAMMATIC_TOOL_CALLING_EXCLUSIVE - Replaces all tools not available within code_execution with just code_execution
image

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        code_execution tool                       │
├─────────────────────────────────────────────────────────────────┤
│  Static Analysis     →  TypeScript Validation  →  QuickJS Runtime │
│  (syntax, globals)      (type checking)           (sandboxed)     │
└─────────────────────────────────────────────────────────────────┘
                                  │
                    ┌─────────────┴─────────────┐
                    │       Tool Bridge          │
                    │  mux.bash(), mux.file_read │
                    │  mux.file_edit_*(), etc.   │
                    └───────────────────────────┘

Key Components

Component File Purpose
IJSRuntime ptc/runtime.ts Abstract interface for JS runtimes
QuickJSRuntime ptc/quickjsRuntime.ts QuickJS-emscripten implementation with Asyncify
ToolBridge ptc/toolBridge.ts Exposes Mux tools under mux.* namespace
staticAnalysis ptc/staticAnalysis.ts Pre-execution validation (syntax, forbidden patterns)
typeGenerator ptc/typeGenerator.ts Generates .d.ts from Zod schemas
code_execution tools/code_execution.ts Entry point tool definition

Streaming Flow

Nested tool calls stream to the UI in real-time:

streamText() calls code_execution.execute()
    ↓
JS code runs: mux.file_read({...})
    → emit tool-call-start {parentToolCallId: "abc123"}
    → file_read executes
    → emit tool-call-end {parentToolCallId: "abc123", result: ...}
    ↓
JS code runs: mux.bash({...})
    → emit tool-call-start {parentToolCallId: "abc123"}
    → bash executes  
    → emit tool-call-end {parentToolCallId: "abc123", result: ...}
    ↓
code_execution returns final result

The StreamingMessageAggregator handles parentToolCallId to nest calls within the parent tool part.

UI Components

  • CodeExecutionToolCall - Main container with fieldset layout, collapsible code/console sections
  • NestedToolRenderer - Routes nested calls to specialized tool components (BashToolCall, FileReadToolCall, etc.)
  • ConsoleOutput - Displays console.log/warn/error output with appropriate styling

Security & Resource Limits

Resource Limit
Memory 64MB
Timeout 5 minutes
Sandbox QuickJS WASM (no fs/net access except via tools)

Excluded from bridge: code_execution (prevents recursion), ask_user_question, propose_plan, todo_*, status_set (UI-specific), provider-native tools (no execute function)

Test Coverage

136 tests across:

  • QuickJS runtime (49 tests) - marshaling, async functions, abort, limits
  • Static analysis (33 tests) - syntax, forbidden patterns, unavailable globals
  • Type generation (14 tests) - Zod → .d.ts conversion, caching
  • Type validation (13 tests) - TypeScript error detection
  • code_execution tool (20 tests) - end-to-end execution
  • StreamingMessageAggregator (5 nested call tests) - parent/child handling

Known Limitations

  1. Sequential execution - Promise.all() runs sequentially due to Asyncify single-stack limitation
  2. Console output not streamed - Appears only after code completes

Generated with mux • Model: anthropic:claude-opus-4-5 • Thinking: high

@ethanndickson ethanndickson force-pushed the ethan/programmatic-tool-calling branch from 59751b5 to 050e8a2 Compare December 18, 2025 06:13
@ethanndickson
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ethanndickson
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ethanndickson
Copy link
Member Author

@codex review

@ethanndickson ethanndickson force-pushed the ethan/programmatic-tool-calling branch from 833f747 to f7fdb69 Compare December 18, 2025 07:33
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Programmatic Tool Calling (PTC) infrastructure for executing JavaScript
code in a sandboxed QuickJS environment with access to Mux tools.

Phases completed:
- Phase 1: Experiment flag (PROGRAMMATIC_TOOL_CALLING)
- Phase 2: Runtime abstraction (IJSRuntime interface)
- Phase 3: QuickJS implementation with Asyncify
- Phase 4: Tool bridge exposing tools under mux.* namespace
- Phase 4.5: Static analysis (syntax, forbidden patterns, AST-based globals)
- Phase 5: code_execution tool definition
- Phase 5.5: TypeScript type generation & validation from Zod schemas
- Phase 5.6: Cleanup fixes (JSDoc, cache unification, test improvements)

Key features:
- Sandboxed execution with 64MB memory / 5min timeout limits
- TypeScript type checking before execution (blocking errors)
- Partial results on failure for debugging
- Event streaming for nested tool calls
- Abort signal support for cancellation

131 tests passing, make static-check passes.

---
_Generated with `mux` • Model: `anthropic:claude-opus-4-5` • Thinking: `high`_
Adds code_execution tool that enables AI models to execute multiple tool
calls in a single inference round-trip via JavaScript code in a sandboxed
QuickJS environment.

Key features:
- QuickJS-emscripten runtime with Asyncify for async host functions
- Tool bridge exposing all Mux tools under mux.* namespace
- Static analysis: syntax validation, forbidden patterns, unavailable globals
- TypeScript type generation from Zod schemas for model guidance
- Real-time streaming of nested tool calls to UI
- Partial results on failure for debuggability
- Resource limits: 64MB memory, 5-minute timeout

UI components:
- CodeExecutionToolCall with collapsible code/console sections
- NestedToolRenderer routing to specialized tool components
- ConsoleOutput for log/warn/error display
- Storybook stories for all states

Gated behind PROGRAMMATIC_TOOL_CALLING experiment flag with optional
PROGRAMMATIC_TOOL_CALLING_EXCLUSIVE mode for PTC-only tool availability.

136 tests passing across runtime, static analysis, type generation,
streaming aggregator, and tool execution.
- Move typescript from devDependencies to dependencies (needed at runtime for PTC static analysis)
- Lazy-load PTC modules in aiService.ts using dynamic import()
- Only load code_execution, quickjsRuntime, and toolBridge when PTC experiments are enabled

This fixes two CI failures:
1. Integration tests failing due to prettier's dynamic imports requiring --experimental-vm-modules
2. Smoke tests failing because typescript wasn't in production bundle

The import chain (aiService → code_execution → staticAnalysis/typeGenerator → typescript/prettier)
now only executes when PTC is actually used.
- Move muxTypes after agent code in typeValidator so error positions match
- Return TypeValidationError objects with message, line, column instead of strings
- Update staticAnalysis to propagate line/column from type errors
- Add tests for line number reporting in code_execution.test.ts and typeValidator.test.ts
- Extract lineNumber directly from QuickJS error object instead of regex parsing
- Add bounds checking to only report lines within agent code
- Display column numbers for type errors (line N, col M)
- Simplify typeValidator line number logic with clearer comments
- Add tests for line numbers on first/last lines and column display
Previously the type cache only hashed tool names, so schema changes
(e.g., updated MCP server, different workspace with same tool names)
wouldn't invalidate cached types. Now hashes names + schemas + descriptions.
The sandbox's 5-minute timeout only interrupted QuickJS execution via the
interrupt handler, but that handler doesn't fire during async suspension
(when waiting for host functions like mux.bash()).

Changes:
- Add setTimeout timer that fires independently of QuickJS execution
- Tools now use runtime.getAbortSignal() instead of AI SDK's signal
- This signal is aborted both by timeout AND external abort requests
- Add abortRequested flag to handle abort() before eval() starts
- Fix edge case where addEventListener on already-aborted signal is a no-op

When timeout fires during a tool call:
1. setTimeout aborts the runtime's AbortController
2. Tool sees aborted signal and can cancel (if it respects abort)
3. When tool returns, interrupt handler also stops QuickJS

Note: JavaScript cannot preemptively cancel Promises - this is cooperative
cancellation. Tools that check their abort signal will cancel; pure async
operations complete but subsequent calls fail fast.
The sandbox bridge was built from allTools before applyToolPolicy ran,
so the mux.* API inside code_execution ignored allow/deny filters.

Now the policy is applied first, and policyFilteredTools is passed to
ToolBridge, ensuring the sandbox only exposes policy-allowed tools.
@ethanndickson ethanndickson force-pushed the ethan/programmatic-tool-calling branch from c43b02c to 2ff0e02 Compare December 19, 2025 00:16
@ethanndickson
Copy link
Member Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ethanndickson ethanndickson force-pushed the ethan/programmatic-tool-calling branch from 11b7d8e to 758ce5b Compare December 19, 2025 00:36
Changed border-white/20 to border-border so the fieldset border
is visible in light mode. Also updated inner containers to use
bg-code-bg and border-border/50 for theme consistency.
@ethanndickson ethanndickson force-pushed the ethan/programmatic-tool-calling branch from 758ce5b to 125b755 Compare December 19, 2025 00:45
@ethanndickson ethanndickson added this pull request to the merge queue Dec 19, 2025
Merged via the queue into main with commit 32712c6 Dec 19, 2025
20 checks passed
@ethanndickson ethanndickson deleted the ethan/programmatic-tool-calling branch December 19, 2025 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant