Skip to content

[FEATURE]: Add MCP sampling support (createMessage) #11948

@cbcoutinho

Description

@cbcoutinho

Feature hasn't been suggested before.

  • I have verified this feature I'm about to request hasn't been suggested before.

Describe the enhancement you want to request

Add support for MCP sampling (Protocol Revision: 2025-11-25), which allows MCP servers to request LLM completions from the client. This enables agentic workflows where MCP servers can leverage the host application's LLM access for complex, multi-step tasks.

Why is this needed?

MCP sampling enables powerful use cases:

  • Agentic MCP servers: Servers can break down complex tasks and request LLM assistance for subtasks
  • Tool-augmented reasoning: MCP servers can use the LLM to interpret results, make decisions, or generate content
  • Recursive tool use: Servers can request completions that may themselves invoke tools (see Tool Calling below)
  • Context-aware processing: Servers can leverage the LLM's understanding without maintaining their own model access
  • Structured outputs: Servers can request JSON-schema-conformant outputs via tool choice

Current state

The MCP client in packages/opencode/src/mcp/index.ts creates clients without sampling capabilities:

const client = new Client({
  name: "opencode",
  version: Installation.VERSION,
})

MCP Sampling Specification (2025-11-25)

Capabilities

Basic sampling:

{
  "capabilities": {
    "sampling": {}
  }
}

With tool use support (SEP-1577, now Final):

{
  "capabilities": {
    "sampling": {
      "tools": {}
    }
  }
}

Request Schema

interface CreateMessageRequest {
  method: "sampling/createMessage";
  params: {
    messages: SamplingMessage[];
    modelPreferences?: {
      hints?: Array<{ name: string }>;
      costPriority?: number;      // 0-1, higher = prefer cheaper
      speedPriority?: number;     // 0-1, higher = prefer faster
      intelligencePriority?: number; // 0-1, higher = prefer more capable
    };
    systemPrompt?: string;
    maxTokens: number;
    
    // Tool calling (requires sampling.tools capability)
    tools?: Tool[];
    toolChoice?: {
      mode?: "auto" | "required" | "none";
    };
  };
}

Response Schema

interface CreateMessageResult {
  role: "assistant";
  content: AssistantMessageContent | AssistantMessageContent[];
  model: string;
  stopReason?: "endTurn" | "stopSequence" | "toolUse" | "maxTokens" | string;
}

Content Types

Text, Image, Audio (standard):

{ type: "text", text: string }
{ type: "image", data: string, mimeType: string }
{ type: "audio", data: string, mimeType: string }

Tool Use (in assistant messages):

{
  type: "tool_use",
  id: string,
  name: string,
  input: object
}

Tool Result (in user messages, must be exclusive - no mixing with other content):

{
  type: "tool_result",
  toolUseId: string,
  content: ContentBlock[],
  isError?: boolean
}

Tool Calling in Sampling

Important: Server-Defined, Server-Executed Tools

The tools array in a sampling request contains tools defined and executed by the MCP server itself - not MCP tools exposed via tools/list, and not tools from the client or other MCP servers.

┌─────────────────┐                      ┌─────────────────┐
│   MCP Server    │                      │  MCP Client     │
│                 │                      │  (opencode)     │
└────────┬────────┘                      └────────┬────────┘
         │                                        │
         │  1. sampling/createMessage             │
         │     { tools: [get_weather, ...] }      │
         │───────────────────────────────────────>│
         │                                        │
         │                                        │──> LLM provider
         │                                        │<── (with tools)
         │                                        │
         │  2. Response: tool_use                 │
         │     { name: "get_weather", input: {} } │
         │<───────────────────────────────────────│
         │                                        │
    ┌────┴────┐                                   │
    │ Server  │  3. Server executes tool          │
    │ runs    │     (e.g., calls weather API)     │
    │ tool    │                                   │
    └────┬────┘                                   │
         │                                        │
         │  4. sampling/createMessage             │
         │     { messages: [..., tool_result] }   │
         │───────────────────────────────────────>│
         │                                        │
         │                                        │──> LLM provider
         │                                        │<── (continues)
         │                                        │
         │  5. Response: endTurn                  │
         │     { content: "The weather is..." }   │
         │<───────────────────────────────────────│

Key points:

  • The MCP server defines the tools it wants the LLM to use
  • When the LLM returns stopReason: "toolUse", the server executes those tools
  • The server sends another sampling request with tool_result blocks
  • The client (opencode) only routes requests to the LLM and returns responses

What's NOT Supported (Possible Future Extensions)

Per SEP-1577, these are explicitly out of scope but mentioned as possible follow-ups:

  1. Client calling server's MCP tools: Would let the client execute the server's own tools/list tools during sampling, removing the need for the server to run its own tool loop.

  2. Client calling other MCP servers' tools: Would let users allowlist tools from any MCP server for use in sampling requests from other servers.

These may be added in future spec revisions but are not part of the current (2025-11-25) specification.


Multi-Turn Tool Loop

When stopReason: "toolUse", the server:

  1. Executes the requested tools
  2. Sends a new sampling/createMessage with tool results appended
  3. Receives LLM response (may contain more tool uses)
  4. Repeats until stopReason: "endTurn" or iteration limit

Constraints:

  • Every tool_use block MUST be matched by a tool_result with matching toolUseId
  • User messages with tool_result MUST NOT contain other content types
  • Parallel tool calls are supported (array of tool_use blocks)

Tool Choice Modes

  • auto (default): Model decides whether to use tools
  • required: Model MUST use at least one tool
  • none: Model MUST NOT use any tools (useful for forcing final text response)

Proposed Implementation

1. Declare sampling capability

const client = new Client({
  name: "opencode",
  version: Installation.VERSION,
}, {
  capabilities: {
    sampling: {
      tools: {}  // Enable tool calling support
    }
  }
})

2. Register request handler

import { CreateMessageRequestSchema } from "@modelcontextprotocol/sdk/types.js"

client.setRequestHandler(CreateMessageRequestSchema, async (request) => {
  // Map model preferences to configured provider
  const model = selectModel(request.params.modelPreferences)
  
  // Build messages for provider
  const messages = mapMessages(request.params.messages)
  
  // Include tools if provided (server-defined tools, NOT MCP tools)
  const tools = request.params.tools?.map(mapTool)
  
  // Call LLM provider
  const response = await llmProvider.createMessage({
    model,
    system: request.params.systemPrompt,
    messages,
    tools,
    tool_choice: request.params.toolChoice,
    max_tokens: request.params.maxTokens,
  })
  
  return {
    role: "assistant",
    content: mapContent(response.content),
    model: response.model,
    stopReason: mapStopReason(response.stop_reason),
  }
})

3. User approval flow

Per the spec, there SHOULD always be a human in the loop:

┌─────────────────────────────────────────────────────────┐
│  MCP Sampling Request from: my-mcp-server               │
├─────────────────────────────────────────────────────────┤
│  System: You are a helpful assistant.                   │
│                                                         │
│  User: What's the weather in Paris?                     │
│                                                         │
│  Tools requested (server-defined):                      │
│    • get_weather(city: string)                          │
│                                                         │
│  Model preference: claude-3-sonnet (speed: high)        │
│  Max tokens: 1000                                       │
├─────────────────────────────────────────────────────────┤
│  [Approve]  [Edit]  [Deny]                              │
└─────────────────────────────────────────────────────────┘

4. Configuration options

{
  "mcp": {
    "my-server": {
      "type": "local",
      "command": ["..."],
      "sampling": {
        "enabled": true,
        "tools": true,              // Allow tool calling in sampling
        "requireApproval": "always", // "always" | "first" | "never"
        "maxTokens": 4096,
        "maxToolIterations": 10     // Prevent infinite tool loops
      }
    }
  }
}

5. Security considerations

Per the spec:

  • User approval controls (SHOULD)
  • Validate message content (SHOULD)
  • Respect model preference hints (SHOULD)
  • Rate limiting (SHOULD)
  • Handle sensitive data appropriately (MUST)
  • Implement iteration limits for tool loops (SHOULD)

Additional considerations for opencode:

  • Display sampling requests in UI similar to tool calls
  • Allow viewing/editing prompts before sending
  • Present responses for review before delivery
  • Track token usage from sampling requests

Related Issues

References

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions