A voice-enabled MCP (Model Context Protocol) server built with Bun, React, and ElevenLabs. This server exposes three tools for LLMs to enable voice interaction: speak
, listen
, and action
.
-
π€ Speech-to-Text: Web Speech API with three modes:
- Manual: Click to record, click send button
- PTT (Push-to-Talk): Hold button while speaking, release to send
- Auto: Automatically sends after 1.5s of silence
-
π Text-to-Speech: ElevenLabs streaming audio
-
π¬ Chat Interface: Facebook Messenger-style UI
-
π Action Tracking: Collapsible action logs attached to LLM responses
-
π WebSocket: Real-time audio streaming and status updates
-
β‘ MCP Tools: Three tools exposed via stdio transport
- Bun v1.0.0 or later
- ElevenLabs API Key
- Modern browser with Web Speech API support (Chrome, Edge recommended)
The easiest way to use this MCP server is via npx:
# Install globally
npm install -g voice-mcp
# Or run directly with npx
npx voice-mcp
Then configure in your MCP client (Claude Desktop, Claude Code, etc.):
{
"mcpServers": {
"voice-mcp": {
"command": "npx",
"args": ["voice-mcp"],
"env": {
"ELEVENLABS_API_KEY": "your_api_key_here",
"MCP_HTTP_PORT": "53245"
}
}
}
}
Open your browser to http://localhost:53245
to access the voice interface!
# Clone the repository
git clone https://github.com/codingbutter/simple-voice-mcp.git
cd simple-voice-mcp
# Install dependencies
bun install
# Copy environment example
cp .env.example .env
# Edit .env and add your ElevenLabs API key
# ELEVENLABS_API_KEY=your_api_key_here
Start the development server (HTTP/WebSocket + MCP stdio):
# Set your API key
export ELEVENLABS_API_KEY="your_api_key_here"
# Run in development mode with HMR
bun dev
The server will:
- Start an HTTP server on port 3000 (configurable via
MCP_HTTP_PORT
) - Serve the React UI at
http://localhost:3000
- Listen for MCP requests via stdio
# Build the frontend
bun run build
# Run in production mode
NODE_ENV=production ELEVENLABS_API_KEY="your_key" bun start
This project includes a .mcp.json
file that automatically configures the server with Claude Code:
-
Add your ElevenLabs API key to
.mcp.json
:{ "env": { "ELEVENLABS_API_KEY": "your_api_key_here" } }
-
Restart Claude Code - The server will auto-start
-
Open browser to
http://localhost:53245
The server is configured with autoStart: true
, so it starts automatically when Claude Code launches.
To use this as an MCP server with Claude Desktop or another MCP client, add this to your MCP configuration:
{
"mcpServers": {
"voice-mcp": {
"command": "bun",
"args": ["run", "/absolute/path/to/simple-voice-mcp/src/index.tsx"],
"env": {
"ELEVENLABS_API_KEY": "your_api_key_here",
"MCP_HTTP_PORT": "53245",
"ELEVEN_VOICE_ID": "21m00Tcm4TlvDq8ikWAM",
"ELEVEN_MODEL_ID": "eleven_flash_v2_5"
}
}
}
}
The server exposes three tools:
Generate and stream text-to-speech audio to connected clients.
Parameters:
text
(string, required): The text to convert to speechlisten
(boolean, optional): If true, wait for user response after speakingtimeout_ms
(number, optional): Timeout when listen=true (default: 60000ms)voiceId
(string, optional): ElevenLabs voice ID (default: Rachel)modelId
(string, optional): ElevenLabs model ID (default: eleven_flash_v2_5)
Returns:
{ ok: true, message, messages? }
- If listen=true, includes user's messages
Wait for text input from clients (blocks until user sends text or timeout).
Parameters:
timeout_ms
(number, optional): Timeout in milliseconds (default: 60000)
Returns:
{ messages: string[] }
- Array of messages (empty if timeout)
Send a status or action update to the client UI. Appears as collapsible section.
Parameters:
text
(string, required): The action/status text to display (e.g., "Reading file X", "Running tests")
Returns:
{ ok: true }
Note: Only send concrete actions being performed, not commentary or explanations.
# Install MCP Inspector globally
npm install -g @modelcontextprotocol/inspector
# Test the server
export ELEVENLABS_API_KEY="your_key"
npx @modelcontextprotocol/inspector bun src/index.tsx
βββββββββββββββββββββββββββββββββββββββββββ
β MCP Client (Claude Desktop, etc.) β
βββββββββββββββββ¬ββββββββββββββββββββββββββ
β stdio (JSON-RPC)
β
βββββββββββββββββΌββββββββββββββββββββββββββ
β MCP Server (Bun Process) β
β ββ stdio transport β
β ββ Three tools: speak/listen/action β
β ββ HTTP/WebSocket server β
βββββββββββββββββ¬ββββββββββββββββββββββββββ
β HTTP + WebSocket
β
βββββββββββββββββΌββββββββββββββββββββββββββ
β Browser UI (React) β
β ββ Chat interface β
β ββ Web Speech API (STT) β
β ββ Audio playback (TTS) β
β ββ WebSocket client β
βββββββββββββββββββββββββββββββββββββββββββ
Variable | Required | Default | Description |
---|---|---|---|
ELEVENLABS_API_KEY |
β | - | Your ElevenLabs API key |
MCP_HTTP_PORT |
β | 3000 | Port for HTTP/WS server |
ELEVEN_VOICE_ID |
β | Rachel voice | Default voice ID |
ELEVEN_MODEL_ID |
β | eleven_flash_v2_5 | Default model |
NODE_ENV |
β | development | Environment mode |
src/
βββ index.tsx # Main entry point (MCP + HTTP server)
βββ App.tsx # React root component
βββ frontend.tsx # React DOM setup
βββ mcp/
β βββ tools.ts # MCP tool implementations
βββ server/
β βββ http.ts # HTTP + WebSocket server
β βββ websocket.ts # WebSocket manager
β βββ tts.ts # ElevenLabs TTS manager
βββ hooks/
β βββ useWebSocket.ts # WebSocket client hook
β βββ useSpeechRecognition.ts # Web Speech API hook
βββ components/
βββ chat/
β βββ ChatInterface.tsx # Main chat UI
β βββ ChatMessage.tsx # Message bubble component
βββ ui/ # shadcn/ui components
- stdio Constraint: The server uses stdout for MCP JSON-RPC. All logging goes to stderr.
- Browser Compatibility: Web Speech API works best in Chrome/Edge
- Multi-Instance: Each MCP server instance needs a unique port (set via
MCP_HTTP_PORT
)
See Project_Scope.md for detailed technical specifications. See QUICKSTART.md for a quick setup guide.
MIT - See LICENSE for details.
- Bun - Runtime & bundler
- React 19 - UI framework
- Tailwind CSS v4 - Styling
- shadcn/ui - UI components
- ElevenLabs - Text-to-speech
- Model Context Protocol - MCP SDK