Voice MCP Server with STT/TTS capabilities - Elysia backend + React PWA frontend
A Model Context Protocol (MCP) server that adds voice capabilities to Claude Code and other MCP clients. Speak text using high-quality TTS (ElevenLabs) with SSML enrichment (OpenAI), and listen to voice input with STT (Google Gemini).
- ποΈ Voice Input: Capture and transcribe voice using Google Gemini STT with VAD and chunking
- π Voice Output: Convert text to speech using ElevenLabs with OpenAI-powered SSML enrichment
- π MCP Integration: Two tools (
speak
andlisten
) accessible from Claude Code and other MCP clients - π¬ PWA Interface: React-based operator console with conversation history and audio replay
- π Multi-Session: Support multiple concurrent MCP connections with separate conversation histories
- β Tested: 81 tests covering schemas, tools, storage, and session management
This is a Bun monorepo with:
- Backend (
apps/backend
): Elysia server with MCP SSE endpoints - Frontend (
apps/frontend
): React PWA with audio controls and conversation UI - Packages:
core
: MCP tools, AI services (TTS/STT/SSML), session managementdatabase
: Prisma storage layer for conversations and messagesshared
: Zod schemas and TypeScript typesplatform
: Web/Electron adaptersui
: Shared React components
- Bun (v1.0+)
- API Keys:
- OpenAI (for SSML enrichment)
- ElevenLabs (for TTS)
- Google Gemini (for STT)
# Clone the repo
git clone https://github.com/CodingButter/speak2me-mcp.git
cd speak2me-mcp
# Install dependencies
bun install
# Set up database
cd packages/database
bun run db:generate
bun run db:push
cd ../..
# Configure API keys (backend)
cp apps/backend/.env.example apps/backend/.env
# Edit apps/backend/.env with your API keys
# Start both backend and frontend
bun run dev
# Or start individually
bun run dev:backend # Backend on http://localhost:3000
bun run dev:frontend # Frontend on http://localhost:5173
# Run all tests
bun test
# Watch mode
bun test:watch
# Coverage
bun test:coverage
Connect Claude Code (or other MCP clients) to the voice server:
bun run dev:backend
{
"mcpServers": {
"voice": {
"url": "http://localhost:3000/sse/my-project-id"
}
}
}
Each project can have its own conversationId
(the last path segment) to maintain separate histories.
Claude Code will auto-discover two tools:
speak
- Convert text to speech
{
text: string, // Required: text to speak
ssml?: string, // Optional: provide your own SSML
voiceId?: string, // Optional: ElevenLabs voice ID
model?: string, // Optional: ElevenLabs model
stream?: boolean // Optional: stream audio (default: true)
}
listen
- Capture and transcribe voice
{
mode: "auto" | "manual" | "ptt", // Required: listening mode
vadThreshold?: number, // Optional: VAD threshold (0-1)
minSilenceMs?: number, // Optional: silence duration
maxUtteranceMs?: number, // Optional: max utterance length
locale?: string // Optional: e.g., "en-US"
}
speak2me-mcp/
βββ apps/
β βββ backend/ # Elysia MCP server
β β βββ src/
β β β βββ index.ts # Main server
β β β βββ mcp/ # SSE transport, tool handlers
β β β βββ api/ # REST endpoints
β β βββ package.json
β βββ frontend/ # React PWA
β βββ src/
β β βββ components/ # UI components
β β βββ hooks/ # Audio capture hooks
β β βββ services/ # Audio encoding
β βββ package.json
βββ packages/
β βββ core/ # MCP tools & services
β β βββ src/
β β βββ mcp/ # handleSpeak, handleListen
β β βββ services/ # TTS, STT, SSML enhancer
β β βββ session/ # SessionManager
β β βββ operations/ # CoreOperations
β βββ database/ # Prisma storage
β β βββ prisma/
β β β βββ schema.prisma
β β βββ src/storage.ts
β βββ shared/ # Schemas & types
β β βββ src/
β β βββ schemas.ts # Zod schemas
β β βββ types.ts # TypeScript types
β βββ platform/ # Web/Electron adapters
β βββ ui/ # Shared components
β βββ config/ # Shared config
βββ package.json # Root workspace
bun run dev
- Start both apps in dev modebun run dev:backend
- Start backend onlybun run dev:frontend
- Start frontend onlybun run build
- Build all appsbun test
- Run all testsbun run typecheck
- Type check all packagesbun run lint
- Lint all packagesbun run format
- Format code with Prettier
bun run dev
- Dev with hot reloadbun run build
- Build for productionbun run start
- Start production buildbun test
- Run backend tests
bun run dev
- Dev serverbun run build
- Build for productionbun run preview
- Preview production buildbun test
- Run frontend tests
bun run db:generate
- Generate Prisma clientbun run db:push
- Push schema to databasebun run db:migrate
- Create migrationbun run db:studio
- Open Prisma Studio
This project uses pre-push hooks to ensure code quality:
- Pre-push: Runs all tests before allowing push to remote
- Tests must pass before code can be pushed
- Located in
.git/hooks/pre-push
Keys can be stored two ways:
Create apps/backend/.env
:
OPENAI_API_KEY=sk-...
ELEVENLABS_API_KEY=...
GEMINI_API_KEY=...
Users can enter keys in the PWA Settings panel. Keys are stored per conversation.
- CLAUDE.md - Instructions for Claude Code when working in this repo
- Project Scope Document.md - Full product requirements and architecture
- Runtime: Bun
- Backend: Elysia, @modelcontextprotocol/sdk, Prisma
- Frontend: React 18, Zustand, TailwindCSS, @ricky0123/vad-web
- AI Services: OpenAI, ElevenLabs, Google Gemini
- Validation: Zod
- Testing: Bun Test
- Fork the repo
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Make your changes
- Run tests (
bun test
) - Commit (
git commit -m 'Add amazing feature'
) - Push to your fork (
git push origin feature/amazing-feature
) - Open a Pull Request
MIT
Built with Claude Code