Skip to content

Aasrith65/voice_layer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voice Layer Service

Small FastAPI service that accepts a frontend WebSocket connection, streams audio to ElevenLabs realtime STT, forwards committed transcripts to your existing backend /v1/chat, then synthesizes the assistant reply with ElevenLabs TTS and streams it back to the frontend.

What this service does

  • Accepts a browser or mobile WebSocket connection at /v1/voice/ws
  • Lets the frontend pass VAD settings per session
  • Uses ElevenLabs realtime speech-to-text for transcription
  • Calls your backend /v1/register and /v1/chat
  • Uses ElevenLabs text-to-speech for assistant playback
  • Returns structured WS events for partial transcripts, committed transcripts, assistant text, and assistant audio

Setup

  1. Create a virtual environment and install dependencies:
python3 -m venv .venv
./.venv/bin/pip install -r requirements.txt
  1. Copy .env.example to .env and fill in:
cp .env.example .env

At minimum set:

  • ELEVENLABS_API_KEY
  • ELEVENLABS_VOICE_ID
  • LLM_API_KEY if you want this service to auto-register
  • BACKEND_BASE_URL should be the backend service root, for example https://your-backend.example.com, not a full /v1/register or /v1/chat URL
  • BACKEND_API_KEY should be set on the service for production deployments

If you already have a stable backend registration_id, set DEFAULT_REGISTRATION_ID and you can skip the LLM registration env values.

For production, ALLOW_FRONTEND_BACKEND_AUTH=false is recommended so browser clients cannot override the deployed backend credentials.

Run

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

Open the browser test console at:

http://localhost:8000/test

Frontend handoff document:

FRONTEND_HANDOFF.md

Frontend WebSocket Contract

Connect to:

ws://localhost:8000/v1/voice/ws

1. Configure the session

Send this first:

{
  "type": "session.configure",
  "session_id": "voice-test-1",
  "backend": {
    "registration_id": "abc123-def456"
  },
  "stt": {
    "model_id": "scribe_v2_realtime",
    "sample_rate": 16000,
    "audio_format": "pcm_16000",
    "language_code": "en",
    "commit_strategy": "vad",
    "vad_threshold": 0.62,
    "vad_silence_threshold_secs": 1.4,
    "min_speech_duration_ms": 520,
    "min_silence_duration_ms": 800,
    "include_timestamps": false
  },
  "tts": {
    "voice_id": "YOUR_ELEVENLABS_VOICE_ID",
    "model_id": "eleven_flash_v2_5",
    "output_format": "pcm_24000",
    "voice_settings": {
      "stability": 0.4,
      "similarity_boost": 0.8,
      "speed": 1.0
    }
  }
}

You can also omit backend.registration_id and instead pass:

{
  "backend": {
    "provider": "custom",
    "api_key": "YOUR_LLM_API_KEY",
    "model": "llama-3.3-70b-versatile",
    "base_url": "https://api.groq.com/openai",
    "system_prompt": "You are a helpful assistant."
  }
}

The service will call your backend /v1/register and cache the returned registration. If your backend does not expose /v1/register, provide an existing registration_id instead.

2. Stream audio

Send JSON frames with base64 PCM audio:

{
  "type": "audio.append",
  "audio": "BASE64_PCM_CHUNK",
  "sample_rate": 16000
}

You can also send binary WebSocket frames directly after configuration. Binary frames are treated as raw PCM audio and forwarded using the configured sample rate.

If you use manual commit instead of VAD, send:

{
  "type": "audio.commit"
}

3. Optional direct text prompt

{
  "type": "text.prompt",
  "text": "Summarize what I just said."
}

Server Events

Examples:

{ "type": "session.ready", "message": "Send session.configure to begin." }
{ "type": "session.configured", "session_id": "voice-test-1" }
{ "type": "stt.session_started", "session_id": "elevenlabs-session-id", "config": { "...": "..." } }
{ "type": "transcript.partial", "text": "hello wor" }
{ "type": "transcript.committed", "text": "hello world" }
{ "type": "assistant.started", "text": "hello world" }
{ "type": "assistant.message", "text": "Hi there!", "metadata": { "...": "..." } }
{ "type": "assistant.audio.start", "turn_id": 1, "content_type": "application/octet-stream", "output_format": "pcm_24000" }
{ "type": "assistant.audio.chunk", "turn_id": 1, "audio": "BASE64_AUDIO_CHUNK", "content_type": "application/octet-stream", "output_format": "pcm_24000", "chunk_index": 1 }
{ "type": "assistant.audio.end", "turn_id": 1, "content_type": "application/octet-stream", "output_format": "pcm_24000", "chunk_count": 42 }
{ "type": "assistant.audio", "audio": "BASE64_AUDIO", "content_type": "audio/mpeg", "output_format": "mp3_44100_128" }
{ "type": "error", "message": "Readable error for the frontend" }

Notes for the frontend engineer

  • Prefer 16 kHz mono PCM for the simplest STT path.
  • Send chunks around 0.1 to 1.0 seconds long for smoother streaming and lower latency.
  • commit_strategy: "vad" is the default recommended path for the browser test client. The backend now owns turn boundaries by default.
  • Use commit_strategy: "manual" only for debugging or custom clients that want to force commits explicitly.
  • Only send previous_text with the first audio chunk after a new segment starts.
  • For low-latency browser playback, prefer pcm_24000 so the client can start speaking from streamed chunks immediately.
  • Assistant audio may arrive as streamed assistant.audio.start / assistant.audio.chunk / assistant.audio.end events, or as a fallback single assistant.audio blob.

Browser test page

The service includes a built-in browser console at /test.

Recommended test order:

  1. Start the API server.
  2. Open http://localhost:8000/test.
  3. Paste a registration_id into the page if you already have one.
  4. Click Connect.
  5. Click Send session.configure.
  6. Click Send text.prompt first to verify backend chat plus TTS.
  7. Click Start Mic to test realtime STT and voice playback.

If you need to force a commit while testing, click Send audio.commit.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors