Skip to content

asdfshk/VA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VA — Voice Agent Pipeline

A self-hosted voice agent built with LiveKit Agents, wired to local STT, LLM, and TTS services. Everything runs on localhost.

┌──────────────────────────┐
│        agent/            │  LiveKit Agents worker (Python)
│  agents.py + plugins     │  ── glues STT → LLM → TTS together
└──────────┬───────────────┘
           │ HTTP
   ┌───────┼─────────┬──────────────┐
   ▼                 ▼              ▼
┌──────────┐    ┌──────────┐   ┌──────────┐
│  stt/    │    │  llm/    │   │   tts/   │
│ Parakeet │    │  vLLM    │   │  Kokoro  │
│  :8989   │    │  :8080   │   │  :8880   │
└──────────┘    └──────────┘   └──────────┘

Layout

VA/
├── agent/    LiveKit agent + custom STT/TTS LiveKit plugins (was VAtest/)
├── stt/      Parakeet FastAPI service (was parakeet-FastAPI/)
├── llm/      vLLM start scripts (Llama-3 / Qwen)
└── tts/      Kokoro FastAPI service (was kokoro-FastAPI/)
Service Folder Default URL Source repo origin
Agent agent/ connects to LiveKit Cloud (custom)
STT stt/ http://localhost:8989 NVIDIA NeMo Parakeet TDT 0.6B
LLM llm/ http://localhost:8080/v1 vLLM serving Llama-3.1-8B-Instruct or Qwen3.5-9B (AWQ)
TTS tts/ http://localhost:8880 Kokoro 82M

Prerequisites

  • Linux with NVIDIA GPU(s), CUDA 12.4 toolkit
  • Python 3.11+ (Parakeet) / 3.12+ (agent), uv package manager
    curl -LsSf https://astral.sh/uv/install.sh | sh
  • espeak-ng for Kokoro TTS:
    sudo apt install espeak-ng
  • A LiveKit Cloud project (or self-hosted LiveKit server). Credentials live in agent/.env.

Note: the original .venv/ folders were intentionally not copied — recreate them with uv sync inside each subfolder (instructions below). All other config files (.env, pyproject.toml, uv.lock, requirements.txt) were preserved.

First-time setup

Run each of these once. They each create a fresh .venv inside the corresponding folder.

# 1. Agent
cd VA/agent
uv sync

# 2. STT (Parakeet)
cd ../stt
uv sync                         # or: pip install -r requirements.txt

# 3. LLM (vLLM) — see llm/README.md for details
cd ../llm
uv venv --python 3.12
uv pip install vllm

# 4. TTS (Kokoro) — installed by start-gpu.sh on first run, no manual step needed
cd ../tts

Running the pipeline

You'll need 4 terminals, one per service. Bring them up in this order:

1) STT — Parakeet (port 8989)

cd VA/stt
chmod +x start.sh
./start.sh

The script sets CUDA_VISIBLE_DEVICES=0, downloads/loads the model, and serves POST /v1/transcribe/parakeet on http://0.0.0.0:8989.

2) TTS — Kokoro (port 8880)

GPU:

cd VA/tts
./start-gpu.sh

CPU-only (slower):

cd VA/tts
./start-cpu.sh

The first run downloads the Kokoro v1.0 voice model into api/src/models/v1_0/. Server listens on http://0.0.0.0:8880 with an OpenAI-compatible /v1/audio/speech endpoint.

3) LLM — vLLM (port 8080)

cd VA/llm
source .venv/bin/activate     # if you used plain pip; uv users can skip
./start-llama.sh              # Llama-3.1-8B-Instruct-AWQ  (recommended)
# or
./start-qwen.sh               # Qwen3.5-9B-AWQ             (strongest)

Both serve on port 8080. See llm/README.md for installation, VRAM math, and per-model tuning.

4) Agent — LiveKit worker

Once STT, TTS, and LLM are all reachable, start the agent:

cd VA/agent
uv run python agents.py dev      # interactive dev mode
# or
uv run python agents.py start    # production worker
# or
uv run python agents.py console  # local terminal session, no LiveKit room

The agent registers with LiveKit Cloud using credentials in agent/.env (LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET) and waits for incoming sessions.

How the pieces talk to each other

  • agent/parakeet.py — custom LiveKit STT plugin → POST {STT_URL}/v1/transcribe/parakeet with raw 16-bit PCM (sample rate 24 kHz).
  • agent/kokoro.py — custom LiveKit TTS plugin (uses the openai Python client) → POST {TTS_URL}/v1/audio/speech with response_format="pcm". Includes a ClauseTokenizer that splits LLM output at clause boundaries so TTS starts speaking before the LLM finishes.
  • agent/agents.py — wires ParakeetSTT, openai.LLM, KokoroTTS, and Silero VAD into a livekit.agents.AgentSession.

If you change ports, update the URLs in agents.py:

stt_plugin = ParakeetSTT(server_url="http://localhost:8989", language="en")
llm        = openai.LLM(base_url="http://localhost:8080/v1", ...)
tts        = KokoroTTS(base_url="http://localhost:8880/v1", ...)

Quick health checks

# STT
curl -X POST "http://localhost:8989/v1/transcribe/parakeet?sample_rate=16000" \
     --data-binary @stt/test_audio.wav

# TTS
curl -X POST http://localhost:8880/v1/audio/speech \
     -H "Content-Type: application/json" \
     -d '{"model":"tts-1","voice":"af_sky","input":"hello world","response_format":"mp3"}' \
     --output /tmp/hello.mp3

# LLM
curl http://localhost:8080/v1/models

Troubleshooting

  • ModuleNotFoundError after uv sync: make sure you ran uv sync inside each service folder and you're using uv run ... (or activated .venv/bin/activate).
  • STT or TTS connection refused: confirm the service is up on the expected port — the agent uses hard-coded URLs in agents.py.
  • vLLM No available memory for the cache blocks: --gpu-memory-utilization is too small for the model. On a 12GB card with STT also running, use 0.80. See llm/README.md for the math.
  • GPU OOM at runtime: most likely Kokoro TTS is running on GPU and crowding out vLLM. Switch TTS to tts/start-cpu.sh. Kokoro on CPU is still real-time on modern x86. Default layout assumes a single 12GB GPU; with two GPUs, edit llm/start-qwen.sh to use CUDA_VISIBLE_DEVICES=1 and raise utilization to 0.92.
  • NVMLError_InvalidArgument from vLLM: CUDA_VISIBLE_DEVICES points at a GPU index that doesn't exist. Run nvidia-smi -L and pick a valid index.
  • espeak missing for Kokoro: sudo apt install espeak-ng. On non-Debian distros, set ESPEAK_DATA_PATH in tts/start-cpu.sh to your espeak-ng data directory.
  • LiveKit auth error: check agent/.env and that the project URL/keys match a live LiveKit Cloud project.

Subfolder docs

Each service keeps its own README and config:

  • agent/pyproject.toml — agent dependencies (LiveKit Agents, OpenAI plugin, etc.)
  • stt/README.md, stt/requirements.txt — Parakeet service details
  • llm/README.md — vLLM install, model options, GPU layout
  • tts/README.md — full Kokoro docs (Docker, voices, options)

About

LiveKit voice agent: Parakeet STT, vLLM LLM, Kokoro TTS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors