VA — Voice Agent Pipeline

A self-hosted voice agent built with LiveKit Agents, wired to local STT, LLM, and TTS services. Everything runs on localhost.

┌──────────────────────────┐
│        agent/            │  LiveKit Agents worker (Python)
│  agents.py + plugins     │  ── glues STT → LLM → TTS together
└──────────┬───────────────┘
           │ HTTP
   ┌───────┼─────────┬──────────────┐
   ▼                 ▼              ▼
┌──────────┐    ┌──────────┐   ┌──────────┐
│  stt/    │    │  llm/    │   │   tts/   │
│ Parakeet │    │  vLLM    │   │  Kokoro  │
│  :8989   │    │  :8080   │   │  :8880   │
└──────────┘    └──────────┘   └──────────┘

Layout

VA/
├── agent/    LiveKit agent + custom STT/TTS LiveKit plugins (was VAtest/)
├── stt/      Parakeet FastAPI service (was parakeet-FastAPI/)
├── llm/      vLLM start scripts (Llama-3 / Qwen)
└── tts/      Kokoro FastAPI service (was kokoro-FastAPI/)

Service	Folder	Default URL	Source repo origin
Agent	`agent/`	connects to LiveKit Cloud	(custom)
STT	`stt/`	`http://localhost:8989`	NVIDIA NeMo Parakeet TDT 0.6B
LLM	`llm/`	`http://localhost:8080/v1`	vLLM serving Llama-3.1-8B-Instruct or Qwen3.5-9B (AWQ)
TTS	`tts/`	`http://localhost:8880`	Kokoro 82M

Prerequisites

Linux with NVIDIA GPU(s), CUDA 12.4 toolkit
Python 3.11+ (Parakeet) / 3.12+ (agent), uv package manager
```
curl -LsSf https://astral.sh/uv/install.sh | sh
```
espeak-ng for Kokoro TTS:
```
sudo apt install espeak-ng
```
A LiveKit Cloud project (or self-hosted LiveKit server). Credentials live in agent/.env.

Note: the original .venv/ folders were intentionally not copied — recreate them with uv sync inside each subfolder (instructions below). All other config files (.env, pyproject.toml, uv.lock, requirements.txt) were preserved.

First-time setup

Run each of these once. They each create a fresh .venv inside the corresponding folder.

# 1. Agent
cd VA/agent
uv sync

# 2. STT (Parakeet)
cd ../stt
uv sync                         # or: pip install -r requirements.txt

# 3. LLM (vLLM) — see llm/README.md for details
cd ../llm
uv venv --python 3.12
uv pip install vllm

# 4. TTS (Kokoro) — installed by start-gpu.sh on first run, no manual step needed
cd ../tts

Running the pipeline

You'll need 4 terminals, one per service. Bring them up in this order:

1) STT — Parakeet (port 8989)

cd VA/stt
chmod +x start.sh
./start.sh

The script sets CUDA_VISIBLE_DEVICES=0, downloads/loads the model, and serves POST /v1/transcribe/parakeet on http://0.0.0.0:8989.

2) TTS — Kokoro (port 8880)

GPU:

cd VA/tts
./start-gpu.sh

CPU-only (slower):

cd VA/tts
./start-cpu.sh

The first run downloads the Kokoro v1.0 voice model into api/src/models/v1_0/. Server listens on http://0.0.0.0:8880 with an OpenAI-compatible /v1/audio/speech endpoint.

3) LLM — vLLM (port 8080)

cd VA/llm
source .venv/bin/activate     # if you used plain pip; uv users can skip
./start-llama.sh              # Llama-3.1-8B-Instruct-AWQ  (recommended)
# or
./start-qwen.sh               # Qwen3.5-9B-AWQ             (strongest)

Both serve on port 8080. See llm/README.md for installation, VRAM math, and per-model tuning.

4) Agent — LiveKit worker

Once STT, TTS, and LLM are all reachable, start the agent:

cd VA/agent
uv run python agents.py dev      # interactive dev mode
# or
uv run python agents.py start    # production worker
# or
uv run python agents.py console  # local terminal session, no LiveKit room

The agent registers with LiveKit Cloud using credentials in agent/.env (LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET) and waits for incoming sessions.

How the pieces talk to each other

agent/parakeet.py — custom LiveKit STT plugin → POST {STT_URL}/v1/transcribe/parakeet with raw 16-bit PCM (sample rate 24 kHz).
agent/kokoro.py — custom LiveKit TTS plugin (uses the openai Python client) → POST {TTS_URL}/v1/audio/speech with response_format="pcm". Includes a ClauseTokenizer that splits LLM output at clause boundaries so TTS starts speaking before the LLM finishes.
agent/agents.py — wires ParakeetSTT, openai.LLM, KokoroTTS, and Silero VAD into a livekit.agents.AgentSession.

If you change ports, update the URLs in agents.py:

stt_plugin = ParakeetSTT(server_url="http://localhost:8989", language="en")
llm        = openai.LLM(base_url="http://localhost:8080/v1", ...)
tts        = KokoroTTS(base_url="http://localhost:8880/v1", ...)

Quick health checks

# STT
curl -X POST "http://localhost:8989/v1/transcribe/parakeet?sample_rate=16000" \
     --data-binary @stt/test_audio.wav

# TTS
curl -X POST http://localhost:8880/v1/audio/speech \
     -H "Content-Type: application/json" \
     -d '{"model":"tts-1","voice":"af_sky","input":"hello world","response_format":"mp3"}' \
     --output /tmp/hello.mp3

# LLM
curl http://localhost:8080/v1/models

Troubleshooting

ModuleNotFoundError after uv sync: make sure you ran uv sync inside each service folder and you're using uv run ... (or activated .venv/bin/activate).
STT or TTS connection refused: confirm the service is up on the expected port — the agent uses hard-coded URLs in agents.py.
vLLM No available memory for the cache blocks: --gpu-memory-utilization is too small for the model. On a 12GB card with STT also running, use 0.80. See llm/README.md for the math.
GPU OOM at runtime: most likely Kokoro TTS is running on GPU and crowding out vLLM. Switch TTS to tts/start-cpu.sh. Kokoro on CPU is still real-time on modern x86. Default layout assumes a single 12GB GPU; with two GPUs, edit llm/start-qwen.sh to use CUDA_VISIBLE_DEVICES=1 and raise utilization to 0.92.
NVMLError_InvalidArgument from vLLM: CUDA_VISIBLE_DEVICES points at a GPU index that doesn't exist. Run nvidia-smi -L and pick a valid index.
espeak missing for Kokoro: sudo apt install espeak-ng. On non-Debian distros, set ESPEAK_DATA_PATH in tts/start-cpu.sh to your espeak-ng data directory.
LiveKit auth error: check agent/.env and that the project URL/keys match a live LiveKit Cloud project.

Subfolder docs

Each service keeps its own README and config:

agent/pyproject.toml — agent dependencies (LiveKit Agents, OpenAI plugin, etc.)
stt/README.md, stt/requirements.txt — Parakeet service details
llm/README.md — vLLM install, model options, GPU layout
tts/README.md — full Kokoro docs (Docker, voices, options)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VA — Voice Agent Pipeline

Layout

Prerequisites

First-time setup

Running the pipeline

1) STT — Parakeet (port 8989)

2) TTS — Kokoro (port 8880)

3) LLM — vLLM (port 8080)

4) Agent — LiveKit worker

How the pieces talk to each other

Quick health checks

Troubleshooting

Subfolder docs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agent		agent
llm		llm
stt		stt
tts		tts
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

VA — Voice Agent Pipeline

Layout

Prerequisites

First-time setup

Running the pipeline

1) STT — Parakeet (port 8989)

2) TTS — Kokoro (port 8880)

3) LLM — vLLM (port 8080)

4) Agent — LiveKit worker

How the pieces talk to each other

Quick health checks

Troubleshooting

Subfolder docs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages