Realtime voice assistant prototype for phone calls and spoken conversations. VoicePlus captures two audio streams, sends low-latency transcription to Speechmatics, displays live S1/S2 transcript lines, and can generate short response suggestions through local Ollama models or optional OpenAI helpers.
- Dual-stream call transcription for speaker/customer sides.
- Speechmatics realtime WebSocket transcription.
- Terminal UI with live partials, final transcript history, pause state, and colored output.
- Echo suppression heuristics between S1 and S2 streams.
- Optional translation pipeline.
- Local LLM reply suggestions through
reply_engine.pyand Ollama. - Optional OpenAI helper calls for translating a Russian goal into English and generating useful phrases.
- Per-call logs written to
call_logs/.
main.pyis the current entry point.reply_engine.pycontains local LLM prompt routing and cleanup logic.vp.pyandvp_2.pyare earlier application variants kept for reference.test.pyandtest_2.pyare local benchmark/scratch scripts for model latency testing.
Create a local .env file from .env.example:
cp .env.example .envRequired for transcription:
SPEECHMATICS_API=replace-with-speechmatics-api-key
SPEECHMATICS_WSS=wss://eu2.rt.speechmatics.com/v2/Optional AI helpers:
OPENAI_API=replace-with-openai-api-key
OLLAMA_HOST=http://127.0.0.1:11434
ENABLE_SUGGEST=1API keys and call logs must stay local. Do not commit .env, logs, audio captures, or generated transcripts.
Install dependencies:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtRun:
python main.pyThe app expects local audio routing/capture to be configured on the host. For local suggestion generation, run Ollama and pull the models referenced by reply_engine.py.
This is an experimental realtime assistant. The public version keeps runtime configuration external and focuses on the transcription, UI, and response-suggestion workflow.