Local Voice RAG Assistant

A fully local, privacy-first voice assistant that answers questions from your documents using Retrieval Augmented Generation (RAG). Speak naturally and get conversational responses based on your ingested PDFs.

Features

100% Local: All processing happens on your machine - no cloud APIs required
Voice Interface: Speak questions naturally and hear spoken responses
RAG Pipeline: Retrieves relevant context from your documents before generating answers
Hybrid Search: Combines vector similarity with BM25 keyword search for better results
Apple Silicon Optimized: MLX-accelerated Whisper for fast speech recognition on Mac

Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Microphone │────▶│  Silero VAD │────▶│ MLX Whisper │
│   (Input)   │     │  (Detect)   │     │    (STT)    │
└─────────────┘     └─────────────┘     └─────────────┘
                                               │
                                               ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Speaker   │◀────│  Coqui TTS  │◀────│  LM Studio  │
│  (Output)   │     │  (Speak)    │     │    (LLM)    │
└─────────────┘     └─────────────┘     └─────────────┘
                                               ▲
                                               │
                           ┌───────────────────┴───────────────────┐
                           │              Weaviate                 │
                           │  ┌─────────────┐  ┌─────────────┐    │
                           │  │ BGE Vectors │  │ BM25 Index  │    │
                           │  └─────────────┘  └─────────────┘    │
                           └───────────────────────────────────────┘

Components

Component	Technology	Purpose
Audio I/O	sounddevice	Microphone input and speaker output
VAD	Silero VAD	Detects when you start/stop speaking
STT	MLX Whisper	Converts speech to text (Apple Silicon)
Embeddings	BGE-base-en-v1.5	Converts text to vectors for search
Vector DB	Weaviate	Stores and searches document chunks
LLM	LM Studio	Generates conversational responses
TTS	Coqui TTS	Converts response text to speech

Prerequisites

All Platforms

Python 3.10+ - Required for all dependencies
Docker Desktop - For running Weaviate vector database
LM Studio - For running local LLMs
Git - For cloning the repository

macOS Additional Requirements

Xcode Command Line Tools: xcode-select --install
Homebrew (recommended): For installing system dependencies

Windows Additional Requirements

Visual Studio Build Tools: Required for compiling some Python packages
CUDA Toolkit (optional): For GPU acceleration with NVIDIA cards

Installation

macOS (Apple Silicon - Recommended)

This setup uses MLX for hardware-accelerated speech recognition.

1. Clone the Repository

git clone https://github.com/InsightStream-Dev/local-voice-rag.git
cd local-voice-rag

2. Create Virtual Environment

python3 -m venv venv
source venv/bin/activate

3. Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

Note: If you encounter numpy/transformers version conflicts, run:
pip install numpy==1.26.4 transformers==4.49.0

4. Start Weaviate (Vector Database)

docker compose -f docker/docker-compose.yml up -d

Verify it's running:

curl http://localhost:8081/v1/.well-known/ready

5. Install and Configure LM Studio

Download from lmstudio.ai
Launch LM Studio
Download a model (recommended: qwen2.5-3b-instruct for speed)
Go to Developer tab → Start the local server
Ensure it's running on http://localhost:1234

6. Configure Environment

cp .env.example .env

Edit .env if needed (defaults work for most setups):

# Weaviate
WEAVIATE_URL=http://localhost:8081

# LM Studio
LMSTUDIO_URL=http://localhost:1234/v1
LMSTUDIO_MODEL=local-model

# Models
WHISPER_MODEL=mlx-community/whisper-large-v3-turbo
EMBEDDING_MODEL=BAAI/bge-base-en-v1.5
TTS_MODEL=tts_models/en/ljspeech/tacotron2-DDC

macOS (Intel) / Linux

Intel Macs and Linux systems cannot use MLX. Use CPU Whisper instead.

Follow steps 1-5 above, then:

6. Install CPU Whisper

Edit requirements.txt and uncomment the CPU fallback:

# openai-whisper>=20231117  ← Uncomment this line

Then install:

pip install openai-whisper

7. Configure for CPU

Edit .env:

WHISPER_MODEL=base.en  # or small.en, medium.en, large-v3

Windows

Windows setup requires additional steps for audio and some Python packages.

1. Install Prerequisites

Python 3.10+: Download from python.org
- Check "Add Python to PATH" during installation
Docker Desktop: Download from docker.com
- Enable WSL 2 backend when prompted
Visual Studio Build Tools:
- Download from visualstudio.microsoft.com
- Install "Desktop development with C++"
Git: Download from git-scm.com

2. Clone and Setup

Open PowerShell or Command Prompt:

git clone https://github.com/InsightStream-Dev/local-voice-rag.git
cd local-voice-rag

python -m venv venv
.\venv\Scripts\activate

pip install --upgrade pip
pip install -r requirements.txt

3. Windows-Specific Dependencies

For audio support, you may need to install PortAudio:

# Using pip (try this first)
pip install sounddevice

# If that fails, install manually:
# Download PortAudio from http://www.portaudio.com/download.html

4. Configure for CPU Whisper

Windows cannot use MLX. Edit requirements.txt:

# Comment out MLX lines:
# mlx>=0.12.0
# mlx-whisper>=0.4.0

# Uncomment CPU whisper:
openai-whisper>=20231117

Install:

pip install openai-whisper

5. Update Environment Configuration

Create .env file:

WEAVIATE_URL=http://localhost:8081
LMSTUDIO_URL=http://localhost:1234/v1
LMSTUDIO_MODEL=local-model

# Use CPU Whisper model
WHISPER_MODEL=base.en

6. Start Weaviate

docker compose -f docker/docker-compose.yml up -d

7. Install and Start LM Studio

Download from lmstudio.ai
Install and launch
Download a model (e.g., qwen2.5-3b-instruct)
Start the local server on port 1234

Usage

1. Ingest Documents

First, add your PDF documents to the knowledge base:

# Activate virtual environment
source venv/bin/activate  # macOS/Linux
.\venv\Scripts\activate   # Windows

# Ingest a PDF
python scripts/ingest_document.py /path/to/your/document.pdf

Example output:

Ingesting: /path/to/document.pdf
  Extracted 50 pages
  Created 120 chunks
  Embedded and stored in Weaviate
Done! Document ingested successfully.

2. Test Search (Optional)

Verify your documents are searchable:

python scripts/test_search.py "your search query"

3. Run the Voice Assistant

python -m src.assistant.pipeline

Output:

Warming up voice assistant...
  Loading VAD...
  Loading STT (Whisper)...
  Loading TTS...
  Connecting to LLM...
    Model: qwen2.5-3b-instruct
  Connecting to vector database...
  Initializing audio...
Voice assistant ready!

==================================================
Voice Assistant Active
Speak to ask questions about your documents.
Press Ctrl+C to exit.
==================================================

[Listening...]

Now speak your question! The assistant will:

Detect when you start speaking
Transcribe your speech
Search for relevant document chunks
Generate a response using the LLM
Speak the answer back to you

Project Structure

local-voice-rag/
├── docker/
│   └── docker-compose.yml    # Weaviate configuration
├── docs/                     # Your PDF documents
├── scripts/
│   ├── ingest_document.py    # PDF ingestion CLI
│   ├── test_search.py        # Test vector search
│   ├── test_llm.py           # Test LM Studio connection
│   ├── test_stt.py           # Test speech-to-text
│   ├── test_tts.py           # Test text-to-speech
│   ├── test_vad.py           # Test voice activity detection
│   └── test_audio.py         # Test audio I/O
├── src/
│   ├── audio/
│   │   ├── microphone.py     # Async microphone capture
│   │   ├── speaker.py        # Audio playback
│   │   └── vad.py            # Silero VAD wrapper
│   ├── stt/
│   │   ├── whisper_mlx.py    # MLX Whisper (Apple Silicon)
│   │   └── whisper_cpu.py    # CPU Whisper fallback
│   ├── tts/
│   │   └── coqui.py          # Coqui TTS wrapper
│   ├── rag/
│   │   ├── embeddings.py     # BGE embeddings
│   │   ├── weaviate_client.py # Weaviate connection
│   │   ├── chunker.py        # Document chunking
│   │   ├── ingest.py         # PDF ingestion pipeline
│   │   └── search.py         # Hybrid search
│   ├── llm/
│   │   └── lmstudio.py       # LM Studio client
│   ├── assistant/
│   │   ├── pipeline.py       # Main voice assistant
│   │   └── prompts.py        # System prompts
│   └── config.py             # Configuration management
├── requirements.txt
├── .env.example
└── README.md

Configuration Reference

All settings can be configured via environment variables or .env file:

Variable	Default	Description
`WEAVIATE_URL`	`http://localhost:8081`	Weaviate HTTP endpoint
`LMSTUDIO_URL`	`http://localhost:1234/v1`	LM Studio API endpoint
`LMSTUDIO_MODEL`	`local-model`	Model identifier (auto-detected)
`WHISPER_MODEL`	`mlx-community/whisper-large-v3-turbo`	Speech recognition model
`EMBEDDING_MODEL`	`BAAI/bge-base-en-v1.5`	Text embedding model
`TTS_MODEL`	`tts_models/en/ljspeech/tacotron2-DDC`	Text-to-speech model
`AUDIO_SAMPLE_RATE`	`16000`	Audio sample rate (Hz)
`VAD_THRESHOLD`	`0.5`	Voice activity detection sensitivity
`VAD_SILENCE_DURATION_MS`	`700`	Silence before speech end (ms)
`CHUNK_SIZE`	`500`	Tokens per document chunk
`CHUNK_OVERLAP`	`50`	Overlap between chunks
`SEARCH_LIMIT`	`5`	Number of search results
`SEARCH_ALPHA`	`0.5`	Hybrid search balance (0=BM25, 1=Vector)

Troubleshooting

Weaviate Connection Issues

Error: Connection refused or Cannot connect to Weaviate

Check if Docker is running:
```
docker ps
```

Check Weaviate container status:

docker compose -f docker/docker-compose.yml logs

Verify Weaviate is healthy:

curl http://localhost:8081/v1/.well-known/ready

LM Studio Not Responding

Error: LM Studio is not available

Ensure LM Studio is running
Check the Developer tab - server must be started
Verify a model is loaded
Test the connection:
```
curl http://localhost:1234/v1/models
```

Audio Device Issues

Error: No input device found or PortAudio error

macOS:

Grant microphone permission in System Preferences → Security & Privacy → Microphone

Windows:

Check Windows audio settings
Try installing PortAudio manually
Run as Administrator if permission issues persist

Numpy/Transformers Version Conflict

Error: Various import errors or deprecation warnings

pip install numpy==1.26.4 transformers==4.49.0

MLX Not Available (Non-Apple Silicon)

Error: MLX is only supported on Apple Silicon

Use CPU Whisper instead:

Comment out mlx and mlx-whisper in requirements.txt
Uncomment openai-whisper
Set WHISPER_MODEL=base.en in .env

Slow Response Times

If responses take too long (>30s):

Use a smaller LLM: In LM Studio, try qwen2.5-1.5b-instruct or phi-3-mini
Use a smaller Whisper model: Set WHISPER_MODEL=base.en
Reduce search results: Set SEARCH_LIMIT=3 in .env

Out of Memory

If you run out of RAM:

Close other applications
Use smaller models:
- LLM: 1-3B parameter models
- Whisper: base.en or small.en
Reduce chunk size: CHUNK_SIZE=300

Testing Individual Components

Test each component independently to isolate issues:

# Test audio input/output
python scripts/test_audio.py

# Test voice activity detection
python scripts/test_vad.py

# Test speech-to-text
python scripts/test_stt.py

# Test text-to-speech
python scripts/test_tts.py

# Test LLM connection
python scripts/test_llm.py

# Test document search
python scripts/test_search.py "test query"

Performance Benchmarks

Typical response times on Apple Silicon M1/M2:

Stage	Time
Speech-to-Text	1-3s
Vector Search	0.1-0.3s
LLM Generation	5-15s
Text-to-Speech	2-5s
Total	8-23s

Times vary based on:

Input audio length
LLM model size and complexity
Response length
Hardware specifications

License

MIT License - See LICENSE file for details.

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Fork the repository
Create a feature branch: git checkout -b feature/my-feature
Commit changes: git commit -am 'Add my feature'
Push to branch: git push origin feature/my-feature
Open a Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docker		docker
docs		docs
documents		documents
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
PROGRESS.md		PROGRESS.md
README.md		README.md
requirements.txt		requirements.txt

InsightStream-Dev/local-voice-rag

Folders and files

Latest commit

History

Repository files navigation