Skip to content

InsightStream-Dev/local-voice-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local Voice RAG Assistant

A fully local, privacy-first voice assistant that answers questions from your documents using Retrieval Augmented Generation (RAG). Speak naturally and get conversational responses based on your ingested PDFs.

Features

  • 100% Local: All processing happens on your machine - no cloud APIs required
  • Voice Interface: Speak questions naturally and hear spoken responses
  • RAG Pipeline: Retrieves relevant context from your documents before generating answers
  • Hybrid Search: Combines vector similarity with BM25 keyword search for better results
  • Apple Silicon Optimized: MLX-accelerated Whisper for fast speech recognition on Mac

Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Microphone │────▶│  Silero VAD │────▶│ MLX Whisper │
│   (Input)   │     │  (Detect)   │     │    (STT)    │
└─────────────┘     └─────────────┘     └─────────────┘
                                               │
                                               ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Speaker   │◀────│  Coqui TTS  │◀────│  LM Studio  │
│  (Output)   │     │  (Speak)    │     │    (LLM)    │
└─────────────┘     └─────────────┘     └─────────────┘
                                               ▲
                                               │
                           ┌───────────────────┴───────────────────┐
                           │              Weaviate                 │
                           │  ┌─────────────┐  ┌─────────────┐    │
                           │  │ BGE Vectors │  │ BM25 Index  │    │
                           │  └─────────────┘  └─────────────┘    │
                           └───────────────────────────────────────┘

Components

Component Technology Purpose
Audio I/O sounddevice Microphone input and speaker output
VAD Silero VAD Detects when you start/stop speaking
STT MLX Whisper Converts speech to text (Apple Silicon)
Embeddings BGE-base-en-v1.5 Converts text to vectors for search
Vector DB Weaviate Stores and searches document chunks
LLM LM Studio Generates conversational responses
TTS Coqui TTS Converts response text to speech

Prerequisites

All Platforms

  1. Python 3.10+ - Required for all dependencies
  2. Docker Desktop - For running Weaviate vector database
  3. LM Studio - For running local LLMs
  4. Git - For cloning the repository

macOS Additional Requirements

  • Xcode Command Line Tools: xcode-select --install
  • Homebrew (recommended): For installing system dependencies

Windows Additional Requirements

  • Visual Studio Build Tools: Required for compiling some Python packages
  • CUDA Toolkit (optional): For GPU acceleration with NVIDIA cards

Installation

macOS (Apple Silicon - Recommended)

This setup uses MLX for hardware-accelerated speech recognition.

1. Clone the Repository

git clone https://github.com/InsightStream-Dev/local-voice-rag.git
cd local-voice-rag

2. Create Virtual Environment

python3 -m venv venv
source venv/bin/activate

3. Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

Note: If you encounter numpy/transformers version conflicts, run:

pip install numpy==1.26.4 transformers==4.49.0

4. Start Weaviate (Vector Database)

docker compose -f docker/docker-compose.yml up -d

Verify it's running:

curl http://localhost:8081/v1/.well-known/ready

5. Install and Configure LM Studio

  1. Download from lmstudio.ai
  2. Launch LM Studio
  3. Download a model (recommended: qwen2.5-3b-instruct for speed)
  4. Go to Developer tab → Start the local server
  5. Ensure it's running on http://localhost:1234

6. Configure Environment

cp .env.example .env

Edit .env if needed (defaults work for most setups):

# Weaviate
WEAVIATE_URL=http://localhost:8081

# LM Studio
LMSTUDIO_URL=http://localhost:1234/v1
LMSTUDIO_MODEL=local-model

# Models
WHISPER_MODEL=mlx-community/whisper-large-v3-turbo
EMBEDDING_MODEL=BAAI/bge-base-en-v1.5
TTS_MODEL=tts_models/en/ljspeech/tacotron2-DDC

macOS (Intel) / Linux

Intel Macs and Linux systems cannot use MLX. Use CPU Whisper instead.

Follow steps 1-5 above, then:

6. Install CPU Whisper

Edit requirements.txt and uncomment the CPU fallback:

# openai-whisper>=20231117  ← Uncomment this line

Then install:

pip install openai-whisper

7. Configure for CPU

Edit .env:

WHISPER_MODEL=base.en  # or small.en, medium.en, large-v3

Windows

Windows setup requires additional steps for audio and some Python packages.

1. Install Prerequisites

  1. Python 3.10+: Download from python.org

    • Check "Add Python to PATH" during installation
  2. Docker Desktop: Download from docker.com

    • Enable WSL 2 backend when prompted
  3. Visual Studio Build Tools:

  4. Git: Download from git-scm.com

2. Clone and Setup

Open PowerShell or Command Prompt:

git clone https://github.com/InsightStream-Dev/local-voice-rag.git
cd local-voice-rag

python -m venv venv
.\venv\Scripts\activate

pip install --upgrade pip
pip install -r requirements.txt

3. Windows-Specific Dependencies

For audio support, you may need to install PortAudio:

# Using pip (try this first)
pip install sounddevice

# If that fails, install manually:
# Download PortAudio from http://www.portaudio.com/download.html

4. Configure for CPU Whisper

Windows cannot use MLX. Edit requirements.txt:

# Comment out MLX lines:
# mlx>=0.12.0
# mlx-whisper>=0.4.0

# Uncomment CPU whisper:
openai-whisper>=20231117

Install:

pip install openai-whisper

5. Update Environment Configuration

Create .env file:

WEAVIATE_URL=http://localhost:8081
LMSTUDIO_URL=http://localhost:1234/v1
LMSTUDIO_MODEL=local-model

# Use CPU Whisper model
WHISPER_MODEL=base.en

6. Start Weaviate

docker compose -f docker/docker-compose.yml up -d

7. Install and Start LM Studio

  1. Download from lmstudio.ai
  2. Install and launch
  3. Download a model (e.g., qwen2.5-3b-instruct)
  4. Start the local server on port 1234

Usage

1. Ingest Documents

First, add your PDF documents to the knowledge base:

# Activate virtual environment
source venv/bin/activate  # macOS/Linux
.\venv\Scripts\activate   # Windows

# Ingest a PDF
python scripts/ingest_document.py /path/to/your/document.pdf

Example output:

Ingesting: /path/to/document.pdf
  Extracted 50 pages
  Created 120 chunks
  Embedded and stored in Weaviate
Done! Document ingested successfully.

2. Test Search (Optional)

Verify your documents are searchable:

python scripts/test_search.py "your search query"

3. Run the Voice Assistant

python -m src.assistant.pipeline

Output:

Warming up voice assistant...
  Loading VAD...
  Loading STT (Whisper)...
  Loading TTS...
  Connecting to LLM...
    Model: qwen2.5-3b-instruct
  Connecting to vector database...
  Initializing audio...
Voice assistant ready!

==================================================
Voice Assistant Active
Speak to ask questions about your documents.
Press Ctrl+C to exit.
==================================================

[Listening...]

Now speak your question! The assistant will:

  1. Detect when you start speaking
  2. Transcribe your speech
  3. Search for relevant document chunks
  4. Generate a response using the LLM
  5. Speak the answer back to you

Project Structure

local-voice-rag/
├── docker/
│   └── docker-compose.yml    # Weaviate configuration
├── docs/                     # Your PDF documents
├── scripts/
│   ├── ingest_document.py    # PDF ingestion CLI
│   ├── test_search.py        # Test vector search
│   ├── test_llm.py           # Test LM Studio connection
│   ├── test_stt.py           # Test speech-to-text
│   ├── test_tts.py           # Test text-to-speech
│   ├── test_vad.py           # Test voice activity detection
│   └── test_audio.py         # Test audio I/O
├── src/
│   ├── audio/
│   │   ├── microphone.py     # Async microphone capture
│   │   ├── speaker.py        # Audio playback
│   │   └── vad.py            # Silero VAD wrapper
│   ├── stt/
│   │   ├── whisper_mlx.py    # MLX Whisper (Apple Silicon)
│   │   └── whisper_cpu.py    # CPU Whisper fallback
│   ├── tts/
│   │   └── coqui.py          # Coqui TTS wrapper
│   ├── rag/
│   │   ├── embeddings.py     # BGE embeddings
│   │   ├── weaviate_client.py # Weaviate connection
│   │   ├── chunker.py        # Document chunking
│   │   ├── ingest.py         # PDF ingestion pipeline
│   │   └── search.py         # Hybrid search
│   ├── llm/
│   │   └── lmstudio.py       # LM Studio client
│   ├── assistant/
│   │   ├── pipeline.py       # Main voice assistant
│   │   └── prompts.py        # System prompts
│   └── config.py             # Configuration management
├── requirements.txt
├── .env.example
└── README.md

Configuration Reference

All settings can be configured via environment variables or .env file:

Variable Default Description
WEAVIATE_URL http://localhost:8081 Weaviate HTTP endpoint
LMSTUDIO_URL http://localhost:1234/v1 LM Studio API endpoint
LMSTUDIO_MODEL local-model Model identifier (auto-detected)
WHISPER_MODEL mlx-community/whisper-large-v3-turbo Speech recognition model
EMBEDDING_MODEL BAAI/bge-base-en-v1.5 Text embedding model
TTS_MODEL tts_models/en/ljspeech/tacotron2-DDC Text-to-speech model
AUDIO_SAMPLE_RATE 16000 Audio sample rate (Hz)
VAD_THRESHOLD 0.5 Voice activity detection sensitivity
VAD_SILENCE_DURATION_MS 700 Silence before speech end (ms)
CHUNK_SIZE 500 Tokens per document chunk
CHUNK_OVERLAP 50 Overlap between chunks
SEARCH_LIMIT 5 Number of search results
SEARCH_ALPHA 0.5 Hybrid search balance (0=BM25, 1=Vector)

Troubleshooting

Weaviate Connection Issues

Error: Connection refused or Cannot connect to Weaviate

  1. Check if Docker is running:

    docker ps
  2. Check Weaviate container status:

    docker compose -f docker/docker-compose.yml logs
  3. Verify Weaviate is healthy:

    curl http://localhost:8081/v1/.well-known/ready

LM Studio Not Responding

Error: LM Studio is not available

  1. Ensure LM Studio is running
  2. Check the Developer tab - server must be started
  3. Verify a model is loaded
  4. Test the connection:
    curl http://localhost:1234/v1/models

Audio Device Issues

Error: No input device found or PortAudio error

macOS:

  • Grant microphone permission in System Preferences → Security & Privacy → Microphone

Windows:

  • Check Windows audio settings
  • Try installing PortAudio manually
  • Run as Administrator if permission issues persist

Numpy/Transformers Version Conflict

Error: Various import errors or deprecation warnings

pip install numpy==1.26.4 transformers==4.49.0

MLX Not Available (Non-Apple Silicon)

Error: MLX is only supported on Apple Silicon

Use CPU Whisper instead:

  1. Comment out mlx and mlx-whisper in requirements.txt
  2. Uncomment openai-whisper
  3. Set WHISPER_MODEL=base.en in .env

Slow Response Times

If responses take too long (>30s):

  1. Use a smaller LLM: In LM Studio, try qwen2.5-1.5b-instruct or phi-3-mini
  2. Use a smaller Whisper model: Set WHISPER_MODEL=base.en
  3. Reduce search results: Set SEARCH_LIMIT=3 in .env

Out of Memory

If you run out of RAM:

  1. Close other applications
  2. Use smaller models:
    • LLM: 1-3B parameter models
    • Whisper: base.en or small.en
  3. Reduce chunk size: CHUNK_SIZE=300

Testing Individual Components

Test each component independently to isolate issues:

# Test audio input/output
python scripts/test_audio.py

# Test voice activity detection
python scripts/test_vad.py

# Test speech-to-text
python scripts/test_stt.py

# Test text-to-speech
python scripts/test_tts.py

# Test LLM connection
python scripts/test_llm.py

# Test document search
python scripts/test_search.py "test query"

Performance Benchmarks

Typical response times on Apple Silicon M1/M2:

Stage Time
Speech-to-Text 1-3s
Vector Search 0.1-0.3s
LLM Generation 5-15s
Text-to-Speech 2-5s
Total 8-23s

Times vary based on:

  • Input audio length
  • LLM model size and complexity
  • Response length
  • Hardware specifications

License

MIT License - See LICENSE file for details.


Contributing

Contributions are welcome! Please open an issue or submit a pull request.

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/my-feature
  3. Commit changes: git commit -am 'Add my feature'
  4. Push to branch: git push origin feature/my-feature
  5. Open a Pull Request

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages