🎓 EduRAG Voice Assistant

An AI-powered educational voice assistant that answers questions from your knowledge base using Retrieval-Augmented Generation.

🌟 What is this?

EduRAG ingests your educational documents (NCERT textbooks, notes, PDFs-to-text, etc.), creates semantic embeddings, and lets you ask questions by voice. A LiveKit-powered voice agent retrieves the most relevant passages and answers in natural language — in English or Hindi/English bilingual mode.

Key Features

Feature	Description
🔍 Semantic Search	Find answers by meaning, not just keywords — powered by `all-MiniLM-L6-v2` sentence-transformers
🎤 Voice Agent	Talk to your knowledge base using LiveKit real-time voice infrastructure
🌐 Bilingual	English-only or Hindi/English code-switching agent
📄 Auto-Ingest	Drop `.txt`, `.md`, `.py`, `.json`, or `.csv` files into `documents/` and ingest with one command
🔒 Change Detection	MD5 hashing skips re-processing unchanged files
💾 Local-first	SQLite database — no external vector DB required

🏗️ Architecture

graph LR
    subgraph Client
        A[🎤 User Voice]
    end

    subgraph LiveKit Cloud
        B[WebRTC Room]
    end

    subgraph Voice Agent
        C[English Agent]
        D[Hindi/English Agent]
    end

    subgraph RAG Engine
        E[Embeddings<br/>all-MiniLM-L6-v2]
        F[Text Chunking<br/>Sentence Boundaries]
        G[SQLite Database]
    end

    subgraph AI Services
        H[Google Gemini<br/>LLM]
        I[Deepgram<br/>STT]
        J[Cartesia<br/>English TTS]
        K[Sarvam AI<br/>Hindi STT & TTS]
    end

    A <-->|WebRTC| B
    B <--> C
    B <--> D
    C --> E
    D --> E
    E <--> G
    F --> G
    C <--> H
    D <--> H
    C <--> I
    C <--> J
    D <--> K

Tech Stack: Python 3.10+ · LiveKit Agents · Sentence-Transformers · Deepgram · Google Gemini · Cartesia TTS · Sarvam AI · SQLite

🤖 AI Service Providers

The project uses a modular provider architecture. Here are the current and alternative options:

LLM (Large Language Model)

Provider	Model	Used In	API Key Env
Google Gemini ✅	`gemini-2.5-flash`	Both agents	`GOOGLE_API_KEY`
Groq	`llama-3.3-70b`, `mixtral-8x7b`	— (alternative)	`GROQ_API_KEY`
OpenAI	`gpt-4o`, `gpt-4o-mini`	— (alternative)	`OPENAI_API_KEY`
Anthropic	`claude-sonnet-4-20250514`	— (alternative)	`ANTHROPIC_API_KEY`

STT (Speech-to-Text)

Provider	Model	Used In	API Key Env
Groq ✅	`whisper-large-v3-turbo`	English agent	`GROQ_API_KEY`
Sarvam AI ✅	`saarika:v2.5`	Hindi agent	`SARVAM_API_KEY`
Deepgram	`nova-2`	— (alternative)	`DEEPGRAM_API_KEY`
Google Cloud	`chirp`	— (alternative)	`GOOGLE_API_KEY`

TTS (Text-to-Speech)

Provider	Model / Voice	Used In	API Key Env
Deepgram ✅	`aura-2-thalia-en`	English agent	`DEEPGRAM_API_KEY`
Sarvam AI ✅	`anushka` (Hindi)	Hindi agent	`SARVAM_API_KEY`
Cartesia	`sonic-2`	— (alternative)	`CARTESIA_API_KEY`
ElevenLabs	`eleven_multilingual_v2`	— (alternative)	`ELEVENLABS_API_KEY`

VAD (Voice Activity Detection)

Provider	Model	Used In
Silero ✅	`silero-vad`	Both agents (runs locally, no API key needed)

✅ = Currently configured in the codebase. To switch providers, update the entrypoint() function in the respective agent file.

📦 Project Structure

Rag-Mcp/
├── rag_mcp/                  # Core Python package
│   ├── config.py             #   Centralized config (frozen dataclass)
│   ├── database.py           #   SQLite layer (context-managed)
│   ├── embeddings.py         #   Embedding model (lazy-loaded)
│   ├── chunking.py           #   Sentence-boundary text chunking
│   ├── rag_engine.py         #   MCPRAGTool — high-level RAG API
│   ├── cli.py                #   CLI (ingest / search / stats / list)
│   └── agents/               #   LiveKit voice agents
│       ├── base.py           #     Shared logic + search tool
│       ├── english_agent.py  #     English-only agent
│       └── multi_agent.py    #     Hindi/English bilingual agent
├── documents/                # Knowledge base (NCERT textbooks)
├── voice-agent.py            # ★ Primary entry point
├── pyproject.toml            # Project metadata & tool config
├── requirements.txt          # pip dependencies
├── .env.example              # API key template
└── .gitignore

Note

The documents/ folder currently contains textbooks from NCERT (National Council of Educational Research and Training) covering subjects like Science, Maths, History, Economics, English Literature, Business Studies, and more. You can replace or add your own .txt, .md, or .csv files to build a custom knowledge base.

🚀 Getting Started

Prerequisites

Python 3.10+
API keys for: LiveKit, Deepgram, Google AI, and optionally Cartesia, Sarvam AI, Groq

Installation

# 1. Clone the repository
git clone https://github.com/0xMihirK/EduRAG.git
cd EduRAG

# 2. Create a virtual environment
python -m venv .venv
.venv\Scripts\activate        # Windows
# source .venv/bin/activate   # macOS / Linux

# 3. Install dependencies
pip install -r requirements.txt

# 4. Download required model files
python voice-agent.py download-files

# 5. Set up API keys
copy .env.example .env        # Windows
# cp .env.example .env        # macOS / Linux
# Then edit .env with your actual keys

📖 Usage

🎤 Voice Agent

Start the voice assistant with LiveKit:

# Download model files (first time only)
python voice-agent.py download-files

# English agent (default)
python voice-agent.py dev

# Hindi / English bilingual agent
python voice-agent.py --language hindi dev

Then connect to the LiveKit room via the LiveKit Playground or your own frontend.

🔧 CLI — Document Management

# Ingest all documents from the documents/ folder
python -m rag_mcp --action ingest

# Semantic search
python -m rag_mcp --action search --query "Explain photosynthesis"

# Database statistics
python -m rag_mcp --action stats

# List ingested documents
python -m rag_mcp --action list

⚙️ Configuration

All settings can be overridden via environment variables:

Setting	Env Variable	Default
Database path	`RAG_DB_PATH`	`rag_database.db`
Embedding model	`RAG_MODEL_NAME`	`all-MiniLM-L6-v2`
Documents dir	`RAG_DOCUMENTS_DIR`	`documents`
Chunk size	`RAG_CHUNK_SIZE`	`500`
Chunk overlap	`RAG_CHUNK_OVERLAP`	`50`
Similarity threshold	`RAG_SIMILARITY_THRESHOLD`	`0.3`
Top-K results	`RAG_TOP_K`	`5`

See .env.example for the full list of required API keys.

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Commit your changes: git commit -m "Add amazing feature"
Push to the branch: git push origin feature/amazing-feature
Open a Pull Request

🙏 Acknowledgements

Sarvam AI — Hindi speech-to-text (saarika:v2.5) and text-to-speech models powering the bilingual agent
LiveKit — Real-time voice infrastructure
Deepgram — English speech-to-text and text-to-speech (aura-2)
Google Gemini — Large language model for response generation
Groq — Ultra-fast Whisper STT inference
Sentence-Transformers — Semantic embedding models

📄 License

This project is licensed under the MIT License — see the LICENSE file for details.

Built with ❤️ for education

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎓 EduRAG Voice Assistant

🌟 What is this?

Key Features

🏗️ Architecture

🤖 AI Service Providers

LLM (Large Language Model)

STT (Speech-to-Text)

TTS (Text-to-Speech)

VAD (Voice Activity Detection)

📦 Project Structure

🚀 Getting Started

Prerequisites

Installation

📖 Usage

🎤 Voice Agent

🔧 CLI — Document Management

⚙️ Configuration

🤝 Contributing

🙏 Acknowledgements

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
documents		documents
rag_mcp		rag_mcp
.env		.env
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
voice-agent.py		voice-agent.py

Folders and files

Latest commit

History

Repository files navigation

🎓 EduRAG Voice Assistant

🌟 What is this?

Key Features

🏗️ Architecture

🤖 AI Service Providers

LLM (Large Language Model)

STT (Speech-to-Text)

TTS (Text-to-Speech)

VAD (Voice Activity Detection)

📦 Project Structure

🚀 Getting Started

Prerequisites

Installation

📖 Usage

🎤 Voice Agent

🔧 CLI — Document Management

⚙️ Configuration

🤝 Contributing

🙏 Acknowledgements

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages