Skip to content

Sumit1673/knowledge-assistant

Repository files navigation

πŸ” Knowledge Assistant β€” Enterprise RAG System

Python LangChain FastAPI Docker ChromaDB License

A production-grade Retrieval-Augmented Generation (RAG) system that answers questions grounded in your private document corpus β€” not the open internet.

Features Β· Architecture Β· Quickstart Β· Usage Β· Configuration Β· API Reference Β· Roadmap


πŸ“Œ Overview

Knowledge Assistant is an end-to-end RAG pipeline that lets you ask natural language questions against your own documents (PDFs, DOCX, TXT, Markdown) and receive grounded, source-cited answers powered by LLMs.

Built for enterprise use cases β€” private documentation, policy libraries, technical knowledge bases β€” where accuracy and traceability matter more than generality.

Key principle: The system answers from your documents first. When the answer isn't in your corpus, it says so β€” no hallucination.


✨ Features

  • πŸ“„ Multi-format ingestion β€” PDF, DOCX, TXT, Markdown via LangChain document loaders
  • βœ‚οΈ Intelligent chunking β€” Configurable RecursiveCharacterTextSplitter with overlap for context continuity
  • 🧠 Pluggable embeddings β€” Sentence Transformers (local), OpenAI, or Cohere embeddings
  • πŸ—ƒοΈ Swappable vector stores β€” ChromaDB (default, persisted), FAISS, Pinecone
  • πŸ€– Flexible LLM backends β€” Ollama (Mistral / Llama 3, fully local), OpenAI GPT-4, Anthropic Claude
  • 🌐 REST API β€” FastAPI with async endpoints, ready for integration
  • 🐳 Dockerized β€” Single docker-compose up launches the API + Ollama service
  • πŸ“Š Retrieval metrics β€” Retrieval precision benchmarking and qualitative analysis utilities

πŸ—οΈ Architecture

                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚         INGESTION PIPELINE       β”‚
                        β”‚                                  β”‚
  Raw Documents ──────► β”‚  Load β†’ Chunk β†’ Embed β†’ Store   β”‚
  (PDF/DOCX/TXT)        β”‚                                  β”‚
                        β”‚  run_ingest.py                   β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚  persisted vectors
                                       β–Ό
                               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                               β”‚   Vector DB   β”‚
                               β”‚  (ChromaDB /  β”‚
                               β”‚    FAISS)     β”‚
                               β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚  similarity search (top-k)
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚         QUERY PIPELINE           β”‚
                        β”‚                                  β”‚
  User Query ─────────► β”‚  Embed Query                     β”‚
                        β”‚  β†’ Retrieve Chunks               β”‚
                        β”‚  β†’ Build Prompt                  β”‚
                        β”‚  β†’ LLM (Ollama / GPT / Claude)   β”‚
                        β”‚  β†’ Return Answer + Sources       β”‚
                        β”‚                                  β”‚
                        β”‚  app.py  (FastAPI)               β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Embedding Model

all-MiniLM-L6-v2 (default) β€” 384-dimensional vectors, fast, runs fully offline via Sentence Transformers.

Retrieval Strategy

Cosine similarity search over the vector store. Top-k chunks (configurable) are injected into the LLM prompt as grounding context.


πŸš€ Quickstart

Prerequisites

  • Python 3.10+
  • Docker + Docker Compose
  • Ollama (for local LLM inference)

1. Clone the repo

git clone https://github.com/Sumit1673/knowledge-assistant.git
cd knowledge-assistant

2. Install dependencies

pip install -r requirements.txt

3. Start Ollama and pull the model

# Install Ollama (macOS / Linux)
curl -fsSL https://ollama.com/install.sh | sh

# In terminal 1 β€” start the Ollama server
ollama serve

# In terminal 2 β€” pull Mistral (used by default)
ollama pull mistral

4. Configure the system

cp config/config.example.yaml config/config.yaml
# Edit config.yaml to point to your documents directory and set your preferences

5. Ingest your documents

python run_ingest.py

This reads documents from the configured source_dir, chunks them, generates embeddings, and persists the vector store to output/vector_store/.

6. Start the API server

uvicorn app:app --host 0.0.0.0 --port 8000 --reload

Or with Docker:

docker-compose up --build

The API will be available at http://localhost:8000. Interactive docs at http://localhost:8000/docs.


🐳 Docker Setup

Docker Compose runs two services:

Service Description
rag_api FastAPI application (port 8000)
ollama Local LLM inference server (port 11434)

Note: Document ingestion (run_ingest.py) is intentionally separate from the API startup to keep cold-start time fast. Run it manually after adding new documents, or automate it via a file-upload trigger or cron job.

# Start all services
docker-compose up -d

# Ingest documents (run once, or after adding new docs)
docker-compose exec rag_api python run_ingest.py

# View logs
docker-compose logs -f rag_api

# Stop
docker-compose down

πŸ’¬ Usage

Via REST API

Ask a question:

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What is the refund policy?"}'

Response:

{
  "answer": "The refund policy allows returns within 30 days of purchase...",
  "sources": [
    {"document": "policy_handbook.pdf", "page": 12, "score": 0.91}
  ]
}

Via Python client

import requests

response = requests.post(
    "http://localhost:8000/query",
    json={"question": "Summarize the onboarding process"}
)
print(response.json()["answer"])

βš™οΈ Configuration

Edit config/config.yaml to customise the pipeline:

# Document ingestion
ingestion:
  source_dir: "data/documents"       # Folder with your source documents
  chunk_size: 500                    # Characters per chunk
  chunk_overlap: 50                  # Overlap between consecutive chunks

# Embeddings
embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"   # or "openai", "cohere"

# Vector store
vector_store:
  type: "chroma"                     # "chroma" | "faiss" | "pinecone"
  persist_dir: "output/vector_store"

# LLM
llm:
  provider: "ollama"                 # "ollama" | "openai" | "anthropic"
  model: "mistral"                   # Model name for the chosen provider
  temperature: 0.1

# Retrieval
retrieval:
  top_k: 4                           # Number of chunks to retrieve per query

πŸ“‚ Project Structure

knowledge-assistant/
β”œβ”€β”€ app.py                  # FastAPI application & query endpoint
β”œβ”€β”€ run_ingest.py           # Document ingestion pipeline (run once / on update)
β”œβ”€β”€ config/
β”‚   └── config.yaml         # Runtime configuration
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ ingestion/          # Document loaders, text splitters
β”‚   β”œβ”€β”€ embeddings/         # Embedding model wrappers
β”‚   β”œβ”€β”€ vector_store/       # VectorStoreManager (Chroma / FAISS)
β”‚   β”œβ”€β”€ llm/                # LLM handler (Ollama / OpenAI / Anthropic)
β”‚   └── rag/                # RAGQueryHandler β€” retrieval + prompt + generation
β”œβ”€β”€ data/
β”‚   └── documents/          # ← Drop your source documents here
β”œβ”€β”€ output/
β”‚   └── vector_store/       # Auto-generated persisted vector store
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ Dockerfile
└── requirements.txt

πŸ“‘ API Reference

Method Endpoint Description
POST /query Ask a question against the document corpus
GET /health Health check
GET /docs Interactive Swagger UI

Full schema available at /docs when the server is running.


πŸ”­ Roadmap

  • Streaming responses via Server-Sent Events
  • File upload endpoint that auto-triggers ingestion
  • Hybrid search (dense + BM25 sparse retrieval)
  • Multi-tenant document namespacing
  • Evaluation harness (RAGAS metrics: faithfulness, context recall)
  • HuggingFace Spaces demo with Groq backend

🀝 Contributing

Contributions, issues and feature requests are welcome. Please open an issue first to discuss what you'd like to change.

  1. Fork the repo
  2. Create a feature branch (git checkout -b feature/your-feature)
  3. Commit your changes (git commit -m 'Add your feature')
  4. Push to the branch (git push origin feature/your-feature)
  5. Open a Pull Request

πŸ“„ License

Distributed under the MIT License. See LICENSE for details.


πŸ‘€ Author

Sumit Vaise β€” Senior ML Engineer

LinkedIn GitHub Portfolio


If this project was useful, consider giving it a ⭐

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages