Skip to content

Sane219/RAD-Knowledge-Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RAG Knowledge Assistant

A Python-based Retrieval-Augmented Generation (RAG) system that answers user queries using private document collections while minimizing hallucinations.

πŸš€ Live Demo

Deploy on Streamlit Cloud: Streamlit App

Features

  • Document Upload & Indexing: Support for PDF, DOCX, TXT files
  • Semantic Search: Vector-based similarity search using FAISS
  • Grounded Answers: LLM responses with source citations
  • Hallucination Mitigation: Strict grounding in retrieved documents
  • Interactive UI: Streamlit-based chat interface
  • Cloud Deployment: Ready for Streamlit Cloud deployment
  • Multiple AI Models: OpenRouter integration with 50+ models

πŸš€ Quick Deploy on Streamlit Cloud

  1. Fork this repository to your GitHub account

  2. Deploy on Streamlit Cloud:

    • Go to share.streamlit.io
    • Connect your GitHub account
    • Select this repository
    • Set main file path: streamlit_app.py
    • Add your OpenRouter API key in Streamlit secrets
  3. Configure Secrets in Streamlit Cloud:

    OPENROUTER_API_KEY = "your_openrouter_api_key_here"
    OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
    LLM_MODEL = "meta-llama/llama-3.2-3b-instruct:free"

πŸ’» Local Development

  1. Clone and Setup

    git clone https://github.com/Sane219/RAD-Knowledge-Assistant.git
    cd RAD-Knowledge-Assistant
    pip install -r requirements.txt
  2. Configure Environment

    cp .env.example .env
    # Edit .env and add your OpenRouter API key
  3. Run Locally

    streamlit run streamlit_app.py

🐳 Docker Deployment (Optional)

For containerized deployment:

docker build -t rag-assistant .
docker run -p 8501:8501 --env-file .env rag-assistant

πŸ”§ Advanced Setup (Separate Backend/Frontend)

For development with separate services:

# Terminal 1 - Backend:
python -m uvicorn app.main:app --host 127.0.0.1 --port 8001

# Terminal 2 - Frontend:
streamlit run frontend/app.py

System Requirements

  • Python: 3.8 or higher
  • Dependencies: Automatically installed via pip
  • API Key: OpenRouter API key (required)
  • Storage: Local file system (no external database needed)

Architecture

  • Frontend: Streamlit for interactive UI
  • Backend: FastAPI for document processing and retrieval
  • Embeddings: SentenceTransformers for vector generation
  • Vector DB: FAISS for local similarity search (no external DB required)
  • LLM: OpenRouter API for answer generation (supports multiple models)
  • Deployment: Runs locally with Python or containerized with Docker

Usage

  1. Upload documents (PDF, DOCX, TXT) via the web interface
  2. Ask questions in the chat interface
  3. Receive grounded answers with source citations
  4. View retrieved document chunks for transparency

Configuration

Edit config/settings.py to customize:

  • Chunk size and overlap
  • Number of retrieved documents
  • LLM model selection
  • Vector database settings

Te

sting the System

  1. Upload Sample Documents

    • Use the provided sample documents in sample_documents/
    • Or upload your own PDF, DOCX, or TXT files
  2. Ask Questions

    • "What is RAG and how does it work?"
    • "What are Python best practices for error handling?"
    • "How should I organize my Python code?"
  3. Verify Grounded Responses

    • Check that answers include source citations
    • Review retrieved document chunks
    • Test with questions not covered in documents

API Endpoints

  • POST /upload - Upload and process documents
  • POST /query - Query documents and get AI answers
  • GET /health - Check system status
  • DELETE /documents - Clear all documents

Customization

Modify Chunk Settings

Edit config/settings.py:

CHUNK_SIZE = 500  # Tokens per chunk
CHUNK_OVERLAP = 50  # Overlap between chunks

Change LLM Model

LLM_MODEL = "gpt-4"  # or "gpt-3.5-turbo"

Adjust Retrieval

MAX_RETRIEVED_DOCS = 5  # Number of chunks to retrieve

Troubleshooting

Common Issues

  1. "Backend Offline" in UI

    • Ensure FastAPI backend is running on port 8000
    • Check console for error messages
  2. OpenRouter API Errors

    • Verify your API key in .env file
    • Check account credits and billing
    • Ensure the model name is correct (e.g., openai/gpt-3.5-turbo)
  3. Document Upload Fails

    • Ensure file format is PDF, DOCX, or TXT
    • Check file size (large files may timeout)
  4. Poor Answer Quality

    • Try uploading more relevant documents
    • Adjust chunk size for your document type
    • Increase number of retrieved documents

Performance Tips

  • Use smaller chunk sizes for technical documents
  • Increase chunk overlap for better context
  • Upload documents with clear structure and headings
  • Test with different question phrasings

🌐 Deployment Options

Streamlit Cloud (Recommended)

  1. Fork this repository
  2. Deploy on share.streamlit.io
  3. Main file path: streamlit_app.py
  4. Add secrets in Streamlit Cloud dashboard

Local Development

streamlit run streamlit_app.py

Docker Deployment

docker build -t rag-assistant .
docker run -p 8501:8501 --env-file .env rag-assistant

Production Considerations

  1. Security

    • Use environment variables for all secrets
    • Implement authentication and authorization
    • Add rate limiting and input validation
  2. Scalability

    • Consider using Pinecone or Weaviate for vector storage
    • Implement document caching
    • Use load balancing for multiple instances
  3. Monitoring

    • Add logging and metrics
    • Monitor API response times
    • Track document processing success rates

Ope

nRouter Configuration

Getting Your OpenRouter API Key

  1. Sign up at OpenRouter.ai
  2. Get API Key from your dashboard
  3. Add credits to your account for API usage

Available Models

You can use any model supported by OpenRouter by changing the LLM_MODEL in your .env file:

# Popular options:
LLM_MODEL=openai/gpt-3.5-turbo          # Fast and cost-effective
LLM_MODEL=openai/gpt-4                  # Higher quality
LLM_MODEL=anthropic/claude-3-haiku      # Anthropic's fast model
LLM_MODEL=anthropic/claude-3-sonnet     # Anthropic's balanced model
LLM_MODEL=meta-llama/llama-3-8b-instruct # Open source option
LLM_MODEL=google/gemini-pro             # Google's model
LLM_MODEL=mistralai/mistral-7b-instruct # Mistral AI model

Cost Comparison

OpenRouter provides transparent pricing for all models:

  • GPT-3.5-turbo: ~$0.002/1K tokens
  • GPT-4: ~$0.03/1K tokens
  • Claude-3-Haiku: ~$0.00025/1K tokens
  • Llama-3-8B: ~$0.0002/1K tokens

Benefits of OpenRouter

  • βœ… Multiple Models: Access to 50+ AI models
  • βœ… Transparent Pricing: Clear cost per model
  • βœ… No Vendor Lock-in: Switch models easily
  • βœ… Reliability: Automatic failover between providers
  • βœ… Usage Analytics: Detailed usage tracking

About

A Python-based Retrieval-Augmented Generation (RAG) system that answers user queries using private document collections while minimizing hallucinations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors