RAG Knowledge Assistant

A Python-based Retrieval-Augmented Generation (RAG) system that answers user queries using private document collections while minimizing hallucinations.

🚀 Live Demo

Deploy on Streamlit Cloud:

Features

Document Upload & Indexing: Support for PDF, DOCX, TXT files
Semantic Search: Vector-based similarity search using FAISS
Grounded Answers: LLM responses with source citations
Hallucination Mitigation: Strict grounding in retrieved documents
Interactive UI: Streamlit-based chat interface
Cloud Deployment: Ready for Streamlit Cloud deployment
Multiple AI Models: OpenRouter integration with 50+ models

🚀 Quick Deploy on Streamlit Cloud

Fork this repository to your GitHub account
Deploy on Streamlit Cloud:
- Go to share.streamlit.io
- Connect your GitHub account
- Select this repository
- Set main file path: streamlit_app.py
- Add your OpenRouter API key in Streamlit secrets

Configure Secrets in Streamlit Cloud:

OPENROUTER_API_KEY = "your_openrouter_api_key_here"
OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
LLM_MODEL = "meta-llama/llama-3.2-3b-instruct:free"

💻 Local Development

Clone and Setup

git clone https://github.com/Sane219/RAD-Knowledge-Assistant.git
cd RAD-Knowledge-Assistant
pip install -r requirements.txt

Configure Environment

cp .env.example .env
# Edit .env and add your OpenRouter API key

Run Locally
```
streamlit run streamlit_app.py
```

🐳 Docker Deployment (Optional)

For containerized deployment:

docker build -t rag-assistant .
docker run -p 8501:8501 --env-file .env rag-assistant

🔧 Advanced Setup (Separate Backend/Frontend)

For development with separate services:

# Terminal 1 - Backend:
python -m uvicorn app.main:app --host 127.0.0.1 --port 8001

# Terminal 2 - Frontend:
streamlit run frontend/app.py

System Requirements

Python: 3.8 or higher
Dependencies: Automatically installed via pip
API Key: OpenRouter API key (required)
Storage: Local file system (no external database needed)

Architecture

Frontend: Streamlit for interactive UI
Backend: FastAPI for document processing and retrieval
Embeddings: SentenceTransformers for vector generation
Vector DB: FAISS for local similarity search (no external DB required)
LLM: OpenRouter API for answer generation (supports multiple models)
Deployment: Runs locally with Python or containerized with Docker

Usage

Upload documents (PDF, DOCX, TXT) via the web interface
Ask questions in the chat interface
Receive grounded answers with source citations
View retrieved document chunks for transparency

Configuration

Edit config/settings.py to customize:

Chunk size and overlap
Number of retrieved documents
LLM model selection
Vector database settings

Te

sting the System

Upload Sample Documents
- Use the provided sample documents in sample_documents/
- Or upload your own PDF, DOCX, or TXT files
Ask Questions
- "What is RAG and how does it work?"
- "What are Python best practices for error handling?"
- "How should I organize my Python code?"
Verify Grounded Responses
- Check that answers include source citations
- Review retrieved document chunks
- Test with questions not covered in documents

API Endpoints

POST /upload - Upload and process documents
POST /query - Query documents and get AI answers
GET /health - Check system status
DELETE /documents - Clear all documents

Customization

Modify Chunk Settings

Edit config/settings.py:

CHUNK_SIZE = 500  # Tokens per chunk
CHUNK_OVERLAP = 50  # Overlap between chunks

Change LLM Model

LLM_MODEL = "gpt-4"  # or "gpt-3.5-turbo"

Adjust Retrieval

MAX_RETRIEVED_DOCS = 5  # Number of chunks to retrieve

Troubleshooting

Common Issues

"Backend Offline" in UI
- Ensure FastAPI backend is running on port 8000
- Check console for error messages
OpenRouter API Errors
- Verify your API key in .env file
- Check account credits and billing
- Ensure the model name is correct (e.g., openai/gpt-3.5-turbo)
Document Upload Fails
- Ensure file format is PDF, DOCX, or TXT
- Check file size (large files may timeout)
Poor Answer Quality
- Try uploading more relevant documents
- Adjust chunk size for your document type
- Increase number of retrieved documents

Performance Tips

Use smaller chunk sizes for technical documents
Increase chunk overlap for better context
Upload documents with clear structure and headings
Test with different question phrasings

🌐 Deployment Options

Streamlit Cloud (Recommended)

Fork this repository
Deploy on share.streamlit.io
Main file path: streamlit_app.py
Add secrets in Streamlit Cloud dashboard

Local Development

streamlit run streamlit_app.py

Docker Deployment

docker build -t rag-assistant .
docker run -p 8501:8501 --env-file .env rag-assistant

Production Considerations

Security
- Use environment variables for all secrets
- Implement authentication and authorization
- Add rate limiting and input validation
Scalability
- Consider using Pinecone or Weaviate for vector storage
- Implement document caching
- Use load balancing for multiple instances
Monitoring
- Add logging and metrics
- Monitor API response times
- Track document processing success rates

Ope

nRouter Configuration

Getting Your OpenRouter API Key

Sign up at OpenRouter.ai
Get API Key from your dashboard
Add credits to your account for API usage

Available Models

You can use any model supported by OpenRouter by changing the LLM_MODEL in your .env file:

# Popular options:
LLM_MODEL=openai/gpt-3.5-turbo          # Fast and cost-effective
LLM_MODEL=openai/gpt-4                  # Higher quality
LLM_MODEL=anthropic/claude-3-haiku      # Anthropic's fast model
LLM_MODEL=anthropic/claude-3-sonnet     # Anthropic's balanced model
LLM_MODEL=meta-llama/llama-3-8b-instruct # Open source option
LLM_MODEL=google/gemini-pro             # Google's model
LLM_MODEL=mistralai/mistral-7b-instruct # Mistral AI model

Cost Comparison

OpenRouter provides transparent pricing for all models:

GPT-3.5-turbo: ~$0.002/1K tokens
GPT-4: ~$0.03/1K tokens
Claude-3-Haiku: ~$0.00025/1K tokens
Llama-3-8B: ~$0.0002/1K tokens

Benefits of OpenRouter

✅ Multiple Models: Access to 50+ AI models
✅ Transparent Pricing: Clear cost per model
✅ No Vendor Lock-in: Switch models easily
✅ Reliability: Automatic failover between providers
✅ Usage Analytics: Detailed usage tracking

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.streamlit		.streamlit
app		app
config		config
frontend		frontend
sample_documents		sample_documents
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
start.py		start.py
streamlit_app.py		streamlit_app.py

Folders and files

Latest commit

History

Repository files navigation

RAG Knowledge Assistant

🚀 Live Demo

Features

🚀 Quick Deploy on Streamlit Cloud

💻 Local Development

🐳 Docker Deployment (Optional)

🔧 Advanced Setup (Separate Backend/Frontend)

System Requirements

Architecture

Usage

Configuration

Te

API Endpoints

Customization

Modify Chunk Settings

Change LLM Model

Adjust Retrieval

Troubleshooting

Common Issues

Performance Tips

🌐 Deployment Options

Streamlit Cloud (Recommended)

Local Development

Docker Deployment

Production Considerations

Ope

Getting Your OpenRouter API Key

Available Models

Cost Comparison

Benefits of OpenRouter

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages