A Python-based Retrieval-Augmented Generation (RAG) system that answers user queries using private document collections while minimizing hallucinations.
- Document Upload & Indexing: Support for PDF, DOCX, TXT files
- Semantic Search: Vector-based similarity search using FAISS
- Grounded Answers: LLM responses with source citations
- Hallucination Mitigation: Strict grounding in retrieved documents
- Interactive UI: Streamlit-based chat interface
- Cloud Deployment: Ready for Streamlit Cloud deployment
- Multiple AI Models: OpenRouter integration with 50+ models
-
Fork this repository to your GitHub account
-
Deploy on Streamlit Cloud:
- Go to share.streamlit.io
- Connect your GitHub account
- Select this repository
- Set main file path:
streamlit_app.py - Add your OpenRouter API key in Streamlit secrets
-
Configure Secrets in Streamlit Cloud:
OPENROUTER_API_KEY = "your_openrouter_api_key_here" OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1" LLM_MODEL = "meta-llama/llama-3.2-3b-instruct:free"
-
Clone and Setup
git clone https://github.com/Sane219/RAD-Knowledge-Assistant.git cd RAD-Knowledge-Assistant pip install -r requirements.txt -
Configure Environment
cp .env.example .env # Edit .env and add your OpenRouter API key -
Run Locally
streamlit run streamlit_app.py
For containerized deployment:
docker build -t rag-assistant .
docker run -p 8501:8501 --env-file .env rag-assistantFor development with separate services:
# Terminal 1 - Backend:
python -m uvicorn app.main:app --host 127.0.0.1 --port 8001
# Terminal 2 - Frontend:
streamlit run frontend/app.py- Python: 3.8 or higher
- Dependencies: Automatically installed via pip
- API Key: OpenRouter API key (required)
- Storage: Local file system (no external database needed)
- Frontend: Streamlit for interactive UI
- Backend: FastAPI for document processing and retrieval
- Embeddings: SentenceTransformers for vector generation
- Vector DB: FAISS for local similarity search (no external DB required)
- LLM: OpenRouter API for answer generation (supports multiple models)
- Deployment: Runs locally with Python or containerized with Docker
- Upload documents (PDF, DOCX, TXT) via the web interface
- Ask questions in the chat interface
- Receive grounded answers with source citations
- View retrieved document chunks for transparency
Edit config/settings.py to customize:
- Chunk size and overlap
- Number of retrieved documents
- LLM model selection
- Vector database settings
sting the System
-
Upload Sample Documents
- Use the provided sample documents in
sample_documents/ - Or upload your own PDF, DOCX, or TXT files
- Use the provided sample documents in
-
Ask Questions
- "What is RAG and how does it work?"
- "What are Python best practices for error handling?"
- "How should I organize my Python code?"
-
Verify Grounded Responses
- Check that answers include source citations
- Review retrieved document chunks
- Test with questions not covered in documents
POST /upload- Upload and process documentsPOST /query- Query documents and get AI answersGET /health- Check system statusDELETE /documents- Clear all documents
Edit config/settings.py:
CHUNK_SIZE = 500 # Tokens per chunk
CHUNK_OVERLAP = 50 # Overlap between chunksLLM_MODEL = "gpt-4" # or "gpt-3.5-turbo"MAX_RETRIEVED_DOCS = 5 # Number of chunks to retrieve-
"Backend Offline" in UI
- Ensure FastAPI backend is running on port 8000
- Check console for error messages
-
OpenRouter API Errors
- Verify your API key in
.envfile - Check account credits and billing
- Ensure the model name is correct (e.g.,
openai/gpt-3.5-turbo)
- Verify your API key in
-
Document Upload Fails
- Ensure file format is PDF, DOCX, or TXT
- Check file size (large files may timeout)
-
Poor Answer Quality
- Try uploading more relevant documents
- Adjust chunk size for your document type
- Increase number of retrieved documents
- Use smaller chunk sizes for technical documents
- Increase chunk overlap for better context
- Upload documents with clear structure and headings
- Test with different question phrasings
- Fork this repository
- Deploy on share.streamlit.io
- Main file path:
streamlit_app.py - Add secrets in Streamlit Cloud dashboard
streamlit run streamlit_app.pydocker build -t rag-assistant .
docker run -p 8501:8501 --env-file .env rag-assistant-
Security
- Use environment variables for all secrets
- Implement authentication and authorization
- Add rate limiting and input validation
-
Scalability
- Consider using Pinecone or Weaviate for vector storage
- Implement document caching
- Use load balancing for multiple instances
-
Monitoring
- Add logging and metrics
- Monitor API response times
- Track document processing success rates
nRouter Configuration
- Sign up at OpenRouter.ai
- Get API Key from your dashboard
- Add credits to your account for API usage
You can use any model supported by OpenRouter by changing the LLM_MODEL in your .env file:
# Popular options:
LLM_MODEL=openai/gpt-3.5-turbo # Fast and cost-effective
LLM_MODEL=openai/gpt-4 # Higher quality
LLM_MODEL=anthropic/claude-3-haiku # Anthropic's fast model
LLM_MODEL=anthropic/claude-3-sonnet # Anthropic's balanced model
LLM_MODEL=meta-llama/llama-3-8b-instruct # Open source option
LLM_MODEL=google/gemini-pro # Google's model
LLM_MODEL=mistralai/mistral-7b-instruct # Mistral AI modelOpenRouter provides transparent pricing for all models:
- GPT-3.5-turbo: ~$0.002/1K tokens
- GPT-4: ~$0.03/1K tokens
- Claude-3-Haiku: ~$0.00025/1K tokens
- Llama-3-8B: ~$0.0002/1K tokens
- β Multiple Models: Access to 50+ AI models
- β Transparent Pricing: Clear cost per model
- β No Vendor Lock-in: Switch models easily
- β Reliability: Automatic failover between providers
- β Usage Analytics: Detailed usage tracking