A production-ready Retrieval-Augmented Generation (RAG) system built with Python, featuring MongoDB Cosmos DB as the vector store and support for multiple LLM providers including Google Gemini and OpenAI.
- π Dual Engine Support: Seamlessly switch between Google Gemini and OpenAI models
- π Vector Database: MongoDB Cosmos DB for scalable, high-performance vector storage
- βοΈ Flexible Configuration: Environment-based configuration with validation
- π Smart Document Processing: Intelligent text chunking and metadata handling
- π‘οΈ Production Ready: Comprehensive error handling, logging, and connection management
- π¬ Interactive CLI: User-friendly command-line interface for testing and management
- π Hybrid Search: Vector similarity search with text search fallback
- π Scalable Architecture: Modular design for easy extension and maintenance
-
MongoDB Cosmos DB Account
- Create an Azure Cosmos DB account with MongoDB API
- Enable vector search capabilities in your Cosmos DB
- Obtain your connection string from the Azure portal
-
API Keys
- Google Cloud API Key: For Gemini models (get from Google AI Studio)
- OpenAI API Key: For GPT models (get from OpenAI Platform)
-
Development Environment
- Python 3.8 or higher
- pip package manager
- Git (for cloning the repository)
- Clone the repository:
git clone https://github.com/your-username/RAG-python.git
cd RAG-python
- Create and activate a virtual environment:
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
- Create a
.env
file in the project root - Copy the template below and fill in your credentials
- Create a
Create a .env
file in the project root with the following variables:
# Engine Selection: Choose "google" or "openai"
ENGINE=google
# API Keys (get one based on your chosen engine)
GOOGLE_API_KEY=your-google-api-key-here
OPENAI_API_KEY=your-openai-api-key-here
# MongoDB Cosmos DB Configuration
COSMOS_CONNECTION_STRING=mongodb://your-cosmos-connection-string
COSMOS_DATABASE_NAME=rag_database
COSMOS_COLLECTION_NAME=documents
COSMOS_VECTOR_INDEX_NAME=vector_index
# Document Processing Configuration
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
EMBEDDING_DIMENSION=768
TOP_K_RESULTS=5
Variable | Description | Default | Options |
---|---|---|---|
ENGINE |
LLM provider to use | google |
google , openai |
CHUNK_SIZE |
Maximum characters per text chunk | 1000 |
Integer |
CHUNK_OVERLAP |
Overlap between chunks | 200 |
Integer |
EMBEDDING_DIMENSION |
Vector embedding dimensions | 768 |
768 (Gemini), 1536 (OpenAI) |
TOP_K_RESULTS |
Number of similar documents to retrieve | 5 |
Integer |
- Start the interactive CLI:
python main.py
- Follow the prompts to:
- Add documents to your knowledge base
- Ask questions about your documents
- View system status and statistics
Once the application starts, you can use these commands:
- Add text: Input text directly into the system
- Ask questions: Query your knowledge base
- Show stats: View system statistics
- Clear database: Reset your knowledge base
- Exit: Quit the application
For integration into other applications:
from rag_chain import RAGSystem
# Initialize the RAG system
rag = RAGSystem()
# Add a document to the knowledge base
document_text = "Your document content here..."
rag.add_text(
text=document_text,
metadata={"source": "example.pdf", "category": "research"}
)
# Query the system
question = "What is the main topic of the document?"
result = rag.query(question)
print(f"Answer: {result['answer']}")
print(f"Sources: {[doc.metadata for doc in result['source_documents']]}")
# Clean up resources
rag.close()
# Batch document processing
documents = [
{"text": "Document 1 content", "metadata": {"source": "doc1.txt"}},
{"text": "Document 2 content", "metadata": {"source": "doc2.txt"}},
]
for doc in documents:
rag.add_text(doc["text"], doc["metadata"])
# Custom query with parameters
result = rag.query(
question="Your question here",
top_k=10, # Retrieve more documents
include_metadata=True
)
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β main.py β β rag_chain.py β β config.py β
β (CLI Interface) βββββΊβ (RAG System) βββββΊβ (Configuration) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
ββββββββββ΄βββββββββ
β β
ββββββββββΌββββββββββ ββββΌβββββββββββββββββββ
βdocument_processorβ βvector_store_manager β
β .py β β .py β
β (Text Processing)β β (MongoDB Cosmos) β
ββββββββββββββββββββ βββββββββββββββββββββββ
-
config.py
- Configuration Management- Environment variable handling
- Settings validation
- Default value management
-
vector_store_manager.py
- Vector Database Operations- MongoDB Cosmos DB integration
- Vector index management
- Similarity search implementation
- Connection pooling
-
document_processor.py
- Text Processing- Document chunking using LangChain's RecursiveCharacterTextSplitter
- Metadata preservation
- Batch processing support
-
rag_chain.py
- RAG System Orchestration- LLM integration (Google Gemini / OpenAI)
- Retrieval and generation pipeline
- Context management
- Response formatting
-
main.py
- Interactive CLI Interface- User interaction handling
- Command processing
- System status display
1. Document Input β 2. Text Chunking β 3. Embedding Generation β 4. Vector Storage
β
8. Response β 7. Answer Generation β 6. Context Retrieval β 5. Query Processing
Detailed Process:
- Document Ingestion: Text documents are input through CLI or API
- Text Chunking: Documents are split using RecursiveCharacterTextSplitter
- Embedding Generation: Chunks are converted to vectors using selected model
- Vector Storage: Embeddings stored in MongoDB Cosmos DB with metadata
- Query Processing: User queries are embedded using the same model
- Context Retrieval: Vector similarity search retrieves relevant chunks
- Answer Generation: LLM generates responses using retrieved context
- Response Delivery: Formatted answer returned with source information
- Model:
gemini-pro
- Embeddings:
models/embedding-001
- Dimensions: 768
- Context Window: 32,768 tokens
- Model:
gpt-3.5-turbo
/gpt-4
- Embeddings:
text-embedding-ada-002
- Dimensions: 1,536
- Context Window: 4,096 / 8,192 tokens
The project uses the following key dependencies:
langchain==0.1.0 # Core LangChain framework
langchain-openai==0.0.2 # OpenAI integration
langchain-google-genai==0.0.5 # Google Gemini integration
langchain-community==0.0.10 # Community integrations
pymongo==4.6.1 # MongoDB driver
azure-cosmos==4.5.1 # Azure Cosmos DB client
python-dotenv==1.0.0 # Environment variable management
tiktoken==0.5.2 # OpenAI tokenization
numpy==1.24.3 # Numerical operations
pydantic==2.5.0 # Data validation
- Optimal Chunk Size: 1000 characters works well for most content types
- Chunk Overlap: 200 characters ensures context continuity
- Metadata Usage: Include source, timestamp, and category information
- Batch Processing: Add multiple documents in batches for better performance
- Index Optimization: Ensure vector indexes are properly configured in Cosmos DB
- Connection Pooling: System automatically manages database connections
- Environment Variables: Store all sensitive data in
.env
files - API Key Rotation: Regularly rotate API keys
- Access Control: Implement proper database access controls
- Data Validation: All inputs are validated before processing
- RU Consumption: Monitor Cosmos DB Request Units usage
- Response Times: Track query performance
- Error Rates: Monitor API call success rates
Issue: Cannot connect to Cosmos DB
ConnectionError: Unable to connect to MongoDB Cosmos DB
Solutions:
- β Verify connection string format and credentials
- β Check firewall settings and IP whitelisting
- β Ensure vector search is enabled in your Cosmos DB account
- β Verify network connectivity
Issue: API key authentication failed
AuthenticationError: Invalid API key
Solutions:
- β
Verify API keys are correctly set in
.env
file - β Check API key permissions and quotas
- β Ensure no extra spaces or characters in keys
- β Verify the selected engine matches available API keys
Issue: Slow query responses Solutions:
- β Optimize chunk size for your content type
- β
Reduce
TOP_K_RESULTS
if retrieving too many documents - β Monitor Cosmos DB RU consumption
- β Consider upgrading Cosmos DB tier
Issue: Out of memory errors during document processing Solutions:
- β Process documents in smaller batches
- β Reduce chunk size temporarily
- β Increase system memory allocation
- β Implement document streaming for large files
Issue: Module import errors
ModuleNotFoundError: No module named 'langchain'
Solutions:
- β Ensure virtual environment is activated
- β
Run
pip install -r requirements.txt
- β Check Python version compatibility (3.8+)
If you encounter issues not covered here:
- Check the Issues page
- Enable debug logging by setting
LOG_LEVEL=DEBUG
in your.env
- Review system logs for detailed error messages
- Create a new issue with error details and system information
- π« Never commit
.env
files to version control - β Use environment variables in production deployments
- β Implement proper access controls for your databases
- β Regularly rotate API keys and connection strings
- π All data is processed locally or through secure API endpoints
- π Documents are stored with encryption at rest in Cosmos DB
- π API communications use HTTPS/TLS encryption
- β Use role-based access control (RBAC) for Cosmos DB
- β Implement API rate limiting
- β Monitor for unusual access patterns
- β Keep dependencies updated for security patches
We welcome contributions! Please follow these steps:
- Fork and clone the repository
git clone https://github.com/your-username/RAG-python.git
cd RAG-python
- Create a development environment
python -m venv dev-env
source dev-env/bin/activate # On Windows: dev-env\Scripts\activate
pip install -r requirements.txt
- Create a feature branch
git checkout -b feature/your-feature-name
- π Write clear, descriptive commit messages
- π§ͺ Add tests for new functionality
- π Update documentation for new features
- π Ensure code follows PEP 8 style guidelines
- β Test your changes thoroughly
- Update the README.md with details of changes if applicable
- Ensure your code includes appropriate error handling
- Add or update tests as needed
- Submit a pull request with a clear description
- Use type hints where possible
- Follow PEP 8 style guidelines
- Include docstrings for functions and classes
- Use meaningful variable and function names
This project is licensed under the MIT License - see the LICENSE file for details.
- β Commercial use allowed
- β Modification allowed
- β Distribution allowed
- β Private use allowed
- β No warranty provided
- β No liability assumed
- LangChain: For the excellent RAG framework
- MongoDB: For Cosmos DB vector search capabilities
- Google: For Gemini AI model access
- OpenAI: For GPT model access
- Azure: For cloud infrastructure
- π§ Email: [your-email@example.com]
- π¬ Issues: GitHub Issues
- π Documentation: Wiki
- π¬ Discussions: GitHub Discussions
# Setup
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env # Edit with your credentials
# Run
python main.py
# Development
python -m pytest tests/
python -m black .
python -m flake8
ENGINE=google
GOOGLE_API_KEY=your-key-here
COSMOS_CONNECTION_STRING=mongodb://your-connection-string
COSMOS_DATABASE_NAME=rag_database
Ready to get started? Follow the Installation guide and start building your RAG system! π