PatternRAG is an advanced Retrieval-Augmented Generation system designed to identify non-obvious connections and patterns across documents. It combines vector search, knowledge graph analysis, and LLM reasoning to discover relationships between concepts that might be missed by traditional RAG systems.
- Multi-perspective Retrieval: Expands queries to look for connections across domains
- Knowledge Graph Integration: Uses entity and relationship extraction to build a knowledge graph
- Pattern Detection: Specialized prompting to identify meaningful patterns and connections
- Hierarchical Chunking: Processes documents at both paragraph and sentence levels
- OpenAI-compatible API: Drop-in replacement for OpenAI's chat completions API
- Python 3.8+
- An LLM API service like Ollama
- At least 8GB RAM (16GB+ recommended)
- 10GB+ storage space for document processing
- Docker installation of OpenWebUI or equivalent - for a front-end for the utility
- Clone the repository:
git clone https://github.com/Robert-Beken/PatternRAG.git
cd pattern-rag- Install dependencies:
pip install -r requirements.txt- Install spaCy model:
python -m spacy download en_core_web_sm- Create directories:
mkdir -p data/db data/metadata data/graph documents- Configure settings:
cp config/default_config.yaml config/config.yaml
# Edit config.yaml as needed-
Add documents:
Place your documents in the
documentsdirectory or specify a custom location in the config file. -
Process documents:
python -m patternrag.ingest --config config/config.yaml- Start the API service:
python -m patternrag.service --config config/config.yaml- Query the system:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "pattern-rag",
"messages": [
{"role": "user", "content": "What connections exist between mathematics and music?"}
]
}'- Installation Guide
- Configuration Options
- API Reference
- Pattern Detection
- Architecture Overview
- Performance Tuning
PatternRAG works by:
-
Document Processing: Documents are loaded, chunked, and embedded into a vector database. Entities and relationships are extracted to build a knowledge graph.
-
Query Analysis: User queries are analyzed for entities and expanded to look for potential connections across domains.
-
Multi-angle Retrieval:
- Vector similarity search with expanded queries
- Knowledge graph traversal to find related entities
- Predefined pattern-based searches
-
Pattern Identification: An LLM analyzes retrieved documents to identify meaningful patterns and connections.
-
Response Generation: The system synthesizes findings into a coherent response that highlights discovered patterns.
PatternRAG offers two search modes:
-
Pattern Mode (default): Full pattern-finding capabilities, query expansion, and connection analysis.
-
Standard Mode: Simple retrieval without extensive pattern finding. Activate by prefixing your query with "standard search".
PatternRAG is highly configurable. Key configuration options include:
- Custom Pattern Templates: Define patterns to guide the system's search
- Embedding Model: Choose the embedding model for vector search
- Chunking Parameters: Adjust document chunking for different document types
- LLM Settings: Configure which model to use for reasoning
See the Configuration Guide for detailed options.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.