GraphRAG combines the power of knowledge graphs with vector embeddings to provide context-rich, accurate responses about tech business news. Unlike traditional RAG systems that rely solely on semantic similarity, GraphRAG leverages entity relationships to understand the broader context of your queries.
- Hybrid Retrieval — Combines vector search (Qdrant) with graph traversal (Neo4j) using RRF fusion
- Smart Entity Extraction — Automatically extracts companies, people, products, and events from documents
- 2-Hop Graph Context — Retrieves related entities for comprehensive context
- Intelligent Reranking — Uses Cohere reranker with Gemini fallback for optimal results
- PDF Processing — Native support for PDF document ingestion
flowchart TB
subgraph Input
PDF[📄 PDF Documents]
end
subgraph Processing
Loader[PDF Loader] --> Chunker[Text Splitter]
Chunker --> Extractor[Entity Extractor]
Chunker --> Embedder[Gemini Embeddings]
end
subgraph Storage
Extractor --> |Entities & Relations| Neo4j[(Neo4j)]
Embedder --> |Vectors| Qdrant[(Qdrant)]
end
subgraph Retrieval
Query[🔍 User Query] --> VecSearch[Vector Search]
Query --> GraphSearch[Graph Traversal]
VecSearch --> Qdrant
GraphSearch --> Neo4j
Qdrant --> |Semantic Matches| Fusion[RRF Fusion]
Neo4j --> |2-Hop Context| Fusion
Fusion --> Rerank[Reranker]
Rerank --> Response[📝 Response]
end
PDF --> Loader
GraphRAG/
├── 📄 docker-compose.yml # Neo4j & Qdrant infrastructure
├── 📄 requirements.txt # Python dependencies
├── 📄 pyproject.toml # Project metadata
├── 📄 .env.example # Environment template
├── 📁 src/
│ ├── config.py # Settings management
│ ├── 📁 schema/ # Graph schema definitions
│ ├── 📁 extraction/ # PDF loading & entity extraction
│ ├── 📁 storage/ # Neo4j & Qdrant clients
│ └── 📁 retrieval/ # Hybrid retriever
├── 📁 scripts/
│ └── ingest_documents.py # Document ingestion script
├── 📁 tests/ # Test suite
└── 📁 data/sample/ # Sample documents
- Python 3.11+
- Docker & Docker Compose
- Google Gemini API Key
git clone <repository-url>
cd GraphRAGdocker-compose up -dThis starts:
| Service | Port | Purpose |
|---|---|---|
| Neo4j Browser | 7474 | Web UI |
| Neo4j Bolt | 7687 | Driver connection |
| Qdrant HTTP | 6333 | REST API |
| Qdrant gRPC | 6334 | gRPC API |
cp .env.example .envEdit .env with your API keys:
# Required
GOOGLE_API_KEY=your_google_api_key_here
# Optional (for enhanced features)
OPENAI_API_KEY=your_openai_api_key_here
COHERE_API_KEY=your_cohere_api_key_herepip install -r requirements.txtpython scripts/ingest_documents.py data/sample/from src.retrieval import HybridRetriever
retriever = HybridRetriever()
results = retriever.retrieve("What companies did Apple acquire?")
for result in results:
print(result)| Type | Properties |
|---|---|
| Company | name, ticker, industry, headquarters |
| Person | name, title, role |
| Product | name, category, launch_date |
| Event | name, date, type, location |
| Article | title, published_date, source |
Person ──[LEADS|FOUNDED|WORKS_AT]──▶ Company
Company ──[ACQUIRED|INVESTED_IN|SUED_BY|PARTNERS_WITH|COMPETES_WITH]──▶ Company
Company ──[LAUNCHED]──▶ Product
* ──[MENTIONED_IN]──▶ Article
| Variable | Default | Description |
|---|---|---|
GOOGLE_API_KEY |
required | Gemini API key |
OPENAI_API_KEY |
optional | For GPT-4o extraction |
NEO4J_URI |
bolt://localhost:7687 |
Neo4j connection |
NEO4J_USERNAME |
neo4j |
Neo4j username |
NEO4J_PASSWORD |
graphrag_password |
Neo4j password |
QDRANT_HOST |
localhost |
Qdrant host |
QDRANT_PORT |
6333 |
Qdrant port |
COHERE_API_KEY |
optional | For reranking |
GEMINI_MODEL |
models/gemini-2.0-flash |
LLM model |
CHUNK_SIZE |
1024 |
Text chunk size |
CHUNK_OVERLAP |
128 |
Chunk overlap |
# Run all tests
pytest
# Run with verbose output
pytest -v
# Skip integration tests
pytest -m "not integration"| Component | Technology |
|---|---|
| LLM | Google Gemini 2.0 Flash |
| Embeddings | Gemini text-embedding-004 |
| Graph Store | Neo4j 5.15 |
| Vector Store | Qdrant 1.7 |
| Framework | LlamaIndex 0.10+ |
| Reranking | Cohere (primary), Gemini (fallback) |
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
Built with ❤️ using LlamaIndex, Neo4j, and Qdrant