🔗 GraphRAG

A Graph-enhanced Retrieval Augmented Generation system for analyzing Tech Business News

📖 Overview

GraphRAG combines the power of knowledge graphs with vector embeddings to provide context-rich, accurate responses about tech business news. Unlike traditional RAG systems that rely solely on semantic similarity, GraphRAG leverages entity relationships to understand the broader context of your queries.

✨ Key Features

Hybrid Retrieval — Combines vector search (Qdrant) with graph traversal (Neo4j) using RRF fusion
Smart Entity Extraction — Automatically extracts companies, people, products, and events from documents
2-Hop Graph Context — Retrieves related entities for comprehensive context
Intelligent Reranking — Uses Cohere reranker with Gemini fallback for optimal results
PDF Processing — Native support for PDF document ingestion

🏗️ Architecture

flowchart TB
    subgraph Input
        PDF[📄 PDF Documents]
    end
    
    subgraph Processing
        Loader[PDF Loader] --> Chunker[Text Splitter]
        Chunker --> Extractor[Entity Extractor]
        Chunker --> Embedder[Gemini Embeddings]
    end
    
    subgraph Storage
        Extractor --> |Entities & Relations| Neo4j[(Neo4j)]
        Embedder --> |Vectors| Qdrant[(Qdrant)]
    end
    
    subgraph Retrieval
        Query[🔍 User Query] --> VecSearch[Vector Search]
        Query --> GraphSearch[Graph Traversal]
        VecSearch --> Qdrant
        GraphSearch --> Neo4j
        Qdrant --> |Semantic Matches| Fusion[RRF Fusion]
        Neo4j --> |2-Hop Context| Fusion
        Fusion --> Rerank[Reranker]
        Rerank --> Response[📝 Response]
    end
    
    PDF --> Loader

📂 Project Structure

GraphRAG/
├── 📄 docker-compose.yml     # Neo4j & Qdrant infrastructure
├── 📄 requirements.txt       # Python dependencies
├── 📄 pyproject.toml         # Project metadata
├── 📄 .env.example           # Environment template
├── 📁 src/
│   ├── config.py             # Settings management
│   ├── 📁 schema/            # Graph schema definitions
│   ├── 📁 extraction/        # PDF loading & entity extraction
│   ├── 📁 storage/           # Neo4j & Qdrant clients
│   └── 📁 retrieval/         # Hybrid retriever
├── 📁 scripts/
│   └── ingest_documents.py   # Document ingestion script
├── 📁 tests/                 # Test suite
└── 📁 data/sample/           # Sample documents

🚀 Quick Start

Prerequisites

Python 3.11+
Docker & Docker Compose
Google Gemini API Key

1. Clone & Setup

git clone <repository-url>
cd GraphRAG

2. Start Infrastructure

docker-compose up -d

This starts:

Service	Port	Purpose
Neo4j Browser	7474	Web UI
Neo4j Bolt	7687	Driver connection
Qdrant HTTP	6333	REST API
Qdrant gRPC	6334	gRPC API

3. Configure Environment

cp .env.example .env

Edit .env with your API keys:

# Required
GOOGLE_API_KEY=your_google_api_key_here

# Optional (for enhanced features)
OPENAI_API_KEY=your_openai_api_key_here
COHERE_API_KEY=your_cohere_api_key_here

4. Install Dependencies

pip install -r requirements.txt

5. Ingest Documents

python scripts/ingest_documents.py data/sample/

6. Query the System

from src.retrieval import HybridRetriever

retriever = HybridRetriever()
results = retriever.retrieve("What companies did Apple acquire?")

for result in results:
    print(result)

📊 Graph Schema

Node Types

Type	Properties
Company	name, ticker, industry, headquarters
Person	name, title, role
Product	name, category, launch_date
Event	name, date, type, location
Article	title, published_date, source

Relationships

Person  ──[LEADS|FOUNDED|WORKS_AT]──▶ Company
Company ──[ACQUIRED|INVESTED_IN|SUED_BY|PARTNERS_WITH|COMPETES_WITH]──▶ Company
Company ──[LAUNCHED]──▶ Product
*       ──[MENTIONED_IN]──▶ Article

⚙️ Configuration

Variable	Default	Description
`GOOGLE_API_KEY`	required	Gemini API key
`OPENAI_API_KEY`	optional	For GPT-4o extraction
`NEO4J_URI`	`bolt://localhost:7687`	Neo4j connection
`NEO4J_USERNAME`	`neo4j`	Neo4j username
`NEO4J_PASSWORD`	`graphrag_password`	Neo4j password
`QDRANT_HOST`	`localhost`	Qdrant host
`QDRANT_PORT`	`6333`	Qdrant port
`COHERE_API_KEY`	optional	For reranking
`GEMINI_MODEL`	`models/gemini-2.0-flash`	LLM model
`CHUNK_SIZE`	`1024`	Text chunk size
`CHUNK_OVERLAP`	`128`	Chunk overlap

🧪 Testing

# Run all tests
pytest

# Run with verbose output
pytest -v

# Skip integration tests
pytest -m "not integration"

📚 Tech Stack

Component	Technology
LLM	Google Gemini 2.0 Flash
Embeddings	Gemini text-embedding-004
Graph Store	Neo4j 5.15
Vector Store	Qdrant 1.7
Framework	LlamaIndex 0.10+
Reranking	Cohere (primary), Gemini (fallback)

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Built with ❤️ using LlamaIndex, Neo4j, and Qdrant

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
data		data
graphrag.egg-info		graphrag.egg-info
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.py		run.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔗 GraphRAG

📖 Overview

✨ Key Features

🏗️ Architecture

📂 Project Structure

🚀 Quick Start

Prerequisites

1. Clone & Setup

2. Start Infrastructure

3. Configure Environment

4. Install Dependencies

5. Ingest Documents

6. Query the System

📊 Graph Schema

Node Types

Relationships

⚙️ Configuration

🧪 Testing

📚 Tech Stack

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔗 GraphRAG

📖 Overview

✨ Key Features

🏗️ Architecture

📂 Project Structure

🚀 Quick Start

Prerequisites

1. Clone & Setup

2. Start Infrastructure

3. Configure Environment

4. Install Dependencies

5. Ingest Documents

6. Query the System

📊 Graph Schema

Node Types

Relationships

⚙️ Configuration

🧪 Testing

📚 Tech Stack

🤝 Contributing

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages