GraphRAG vs Naive RAG: CV Knowledge Graph Comparison

A comprehensive demonstration of GraphRAG vs Naive RAG using realistic PDF CVs and LLM-powered knowledge graph extraction. This project showcases how knowledge graphs enable structured queries that are impossible with traditional vector-based RAG systems.

🚀 Quick Start

Prerequisites

Python 3.11+ with uv package manager
Docker Desktop (for Neo4j database)
OpenAI API Key (set in .env file)

One-Command Demo

# Complete end-to-end comparison
uv run python 5_compare_systems.py

Step-by-Step Workflow

# 1. Initial setup and validation
uv run python 0_setup.py

# 2. Start Neo4j database
./start_session.sh

# 3. Generate 30 realistic CV PDFs
uv run python 1_generate_data.py

# 4. Extract knowledge graph from CVs using LLMGraphTransformer
uv run python 2_data_to_knowledge_graph.py

# 5. Run complete comparison
uv run python 5_compare_systems.py

Dodanie danych do bazy

1. Skopiuj plik dump do kontenera Neo4j

docker cp mydump.dump neo4j-graphrag:/var/lib/neo4j/import/

2. Zatrzymaj kontener Neo4j

docker stop neo4j-graphrag

3. Wgraj dump do bazy

docker run --rm
-v 06_graphrag_neo4j_data:/data
-v 06_graphrag_neo4j_import:/var/lib/neo4j/import
neo4j:latest
neo4j-admin load --from=/var/lib/neo4j/import/mydump.dump --database=neo4j --force

4. Uruchom kontener ponownie

docker start neo4j-graphrag

🎯 Problem Addressed

Traditional RAG systems struggle with structured queries requiring:

Query Type	Example	Traditional RAG Issue
Counting	"How many Python developers?"	❌ Estimates from text chunks
Filtering	"Find people with Docker AND Kubernetes"	❌ Limited to semantic similarity
Aggregation	"Average years of experience?"	❌ Cannot calculate across entities
Sorting	"Top 3 most experienced developers"	❌ No structured ranking
Multi-hop	"People who attended same university"	❌ Cannot traverse relationships

🏗️ Architecture

Knowledge Graph Schema

Auto-extracted from PDF CVs using LLMGraphTransformer:

Nodes:
├── Person (id, name, location, bio)
├── Skill (id, category)
├── Company (id, industry, location)
├── University (id, location, type)
└── Certification (id, provider, field)

Relationships:
├── (Person)-[HAS_SKILL]->(Skill)
├── (Person)-[WORKED_AT]->(Company)
├── (Person)-[STUDIED_AT]->(University)
├── (Person)-[EARNED]->(Certification)
└── (Person)-[MENTIONS]->(Person)

System Components

PDF Processing: Realistic CV generation with reportlab
Knowledge Extraction: LangChain LLMGraphTransformer
Graph Database: Neo4j with Docker
GraphRAG: LangChain GraphCypherQAChain with custom prompts
Naive RAG: ChromaDB vector search baseline
Evaluation: GPT-5 ground truth generation

📊 Example Results

Query: "How many people have Python programming skills?"

GraphRAG (✅ Accurate):

MATCH (p:Person)-[:HAS_SKILL]->(s:Skill)
WHERE toLower(s.id) = toLower("Python")
RETURN count(p) AS pythonProgrammers

Result: 7 people (exact count)

Naive RAG (❌ Incomplete): Result: "Based on context, only Amanda Smith is mentioned" (missed 6 people)

Query: "List people with both React and Node.js skills"

GraphRAG (✅ Complete): Result: 4 people - Christine Rodriguez, Joseph Fuller, Krystal Castillo, William Bonilla

Naive RAG (❌ Limited): Result: 1 person - Christine Rodriguez (missed 3 people)

📁 Project Structure

06_GraphRAG/
├── 0_setup.py                 # Environment validation
├── 1_generate_data.py          # Synthetic PDF CV generation
├── 2_data_to_knowledge_graph.py  # LLM graph extraction
├── 3_query_knowledge_graph.py  # GraphRAG implementation
├── 4_naive_rag_cv.py          # Naive RAG baseline
├── 5_compare_systems.py       # System comparison
├── docker-compose.yml         # Neo4j setup
├── start_session.sh           # Neo4j management
├── utils/                     # Utility files
│   ├── generate_ground_truth.py  # GPT-5 ground truth
│   ├── test_questions.json    # Evaluation questions
│   └── config.toml           # Configuration
├── data/programmers/          # Generated CV PDFs
└── results/                   # Comparison results
    ├── ground_truth_answers.json
    └── comparison_report.md

🔧 Technical Stack

Language: Python 3.11+
Package Manager: uv
LLM: OpenAI GPT-4o (queries), GPT-5 (ground truth)
Graph Database: Neo4j 5.x with Docker
Vector Store: ChromaDB (baseline comparison)
Frameworks: LangChain, LangChain Experimental
Document Processing: Unstructured, ReportLab

🎓 Key Learnings

GraphRAG excels at structured queries requiring precise relationships
LLMGraphTransformer enables real-world PDF-to-knowledge-graph workflows
Custom Cypher prompts solve case sensitivity and result interpretation issues
GPT-5 ground truth provides unbiased evaluation
Hybrid approaches can combine both strengths for optimal results

🔍 Advanced Usage

Browse Knowledge Graph

Neo4j Browser: http://localhost:7474 (neo4j/password123)

Individual Components

# Test GraphRAG only
uv run python 3_query_knowledge_graph.py

# Test Naive RAG only
uv run python 4_naive_rag_cv.py

# Generate ground truth only
uv run python utils/generate_ground_truth.py

🤝 Real-World Applications

This approach applies to any domain with:

Structured relationships between entities
Precise counting/filtering requirements
Multi-hop reasoning needs
Complex business queries

Examples: Staffing, inventory management, medical records, financial risk analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
dump		dump
lib		lib
results		results
utils		utils
.env.example		.env.example
.gitignore		.gitignore
0_setup.py		0_setup.py
1_append.py		1_append.py
1_generate_data.py		1_generate_data.py
1_generate_rest.py		1_generate_rest.py
1_regenerate_rfp_pdf.py		1_regenerate_rfp_pdf.py
2_data_to_knowledge_graph.py		2_data_to_knowledge_graph.py
3_query_knowledge_graph.py		3_query_knowledge_graph.py
4_naive_rag_cv.py		4_naive_rag_cv.py
5_compare_systems.py		5_compare_systems.py
6_app.py		6_app.py
README.md		README.md
debug_detection.py		debug_detection.py
docker-compose.yml		docker-compose.yml
end_session.sh		end_session.sh
pyproject.toml		pyproject.toml
start_session.sh		start_session.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraphRAG vs Naive RAG: CV Knowledge Graph Comparison

🚀 Quick Start

Prerequisites

One-Command Demo

Step-by-Step Workflow

Dodanie danych do bazy

1. Skopiuj plik dump do kontenera Neo4j

2. Zatrzymaj kontener Neo4j

3. Wgraj dump do bazy

4. Uruchom kontener ponownie

🎯 Problem Addressed

🏗️ Architecture

Knowledge Graph Schema

System Components

📊 Example Results

Query: "How many people have Python programming skills?"

Query: "List people with both React and Node.js skills"

📁 Project Structure

🔧 Technical Stack

🎓 Key Learnings

🔍 Advanced Usage

Browse Knowledge Graph

Individual Components

🤝 Real-World Applications

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GraphRAG vs Naive RAG: CV Knowledge Graph Comparison

🚀 Quick Start

Prerequisites

One-Command Demo

Step-by-Step Workflow

Dodanie danych do bazy

1. Skopiuj plik dump do kontenera Neo4j

2. Zatrzymaj kontener Neo4j

3. Wgraj dump do bazy

4. Uruchom kontener ponownie

🎯 Problem Addressed

🏗️ Architecture

Knowledge Graph Schema

System Components

📊 Example Results

Query: "How many people have Python programming skills?"

Query: "List people with both React and Node.js skills"

📁 Project Structure

🔧 Technical Stack

🎓 Key Learnings

🔍 Advanced Usage

Browse Knowledge Graph

Individual Components

🤝 Real-World Applications

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages