# Understanding Knowledge Graphs for GraphRAG

In this notebook, we'll explore how to build and structure knowledge graphs for effective GraphRAG applications. We'll cover:

1. Understanding knowledge graph concepts
2. Different approaches to graph modeling
3. Using the Neo4j LLM Knowledge Graph Builder
4. Best practices for GraphRAG data modeling

In [None]:
from neo4j import GraphDatabase
from dotenv import load_dotenv
import os
import openai
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.embedder import OpenAIEmbedder
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline

# Load environment variables and connect
load_dotenv()
driver = GraphDatabase.driver(
    os.getenv('NEO4J_URI'),
    auth=(os.getenv('NEO4J_USERNAME'), os.getenv('NEO4J_PASSWORD'))
)

# Initialize OpenAI components
openai.api_key = os.getenv('OPENAI_API_KEY')
llm = OpenAILLM()
embedder = OpenAIEmbedder()

## 1. Knowledge Graph Concepts

A knowledge graph is a structured representation of information where:
- Nodes represent entities (e.g., products, customers, documents)
- Relationships connect these entities (e.g., PURCHASED, CONTAINS, MENTIONS)
- Properties store attributes about entities and relationships

In GraphRAG, knowledge graphs serve two key purposes:
1. Providing structured context for LLM queries
2. Enabling graph-based retrieval patterns

## 2. Graph Modeling Approaches

There are two main approaches to building knowledge graphs for GraphRAG:

### 2.1 Domain Graph (Structured Data)
- Built from existing structured data
- Clear schema and relationships
- Example: Customer-Product-Order graph

In [None]:
def create_domain_graph_example():
    with driver.session() as session:
        # Clear existing data
        session.run("MATCH (n) DETACH DELETE n")
        
        # Create example domain graph
        session.run("""
        CREATE (c:Customer {name: 'John Doe', id: '123'})
        CREATE (p1:Product {name: 'Laptop', id: 'P1', description: 'High-performance laptop'})
        CREATE (p2:Product {name: 'Mouse', id: 'P2', description: 'Wireless mouse'})
        CREATE (o:Order {id: 'O1', date: date()})
        CREATE (c)-[:PLACED_ORDER]->(o)
        CREATE (o)-[:CONTAINS]->(p1)
        CREATE (o)-[:CONTAINS]->(p2)
        """)

create_domain_graph_example()

### 2.2 Lexical Graph (Unstructured Data)
- Built from documents and text
- Flexible schema
- Example: Document-Entity-Concept graph

In [None]:
# Initialize Knowledge Graph Builder for lexical graph
pipeline = SimpleKGPipeline(
    driver=driver,
    llm=llm,
    embedder=embedder,
    entities=["Entity", "Concept"],
    relations=["RELATED_TO", "MENTIONS"]
)

# Example text to process
sample_text = """
The new laptop features a powerful processor and comes with a wireless mouse.
It's perfect for both gaming and professional work.
"""

# Process text to create lexical graph
pipeline.run(text=sample_text)

## 3. Using Neo4j LLM Knowledge Graph Builder

The Knowledge Graph Builder helps you:
1. Extract entities and relationships from text
2. Create graph structures automatically
3. Maintain context between documents

In [None]:
# Configure Knowledge Graph Builder with custom schema
custom_pipeline = SimpleKGPipeline(
    driver=driver,
    llm=llm,
    embedder=embedder,
    entities=["Product", "Feature", "UseCase"],
    relations=["HAS_FEATURE", "SUITABLE_FOR"]
)

# Process product documentation
product_doc = """
Our laptop features 32GB RAM and an NVIDIA GPU.
It's ideal for video editing and 3D rendering.
"""

custom_pipeline.run(text=product_doc)

## 4. Best Practices for GraphRAG Data Modeling

When building knowledge graphs for GraphRAG:

1. **Start Simple**
   - Begin with core entities and relationships
   - Add complexity iteratively

2. **Maintain Context**
   - Connect related information
   - Preserve document sources

3. **Balance Structure**
   - Combine domain and lexical approaches
   - Use flexible schemas for unstructured data

4. **Think About Queries**
   - Design for intended use cases
   - Consider traversal patterns

In [None]:
def explore_graph():
    with driver.session() as session:
        # View entities and relationships
        result = session.run("""
        MATCH (n)
        OPTIONAL MATCH (n)-[r]->()
        RETURN DISTINCT
            labels(n) as node_types,
            count(DISTINCT n) as node_count,
            collect(DISTINCT type(r)) as relationship_types
        """)
        
        for record in result:
            print(f"Node type: {record['node_types']}")
            print(f"Count: {record['node_count']}")
            print(f"Relationships: {record['relationship_types']}\n")

explore_graph()

## Next Steps

Now that you understand how to build and structure knowledge graphs, move on to the next notebook to learn about document processing and vector search!