# Graph RAG with Neo4j Aura and LangChain

This notebook shows a **end-to-end example** of how to build **Graph RAG** (Graph-based Retrieval-Augmented Generation) system using:

- **Neo4j Aura** as a managed graph database
- **LangChain** for LLM + graph tools
- **Groq** (ChatGroq) as the LLM provider

we use **DevOps / incident management** scenario:

- Services (e.g. `checkout-service`, `payment-service`)
- Incidents with severities (P1, P2, etc.)
- On-call engineers who handled incidents

You will learn:

1. How to connect Colab to **Neo4j Aura**
2. How to create **incident knowledge graph**
3. How to use an LLM to **extract a graph** from unstructured incident text
4. How to ask **natural language questions** and let the LLM generate **Cypher** to query Neo4j

## 1. Install dependencies

This cell installs the Python packages we need:

- `langchain`, `langchain-community`, `langchain-experimental` – core LangChain + extra utilities
- `langchain-groq` – LangChain integration for Groq's LLMs
- `neo4j` – official Neo4j Python driver

We use `!` to run a shell command from the notebook, and `--quiet` to keep the output clean.

In [None]:
!pip install --upgrade --quiet \
  langchain langchain-community langchain-experimental langchain-groq neo4j

## 2. Configure Neo4j Aura connection

In this section, we configure how the notebook connects to Neo4j Aura.

You need three values from your **Neo4j Aura** instance:

- `NEO4J_URI` – the connection URL (starts with `neo4j+s://...`)
- `NEO4J_USERNAME` – usually `neo4j` by default
- `NEO4J_PASSWORD` – the password you chose when creating the database

For safety, you should **not commit real passwords** to GitHub.  
When publishing, replace the real password with a placeholder (as we do here).

In [None]:
import os

# TODO: replace the placeholder values with your real Aura connection details
NEO4J_URI = "neo4j+s://6419bb8d.databases.neo4j.io"
NEO4J_USERNAME = "neo4j"
NEO4J_PASSWORD = "xxxxxxxxxxxxxx"

# Store these in environment variables so that libraries can read them easily
os.environ["NEO4J_URI"] = NEO4J_URI
os.environ["NEO4J_USERNAME"] = NEO4J_USERNAME
os.environ["NEO4J_PASSWORD"] = NEO4J_PASSWORD

## 3. Configure Groq (LLM) API key

We will use **Groq** as the LLM provider through `langchain-groq`.

1. Create an account at Groq and generate an API key.
2. Paste your key into the `GROQ_API_KEY` variable below (or use a Colab secret).

Again, do **not** commit real keys to GitHub. Use environment variables or Colab secrets in real projects.

In [None]:
# TODO: replace the placeholder with your real Groq API key
GROQ_API_KEY = "gsk_xxxxxxxxxxxxxx"

os.environ["GROQ_API_KEY"] = GROQ_API_KEY

## 4. Initialize Neo4j client and LLM

Here we:

1. Create a `Neo4jGraph` object so LangChain can talk to Neo4j.
2. Create a `ChatGroq` LLM instance.

We will use:

- `Neo4jGraph` to run Cypher and inspect the schema.
- `ChatGroq` as the brain that:
  - Helps extract graphs from text.
  - Translates natural language questions into Cypher.

In [None]:
from langchain_community.graphs import Neo4jGraph
from langchain_groq import ChatGroq

# Initialize the Neo4j graph connection
graph = Neo4jGraph(
    url=NEO4J_URI,
    username=NEO4J_USERNAME,
    password=NEO4J_PASSWORD,
)

# Initialize the LLM from Groq
llm = ChatGroq(
    groq_api_key=GROQ_API_KEY,
    model_name="llama-3.3-70b-versatile",  # Updated to a supported model, llama-3.3-70b-versatile, groq/compound,
    temperature=0.0,
)

graph, llm

(<langchain_community.graphs.neo4j_graph.Neo4jGraph at 0x7e26355f2150>,
 ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x7e263547ba10>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x7e262fe1f020>, model_name='llama-3.3-70b-versatile', temperature=1e-08, model_kwargs={}, groq_api_key=SecretStr('**********')))

## 5. Our realistic scenario: DevOps incident management

Imagine we are an SRE / DevOps team.  
We want to answer questions like:

- *"Which services had P1 incidents?"*  
- *"Who handled incident INC-003?"*  
- *"Which engineer has handled the most incidents?"*  

This is a good fit for a **knowledge graph** with entities and relationships like:

- `(:Service {name})`
- `(:Incident {id, severity, started_at, ended_at})`
- `(:Engineer {name})`
- `(:Service)-[:HAS_INCIDENT]->(:Incident)`
- `(:Incident)-[:HANDLED_BY]->(:Engineer)`

## 6. Create a small incident knowledge graph in Neo4j

We now create a tiny dataset directly in Neo4j using a Cypher query.

This query will:

- Create a few **services** (e.g. `checkout-service`, `payment-service`, `user-service`)
- Create some **incidents** (with `id`, `severity`, and timestamps)
- Create **engineers** (e.g. `Alice`, `Bob`, `Carlos`)
- Connect them with relationships:
  - `(:Service)-[:HAS_INCIDENT]->(:Incident)`
  - `(:Incident)-[:HANDLED_BY]->(:Engineer)`

We use `MERGE` instead of `CREATE` to avoid duplicates if you run the cell multiple times.

In [None]:
cleanup_query_1 = """
MATCH (n:Service)-[r1:HAS_INCIDENT]->(i:Incident)
DETACH DELETE n, i
"""

cleanup_query_2 = """
MATCH (e:Engineer)
DETACH DELETE e
"""

data_creation_query = """
// Create services
MERGE (s1:Service {name: "checkout-service"})
MERGE (s2:Service {name: "payment-service"})
MERGE (s3:Service {name: "user-service"})

// Create engineers
MERGE (e1:Engineer {name: "Alice"})
MERGE (e2:Engineer {name: "Bob"})
MERGE (e3:Engineer {name: "Carlos"})

// Create incidents
MERGE (i1:Incident {
  id: "INC-001",
  severity: "P1",
  summary: "Checkout failures for EU customers",
  started_at: datetime("2024-08-01T09:15:00Z"),
  ended_at: datetime("2024-08-01T09:45:00Z")
})
MERGE (i2:Incident {
  id: "INC-002",
  severity: "P2",
  summary: "Intermittent payment timeouts",
  started_at: datetime("2024-08-05T11:00:00Z"),
  ended_at: datetime("2024-08-05T12:30:00Z")
})
MERGE (i3:Incident {
  id: "INC-003",
  severity: "P1",
  summary: "User profile service returning 500 errors",
  started_at: datetime("2024-08-10T15:20:00Z"),
  ended_at: datetime("2024-08-10T16:05:00Z")
})

// Connect services to incidents
MERGE (s1)-[:HAS_INCIDENT]->(i1)
MERGE (s2)-[:HAS_INCIDENT]->(i2)
MERGE (s3)-[:HAS_INCIDENT]->(i3)

// Connect incidents to engineers who handled them
MERGE (i1)-[:HANDLED_BY]->(e1)
MERGE (i2)-[:HANDLED_BY]->(e2)
MERGE (i3)-[:HANDLED_BY]->(e1)
"""

# Run the Cypher queries against Neo4j
graph.query(cleanup_query_1)
graph.query(cleanup_query_2)
graph.query(data_creation_query)

[]

## 7. Inspect the graph schema

Now we ask Neo4j (through LangChain's `Neo4jGraph`) to:

1. Refresh the schema (scan labels, relationships, and properties).
2. Print the schema so we can see what the graph looks like.

This schema will later help the LLM generate better Cypher queries.

In [None]:
graph.refresh_schema()
print(graph.schema)

Node properties:
Service {name: STRING}
Engineer {name: STRING}
Incident {severity: STRING, summary: STRING, started_at: DATE_TIME, id: STRING, ended_at: DATE_TIME}
Relationship properties:

The relationships:
(:Service)-[:HAS_INCIDENT]->(:Incident)
(:Incident)-[:HANDLED_BY]->(:Engineer)


## 8. From unstructured incident text to a graph (Graph RAG idea)

In many real systems, our incident knowledge does not come as perfect rows.  
We usually have **incident reports**, **postmortems**, and **Slack messages**.

Here we:

1. Create a short **incident report** as plain text.
2. Wrap it in a LangChain `Document`.
3. Use `LLMGraphTransformer` to **extract entities and relationships** from the text.

This gives us a **graph view** of an unstructured document, which is the essence of **Graph RAG**.

In [None]:
from langchain_core.documents import Document
from langchain_experimental.graph_transformers import LLMGraphTransformer

incident_report_text = """On August 10th, 2024, the user-service experienced a P1 incident.
Users were receiving HTTP 500 errors when viewing their profile page.
The on-call engineer Carlos investigated but later handed over to Alice,
who rolled back a faulty deployment and restored the service.
The root cause was a misconfigured feature flag affecting user-service reads.
"""

documents = [Document(page_content=incident_report_text)]

# Use the LLM to transform text into graph-structured data
graph_transformer = LLMGraphTransformer(llm=llm)
graph_documents = graph_transformer.convert_to_graph_documents(documents)

graph_documents

[GraphDocument(nodes=[Node(id='P1 Incident', type='Incident', properties={}), Node(id='User-Service', type='Service', properties={}), Node(id='Http 500 Errors', type='Error', properties={}), Node(id='Carlos', type='Person', properties={}), Node(id='Alice', type='Person', properties={}), Node(id='Faulty Deployment', type='Deployment', properties={}), Node(id='Feature Flag', type='Feature', properties={})], relationships=[Relationship(source=Node(id='User-Service', type='Service', properties={}), target=Node(id='P1 Incident', type='Incident', properties={}), type='EXPERIENCED', properties={}), Relationship(source=Node(id='Users', type='Person', properties={}), target=Node(id='Http 500 Errors', type='Error', properties={}), type='RECEIVED', properties={}), Relationship(source=Node(id='Carlos', type='Person', properties={}), target=Node(id='On-Call', type='Role', properties={}), type='INVESTIGATED', properties={}), Relationship(source=Node(id='Alice', type='Person', properties={}), target=

## 9. Inspect extracted nodes and relationships

`graph_documents` contains the LLM-extracted graph representation of the incident report.

We will:

- Look at the **nodes** (entities)
- Look at the **relationships** between those entities

This step helps you visually understand what the LLM "saw" in the text.

In [None]:
first_graph_doc = graph_documents[0]

print("=== Nodes ===")
for node in first_graph_doc.nodes:
    print(node)

print("\n=== Relationships ===")
for rel in first_graph_doc.relationships:
    print(rel)

=== Nodes ===
id='P1 Incident' type='Incident' properties={}
id='User-Service' type='Service' properties={}
id='Http 500 Errors' type='Error' properties={}
id='Carlos' type='Person' properties={}
id='Alice' type='Person' properties={}
id='Faulty Deployment' type='Deployment' properties={}
id='Feature Flag' type='Feature' properties={}

=== Relationships ===
source=Node(id='User-Service', type='Service', properties={}) target=Node(id='P1 Incident', type='Incident', properties={}) type='EXPERIENCED' properties={}
source=Node(id='Users', type='Person', properties={}) target=Node(id='Http 500 Errors', type='Error', properties={}) type='RECEIVED' properties={}
source=Node(id='Carlos', type='Person', properties={}) target=Node(id='On-Call', type='Role', properties={}) type='INVESTIGATED' properties={}
source=Node(id='Alice', type='Person', properties={}) target=Node(id='Faulty Deployment', type='Deployment', properties={}) type='ROLLED_BACK' properties={}
source=Node(id='Feature Flag', type=

> Note: In a real production system, the next step would be to **persist these extracted nodes and relationships into Neo4j**.
>
> For this beginner-friendly demo we focus on:
> - Understanding the extraction step
> - Asking questions against the **structured incident graph** we manually inserted earlier.
>
> Combining both steps (auto-ingesting extracted graphs into Neo4j) is a natural next improvement.

## 10. Build a GraphCypherQAChain (NL → Cypher → Neo4j → Answer)

Now we build a `GraphCypherQAChain`:

- Input: **natural language question** (`"Which services had P1 incidents?"`)
- Internal steps:
  1. LLM looks at the **graph schema**.
  2. LLM generates a **Cypher query**.
  3. The query runs against Neo4j.
  4. LLM turns the raw result rows into a **friendly natural language answer**.

We set `verbose=True` so we can see the generated Cypher for learning purposes.

In [None]:
from langchain_core.prompts import PromptTemplate
from langchain_community.chains.graph_qa.cypher import GraphCypherQAChain

qa_prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=(
        "You are an assistant that turns Neo4j query results into natural language answers.\n\n"
        "READ THIS CAREFULLY:\n"
        "The `context` shown below is the EXACT and COMPLETE output of the Cypher query\n"
        "that already answers the question. You must ONLY base your answer on these rows.\n\n"
        "context:\n{context}\n\n"
        "question:\n{question}\n\n"
        "RULES (follow exactly):\n"
        "1. If `context` is an empty list `[]`, answer exactly:\n"
        "   I don't know based on the data provided.\n"
        "2. If `context` is NOT empty, you MUST answer using the values inside it.\n"
        "3. NEVER say the data is missing, incomplete, or does not mention something.\n"
        "4. NEVER claim you don't know when `context` is NOT empty.\n"
        "5. NEVER reinterpret or doubt the meaning of the context. It is ALWAYS correct.\n"
        "6. Provide a CLEAR and direct answer based solely on the rows.\n\n"
        "Now provide the final answer:"

    ),
)

qa_chain = GraphCypherQAChain.from_llm(
    llm=llm,
    graph=graph,
    verbose=True,
    qa_prompt=qa_prompt,
    allow_dangerous_requests=True,
)

qa_chain

GraphCypherQAChain(verbose=True, graph=<langchain_community.graphs.neo4j_graph.Neo4jGraph object at 0x7e26355f2150>, cypher_generation_chain=LLMChain(verbose=False, prompt=PromptTemplate(input_variables=['question', 'schema'], input_types={}, partial_variables={}, template='Task:Generate Cypher statement to query a graph database.\nInstructions:\nUse only the provided relationship types and properties in the schema.\nDo not use any other relationship types or properties that are not provided.\nSchema:\n{schema}\nNote: Do not include any explanations or apologies in your responses.\nDo not respond to any questions that might ask anything else than for you to construct a Cypher statement.\nDo not include any text except the generated Cypher statement.\n\nThe question is:\n{question}'), llm=ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x7e263547ba10>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x7e262fe1f020>, model_name='llama-3.3-7

## 11. Ask natural language questions about incidents

Now we are ready to ask questions in plain English and let the LLM:

1. Translate them into Cypher
2. Query Neo4j
3. Summarize the results

We will test a few examples.

In [None]:
# 11.1 Which services had P1 incidents?
response_1 = qa_chain.invoke({
    "query": "Which services had P2 incidents?"
})
response_1



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (s:Service)-[:HAS_INCIDENT]->(i:Incident) 
WHERE i.severity = 'P2' 
RETURN DISTINCT s.name[0m
Full Context:
[32;1m[1;3m[{'s.name': 'payment-service'}][0m

[1m> Finished chain.[0m


{'query': 'Which services had P2 incidents?',
 'result': 'The payment-service had P2 incidents.'}

In [None]:
# 11.2 Who handled incident INC-003?
response_2 = qa_chain.invoke({
    "query": "Who handled incident INC-003?"
})
response_2



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (i:Incident {id: 'INC-003'})-[:HANDLED_BY]->(e:Engineer) RETURN e.name[0m
Full Context:
[32;1m[1;3m[{'e.name': 'Alice'}][0m

[1m> Finished chain.[0m


{'query': 'Who handled incident INC-003?',
 'result': 'Alice handled incident INC-003.'}

In [None]:
# 11.3 List all incidents handled by Alice with their severity.
response_3 = qa_chain.invoke({
    "query": "List all incidents handled by Alice with their severity."
})
response_3



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (e:Engineer {name: 'Alice'})-[:HANDLED_BY]->(i:Incident) RETURN i.severity, i.id[0m
Full Context:
[32;1m[1;3m[][0m

[1m> Finished chain.[0m


{'query': 'List all incidents handled by Alice with their severity.',
 'result': "I don't know based on the data provided."}

In [None]:
# 11.4 For each service, list its incidents and the engineers who handled them.
response_4 = qa_chain.invoke({
    "query": "For each service, list its incidents and the engineers who handled them."
})
response_4



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mcypher
MATCH (s:Service)-[:HAS_INCIDENT]->(i:Incident)-[:HANDLED_BY]->(e:Engineer)
RETURN s.name AS Service, collect(i.id) AS Incidents, collect(e.name) AS Engineers
[0m
Full Context:
[32;1m[1;3m[{'Service': 'checkout-service', 'Incidents': ['INC-001'], 'Engineers': ['Alice']}, {'Service': 'payment-service', 'Incidents': ['INC-002'], 'Engineers': ['Bob']}, {'Service': 'user-service', 'Incidents': ['INC-003'], 'Engineers': ['Alice']}][0m

[1m> Finished chain.[0m


{'query': 'For each service, list its incidents and the engineers who handled them.',
 'result': 'The checkout-service had incidents INC-001, which were handled by Alice. \nThe payment-service had incidents INC-002, which were handled by Bob. \nThe user-service had incidents INC-003, which were handled by Alice.'}

## 12. (Optional) Manual Cypher examples

Sometimes you want to write Cypher by hand to debug or explore the data.

Below are some manual queries you can run.  
These do not use the LLM; they are pure Neo4j Cypher.

In [None]:
# All services and their incidents
graph.query("""
MATCH (s:Service)-[:HAS_INCIDENT]->(i:Incident)
RETURN s.name AS service, i.id AS incident_id, i.severity AS severity, i.summary AS summary
ORDER BY service, incident_id
""")

In [None]:
# Count incidents per engineer
graph.query("""
MATCH (i:Incident)-[:HANDLED_BY]->(e:Engineer)
RETURN e.name AS engineer, count(i) AS incident_count
ORDER BY incident_count DESC
""")

## 13. Summary and next steps

In this notebook you:

1. Connected Colab to **Neo4j Aura**.
2. Created a small **DevOps incident knowledge graph**.
3. Used an LLM to **extract a graph** from an unstructured incident report (Graph RAG idea).
4. Built a **GraphCypherQAChain** to:
   - Convert natural language questions into Cypher.
   - Query Neo4j.
   - Generate friendly answers.

**Ideas to extend this demo:**

- Ingest **real incident postmortems** from your organization.
- Automatically **persist** extracted nodes and relationships into Neo4j.
- Combine **vector search (embeddings)** and **graph queries** for hybrid RAG.
- Add more entities: teams, runbooks, on-call rotations, services in multiple regions, etc.