# Advanced GraphRAG for Agentic Retrieval
**Graph ML, Algorithms, and Structured Knowledge Flows**

Welcome to the workshop! In this notebook, we’ll:
- Load a pre-built GraphRAG graph in Neo4j Aura
- Create secondary relationships that reveal hidden chunk patterns
- Apply graph algorithms (PageRank, betweenness) to highlight retrieval anchors
- Blend Bayesian link prediction for next-best-chunk prediction
- Rerank retrieval results for smarter LLM grounding
- Show why graph insights are essential for real-world agentic systems

In [None]:
from neo4j import GraphDatabase
import os
import pandas as pd

NEO4J_URI = os.getenv('NEO4J_URI')
NEO4J_USER = os.getenv('NEO4J_USERNAME')
NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD')
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))



def run_query(query, parameters=None):
    with driver.session() as session:
        return list(session.run(query, parameters or {}))
print('Connected to Neo4j!')


Connected to Neo4j!


## Basic Exploration: Node Types and Counts

In [4]:
run_query("""
MATCH (n)
RETURN labels(n) AS labels, count(*) AS count
ORDER BY count DESC;
""")

[<Record labels=['Document'] count=15292>,
 <Record labels=['Source'] count=1214>,
 <Record labels=['Message'] count=669>,
 <Record labels=['Conversation'] count=236>,
 <Record labels=['Message', 'Assistant'] count=205>,
 <Record labels=['Session'] count=167>,
 <Record labels=['_Neodash_Dashboard'] count=2>]

## 🔗 Creating Secondary Relationships
These reveal co-occurrence and sequential patterns beyond static retrieval.

In [25]:
# Create CO_OCCURS_WITH edges
run_query("""
MATCH (c1:Document)-[:HAS_CONTEXT]-(r:Message)-[:HAS_CONTEXT]-(c2:Document)
WHERE elementId(c1) < elementId(c2)
MERGE (c1)-[co:CO_OCCURS_WITH]->(c2)
ON CREATE SET co.weight = 1
ON MATCH SET co.weight = co.weight + 1
""")

[]

## Graph Algorithms: PageRank & Betweenness Centrality

In [24]:
run_query("""
CALL gds.graph.project(
  'docs_co',
  {
    Document: {
      properties: ['embedding']
    }
  },
  {
    CO_OCCURS_WITH: {
      type: 'CO_OCCURS_WITH',
      orientation: 'UNDIRECTED'
    }
  }
)
YIELD graphName, nodeCount, relationshipCount
RETURN graphName, nodeCount, relationshipCount
""")

ClientError: {code: Neo.ClientError.Procedure.ProcedureCallFailed} {message: Failed to invoke procedure `gds.graph.project`: Caused by: java.lang.IllegalArgumentException: A graph with name 'docs_co' already exists.}

In [30]:
# Betweenness Centrality
df_betweenness = pd.DataFrame(run_query("""
CALL gds.betweenness.stream('docs_co')
YIELD nodeId, score
RETURN  score, gds.util.asNode(nodeId).text AS node_text
ORDER BY score DESC LIMIT 10
"""), columns=['score', 'node_text'])  
df_betweenness.head()

Unnamed: 0,score,node_text
0,141771.374239,GDS Degree Centrality algorithm is useful for ...
1,104392.804654,Exploring projected graphs after loading them ...
2,100949.048321,"Perhaps you are a data scientist, or you aspir..."
3,70493.711938,Unlock Enterprise Data: LLMs + Knowledge Graph...
4,60613.134404,Graph Algorithms Path Finding Procedures Schem...


## 🛠️ Final Engineering Use: Reranking with Bayesian + Vector Similarity

In [31]:
vector_scores = {'chunk1': 0.92, 'chunk2': 0.87}
bayesian_probs = {'chunk1': 0.5, 'chunk2': 0.8}

alpha = 0.7
final_scores = {chunk: alpha * v + (1 - alpha) * bayesian_probs.get(chunk, 0) 
                 for chunk, v in vector_scores.items()}
sorted(final_scores.items(), key=lambda x: x[1], reverse=True)

[('chunk2', 0.849), ('chunk1', 0.794)]

✅ Graph insights make retrieval smarter and more robust for agentic systems.
✅ Precompute graph signals for zero-latency serving.
✅ Graph lineage and bridging chunks: the missing link for agents in real-world retrieval.
**You can’t AI without graph!**