# 🧠 Advanced GraphRAG for Agentic Retrieval
**Graph ML, Algorithms, and Structured Knowledge Flows**

Welcome to the workshop! In this notebook, we’ll:
- Load a pre-built GraphRAG graph in Neo4j Aura
- Create secondary relationships that reveal hidden chunk patterns
- Apply graph algorithms (PageRank, betweenness) to highlight retrieval anchors
- Blend Bayesian link prediction for next-best-chunk prediction
- Rerank retrieval results for smarter LLM grounding
- Show why graph insights are essential for real-world agentic systems

In [ ]:
# Setup Neo4j connection
from neo4j import GraphDatabase

uri = "bolt+s://<AURA_URI>"
driver = GraphDatabase.driver(uri, auth=("neo4j", "<password>"))

def run_cypher(query):
    with driver.session() as session:
        result = session.run(query)
        return [r.data() for r in result]


## 📊 Basic Exploration: Node Types and Counts

In [ ]:
run_cypher("""
MATCH (n)
RETURN labels(n) AS labels, count(*) AS count
ORDER BY count DESC;
""")

## 🔗 Creating Secondary Relationships
These reveal co-occurrence and sequential patterns beyond static retrieval.

In [ ]:
# Create CO_OCCURS_WITH edges
run_cypher("""
MATCH (r:Response)-[:HAS_CONTEXT]->(c1:Chunk),
      (r)-[:HAS_CONTEXT]->(c2:Chunk)
WHERE id(c1) < id(c2)
MERGE (c1)-[co:CO_OCCURS_WITH]->(c2)
ON CREATE SET co.weight = 1
ON MATCH SET co.weight = co.weight + 1
""")

In [ ]:
# Create CROSS_WINDOW_FOLLOWS edges
run_cypher("""
MATCH (r1:Response)-[:HAS_CONTEXT]->(c:Chunk),
      (r1)<-[:NEXT]-(r0:Response)-[:HAS_CONTEXT]->(c)
MERGE (c)-[f:CROSS_WINDOW_FOLLOWS]->(c)
ON CREATE SET f.count = 1
ON MATCH SET f.count = f.count + 1
""")

## 🧠 Graph Algorithms: PageRank & Betweenness Centrality

In [ ]:
# PageRank
run_cypher("""
CALL gds.pageRank.stream({
  nodeProjection: 'Chunk',
  relationshipProjection: {
    CO_OCCURS_WITH: {type: 'CO_OCCURS_WITH', orientation: 'UNDIRECTED'},
    CROSS_WINDOW_FOLLOWS: {type: 'CROSS_WINDOW_FOLLOWS', orientation: 'UNDIRECTED'}
  },
  maxIterations: 20,
  dampingFactor: 0.85
})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).id AS chunkId, score
ORDER BY score DESC LIMIT 10;
""")

In [ ]:
# Betweenness Centrality
run_cypher("""
CALL gds.betweenness.stream({
  nodeProjection: 'Chunk',
  relationshipProjection: {
    CO_OCCURS_WITH: {type: 'CO_OCCURS_WITH', orientation: 'UNDIRECTED'},
    CROSS_WINDOW_FOLLOWS: {type: 'CROSS_WINDOW_FOLLOWS', orientation: 'UNDIRECTED'}
  }
})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).id AS chunkId, score
ORDER BY score DESC LIMIT 10;
""")

## 🔗 Bayesian Ranking: Next Chunk Prediction

In [ ]:
# Compute conditional probabilities (P(B | A))
run_cypher("""
MATCH (c1:Chunk)-[f:CROSS_WINDOW_FOLLOWS]->(c2:Chunk)
WITH c1, c2, f.count AS count
RETURN c1.id AS from_chunk, c2.id AS to_chunk, count
ORDER BY count DESC LIMIT 20;
""")

## 🛠️ Final Engineering Use: Reranking with Bayesian + Vector Similarity

In [ ]:
vector_scores = {'chunk1': 0.92, 'chunk2': 0.87}
bayesian_probs = {'chunk1': 0.5, 'chunk2': 0.8}

alpha = 0.7
final_scores = {chunk: alpha * v + (1 - alpha) * bayesian_probs.get(chunk, 0) 
                 for chunk, v in vector_scores.items()}
sorted(final_scores.items(), key=lambda x: x[1], reverse=True)

✅ Graph insights make retrieval smarter and more robust for agentic systems.
✅ Precompute graph signals for zero-latency serving.
✅ Graph lineage and bridging chunks: the missing link for agents in real-world retrieval.
**You can’t AI without graph!**