# Advanced Graph RAG with Real Data
This notebook demonstrates how to combine a knowledge graph with a vector database so that an LLM can reason over both structured relations and semantic text.

## What is Graph RAG?
*Retrieval-Augmented Generation* (RAG) typically retrieves semantically similar documents from a vector store. **Graph RAG** adds a symbolic layer: we first traverse a knowledge graph to find entities and relationships, then use semantic search for additional context.

Real-world applications include academic assistants, enterprise knowledge, legal search and biological research. The diagram below shows the full flow:

```mermaid
graph TD
  Q[User question] --> G[Graph traversal]
  G --> V[Vector search]
  V --> L[LLM]
  L --> A[Answer]
```

In [None]:
# Install required libraries
# !pip install networkx chromadb sentence-transformers openai matplotlib

In [None]:
import networkx as nx
import chromadb
from sentence_transformers import SentenceTransformer
from openai import OpenAI
import matplotlib.pyplot as plt

In [None]:
# Create a small real graph. We'll use the karate club network
G = nx.karate_club_graph()
# Add simple textual descriptions. In a real project you would pull abstracts or Wikipedia descriptions.
for node in G.nodes():
    G.nodes[node]['description'] = f'Member {node} of the karate club.'

In [None]:
# Build a vector store of node descriptions
db = chromadb.PersistentClient(path='graph_db')
collection = db.get_or_create_collection('karate')
model = SentenceTransformer('all-MiniLM-L6-v2')
for node, data in G.nodes(data=True):
    text = data['description']
    emb = model.encode(text)
    collection.add(documents=[text], embeddings=[emb], metadatas=[{'node': node}])

In [None]:
def graph_neighbors(start, depth=1):
    frontier = {start}
    visited = set()
    for _ in range(depth):
        next_frontier = set()
        for node in frontier:
            for nbr in G.neighbors(node):
                if nbr not in visited:
                    next_frontier.add(nbr)
        visited.update(frontier)
        frontier = next_frontier
    return list(visited | frontier)

In [None]:
def graph_rag(question, start_node, depth=1, top_k=3):
    nodes = graph_neighbors(start_node, depth)
    texts = [G.nodes[n]['description'] for n in nodes]
    query_emb = model.encode(question)
    results = collection.query(query_embeddings=[query_emb], n_results=top_k)
    vect_texts = [r for r in results['documents'][0]]
    vect_meta = results['metadatas'][0]

    prompt = f'Graph result: {texts}
Vector search result: {vect_texts}
Question: {question}'
    client = OpenAI()
    completion = client.chat.completions.create(model='gpt-4o', messages=[{'role':'user','content':prompt}])
    return completion.choices[0].message.content, nodes, results

### Example query

In [None]:
answer, nodes, results = graph_rag('Who is connected to node 0?', start_node=0, depth=2)
print(answer)

In [None]:
# Visualize traversal
pos = nx.spring_layout(G, seed=42)
nx.draw(G, pos, node_color='lightgray', with_labels=True)
nx.draw_networkx_nodes(G, pos, nodelist=nodes, node_color='red')
plt.show()