# Week 8: GraphRAG Implementation

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Digital-AI-Finance/agentic-artificial-intelligence/blob/main/L08_GraphRAG_Knowledge/L08_GraphRAG.ipynb)

This notebook demonstrates building a simple GraphRAG system with entity extraction and knowledge graph construction.

In [None]:
# Colab setup
import sys
if 'google.colab' in sys.modules:
    !pip install -q langchain-openai networkx python-dotenv
    from google.colab import userdata
    import os
    os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

In [None]:
import os
import json
from typing import List, Dict, Tuple
import networkx as nx
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv

load_dotenv()
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
print("Environment ready")

## 1. Entity and Relationship Extraction

In [None]:
def extract_entities_and_relations(text: str) -> Dict:
    """Extract entities and relationships from text using LLM."""
    prompt = f"""Extract entities and relationships from this text.

Text: {text}

Return JSON with:
{{
  "entities": [{{"name": "...", "type": "PERSON|ORG|PRODUCT|CONCEPT"}}],
  "relationships": [{{"source": "...", "target": "...", "relation": "..."}}]
}}"""
    
    response = llm.invoke(prompt).content
    # Extract JSON from response
    try:
        start = response.find('{')
        end = response.rfind('}') + 1
        return json.loads(response[start:end])
    except:
        return {"entities": [], "relationships": []}

# Test
sample_text = """OpenAI released GPT-4 in March 2023. Sam Altman is the CEO of OpenAI. 
GPT-4 is a large language model that powers ChatGPT. Microsoft invested in OpenAI."""

result = extract_entities_and_relations(sample_text)
print("Entities:", result.get("entities", []))
print("Relations:", result.get("relationships", []))

## 2. Build Knowledge Graph

In [None]:
def build_knowledge_graph(extraction_result: Dict) -> nx.DiGraph:
    """Build a NetworkX graph from extracted entities and relations."""
    G = nx.DiGraph()
    
    # Add entities as nodes
    for entity in extraction_result.get("entities", []):
        G.add_node(entity["name"], type=entity.get("type", "UNKNOWN"))
    
    # Add relationships as edges
    for rel in extraction_result.get("relationships", []):
        G.add_edge(rel["source"], rel["target"], relation=rel["relation"])
    
    return G

G = build_knowledge_graph(result)
print(f"Graph: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges")
print("Nodes:", list(G.nodes(data=True))[:5])
print("Edges:", list(G.edges(data=True))[:5])

## 3. Graph-Based Retrieval

In [None]:
def query_knowledge_graph(G: nx.DiGraph, query_entity: str, hops: int = 2) -> str:
    """Retrieve context from knowledge graph around a query entity."""
    if query_entity not in G:
        return f"Entity '{query_entity}' not found in graph."
    
    # Get subgraph within N hops
    neighbors = set([query_entity])
    for _ in range(hops):
        new_neighbors = set()
        for node in neighbors:
            new_neighbors.update(G.predecessors(node))
            new_neighbors.update(G.successors(node))
        neighbors.update(new_neighbors)
    
    # Build context from subgraph
    context_parts = []
    for source, target, data in G.edges(data=True):
        if source in neighbors and target in neighbors:
            context_parts.append(f"{source} {data.get('relation', 'related to')} {target}")
    
    return ". ".join(context_parts)

# Test query
context = query_knowledge_graph(G, "GPT-4")
print(f"Context for 'GPT-4': {context}")

In [None]:
def graphrag_answer(G: nx.DiGraph, query: str) -> str:
    """Answer query using GraphRAG approach."""
    # Extract key entity from query
    entity_prompt = f"What is the main entity in this question? Return just the entity name.\nQuestion: {query}"
    main_entity = llm.invoke(entity_prompt).content.strip()
    
    # Get graph context
    context = query_knowledge_graph(G, main_entity)
    
    # Generate answer
    answer_prompt = f"""Based on this knowledge graph context, answer the question.
    
Context: {context}
Question: {query}

If the context doesn't contain the answer, say so."""
    
    return llm.invoke(answer_prompt).content

# Test
answer = graphrag_answer(G, "Who created GPT-4?")
print(f"Answer: {answer}")

## Summary

This notebook demonstrated:
1. **Entity Extraction**: LLM-based extraction of entities and relationships
2. **Graph Construction**: Building NetworkX knowledge graph
3. **Graph Retrieval**: Multi-hop traversal for context
4. **GraphRAG Query**: Combining graph context with LLM generation