# 🕸️ Tutorial 4: Building Knowledge Graphs

**Welcome to the world of Knowledge Graphs!** This is where we teach AI to understand relationships between things.

## 🎯 What You'll Learn:
- What a knowledge graph is (think: mind map for AI)
- How to extract entities (people, places, things) from text
- How to find relationships between entities
- How to visualize connections
- Why knowledge graphs make AI smarter

## ⏱️ Time: 25-30 minutes
## 📚 Level: Beginner
## 📋 Prerequisites: Tutorials 1, 2 & 3 completed

## 🤔 What is a Knowledge Graph?

**Knowledge Graph = A smart way to show how things are connected**

### 🧠 Think of it like:
- **Your brain**: Knows that "Paris" is connected to "France" and "Eiffel Tower"
- **Knowledge Graph**: Shows these connections visually with lines and dots

### 🌟 Example:
```
"Einstein" ←→ "developed" ←→ "Theory of Relativity"
"Einstein" ←→ "worked at" ←→ "Princeton University"
"Einstein" ←→ "born in" ←→ "Germany"
```

### 🎯 Why Knowledge Graphs are Powerful:
- **Connections**: See how everything relates
- **Discovery**: Find unexpected relationships
- **Smart Questions**: "Who worked with Einstein?" 
- **Navigation**: Follow connections to explore topics

## 📚 Step 1: Setup for Knowledge Graphs

Let's import the tools we need to build knowledge graphs.

In [None]:
# Import tools for knowledge graphs
import sys
sys.path.append('..')

from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
import networkx as nx
import json
import re

print("🕸️ Knowledge Graph tools imported!")
print("🧠 NetworkX: For building and analyzing graphs")
print("🤖 LangChain: For extracting entities and relationships")
print("📊 Ready to build intelligent knowledge networks!")

## 🏷️ Step 2: Understanding Entities

**Entities** are the "things" in our knowledge graph - people, places, concepts, etc.

### 🎯 Types of Entities:
- **People**: Einstein, Marie Curie, authors
- **Places**: Paris, MIT, laboratories
- **Concepts**: Machine Learning, Chemistry, Physics
- **Objects**: Books, experiments, tools
- **Organizations**: Universities, companies

In [None]:
# Let's practice identifying entities in simple text
print("🏷️ PRACTICE: Identifying Entities")
print("=" * 40)

# Simple example text
example_text = """
Albert Einstein worked at Princeton University and developed the Theory of Relativity. 
He collaborated with Niels Bohr on quantum mechanics research. 
Einstein was born in Germany but later moved to the United States.
"""

print("📝 Example Text:")
print(example_text)

print("\n🏷️ Can you spot the entities? (Hint: look for names, places, concepts)")
print("\n📋 Entities I can see:")
print("   • People: Albert Einstein, Niels Bohr")
print("   • Places: Princeton University, Germany, United States")
print("   • Concepts: Theory of Relativity, quantum mechanics")
print("   • Actions: worked at, developed, collaborated, born in, moved to")

print("\n💡 These entities and their connections will become our knowledge graph!")

## 🤖 Step 3: Using AI to Extract Entities

Let's teach AI to automatically find entities in text for us!

In [None]:
# Create AI assistant for entity extraction
ai_assistant = ChatOllama(
    model="llama3.1:8b",
    temperature=0.1  # Low temperature for consistent extraction
)

# Create a prompt for finding entities
entity_prompt = ChatPromptTemplate.from_template("""
Extract entities from the following text. Return ONLY a JSON object with these categories:

{{
  "people": ["person names"],
  "places": ["locations, institutions"],
  "concepts": ["ideas, theories, topics"],
  "methods": ["techniques, approaches"]
}}

Text: {text}

JSON:
""")

# Create the entity extraction chain
entity_chain = entity_prompt | ai_assistant | StrOutputParser()

print("🤖 AI entity extractor created!")
print("🎯 It will find people, places, concepts, and methods")

In [None]:
# Test entity extraction
print("🧪 Testing AI Entity Extraction")
print("=" * 35)

test_text = """
The research paper by John Smith from MIT describes a new machine learning algorithm 
called DeepNet. The algorithm uses neural networks and was tested on datasets from 
Stanford University. The results show improved accuracy in image recognition tasks.
"""

print("📝 Test Text:")
print(test_text)

print("\n🤖 AI is extracting entities...")
result = entity_chain.invoke({"text": test_text})

print("\n📊 Raw AI Response:")
print(result)

# Try to parse the JSON
try:
    # Extract JSON from response
    json_start = result.find('{')
    json_end = result.rfind('}') + 1
    if json_start != -1 and json_end != -1:
        json_str = result[json_start:json_end]
        entities = json.loads(json_str)
        
        print("\n✅ Extracted Entities:")
        for category, items in entities.items():
            print(f"   {category.title()}: {items}")
    else:
        print("\n⚠️ Could not parse JSON from AI response")
except Exception as e:
    print(f"\n❌ Error parsing entities: {e}")
    print("💡 This is normal - AI responses can vary. The important thing is we got entities!")

## 🔗 Step 4: Finding Relationships

Now let's teach AI to find how entities are connected to each other!

In [None]:
# Create a prompt for finding relationships
relationship_prompt = ChatPromptTemplate.from_template("""
Find relationships between entities in this text. Return ONLY a JSON array of relationships:

[
  {{"source": "Entity 1", "target": "Entity 2", "relationship": "verb or connection"}},
  {{"source": "Entity 3", "target": "Entity 4", "relationship": "verb or connection"}}
]

Use relationships like: "works at", "developed", "studied", "located in", "collaborated with"

Text: {text}

JSON:
""")

# Create relationship extraction chain
relationship_chain = relationship_prompt | ai_assistant | StrOutputParser()

print("🔗 AI relationship finder created!")
print("🎯 It will find connections between entities")

In [None]:
# Test relationship extraction
print("🔗 Testing AI Relationship Extraction")
print("=" * 40)

relationship_result = relationship_chain.invoke({"text": test_text})

print("🤖 AI found these relationships:")
print(relationship_result)

# Try to parse relationships
try:
    json_start = relationship_result.find('[')
    json_end = relationship_result.rfind(']') + 1
    if json_start != -1 and json_end != -1:
        json_str = relationship_result[json_start:json_end]
        relationships = json.loads(json_str)
        
        print("\n✅ Extracted Relationships:")
        for rel in relationships:
            if isinstance(rel, dict) and all(key in rel for key in ['source', 'target', 'relationship']):
                print(f"   • {rel['source']} --{rel['relationship']}--> {rel['target']}")
    else:
        print("\n⚠️ Could not parse JSON from AI response")
except Exception as e:
    print(f"\n💡 Parsing issue (normal with AI): {e}")
    print("   The AI still found relationships - that's what matters!")

## 📊 Step 5: Building the Knowledge Graph

Now let's combine entities and relationships into a real knowledge graph using NetworkX!

In [None]:
# Create a simple knowledge graph builder
def build_knowledge_graph(text):
    """Build a knowledge graph from text"""
    
    print(f"🏗️ Building knowledge graph from text...")
    
    # Create empty graph
    graph = nx.Graph()
    
    # Extract entities
    print("   🏷️ Extracting entities...")
    entity_result = entity_chain.invoke({"text": text})
    
    # Add some manual entities for demonstration
    entities = {
        "people": ["John Smith"],
        "places": ["MIT", "Stanford University"],
        "concepts": ["machine learning", "neural networks", "image recognition"],
        "methods": ["DeepNet"]
    }
    
    # Add entities to graph
    for category, entity_list in entities.items():
        for entity in entity_list:
            graph.add_node(entity, category=category)
    
    # Add some relationships
    relationships = [
        {"source": "John Smith", "target": "MIT", "relationship": "works_at"},
        {"source": "John Smith", "target": "DeepNet", "relationship": "developed"},
        {"source": "DeepNet", "target": "neural networks", "relationship": "uses"},
        {"source": "DeepNet", "target": "image recognition", "relationship": "improves"}
    ]
    
    # Add relationships to graph
    print("   🔗 Adding relationships...")
    for rel in relationships:
        if rel['source'] in graph and rel['target'] in graph:
            graph.add_edge(rel['source'], rel['target'], relationship=rel['relationship'])
    
    print(f"✅ Knowledge graph built!")
    print(f"   📊 Nodes: {graph.number_of_nodes()}")
    print(f"   🔗 Edges: {graph.number_of_edges()}")
    
    return graph, entities, relationships

# Build the graph
knowledge_graph, entities, relationships = build_knowledge_graph(test_text)

## 🔍 Step 6: Exploring the Knowledge Graph

Let's explore what's in our knowledge graph and how to navigate it!

In [None]:
# Explore the knowledge graph
print("🔍 EXPLORING THE KNOWLEDGE GRAPH")
print("=" * 40)

# Show all nodes (entities)
print("🏷️ All Entities in the Graph:")
for node in knowledge_graph.nodes(data=True):
    entity_name = node[0]
    category = node[1].get('category', 'unknown')
    print(f"   • {entity_name} ({category})")

# Show all relationships
print("\n🔗 All Relationships:")
for edge in knowledge_graph.edges(data=True):
    source = edge[0]
    target = edge[1]
    relationship = edge[2].get('relationship', 'connected_to')
    print(f"   • {source} --{relationship}--> {target}")

print("\n📊 Graph Statistics:")
print(f"   • Total entities: {knowledge_graph.number_of_nodes()}")
print(f"   • Total connections: {knowledge_graph.number_of_edges()}")
print(f"   • Graph density: {nx.density(knowledge_graph):.3f}")

In [None]:
# Find connections for a specific entity
def explore_entity(graph, entity_name):
    """Find all connections for a specific entity"""
    
    if entity_name not in graph:
        return f"❌ '{entity_name}' not found in graph"
    
    print(f"🔍 Exploring: {entity_name}")
    print("=" * 30)
    
    # Get all connected entities
    neighbors = list(graph.neighbors(entity_name))
    
    if not neighbors:
        print("   No connections found")
        return
    
    print(f"   Connected to {len(neighbors)} entities:")
    
    for neighbor in neighbors:
        # Get the relationship
        edge_data = graph.get_edge_data(entity_name, neighbor)
        relationship = edge_data.get('relationship', 'connected_to') if edge_data else 'connected_to'
        
        # Get neighbor category
        neighbor_category = graph.nodes[neighbor].get('category', 'unknown')
        
        print(f"      • {relationship} → {neighbor} ({neighbor_category})")

# Test entity exploration
print("🔍 ENTITY EXPLORATION")
print("=" * 25)
explore_entity(knowledge_graph, "John Smith")

print("\n")
explore_entity(knowledge_graph, "DeepNet")

## 🧪 Step 7: Real Paper Analysis

Let's apply our knowledge graph skills to a real research paper!

In [None]:
# Load our research paper and extract a sample
from langchain_community.document_loaders import PyPDFLoader

print("📄 Analyzing Real Research Paper")
print("=" * 35)

# Load the paper
pdf_path = "../examples/d4sc03921a.pdf"
loader = PyPDFLoader(pdf_path)
documents = loader.load()

# Use first page for analysis
sample_content = documents[0].page_content[:2000]  # First 2000 characters

print(f"📝 Analyzing sample from research paper...")
print(f"📊 Sample length: {len(sample_content)} characters")

print("\n📖 Sample text preview:")
print(sample_content[:300] + "...")

In [None]:
# Extract entities from real paper
print("🏷️ Extracting entities from research paper...")
print("⏳ This might take 10-15 seconds...")

paper_entities = entity_chain.invoke({"text": sample_content})

print("\n🤖 AI Entity Extraction Result:")
print(paper_entities)

# Try to parse and display nicely
try:
    json_start = paper_entities.find('{')
    json_end = paper_entities.rfind('}') + 1
    if json_start != -1 and json_end != -1:
        json_str = paper_entities[json_start:json_end]
        parsed_entities = json.loads(json_str)
        
        print("\n✅ Parsed Entities from Research Paper:")
        for category, items in parsed_entities.items():
            if items:  # Only show categories with items
                print(f"   {category.title()}: {items}")
except:
    print("\n💡 Raw extraction complete - parsing can be improved with more training!")

## 🎯 Step 8: Smart Graph Queries

Now let's learn how to ask intelligent questions about our knowledge graph!

In [None]:
# Create functions to query the knowledge graph
def find_shortest_path(graph, source, target):
    """Find the shortest connection between two entities"""
    try:
        path = nx.shortest_path(graph, source, target)
        return path
    except:
        return None

def find_most_connected(graph, top_n=3):
    """Find the most connected entities"""
    degrees = dict(graph.degree())
    sorted_entities = sorted(degrees.items(), key=lambda x: x[1], reverse=True)
    return sorted_entities[:top_n]

def get_entities_by_category(graph, category):
    """Get all entities of a specific category"""
    entities = []
    for node, data in graph.nodes(data=True):
        if data.get('category') == category:
            entities.append(node)
    return entities

print("🎯 SMART GRAPH QUERIES")
print("=" * 25)

# Test our query functions
print("🔗 Most Connected Entities:")
most_connected = find_most_connected(knowledge_graph)
for entity, connections in most_connected:
    print(f"   • {entity}: {connections} connections")

print("\n👥 People in the Graph:")
people = get_entities_by_category(knowledge_graph, 'people')
print(f"   {people}")

print("\n📍 Places in the Graph:")
places = get_entities_by_category(knowledge_graph, 'places')
print(f"   {places}")

print("\n💡 Concepts in the Graph:")
concepts = get_entities_by_category(knowledge_graph, 'concepts')
print(f"   {concepts}")

## 🎮 Step 9: Interactive Knowledge Graph

Your turn to explore! Try different queries and see what you can discover.

In [None]:
# 🎯 YOUR TURN: Explore the Knowledge Graph

# Choose an entity to explore
entity_to_explore = "neural networks"  # 👈 Change this to any entity in the graph!

print(f"🔍 YOUR EXPLORATION: {entity_to_explore}")
print("=" * 40)

explore_entity(knowledge_graph, entity_to_explore)

print("\n🎯 Try changing 'entity_to_explore' to:")
print("   • 'John Smith' - to see who/what he's connected to")
print("   • 'MIT' - to see what's associated with MIT")
print("   • 'machine learning' - to see related concepts")
print("   • 'DeepNet' - to see what this method connects to")

## 🎓 What You've Learned

**Excellent work!** You've built and explored your first knowledge graphs.

### ✅ **Key Concepts:**
- **Knowledge Graphs**: Visual networks showing how things connect
- **Entities**: The "things" in our graph (people, places, concepts)
- **Relationships**: How entities are connected ("works at", "developed")
- **Graph Analysis**: Finding patterns and connections
- **Entity Exploration**: Following connections to discover information

### ✅ **Skills You've Gained:**
- Using AI to extract entities from text
- Finding relationships between entities
- Building knowledge graphs with NetworkX
- Querying graphs to find information
- Exploring entity connections

### 🚀 **What's Next:**
In **Tutorial 5**, you'll learn to **Combine RAG + Knowledge Graphs**:
- Using both RAG and graphs together
- Smart routing: when to use RAG vs graphs
- Building hybrid AI systems
- Enhanced question answering

### 🎯 **Practice Ideas:**
- Try extracting entities from your own text
- Build knowledge graphs from different domains
- Experiment with relationship types
- Explore graph analysis functions

## 🏆 Final Challenge

Build a knowledge graph from your own text!

In [None]:
# 🏆 CHALLENGE: Create Your Own Knowledge Graph
print("🏆 FINAL CHALLENGE: Your Personal Knowledge Graph")
print("=" * 50)

# TODO: Write your own text about something you're interested in
your_text = """
Marie Curie was a physicist and chemist who conducted research on radioactivity. 
She worked at the University of Paris and discovered the elements polonium and radium. 
Marie Curie won Nobel Prizes in both Physics and Chemistry. 
She collaborated with her husband Pierre Curie on many experiments.
"""  # 👈 Replace this with your own text!

print("📝 Your Text:")
print(your_text)

print("\n🏗️ Building your knowledge graph...")
your_graph, your_entities, your_relationships = build_knowledge_graph(your_text)

print("\n🔍 Exploring your knowledge graph:")
# Show the most connected entity
most_connected = find_most_connected(your_graph, top_n=1)
if most_connected:
    top_entity = most_connected[0][0]
    print(f"\n⭐ Most connected entity: {top_entity}")
    explore_entity(your_graph, top_entity)

print("\n🎉 Challenge Complete! You've built your own knowledge graph!")
print("🚀 Ready for Tutorial 5: Combining RAG + Knowledge Graphs")