# GraphStore Example: Extracting and Managing Knowledge Graphs

This notebook demonstrates how to:
1. Extract knowledge graphs from text using SpindleExtractor
2. Store extracted triples in a persistent GraphStore
3. Perform CRUD operations (Create, Read, Update, Delete) on nodes and edges

The GraphStore uses Kùzu, an embedded graph database, so no separate server is needed.


In [1]:
import os
import json
from datetime import datetime, timedelta
from dotenv import load_dotenv
from spindle import (
    SpindleExtractor,
    create_ontology,
    GraphStore
)

# Load environment variables (API keys)
load_dotenv()

# Check if API key is set
if not os.getenv("ANTHROPIC_API_KEY"):
    print("Error: ANTHROPIC_API_KEY environment variable not set.")
    print("Please set it in a .env file or as an environment variable.")


## Part 1: Setting Up the Ontology and Extractor

First, we'll define an ontology (entity types and relation types) and create an extractor.


In [2]:
# Define entity types with attributes
entity_types = [
    {
        "name": "Person", 
        "description": "A human being",
        "attributes": [
            {"name": "title", "type": "string", "description": "Job title"},
            {"name": "years_experience", "type": "int", "description": "Years at company"}
        ]
    },
    {
        "name": "Organization", 
        "description": "A company or institution",
        "attributes": [
            {"name": "founded_year", "type": "int", "description": "Year founded"},
            {"name": "industry", "type": "string", "description": "Industry sector"}
        ]
    },
    {
        "name": "Location", 
        "description": "A geographic place",
        "attributes": []
    },
    {
        "name": "Technology", 
        "description": "A programming language or tool",
        "attributes": []
    }
]

# Define relation types
relation_types = [
    {
        "name": "works_at",
        "description": "Employment relationship",
        "domain": "Person",
        "range": "Organization"
    },
    {
        "name": "located_in",
        "description": "Physical location",
        "domain": "Organization",
        "range": "Location"
    },
    {
        "name": "uses",
        "description": "Technology usage",
        "domain": "Person",
        "range": "Technology"
    }
]

# Create ontology and extractor
ontology = create_ontology(entity_types, relation_types)
extractor = SpindleExtractor(ontology)

print("✓ Ontology and extractor created successfully!")
print(f"Entity types: {[et.name for et in ontology.entity_types]}")
print(f"Relation types: {[rt.name for rt in ontology.relation_types]}")


✓ Ontology and extractor created successfully!
Entity types: ['Person', 'Organization', 'Location', 'Technology']
Relation types: ['works_at', 'located_in', 'uses']


## Part 2: Extracting Knowledge from Text

Now we'll extract triples from some sample text. This demonstrates the extraction process before storing in the graph database.


In [3]:
# Sample text about employees and companies
text1 = """
Alice Johnson works at TechCorp in San Francisco. She primarily uses Python
for backend development and has been with the company for 3 years.
"""

print("Extracting triples from first source...")
result1 = extractor.extract(
    text=text1,
    source_name="Employee Database",
    source_url="https://example.com/employees"
)

print(f"\nExtracted {len(result1.triples)} triples:\n")
for i, triple in enumerate(result1.triples, 1):
    print(f"{i}. {triple.subject.name} ({triple.subject.type}) --[{triple.predicate}]--> {triple.object.name} ({triple.object.type})")
    if triple.subject.custom_atts:
        print(f"   Subject attributes: {triple.subject.custom_atts}")


Extracting triples from first source...
2025-11-05T13:27:55.729 [BAML [92mINFO[0m] [35mFunction ExtractTriples[0m:
    [33mClient: CustomGPT5Mini (gpt-5-mini-2025-08-07) - 31249ms. StopReason: completed. Tokens(in/out): 1364/2529[0m
    [34m---PROMPT---[0m
    [2m[43msystem: [0m[2mYou are a knowledge graph extraction expert. Your task is to extract structured triples (subject-predicate-object) from the provided text, with rich entity metadata, custom attributes, and supporting evidence.
    [43muser: [0m[2mONTOLOGY:
    You must extract triples that conform to the following ontology:
    
    Valid Entity Types:
    - Person: A human being
      Custom Attributes:
        * title (string): Job title
        * years_experience (int): Years at company
    - Organization: A company or institution
      Custom Attributes:
        * founded_year (int): Year founded
        * industry (string): Industry sector
    - Location: A geographic place
    - Technology: A programming 

## Part 3: Creating and Using GraphStore

Now we'll create a GraphStore and store our extracted triples. The GraphStore automatically converts all node names and edge predicates to UPPERCASE for consistency.


In [4]:
# Create a GraphStore with a unique name
graph_name = f"example_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
print(f"Creating graph: {graph_name}")
print(f"Graph files will be stored in: graphs/{graph_name}/\n")

# Use GraphStore as a context manager (automatically closes when done)
with GraphStore(db_path=graph_name) as store:
    print("✓ GraphStore initialized")
    
    # Store the extracted triples
    count = store.add_triples(result1.triples)
    print(f"✓ Added {count} triples to graph database")
    
    # Get initial statistics
    stats = store.get_statistics()
    print(f"\nGraph Statistics:")
    print(f"  Nodes: {stats['node_count']}")
    print(f"  Edges: {stats['edge_count']}")
    print(f"  Sources: {', '.join(stats['sources'])}")
    print(f"  Predicates: {', '.join(stats['predicates'])}")


Creating graph: example_20251105_132811
Graph files will be stored in: graphs/example_20251105_132811/

✓ GraphStore initialized
✓ Added 3 triples to graph database

Graph Statistics:
  Nodes: 4
  Edges: 3
  Sources: Employee Database
  Predicates: WORKS_AT, LOCATED_IN, USES


## Part 4: Adding More Data

Let's extract and add more triples from a second source to build up the graph.


In [5]:
# Extract from a second source
text2 = """
TechCorp is located in San Francisco and has recently expanded to New York.
Bob Smith, who also works at TechCorp, specializes in TypeScript development.
"""

result2 = extractor.extract(
    text=text2,
    source_name="Company Profile",
    source_url="https://example.com/company",
    existing_triples=result1.triples  # Pass existing triples for entity consistency
)

print(f"Extracted {len(result2.triples)} triples from second source:\n")
for i, triple in enumerate(result2.triples, 1):
    print(f"{i}. {triple.subject.name} ({triple.subject.type}) --[{triple.predicate}]--> {triple.object.name} ({triple.object.type})")

# Store the second batch
with GraphStore(db_path=graph_name) as store:
    count = store.add_triples(result2.triples)
    print(f"\n✓ Added {count} more triples to graph database")
    
    # Updated statistics
    stats = store.get_statistics()
    print(f"\nUpdated Graph Statistics:")
    print(f"  Nodes: {stats['node_count']}")
    print(f"  Edges: {stats['edge_count']}")
    print(f"  Sources: {', '.join(stats['sources'])}")


2025-11-05T13:28:39.298 [BAML [92mINFO[0m] [35mFunction ExtractTriples[0m:
    [33mClient: CustomHaiku (claude-3-5-haiku-20241022) - 13701ms. StopReason: end_turn. Tokens(in/out): 1695/1127[0m
    [34m---PROMPT---[0m
    [2m[43muser: [0m[2mYou are a knowledge graph extraction expert. Your task is to extract structured triples (subject-predicate-object) from the provided text, with rich entity metadata, custom attributes, and supporting evidence.ONTOLOGY:
    You must extract triples that conform to the following ontology:
    
    Valid Entity Types:
    - Person: A human being
      Custom Attributes:
        * title (string): Job title
        * years_experience (int): Years at company
    - Organization: A company or institution
      Custom Attributes:
        * founded_year (int): Year founded
        * industry (string): Industry sector
    - Location: A geographic place
    - Technology: A programming language or tool
    
    Valid Relation Types:
    - works_at: Em

## Part 5: READ Operations - Querying the Graph

Now we'll demonstrate various ways to query the graph store.


In [6]:
with GraphStore(db_path=graph_name) as store:
    # Query 1: Find all "works_at" relationships
    print("Query 1: All 'works_at' relationships")
    print("-" * 70)
    works_at = store.query_by_pattern(predicate="works_at")
    print(f"Found {len(works_at)} relationship(s):")
    for edge in works_at:
        print(f"  - {edge['subject']} works at {edge['object']}")
        print(f"    Source: {edge.get('supporting_evidence', [{}])[0].get('source_nm', 'Unknown')}")
    print()
    
    # Query 2: Find all relationships involving a specific subject
    print("Query 2: All relationships with Alice Johnson as subject")
    print("-" * 70)
    alice_rels = store.query_by_pattern(subject="Alice Johnson")
    print(f"Found {len(alice_rels)} relationship(s):")
    for edge in alice_rels:
        print(f"  - {edge['subject']} --[{edge['predicate']}]--> {edge['object']}")
    print()
    
    # Query 3: Find all relationships where TechCorp is the subject
    print("Query 3: All relationships with TechCorp as subject")
    print("-" * 70)
    techcorp_rels = store.query_by_pattern(subject="TechCorp")
    print(f"Found {len(techcorp_rels)} relationship(s):")
    for edge in techcorp_rels:
        print(f"  - {edge['subject']} --[{edge['predicate']}]--> {edge['object']}")
    print()
    
    # Query 4: Query by source
    print("Query 4: All triples from 'Employee Database' source")
    print("-" * 70)
    employee_db = store.query_by_source("Employee Database")
    print(f"Found {len(employee_db)} edge(s):")
    for edge in employee_db:
        print(f"  - {edge['subject']} --[{edge['predicate']}]--> {edge['object']}")
    print()
    
    # Query 5: Query by date range
    print("Query 5: All triples from the last hour")
    print("-" * 70)
    one_hour_ago = datetime.utcnow() - timedelta(hours=1)
    recent = store.query_by_date_range(start=one_hour_ago)
    print(f"Found {len(recent)} edge(s):")
    for edge in recent[:5]:  # Show first 5
        print(f"  - {edge['subject']} --[{edge['predicate']}]--> {edge['object']}")
    print()
    
    # Query 6: Get specific node information
    print("Query 6: Get node information for Alice Johnson")
    print("-" * 70)
    alice = store.get_node("Alice Johnson")
    if alice:
        print(f"Node: {alice['name']}")
        print(f"Type: {alice['type']}")
        print(f"Description: {alice['description']}")
        print(f"Metadata: {json.dumps(alice['metadata'], indent=2)}")
        print(f"Custom Attributes: {json.dumps(alice['custom_atts'], indent=2)}")
    print()
    
    # Query 7: Get specific edge
    print("Query 7: Get specific edge")
    print("-" * 70)
    edge = store.get_edge("Alice Johnson", "works_at", "TechCorp")
    if edge:
        print(f"Found edge: {edge[0]['subject']} --[{edge[0]['predicate']}]--> {edge[0]['object']}")
        print(f"Supporting evidence sources: {len(edge[0].get('supporting_evidence', []))}")


Query 1: All 'works_at' relationships
----------------------------------------------------------------------
Found 2 relationship(s):
  - ALICE JOHNSON works at TECHCORP
    Source: Employee Database
  - BOB SMITH works at TECHCORP
    Source: Company Profile

Query 2: All relationships with Alice Johnson as subject
----------------------------------------------------------------------
Found 2 relationship(s):
  - ALICE JOHNSON --[WORKS_AT]--> TECHCORP
  - ALICE JOHNSON --[USES]--> PYTHON

Query 3: All relationships with TechCorp as subject
----------------------------------------------------------------------
Found 2 relationship(s):
  - TECHCORP --[LOCATED_IN]--> SAN FRANCISCO
  - TECHCORP --[LOCATED_IN]--> NEW YORK

Query 4: All triples from 'Employee Database' source
----------------------------------------------------------------------
Found 3 edge(s):
  - ALICE JOHNSON --[USES]--> PYTHON
  - ALICE JOHNSON --[WORKS_AT]--> TECHCORP
  - TECHCORP --[LOCATED_IN]--> SAN FRANCISCO

Quer

## Part 6: CREATE Operations - Adding Individual Nodes and Edges

You can add nodes and edges individually without extracting from text.


In [7]:
with GraphStore(db_path=graph_name) as store:
    # Create a new node
    print("Creating a new node: Carol Davis")
    print("-" * 70)
    success = store.add_node(
        name="Carol Davis",
        entity_type="Person",
        metadata={"title": "Data Scientist", "verified": True},
        description="A data scientist at TechCorp",
        custom_atts={}
    )
    
    if success:
        print("✓ Added Carol Davis node")
    else:
        print("Node already exists or failed to add")
    print()
    
    # Create a new edge
    print("Creating a new edge: Carol Davis works_at TechCorp")
    print("-" * 70)
    result = store.add_edge(
        subject="Carol Davis",
        predicate="works_at",
        obj="TechCorp",
        metadata={
            "source": "Manual Entry",
            "extraction_datetime": datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ")
        }
    )
    
    if result.get("success"):
        print(f"✓ Added edge: {result.get('message')}")
    else:
        print(f"✗ Failed to add edge: {result.get('message')}")
    print()
    
    # Show updated statistics
    stats = store.get_statistics()
    print("Updated Graph Statistics:")
    print(f"  Nodes: {stats['node_count']}")
    print(f"  Edges: {stats['edge_count']}")


Creating a new node: Carol Davis
----------------------------------------------------------------------
✓ Added Carol Davis node

Creating a new edge: Carol Davis works_at TechCorp
----------------------------------------------------------------------
✓ Added edge: Created new edge

Updated Graph Statistics:
  Nodes: 8
  Edges: 7


## Part 7: UPDATE Operations - Modifying Node and Edge Metadata

You can update node and edge properties after they've been created.


In [8]:
with GraphStore(db_path=graph_name) as store:
    # Update node metadata
    print("Updating Alice Johnson's node metadata")
    print("-" * 70)
    
    # First, get the current node to see existing metadata
    alice_before = store.get_node("Alice Johnson")
    print("Before update:")
    print(f"  Metadata: {json.dumps(alice_before['metadata'], indent=2)}")
    print()
    
    # Update the node
    success = store.update_node(
        "Alice Johnson",
        updates={
            "metadata": {
                "sources": ["Employee Database"],
                "first_seen": result1.triples[0].extraction_datetime,
                "verified": True,
                "employee_id": "E12345"
            }
        }
    )
    
    if success:
        print("✓ Updated Alice Johnson's metadata")
        alice_after = store.get_node("Alice Johnson")
        print("After update:")
        print(f"  Metadata: {json.dumps(alice_after['metadata'], indent=2)}")
    print()
    
    # Update edge metadata
    print("Updating edge metadata")
    print("-" * 70)
    
    # Get the edge first
    edge_before = store.get_edge("Alice Johnson", "uses", "Python")
    if edge_before:
        print("Before update:")
        print(f"  Metadata: {json.dumps(edge_before[0]['metadata'], indent=2)}")
        print()
        
        # Update the edge
        success = store.update_edge(
            "Alice Johnson",
            "uses",
            "Python",
            updates={
                "metadata": {
                    "proficiency": "expert",
                    "years_used": 5,
                    "verified": True
                }
            }
        )
        
        if success:
            print("✓ Updated edge metadata")
            edge_after = store.get_edge("Alice Johnson", "uses", "Python")
            print("After update:")
            print(f"  Metadata: {json.dumps(edge_after[0]['metadata'], indent=2)}")


Updating Alice Johnson's node metadata
----------------------------------------------------------------------
Before update:
  Metadata: {
  "sources": [
    "Employee Database"
  ],
  "first_seen": "2025-11-05T19:27:55Z"
}

✓ Updated Alice Johnson's metadata
After update:
  Metadata: {
  "sources": [
    "Employee Database"
  ],
  "first_seen": "2025-11-05T19:27:55Z",
  "verified": true,
  "employee_id": "E12345"
}

Updating edge metadata
----------------------------------------------------------------------
Before update:
  Metadata: {}

✓ Updated edge metadata
After update:
  Metadata: {
  "proficiency": "expert",
  "years_used": 5,
  "verified": true
}


## Part 8: DELETE Operations - Removing Nodes and Edges

You can delete specific edges or entire nodes (which also deletes all connected edges).


In [9]:
with GraphStore(db_path=graph_name) as store:
    # Get current statistics before deletion
    stats_before = store.get_statistics()
    print("Before deletion:")
    print(f"  Nodes: {stats_before['node_count']}")
    print(f"  Edges: {stats_before['edge_count']}")
    print()
    
    # Delete a specific edge
    print("Deleting an edge: Alice Johnson --[uses]--> Python")
    print("-" * 70)
    
    # Check if edge exists
    edge = store.get_edge("Alice Johnson", "uses", "Python")
    if edge:
        print(f"Edge exists: {edge[0]['subject']} --[{edge[0]['predicate']}]--> {edge[0]['object']}")
        
        # Delete the edge
        success = store.delete_edge("Alice Johnson", "uses", "Python")
        if success:
            print("✓ Deleted edge successfully")
            
            # Verify deletion
            edge_after = store.get_edge("Alice Johnson", "uses", "Python")
            if edge_after is None or len(edge_after) == 0:
                print("✓ Verified: Edge no longer exists")
            else:
                print("✗ Warning: Edge still exists")
        else:
            print("✗ Failed to delete edge")
    else:
        print("Edge not found")
    print()
    
    # Show updated statistics
    stats_after = store.get_statistics()
    print("After edge deletion:")
    print(f"  Nodes: {stats_after['node_count']}")
    print(f"  Edges: {stats_after['edge_count']}")
    print()
    
    # Delete a node (this also deletes all connected edges)
    print("Deleting a node: Carol Davis (and all connected edges)")
    print("-" * 70)
    
    # Check if node exists
    carol = store.get_node("Carol Davis")
    if carol:
        print(f"Node exists: {carol['name']} ({carol['type']})")
        
        # Find edges connected to this node
        carol_edges = store.query_by_pattern(subject="Carol Davis")
        print(f"Found {len(carol_edges)} edge(s) connected to this node")
        
        # Delete the node (this automatically deletes all connected edges)
        success = store.delete_node("Carol Davis")
        if success:
            print("✓ Deleted node successfully (and all connected edges)")
            
            # Verify deletion
            carol_after = store.get_node("Carol Davis")
            if carol_after is None:
                print("✓ Verified: Node no longer exists")
            else:
                print("✗ Warning: Node still exists")
        else:
            print("✗ Failed to delete node")
    else:
        print("Node not found")
    print()
    
    # Final statistics
    stats_final = store.get_statistics()
    print("Final Graph Statistics:")
    print(f"  Nodes: {stats_final['node_count']}")
    print(f"  Edges: {stats_final['edge_count']}")


Before deletion:
  Nodes: 8
  Edges: 7

Deleting an edge: Alice Johnson --[uses]--> Python
----------------------------------------------------------------------
Edge exists: ALICE JOHNSON --[USES]--> PYTHON
✓ Deleted edge successfully
✓ Verified: Edge no longer exists

After edge deletion:
  Nodes: 8
  Edges: 6

Deleting a node: Carol Davis (and all connected edges)
----------------------------------------------------------------------
Node exists: CAROL DAVIS (Person)
Found 1 edge(s) connected to this node
✓ Deleted node successfully (and all connected edges)
✓ Verified: Node no longer exists

Final Graph Statistics:
  Nodes: 7
  Edges: 5


## Part 9: Cypher Queries

You can use Cypher queries to query the graph store.


In [10]:
with GraphStore(db_path=graph_name) as store:
    # Example 1: Find all people and what organizations they work at
    print("Cypher Query 1: Find all people and their organizations")
    print("-" * 70)
    cypher_query1 = """
    MATCH (p:Entity)-[r:Relationship {predicate: 'WORKS_AT'}]->(o:Entity)
    RETURN p.name AS person, o.name AS organization
    """
    
    results = store.query_cypher(cypher_query1)
    print(f"Found {len(results)} result(s):")
    for row in results:
        print(f"  - {row['person']} works at {row['organization']}")
    print()
    
    # Example 2: Find all organizations and their locations
    print("Cypher Query 2: Find all organizations and their locations")
    print("-" * 70)
    cypher_query2 = """
    MATCH (o:Entity)-[r:Relationship {predicate: 'LOCATED_IN'}]->(l:Entity)
    RETURN o.name AS organization, l.name AS location
    """
    
    results = store.query_cypher(cypher_query2)
    print(f"Found {len(results)} result(s):")
    for row in results:
        print(f"  - {row['organization']} is located in {row['location']}")
    print()
    
    # Example 3: Find all nodes of a specific type
    print("Cypher Query 3: Find all Person nodes")
    print("-" * 70)
    cypher_query3 = """
    MATCH (p:Entity {type: 'Person'})
    RETURN p.name AS name, p.description AS description
    """
    
    results = store.query_cypher(cypher_query3)
    print(f"Found {len(results)} person(s):")
    for row in results:
        print(f"  - {row['name']}: {row['description']}")


Cypher Query 1: Find all people and their organizations
----------------------------------------------------------------------
Found 2 result(s):
  - ALICE JOHNSON works at TECHCORP
  - BOB SMITH works at TECHCORP

Cypher Query 2: Find all organizations and their locations
----------------------------------------------------------------------
Found 2 result(s):
  - TECHCORP is located in NEW YORK
  - TECHCORP is located in SAN FRANCISCO

Cypher Query 3: Find all Person nodes
----------------------------------------------------------------------
Found 2 person(s):
  - ALICE JOHNSON: An employee at TechCorp who primarily does backend development and is mentioned as having spent multiple years with the company.
  - BOB SMITH: An employee at TechCorp who specializes in TypeScript development


## Part 10: Exporting Triples

You can export all triples from the graph store back to Triple objects for use with other Spindle functions.


In [11]:
with GraphStore(db_path=graph_name) as store:
    # Export all triples
    print("Exporting all triples from graph store")
    print("-" * 70)
    
    all_triples = store.get_triples()
    print(f"Exported {len(all_triples)} triple(s) from database")
    print("\nThese can be used with any Spindle functions that accept Triple objects.")
    print("\nExample exported triples:")
    for i, triple in enumerate(all_triples[:3], 1):  # Show first 3
        print(f"\n{i}. {triple.subject.name} ({triple.subject.type}) --[{triple.predicate}]--> {triple.object.name} ({triple.object.type})")
        print(f"   Source: {triple.source.source_name}")
        print(f"   Extraction datetime: {triple.extraction_datetime}")


Exporting all triples from graph store
----------------------------------------------------------------------
Exported 6 triple(s) from database

These can be used with any Spindle functions that accept Triple objects.

Example exported triples:

1. ALICE JOHNSON (Person) --[WORKS_AT]--> TECHCORP (Organization)
   Source: Employee Database
   Extraction datetime: 2025-11-05T19:27:55Z

2. TECHCORP (Organization) --[LOCATED_IN]--> NEW YORK (Location)
   Source: Company Profile
   Extraction datetime: 2025-11-05T19:28:39Z

3. TECHCORP (Organization) --[LOCATED_IN]--> SAN FRANCISCO (Location)
   Source: Employee Database
   Extraction datetime: 2025-11-05T19:27:55Z


## Summary

This notebook demonstrated:

1. **Extraction**: Using `SpindleExtractor` to extract knowledge graph triples from text
2. **Storage**: Using `GraphStore` to persistently store triples in a Kùzu database
3. **READ Operations**: 
   - Querying by pattern (subject, predicate, object)
   - Querying by source
   - Querying by date range
   - Getting node and edge information
   - Using Cypher queries for advanced queries
4. **CREATE Operations**: 
   - Adding individual nodes
   - Adding individual edges
5. **UPDATE Operations**: 
   - Updating node metadata
   - Updating edge metadata
6. **DELETE Operations**: 
   - Deleting specific edges
   - Deleting nodes (which also deletes connected edges)
7. **Export**: Exporting triples back to Triple objects

### Key Points:

- GraphStore automatically converts all node names and edge predicates to UPPERCASE for consistency
- Queries are case-insensitive
- The graph database is stored locally (no server needed)
- You can use the graph store as a context manager for automatic cleanup
- All triples maintain source metadata and supporting evidence

### Cleanup

When you're done with the example graph, you can delete it:

```python
with GraphStore(db_path=graph_name) as store:
    store.delete_graph()
```
