# 🚀 iText2KG + FalkorDB Quickstart Guide

This notebook demonstrates how to use **iText2KG** with **FalkorDB** for building and visualizing knowledge graphs from text documents.

## Features Covered:
- Document distillation with custom schemas
- Knowledge graph construction using iText2KG_Star (recommended)
- Complete FalkorDB integration with advanced features
- Graph statistics and analytics
- Error handling and fallback mechanisms

**Updated:** August 2025 with full FalkorDB integration

## 📦 Installation

Install the required packages:

```bash
pip install itext2kg falkordb langchain_openai
```

**Prerequisites:**
- Python 3.9+
- OpenAI API key
- FalkorDB server running (default: localhost:6379)

**FalkorDB Setup:**
```bash
# Using Docker (recommended)
docker run -p 6379:6379 -it --rm falkordb/falkordb:latest
```

In [None]:
import os
import asyncio
from typing import List, Optional
from pydantic import BaseModel, Field

# iText2KG imports
from itext2kg import DocumentDistiller, iText2KG_Star, FalkorDBStorage
from itext2kg.logging_config import setup_logging, get_logger

# LangChain imports
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# Setup logging
setup_logging(level="INFO")
logger = get_logger(__name__)

print("All imports successful!")
print("Logging configured successfully")

## 🔧 Configuration

Set up your API keys and database connection:

In [None]:
print("Starting iText2KG + FalkorDB Example")
print("=" * 50)

# OpenAI Configuration
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or input('Enter your OpenAI API key: ')

# Initialize LLM and Embeddings
llm_model = ChatOpenAI(
    api_key=OPENAI_API_KEY,
    model="gpt-4o-mini",
    temperature=0,
    max_retries=2
)

embeddings_model = OpenAIEmbeddings(
    api_key=OPENAI_API_KEY,
    model="text-embedding-3-small"
)

# FalkorDB Configuration
FALKORDB_CONFIG = {
    "host": os.getenv("FALKORDB_HOST", "localhost"),
    "port": int(os.getenv("FALKORDB_PORT", 6379)),
    "password": os.getenv("FALKORDB_PASSWORD", None),
    "graph_name": "NewsGraph"
}

print(f"🔧 Configuration complete!")
print(f"FalkorDB: {FALKORDB_CONFIG['host']}:{FALKORDB_CONFIG['port']}")
print(f"LLM Model: {llm_model.model_name}")

## 📋 Define Data Schema

Create a Pydantic schema for structured information extraction:

In [None]:
class NewsArticle(BaseModel):
    """Schema for extracting structured information from news articles."""
    title: str = Field(default="", description="The title of the article")
    companies: List[str] = Field(default_factory=list, description="Companies mentioned in the article")
    people: List[str] = Field(default_factory=list, description="People mentioned in the article")
    locations: List[str] = Field(default_factory=list, description="Locations mentioned in the article")
    key_events: str = Field(default="", description="Main events described in the article")
    technologies: List[str] = Field(default_factory=list, description="Technologies or products mentioned")
    funding_info: str = Field(default="", description="Any funding or financial information mentioned")

print("📋 NewsArticle schema defined successfully!")

## 📰 Sample Data

Let's use a sample news article about a tech acquisition:

In [None]:
# Sample article text
article_text = """
Apple Inc. announced today that it has acquired Emotient, a San Diego-based artificial
intelligence startup specializing in facial expression recognition technology. The acquisition
was led by Apple's Senior Vice President of Software Engineering, Craig Federighi.

Emotient's technology will be integrated into Apple's machine learning initiatives. The startup
was founded in 2012 by Dr. Ken Denman and has raised $8 million in funding from venture capital firms.

The acquisition represents Apple's continued investment in artificial intelligence and machine
learning capabilities. Industry analysts believe this technology could be integrated into future
iPhone and iPad applications for emotion recognition and user experience enhancement.

Tim Cook, Apple's CEO, stated that this acquisition aligns with the company's strategy to develop
more intelligent and intuitive user interfaces. The Emotient team will join Apple's AI research
division in Cupertino, California.
"""

# Information extraction query
extraction_query = """
# DIRECTIVES:
- Act like a professional business news analyst
- Extract key information from this business article
- Focus on factual information only
- If information is not found, leave it empty
"""

print(f"Article loaded ({len(article_text)} characters)")
print(f"Extraction query prepared")

## 🔍 Step 1: Document Distillation

Extract structured information from the raw text:

In [None]:
async def distill_document():
    """Extract structured information from the article."""
    
    logger.info("Starting document distillation...")
    
    # Initialize document distiller
    distiller = DocumentDistiller(llm_model=llm_model)
    
    # Extract structured information
    distilled = await distiller.distill(
        documents=[article_text],
        IE_query=extraction_query,
        output_data_structure=NewsArticle
    )
    
    # Convert to dictionary
    doc_dict = (
        distilled.model_dump() 
        if hasattr(distilled, 'model_dump') 
        else distilled.dict()
    )
    
    # Create semantic blocks
    semantic_blocks = [
        f"{k}: {', '.join(v) if isinstance(v, list) else v}"
        for k, v in doc_dict.items() if v
    ]
    
    logger.info(f"Generated {len(semantic_blocks)} semantic blocks")
    
    return doc_dict, semantic_blocks

# Run distillation
doc_data, semantic_blocks = await distill_document()

print("\nExtracted Information:")
for key, value in doc_data.items():
    if value:
        print(f"  • {key}: {value}")

print(f"\nGenerated {len(semantic_blocks)} semantic blocks for KG construction")

## 🕸️ Step 2: Knowledge Graph Construction

Build a knowledge graph using iText2KG_Star (recommended approach):

In [None]:
async def build_knowledge_graph(sections):
    """Build knowledge graph from semantic blocks."""
    
    logger.info("Building knowledge graph with iText2KG_Star...")
    
    # Initialize iText2KG_Star
    itext2kg_star = iText2KG_Star(
        llm_model=llm_model, 
        embeddings_model=embeddings_model
    )
    
    try:
        # Build the knowledge graph
        knowledge_graph = await itext2kg_star.build_graph(
            sections=sections,
            ent_threshold=0.7,      # Entity similarity threshold
            rel_threshold=0.7,      # Relationship similarity threshold
            max_tries=3,            # Max attempts for extraction
            entity_name_weight=0.6, # Weight for entity name in matching
            entity_label_weight=0.4, # Weight for entity label in matching
            observation_date="2025-08-05"  # Optional: temporal context
        )
        
        logger.info(f"Built KG: {len(knowledge_graph.entities)} entities, {len(knowledge_graph.relationships)} relationships")
        
    except ValueError as e:
        logger.warning(f"Extraction failed: {e}. Using fallback data...")
        
        # Fallback: simple test sections
        fallback_sections = [
            "Apple Inc. is a technology company based in Cupertino, California.",
            "Tim Cook is the CEO of Apple Inc.",
            "Craig Federighi is the Senior Vice President of Apple Inc.",
            "Emotient is an AI startup specializing in facial recognition.",
            "Apple Inc. acquired Emotient in 2016.",
            "Dr. Ken Denman founded Emotient in 2012.",
            "Emotient raised $8 million in funding."
        ]
        
        knowledge_graph = await itext2kg_star.build_graph(
            sections=fallback_sections,
            ent_threshold=0.7,
            rel_threshold=0.7,
            max_tries=1
        )
        
        logger.info(f"Fallback KG: {len(knowledge_graph.entities)} entities, {len(knowledge_graph.relationships)} relationships")
    
    return knowledge_graph

# Build knowledge graph
kg = await build_knowledge_graph(semantic_blocks)

print("\n🕸️ Knowledge Graph Summary:")
print(f"  • Entities: {len(kg.entities)}")
print(f"  • Relationships: {len(kg.relationships)}")

print("\nEntities by Type:")
entity_types = {}
for entity in kg.entities:
    entity_types[entity.label] = entity_types.get(entity.label, 0) + 1
    
for label, count in sorted(entity_types.items()):
    print(f"  • {label}: {count}")

## 🗄️ Step 3: FalkorDB Integration

Store and visualize the knowledge graph in FalkorDB:

In [None]:
async def integrate_with_falkordb(knowledge_graph):
    """Store knowledge graph in FalkorDB with enhanced features."""
    
    logger.info("Connecting to FalkorDB...")
    
    # Initialize FalkorDB storage
    fdb = FalkorDBStorage(**FALKORDB_CONFIG)
    
    try:
        # Get initial statistics
        initial_stats = fdb.get_graph_stats()
        print(f"Initial graph stats: {initial_stats}")
        
        # Optional: Clear existing data
        clear_data = input("\nClear existing graph data? (y/N): ")
        if clear_data.lower() == 'y':
            fdb.clear_graph()
            print("Cleared existing graph data")
            initial_stats = {'nodes': 0, 'relationships': 0}
        
        # Store the knowledge graph in FalkorDB
        print("\nPushing knowledge graph to FalkorDB...")
        fdb.visualize_graph(knowledge_graph, parent_node_type="Article")
        
        # Get final statistics
        final_stats = fdb.get_graph_stats()
        
        # Calculate changes
        nodes_added = final_stats['nodes'] - initial_stats['nodes']
        rels_added = final_stats['relationships'] - initial_stats['relationships']
        
        print(f"\nGraph successfully stored in FalkorDB!")
        print(f"Added: {nodes_added} nodes, {rels_added} relationships")
        print(f"Total: {final_stats['nodes']} nodes, {final_stats['relationships']} relationships")
        
        # Run sample analytics query
        print("\n🔍 Running analytics query...")
        try:
            sample_query = "MATCH (n) RETURN labels(n) as entity_type, count(n) as count ORDER BY count DESC LIMIT 10"
            result = fdb.run_query(sample_query)
            
            print("Entity distribution:")
            if hasattr(result, 'result_set') and result.result_set:
                for row in result.result_set:
                    print(f"  • {row[0]}: {row[1]} entities")
        except Exception as e:
            print(f"Analytics query failed: {e}")
        
        return fdb
        
    except Exception as e:
        logger.error(f"Failed to integrate with FalkorDB: {e}")
        raise

# Integrate with FalkorDB
falkor_storage = await integrate_with_falkordb(kg)

## 🚀 Advanced FalkorDB Features

Explore FalkorDB's powerful graph analytics capabilities:

In [None]:
# Advanced Graph Analytics with FalkorDB
print("FalkorDB Graph Analysis:\n")

# Query 1: Find all relationship types and their frequencies
try:
    query1 = "MATCH ()-[r]->() RETURN type(r) as relationship_type, count(r) as count ORDER BY count DESC"
    result1 = falkor_storage.run_query(query1)
    
    print("Relationship types in your graph:")
    if hasattr(result1, 'result_set') and result1.result_set:
        for row in result1.result_set[:5]:  # Show top 5
            print(f"  • {row[0]}: {row[1]} instances")
except Exception as e:
    print(f"Relationship query failed: {e}")

print("\n" + "="*50)

# Query 2: Find connection paths between key entities
try:
    query2 = "MATCH path = (a)-[*1..2]-(b) WHERE a.name CONTAINS 'Apple' AND b.name CONTAINS 'Tim' RETURN length(path) as path_length, count(*) as paths"
    result2 = falkor_storage.run_query(query2)
    
    print("Connection paths between Apple and Tim:")
    if hasattr(result2, 'result_set') and result2.result_set:
        for row in result2.result_set:
            print(f"  • Path length {row[0]}: {row[1]} paths")
    else:
        print("  • No direct paths found")
except Exception as e:
    print(f"Path query failed: {e}")

print("\n" + "="*50)

# Query 3: Show sample nodes and their properties
try:
    query3 = "MATCH (n) RETURN n.name, labels(n), n.description LIMIT 5"
    result3 = falkor_storage.run_query(query3)
    
    print("Sample entities in the graph:")
    if hasattr(result3, 'result_set') and result3.result_set:
        for row in result3.result_set:
            name = row[0] if row[0] else "Unknown"
            labels = row[1] if row[1] else "No label"
            desc = row[2] if len(row) > 2 and row[2] else "No description"
            print(f"  • {name} ({labels}): {desc[:50]}...")
except Exception as e:
    print(f"Entity query failed: {e}")

print("\n" + "="*50)
print("Advanced analysis complete!")

## 🧹 Cleanup

Clean up resources:

In [None]:
# Close FalkorDB connection
try:
    falkor_storage.close()
    print("🔌 FalkorDB connection closed successfully")
except Exception as e:
    print(f"Error closing connection: {e}")

print("\nQuickstart completed successfully!")
print("\nNext Steps:")
print("  • Try with your own documents")
print("  • Experiment with different schemas")
print("  • Explore advanced Cypher queries")
print("  • Build dynamic knowledge graphs with temporal data")
print("  • Scale up with larger document collections")

## 📚 More Examples

**Try different document types:**

```python
# Scientific articles
from itext2kg.models.schemas import Article

# CV/Resume processing  
from itext2kg.models.schemas import CV

# Custom schemas for your domain
class YourCustomSchema(BaseModel):
    # Define your fields here
    pass
```

**Dynamic Knowledge Graphs:**
```python
# Build evolving graphs with temporal data
kg = await itext2kg_star.build_graph(
    sections=sections,
    existing_knowledge_graph=previous_kg,  # Incremental updates
    observation_date="2025-08-05"
)
```

## 🔧 Troubleshooting

**Common Issues:**

1. **FalkorDB Connection Error**: Ensure FalkorDB is running on the specified host/port
2. **API Key Error**: Verify your OpenAI API key is set correctly
3. **Import Error**: Make sure all packages are installed with correct versions
4. **Extraction Failures**: The system includes fallback mechanisms for robustness

**Resources:**
- [iText2KG Documentation](https://github.com/auvalab/itext2kg)
- [FalkorDB Documentation](https://www.falkordb.com/)
- [FalkorDB Cypher Guide](https://docs.falkordb.com/)
- [OpenAI API Documentation](https://platform.openai.com/docs)

**Support:**
- GitHub Issues: [itext2kg/issues](https://github.com/auvalab/itext2kg/issues)
- Community: [Discussions](https://github.com/auvalab/itext2kg/discussions)
- FalkorDB Community: [FalkorDB Discord](https://discord.gg/falkordb)