# COLLIE: Complete Workflow Demo

This notebook demonstrates the complete COLLIE workflow for cultural heritage information extraction and analysis:

1. **AI-powered Information Extraction** using PydanticAI
2. **Canonical JSON Serialization** for async/future processing
3. **Markdown Rendering** for human-readable reports
4. **NetworkX Graph Analysis** for social network analysis
5. **Interactive Visualization** with Plotly
6. **Cypher Export** for graph database integration

## Setup

First, let's import all necessary libraries and set up our environment.


In [33]:
import asyncio
import json
import sys
from pathlib import Path
from typing import List

# Add src to path for imports
sys.path.insert(0, str(Path.cwd() / "src"))

# COLLIE imports
from collie.models.base import CRMEntity
from collie.extraction import InformationExtractor
from collie.io.to_networkx import (
    to_networkx_graph,
    calculate_centrality_measures,
    find_communities,
    get_network_statistics
)
from collie.visualization import (
    create_interactive_plot,
    plot_community_network,
    plot_centrality_network
)
from collie.io.to_cypher import generate_cypher_script
from collie.io.to_markdown import render_table, to_markdown, MarkdownStyle

# Standard libraries
import networkx as nx
import plotly.graph_objects as go
import plotly.express as px
from plotly.offline import plot
import plotly.io as pio

print("✅ All imports successful!")
print(f"📁 Working directory: {Path.cwd()}")


✅ All imports successful!
📁 Working directory: /Users/davidspencer/Documents/GitHub/collie


## Sample Text: Albert Einstein Biography

We'll use a sample text about Albert Einstein to demonstrate the complete workflow.


In [34]:
# Sample text about Albert Einstein
sample_text = """
Albert Einstein was born on March 14, 1879, in Ulm, Germany. 
He developed the theory of relativity and won the Nobel Prize in Physics in 1921.
Einstein worked at the Institute for Advanced Study in Princeton, New Jersey.
He is best known for his mass-energy equivalence formula E = mc².
Einstein's work revolutionized our understanding of space, time, and gravity.
He died on April 18, 1955, in Princeton, New Jersey.
"""

print("📝 Sample Text:")
print("=" * 50)
print(sample_text)
print("=" * 50)
print(f"Text length: {len(sample_text)} characters")


📝 Sample Text:

Albert Einstein was born on March 14, 1879, in Ulm, Germany. 
He developed the theory of relativity and won the Nobel Prize in Physics in 1921.
Einstein worked at the Institute for Advanced Study in Princeton, New Jersey.
He is best known for his mass-energy equivalence formula E = mc².
Einstein's work revolutionized our understanding of space, time, and gravity.
He died on April 18, 1955, in Princeton, New Jersey.

Text length: 420 characters


## Step 1: AI-Powered Information Extraction

Use PydanticAI to extract CRM entities and relationships from the text.


In [35]:
# Initialize the information extractor
extractor = InformationExtractor()

# Extract entities and relationships
print("🔍 Extracting entities using PydanticAI...")
extraction_result = await extractor.extract_from_text(sample_text)

print(f"✅ Extracted {len(extraction_result.entities)} entities")
print(f"✅ Extracted {len(extraction_result.relationships)} relationships")

# Display extracted entities
print("\n📋 Extracted Entities:")
for i, entity in enumerate(extraction_result.entities, 1):
    print(f"{i}. {entity.label} ({entity.class_code}) - Confidence: {entity.confidence:.2f}")

# Display extracted relationships
if extraction_result.relationships:
    print("\n🔗 Extracted Relationships:")
    for i, rel in enumerate(extraction_result.relationships, 1):
        print(f"{i}. {rel.property_label} - Confidence: {rel.confidence:.2f}")


🔍 Extracting entities using PydanticAI...
✅ Extracted 18 entities
✅ Extracted 53 relationships

📋 Extracted Entities:
1. Albert Einstein (E21) - Confidence: 0.70
2. Nobel Prize (E21) - Confidence: 0.70
3. Advanced Study (E21) - Confidence: 0.70
4. New Jersey (E21) - Confidence: 0.70
5. New Jersey (E21) - Confidence: 0.70
6. Event involving born (E5) - Confidence: 0.60
7. Event involving died (E5) - Confidence: 0.60
8. Event involving developed (E5) - Confidence: 0.60
9. Event involving won (E5) - Confidence: 0.60
10. Germany (E53) - Confidence: 0.80
11. Princeton (E53) - Confidence: 0.80
12. Ulm (E53) - Confidence: 0.80
13. theory (E22) - Confidence: 0.70
14. Nobel Prize (E22) - Confidence: 0.70
15. relativity (E22) - Confidence: 0.70
16. Year 1879 (E52) - Confidence: 0.90
17. Year 1921 (E52) - Confidence: 0.90
18. Year 1955 (E52) - Confidence: 0.90

🔗 Extracted Relationships:
1. has association with - Confidence: 0.50
2. has association with - Confidence: 0.50
3. has association with 

## Step 2: Convert to CRM Entities

Transform the extracted data into proper CIDOC CRM entities.


In [36]:
# Convert extracted entities to CRM entities
crm_entities = []
for entity in extraction_result.entities:
    crm_entity = CRMEntity(
        id=str(entity.id),
        class_code=entity.class_code,
        label=entity.label,
        notes=entity.description,
        type=[entity.class_code]
    )
    crm_entities.append(crm_entity)

print(f"✅ Created {len(crm_entities)} CRM entities")

# Display CRM entities
print("\n🏗️ CRM Entities:")
for i, entity in enumerate(crm_entities, 1):
    print(f"{i}. {entity.label} ({entity.class_code})")
    print(f"   ID: {entity.id}")
    print(f"   Notes: {entity.notes[:100]}..." if entity.notes and len(entity.notes) > 100 else f"   Notes: {entity.notes}")
    print()


✅ Created 18 CRM entities

🏗️ CRM Entities:
1. Albert Einstein (E21)
   ID: cef31f5c-cf9e-401f-a040-34c6cc3d20a1
   Notes: Person mentioned in the text: Albert Einstein

2. Nobel Prize (E21)
   ID: 1a913e42-fc92-4c49-bd56-bea0e93beb7c
   Notes: Person mentioned in the text: Nobel Prize

3. Advanced Study (E21)
   ID: 58577e46-1223-43a5-b141-379f7479bd4a
   Notes: Person mentioned in the text: Advanced Study

4. New Jersey (E21)
   ID: b9fc82c5-26f7-4faa-9672-5a2cfd78e733
   Notes: Person mentioned in the text: New Jersey

5. New Jersey (E21)
   ID: 998af369-6763-451c-b1ef-50bcb8e8f953
   Notes: Person mentioned in the text: New Jersey

6. Event involving born (E5)
   ID: 87403ef9-2d11-4cdb-8690-24f6e736973e
   Notes: Event mentioned in context: Albert Einstein was born on March 14, 1879, in Ulm, Germany. 
He develop...

7. Event involving died (E5)
   ID: a986ff56-1ff1-41fc-8a62-a6066df4ad47
   Notes: Event mentioned in context: our understanding of space, time, and gravity.
He died on

## Step 3: Serialize as Canonical JSON

**This is the crucial step for async/future processing!** Serialize the CRM entities as canonical JSON that can be easily loaded into graph databases, APIs, or other processing pipelines.


In [37]:
# Serialize as canonical JSON using Pydantic models
json_data = [entity.model_dump(mode='json') for entity in crm_entities]

print(f"💾 Serialized {len(json_data)} entities to canonical JSON")

# Create output directory
output_dir = Path("notebook_output")
output_dir.mkdir(exist_ok=True)

# Save canonical JSON
json_file = output_dir / "canonical_entities.json"
with open(json_file, "w") as f:
    json.dump(json_data, f, indent=2)

print(f"✅ Saved canonical JSON to: {json_file}")

# Display the structure of canonical JSON
print("\n📋 Canonical JSON Structure (first entity):")
print(json.dumps(json_data[0], indent=2))

# Demonstrate loading back from JSON
print("\n🔄 Loading back from canonical JSON...")
with open(json_file, "r") as f:
    loaded_data = json.load(f)

loaded_entities = []
for entity_data in loaded_data:
    entity = CRMEntity(**entity_data)
    loaded_entities.append(entity)

print(f"✅ Successfully loaded {len(loaded_entities)} entities from JSON")
print(f"✅ Data integrity verified: {len(loaded_entities) == len(crm_entities)}")


💾 Serialized 18 entities to canonical JSON
✅ Saved canonical JSON to: notebook_output/canonical_entities.json

📋 Canonical JSON Structure (first entity):
{
  "id": "cef31f5c-cf9e-401f-a040-34c6cc3d20a1",
  "class_code": "E21",
  "label": "Albert Einstein",
  "notes": "Person mentioned in the text: Albert Einstein",
  "type": [
    "E21"
  ]
}

🔄 Loading back from canonical JSON...
✅ Successfully loaded 18 entities from JSON
✅ Data integrity verified: True


## Step 4: Render to Markdown

Generate human-readable reports in Markdown format.


In [38]:
# Create markdown directory
markdown_dir = output_dir / "markdown"
markdown_dir.mkdir(exist_ok=True)

# Generate individual entity cards
print("📄 Generating Markdown reports...")
for i, entity in enumerate(crm_entities[:5]):  # Show first 5 entities
    markdown_card = to_markdown(entity, MarkdownStyle.CARD)
    card_file = markdown_dir / f"entity_{i+1}_{entity.class_code}.md"
    with open(card_file, "w") as f:
        f.write(markdown_card)
    print(f"✅ Generated entity card: {card_file}")

# Generate summary table
table_markdown = render_table(crm_entities)
table_file = markdown_dir / "entities_summary.md"
with open(table_file, "w") as f:
    f.write("# CRM Entities Summary\n\n" + table_markdown)

print(f"✅ Generated summary table: {table_file}")

# Display the summary table
print("\n📊 Entities Summary Table:")
print(table_markdown)


📄 Generating Markdown reports...
✅ Generated entity card: notebook_output/markdown/entity_1_E21.md
✅ Generated entity card: notebook_output/markdown/entity_2_E21.md
✅ Generated entity card: notebook_output/markdown/entity_3_E21.md
✅ Generated entity card: notebook_output/markdown/entity_4_E21.md
✅ Generated entity card: notebook_output/markdown/entity_5_E21.md
✅ Generated summary table: notebook_output/markdown/entities_summary.md

📊 Entities Summary Table:
| id | class_code | label | type |
| --- | --- | --- | --- |
| cef31f5c... | E21 | Albert Einstein | E21 |
| 1a913e42... | E21 | Nobel Prize | E21 |
| 58577e46... | E21 | Advanced Study | E21 |
| b9fc82c5... | E21 | New Jersey | E21 |
| 998af369... | E21 | New Jersey | E21 |
| 87403ef9... | E5 | Event involving born | E5 |
| a986ff56... | E5 | Event involving died | E5 |
| e31ef1dd... | E5 | Event involving developed | E5 |
| 7178366e... | E5 | Event involving won | E5 |
| 10f0889e... | E53 | Germany | E53 |
| f0bb0f9b... | E53 | Pr

## Step 5: Convert to NetworkX Graph

Transform the CRM entities into a NetworkX graph for social network analysis.


In [39]:
# Convert to NetworkX graph
graph = to_networkx_graph(crm_entities)

print(f"🕸️ Created NetworkX graph with {graph.number_of_nodes()} nodes and {graph.number_of_edges()} edges")

# Display graph information
print("\n📊 Graph Information:")
print(f"Nodes: {list(graph.nodes())[:5]}..." if len(graph.nodes()) > 5 else f"Nodes: {list(graph.nodes())}")
print(f"Edges: {list(graph.edges())[:5]}..." if len(graph.edges()) > 5 else f"Edges: {list(graph.edges())}")

# Display node data
print("\n🏷️ Node Data (first 3 nodes):")
for i, (node_id, data) in enumerate(list(graph.nodes(data=True))[:3]):
    print(f"Node {i+1}: {data}")


🕸️ Created NetworkX graph with 18 nodes and 0 edges

📊 Graph Information:
Nodes: ['cef31f5c-cf9e-401f-a040-34c6cc3d20a1', '1a913e42-fc92-4c49-bd56-bea0e93beb7c', '58577e46-1223-43a5-b141-379f7479bd4a', 'b9fc82c5-26f7-4faa-9672-5a2cfd78e733', '998af369-6763-451c-b1ef-50bcb8e8f953']...
Edges: []

🏷️ Node Data (first 3 nodes):
Node 1: {'class_code': 'E21', 'label': 'Albert Einstein', 'notes': 'Person mentioned in the text: Albert Einstein', 'type': ['E21']}
Node 2: {'class_code': 'E21', 'label': 'Nobel Prize', 'notes': 'Person mentioned in the text: Nobel Prize', 'type': ['E21']}
Node 3: {'class_code': 'E21', 'label': 'Advanced Study', 'notes': 'Person mentioned in the text: Advanced Study', 'type': ['E21']}


## Step 6: Network Analysis

Perform comprehensive network analysis including centrality measures, community detection, and network statistics.


In [40]:
# Calculate centrality measures
print("📊 Calculating centrality measures...")
centrality_measures = calculate_centrality_measures(graph)

print(f"✅ Calculated centrality measures: {list(centrality_measures.keys())}")

# Display degree centrality (most connected nodes)
if 'degree_centrality' in centrality_measures:
    degree_centrality = centrality_measures['degree_centrality']
    print("\n🔗 Degree Centrality (most connected nodes):")
    sorted_nodes = sorted(degree_centrality.items(), key=lambda x: x[1], reverse=True)
    for node_id, centrality in sorted_nodes[:5]:
        node_data = graph.nodes[node_id]
        print(f"  {node_data.get('label', node_id)}: {centrality:.3f}")

# Find communities
print("\n🏘️ Finding communities...")
communities = find_communities(graph)
print(f"✅ Found {len(communities)} communities")

if communities:
    print("\nCommunity Structure:")
    for i, community in enumerate(communities[:3]):  # Show first 3 communities
        community_labels = [graph.nodes[node_id].get('label', node_id) for node_id in community]
        print(f"  Community {i+1}: {community_labels}")

# Get network statistics
print("\n📈 Calculating network statistics...")
network_stats = get_network_statistics(graph)

print("\n📊 Network Statistics:")
print(f"  Density: {network_stats['basic_metrics']['density']:.3f}")
print(f"  Average clustering: {network_stats['connectivity']['average_clustering']:.3f}")
print(f"  Is connected: {network_stats['basic_metrics']['is_connected']}")

if 'degree_stats' in network_stats:
    degree_stats = network_stats['degree_stats']
    print(f"  Average degree: {degree_stats['avg_degree']:.2f}")
    print(f"  Max degree: {degree_stats['max_degree']}")
    print(f"  Min degree: {degree_stats['min_degree']}")


📊 Calculating centrality measures...
✅ Calculated centrality measures: ['degree', 'betweenness', 'closeness', 'eigenvector', 'pagerank']

🏘️ Finding communities...
✅ Found 0 communities

📈 Calculating network statistics...

📊 Network Statistics:
  Density: 0.000
  Average clustering: 0.000
  Is connected: None
  Average degree: 0.00
  Max degree: 0
  Min degree: 0


## Step 7: Interactive Network Visualization

Create an interactive network visualization using Plotly.


In [None]:
# Create plots directory
plots_dir = output_dir / "plots"
plots_dir.mkdir(exist_ok=True)

# Create interactive network plot
print("🎨 Creating interactive network visualization...")
interactive_fig = create_interactive_plot(
    graph,
    title="Einstein's Life Network",
    node_size_multiplier=20,
    edge_width_multiplier=2
)

# Save interactive plot
interactive_file = plots_dir / "network_overview.html"
interactive_fig.write_html(str(interactive_file))
print(f"✅ Generated interactive plot: {interactive_file}")

# Display in notebook - using alternative method
try:
    interactive_fig.show()
except ValueError as e:
    print(f"⚠️  Display issue: {e}")
    print("📁 Plot saved as HTML file - you can open it in your browser")
    print(f"🔗 File location: {interactive_file}")
    print("💡 Tip: Open the HTML file in your browser to view the interactive plot")


🎨 Creating interactive network visualization...
✅ Generated interactive plot: notebook_output/plots/network_overview.html


ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

## Step 8: Enhanced Interactive Features

Create additional interactive visualizations with community structure and centrality analysis.


In [None]:
# Create enhanced interactive visualizations
print("🎨 Creating enhanced interactive visualizations...")

# 1. Community Structure Visualization
if communities:
    print("📊 Creating community structure visualization...")
    community_fig = create_interactive_plot(
        graph,
        title="Community Structure - Einstein's Network",
        node_size_multiplier=25,
        edge_width_multiplier=2
    )
    
    # Add community colors
    community_colors = px.colors.qualitative.Set3[:len(communities)]
    for i, community in enumerate(communities):
        for node_id in community:
            if node_id in community_fig.data[1].marker.color:
                # Update node color for community
                pass
    
    community_file = plots_dir / "community_structure.html"
    community_fig.write_html(str(community_file))
    print(f"✅ Generated community plot: {community_file}")
    
    # Display community visualization
    try:
        community_fig.show()
    except ValueError as e:
        print(f"⚠️  Display issue: {e}")
        print(f"📁 Community plot saved: {community_file}")

# 2. Centrality Analysis Visualization
print("📊 Creating centrality analysis visualization...")
centrality_fig = create_interactive_plot(
    graph,
    title="Centrality Analysis - Einstein's Network",
    node_size_multiplier=30,
    edge_width_multiplier=1.5
)

centrality_file = plots_dir / "centrality_analysis.html"
centrality_fig.write_html(str(centrality_file))
print(f"✅ Generated centrality plot: {centrality_file}")

# Display centrality visualization
try:
    centrality_fig.show()
except ValueError as e:
    print(f"⚠️  Display issue: {e}")
    print(f"📁 Centrality plot saved: {centrality_file}")

# 3. Main Interactive Network
print("📊 Creating main interactive network...")
main_fig = create_interactive_plot(
    graph,
    title="Interactive Einstein Network - Complete View",
    node_size_multiplier=20,
    edge_width_multiplier=2
)

main_file = plots_dir / "interactive_network.html"
main_fig.write_html(str(main_file))
print(f"✅ Generated main interactive plot: {main_file}")

# Display main visualization
try:
    main_fig.show()
except ValueError as e:
    print(f"⚠️  Display issue: {e}")
    print(f"📁 Main plot saved: {main_file}")


🎨 Creating interactive network visualization...
✅ Generated interactive plot: notebook_output/plots/interactive_network.html


ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

## Step 9: Advanced Interactive Features

Create advanced interactive visualizations with filtering, enhanced hover information, and dynamic layouts.


In [None]:
# Create advanced interactive visualizations with enhanced features
print("🎨 Creating advanced interactive visualizations...")

# Enhanced interactive plot with better hover information
def create_enhanced_interactive_plot(graph, title, node_size_multiplier=20, edge_width_multiplier=2):
    """Create an enhanced interactive plot with better hover information and styling."""
    
    # Get layout positions
    pos = nx.spring_layout(graph, k=3, iterations=50)
    
    # Prepare edge traces with enhanced styling
    edge_x = []
    edge_y = []
    edge_hovertext = []
    
    for edge in graph.edges(data=True):
        x0, y0 = pos[edge[0]]
        x1, y1 = pos[edge[1]]
        edge_x.extend([x0, x1, None])
        edge_y.extend([y0, y1, None])
        
        # Enhanced edge hover information
        edge_data = edge[2]
        property_code = edge_data.get('property_code', 'Unknown')
        edge_hovertext.append(f"Relationship: {property_code}<br>From: {edge[0]}<br>To: {edge[1]}")
    
    edge_trace = go.Scatter(
        x=edge_x, y=edge_y,
        line=dict(width=edge_width_multiplier, color="#888"),
        hoverinfo="none",
        mode="lines",
        name="Relationships"
    )
    
    # Prepare node traces with enhanced information
    node_x = []
    node_y = []
    node_text = []
    node_hovertext = []
    node_colors = []
    node_sizes = []
    node_labels = []
    
    for node in graph.nodes():
        x, y = pos[node]
        node_x.append(x)
        node_y.append(y)
        
        node_data = graph.nodes[node]
        label = node_data.get('label', node)
        class_code = node_data.get('class_code', 'Unknown')
        degree = graph.degree(node)
        
        node_text.append(label)
        node_labels.append(label)
        
        # Enhanced hover information
        hover_info = f"""
        <b>{label}</b><br>
        ID: {node}<br>
        Class: {class_code}<br>
        Degree: {degree}<br>
        Type: {node_data.get('type', 'Unknown')}
        """
        node_hovertext.append(hover_info)
        
        # Color by class code with better color scheme
        color_map = {
            "E21": "#FF6B6B",      # Person - Red
            "E5": "#4ECDC4",       # Event - Teal
            "E53": "#45B7D1",      # Place - Blue
            "E22": "#96CEB4",      # Object - Green
            "E52": "#FFEAA7",      # Time - Yellow
            "E12": "#DDA0DD",      # Production - Purple
            "E39": "#F39C12",      # Actor - Orange
        }
        node_colors.append(color_map.get(class_code, "#95A5A6"))
        
        # Size by degree with better scaling
        node_sizes.append(max(15, degree * node_size_multiplier + 10))
    
    node_trace = go.Scatter(
        x=node_x, y=node_y,
        mode="markers+text",
        hoverinfo="text",
        hovertext=node_hovertext,
        text=node_text,
        textposition="middle center",
        textfont=dict(size=10, color="white"),
        marker=dict(
            size=node_sizes,
            color=node_colors,
            line=dict(width=2, color="white"),
            opacity=0.8
        ),
        name="Entities"
    )
    
    # Create figure with enhanced layout
    fig = go.Figure(
        data=[edge_trace, node_trace],
        layout=go.Layout(
            title=dict(
                text=title,
                font=dict(size=20, color="#2C3E50"),
                x=0.5
            ),
            showlegend=True,
            hovermode="closest",
            margin=dict(b=20, l=5, r=5, t=60),
            annotations=[
                dict(
                    text="Interactive CRM Network - Hover over nodes for details",
                    showarrow=False,
                    xref="paper", yref="paper",
                    x=0.005, y=-0.002,
                    xanchor="left", yanchor="bottom",
                    font=dict(color="#7F8C8D", size=12)
                )
            ],
            xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
            yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
            plot_bgcolor="white",
            paper_bgcolor="white",
            font=dict(family="Arial, sans-serif")
        )
    )
    
    return fig

# Create enhanced interactive plot
enhanced_fig = create_enhanced_interactive_plot(
    graph,
    title="Enhanced Interactive Einstein Network",
    node_size_multiplier=25,
    edge_width_multiplier=2
)

# Save enhanced plot
enhanced_file = plots_dir / "enhanced_interactive_network.html"
enhanced_fig.write_html(str(enhanced_file))
print(f"✅ Generated enhanced interactive plot: {enhanced_file}")

# Display enhanced visualization
try:
    enhanced_fig.show()
except ValueError as e:
    print(f"⚠️  Display issue: {e}")
    print(f"📁 Enhanced plot saved: {enhanced_file}")
    print("💡 Tip: Open the HTML file in your browser to view the interactive plot")

print("\n🎯 Enhanced Features:")
print("  - Better hover information with entity details")
print("  - Improved color scheme for entity types")
print("  - Dynamic node sizing based on degree")
print("  - Enhanced styling and layout")
print("  - Interactive zoom and pan capabilities")


## 📋 How to View Your Interactive Plots

If you encountered display issues in the notebook, here are several ways to view your interactive Plotly visualizations:

### Method 1: Open HTML Files in Browser
All interactive plots are saved as HTML files in the `plots/` directory. You can:
1. Navigate to the `notebook_output/plots/` folder
2. Double-click any `.html` file to open it in your browser
3. Enjoy the full interactive experience with zoom, pan, and hover features

### Method 2: Use Alternative Display Methods
```python
# Alternative 1: Display with specific renderer
import plotly.io as pio
pio.renderers.default = "browser"  # Opens in browser
fig.show()

# Alternative 2: Display as static image
fig.show("png")  # Saves as PNG

# Alternative 3: Display in iframe
from IPython.display import HTML
HTML(fig.to_html(include_plotlyjs='cdn'))
```

### Method 3: Fix Jupyter Environment
To fix the display issue permanently:
1. Restart your Jupyter server
2. Change kernel to "Collie (uv)" in your notebook
3. Re-run the cells

### Generated Files
- `network_overview.html` - Main interactive network
- `community_structure.html` - Community analysis
- `centrality_analysis.html` - Centrality analysis  
- `interactive_network.html` - Complete network view
- `enhanced_interactive_network.html` - Enhanced version with rich hover info


In [None]:
# Alternative display methods for Plotly visualizations
print("🔧 Alternative Display Methods")

# Method 1: Display with browser renderer
print("\n1. Browser Display:")
try:
    import plotly.io as pio
    pio.renderers.default = "browser"
    print("✅ Browser renderer set - plots will open in browser")
except Exception as e:
    print(f"❌ Browser renderer failed: {e}")

# Method 2: Display as static image
print("\n2. Static Image Display:")
try:
    # Create a simple test plot
    import plotly.graph_objects as go
    test_fig = go.Figure(data=go.Scatter(x=[1, 2, 3], y=[4, 5, 6]))
    
    # Save as PNG
    test_fig.write_image(plots_dir / "test_plot.png")
    print("✅ Static image display works - test plot saved as PNG")
except Exception as e:
    print(f"❌ Static image display failed: {e}")

# Method 3: HTML display with IPython
print("\n3. HTML Display:")
try:
    from IPython.display import HTML
    html_content = enhanced_fig.to_html(include_plotlyjs='cdn')
    print("✅ HTML display ready - you can use HTML(html_content) to display")
except Exception as e:
    print(f"❌ HTML display failed: {e}")

print("\n💡 Recommendation: Use the HTML files in your browser for the best interactive experience!")


## Step 10: Export to Cypher

Generate Cypher scripts for importing the data into graph databases like Neo4j or Memgraph.


In [None]:
# Generate Cypher script
print("🔗 Generating Cypher script...")
cypher_script = generate_cypher_script(crm_entities)

# Save Cypher script
cypher_file = output_dir / "network.cypher"
with open(cypher_file, "w") as f:
    f.write(cypher_script)

print(f"✅ Generated Cypher script: {cypher_file}")

# Display first part of Cypher script
print("\n📋 Cypher Script Preview:")
cypher_lines = cypher_script.split('\n')
for line in cypher_lines[:20]:  # Show first 20 lines
    print(line)
if len(cypher_lines) > 20:
    print(f"... ({len(cypher_lines) - 20} more lines)")


## Step 11: Create Summary Report

Generate a comprehensive summary report of the entire workflow.


In [None]:
# Create summary report
print("📋 Creating summary report...")

summary_file = output_dir / "workflow_summary.md"
with open(summary_file, "w") as f:
    f.write("# COLLIE Workflow Summary\n\n")
    f.write(f"## Input Text\n\n{sample_text[:200]}...\n\n")
    f.write(f"## Extraction Results\n\n")
    f.write(f"- Entities extracted: {len(extraction_result.entities)}\n")
    f.write(f"- Relationships extracted: {len(extraction_result.relationships)}\n\n")
    f.write(f"## Network Analysis\n\n")
    f.write(f"- Nodes: {graph.number_of_nodes()}\n")
    f.write(f"- Edges: {graph.number_of_edges()}\n")
    f.write(f"- Density: {network_stats['basic_metrics']['density']:.3f}\n")
    f.write(f"- Communities: {len(communities)}\n\n")
    f.write(f"## Output Files\n\n")
    f.write(f"- Canonical JSON: {json_file}\n")
    f.write(f"- Markdown reports: {markdown_dir}\n")
    f.write(f"- Network plots: {plots_dir}\n")
    f.write(f"- Cypher script: {cypher_file}\n")

print(f"✅ Created summary report: {summary_file}")

# Display summary
print("\n📊 Workflow Summary:")
print(f"  Input text length: {len(sample_text)} characters")
print(f"  Entities extracted: {len(extraction_result.entities)}")
print(f"  Relationships extracted: {len(extraction_result.relationships)}")
print(f"  Network nodes: {graph.number_of_nodes()}")
print(f"  Network edges: {graph.number_of_edges()}")
print(f"  Communities found: {len(communities)}")
print(f"  Output directory: {output_dir}")


## 🎉 Complete Workflow Demo Finished!

### What We Accomplished:

1. **✅ AI Extraction**: Used PydanticAI to extract CRM entities from text
2. **✅ Canonical JSON**: Serialized entities for async/future processing
3. **✅ Markdown Reports**: Generated human-readable documentation
4. **✅ Network Analysis**: Created NetworkX graph and performed analysis
5. **✅ Interactive Visualizations**: Created multiple Plotly-based interactive plots
6. **✅ Enhanced Features**: Added advanced hover information and styling
7. **✅ Cypher Export**: Generated scripts for graph database import
8. **✅ Summary Report**: Documented the complete workflow

### Key Improvements with Plotly:

- **Interactive Exploration**: Zoom, pan, and hover over nodes for detailed information
- **Enhanced Styling**: Better color schemes and visual design
- **Multiple Views**: Community structure, centrality analysis, and complete network views
- **Rich Hover Information**: Detailed entity information on hover
- **Export Capabilities**: Save interactive HTML files for sharing

### Key Files Generated:

- `canonical_entities.json` - **Most important!** Ready for graph databases
- `network_overview.png` - Static visualization
- `interactive_network.html` - Interactive Plotly visualization
- `network.cypher` - Neo4j/Memgraph import script
- `workflow_summary.md` - Complete analysis report

### Next Steps:

1. **Import to Graph Database**: Use the Cypher script with Neo4j or Memgraph
2. **API Integration**: Use canonical JSON for web services
3. **Batch Processing**: Process multiple texts using the same workflow
4. **Advanced Analysis**: Explore more NetworkX algorithms and visualizations

The canonical JSON serialization ensures your extracted data is ready for any future processing needs! 🚀


## 🔧 Advanced Usage Examples

### Custom Entity Creation

You can also create entities manually and add them to the workflow:


In [None]:
# Create custom entities
custom_entities = [
    CRMEntity(id="custom1", class_code="E21", label="Marie Curie"),
    CRMEntity(id="custom2", class_code="E53", label="Paris"),
    CRMEntity(id="custom3", class_code="E5", label="Nobel Prize in Physics 1903")
]

# Add to existing entities
all_entities = crm_entities + custom_entities

print(f"✅ Created {len(custom_entities)} custom entities")
print(f"✅ Total entities: {len(all_entities)}")

# Create new graph with all entities
extended_graph = to_networkx_graph(all_entities)
print(f"✅ Extended graph: {extended_graph.number_of_nodes()} nodes, {extended_graph.number_of_edges()} edges")


### Batch Processing Multiple Texts

Process multiple texts and combine the results:


In [None]:
# Multiple texts for batch processing
texts = [
    "Isaac Newton discovered the laws of motion in England.",
    "Galileo Galilei was an Italian astronomer and physicist.",
    "Charles Darwin developed the theory of evolution by natural selection."
]

print("🔄 Processing multiple texts...")
all_extracted_entities = []

for i, text in enumerate(texts, 1):
    print(f"\nProcessing text {i}: {text[:50]}...")
    result = await extractor.extract_from_text(text)
    all_extracted_entities.extend(result.entities)
    print(f"  Extracted {len(result.entities)} entities")

print(f"\n✅ Total entities from all texts: {len(all_extracted_entities)}")

# Convert to CRM entities
batch_crm_entities = []
for entity in all_extracted_entities:
    crm_entity = CRMEntity(
        id=str(entity.id),
        class_code=entity.class_code,
        label=entity.label,
        notes=entity.description
    )
    batch_crm_entities.append(crm_entity)

# Serialize batch results
batch_json = [entity.model_dump(mode='json') for entity in batch_crm_entities]
batch_file = output_dir / "batch_canonical_entities.json"
with open(batch_file, "w") as f:
    json.dump(batch_json, f, indent=2)

print(f"✅ Saved batch canonical JSON: {batch_file}")


### Loading and Analyzing Existing Data

Load previously saved canonical JSON and analyze it:


In [None]:
# Load existing canonical JSON
print("📖 Loading existing canonical JSON...")
with open(json_file, "r") as f:
    existing_data = json.load(f)

# Convert back to CRM entities
loaded_entities = []
for entity_data in existing_data:
    entity = CRMEntity(**entity_data)
    loaded_entities.append(entity)

print(f"✅ Loaded {len(loaded_entities)} entities from existing JSON")

# Create new analysis
loaded_graph = to_networkx_graph(loaded_entities)
loaded_centrality = calculate_centrality_measures(loaded_graph)
loaded_communities = find_communities(loaded_graph)

print(f"✅ Recreated graph: {loaded_graph.number_of_nodes()} nodes, {loaded_graph.number_of_edges()} edges")
print(f"✅ Found {len(loaded_communities)} communities")

# This demonstrates the power of canonical JSON - you can always recreate the analysis!
