# COLLIE: Complete Workflow Demo

This notebook demonstrates the complete COLLIE workflow for cultural heritage information extraction and analysis:

1. **AI-powered Information Extraction** using PydanticAI
2. **Canonical JSON Serialization** for async/future processing
3. **Markdown Rendering** for human-readable reports
4. **NetworkX Graph Analysis** for social network analysis
5. **Interactive Visualization** with Plotly
6. **Cypher Export** for graph database integration

## Setup

First, let's import all necessary libraries and set up our environment.


In [1]:
import asyncio
import json
import sys
from pathlib import Path
from typing import List

# Add src to path for imports
sys.path.insert(0, str(Path.cwd() / "src"))

# COLLIE imports
from collie.models.base import CRMEntity
from collie.extraction import InformationExtractor
from collie.io.to_networkx import (
    to_networkx_graph,
    calculate_centrality_measures,
    find_communities,
    get_network_statistics
)
from collie.visualization import (
    create_interactive_plot,
    plot_community_network,
    plot_centrality_network
)
from collie.io.to_cypher import generate_cypher_script
from collie.io.to_markdown import render_table, to_markdown, MarkdownStyle

# Standard libraries
import networkx as nx
import plotly.graph_objects as go
import plotly.express as px
from plotly.offline import plot
import plotly.io as pio

print("✅ All imports successful!")
print(f"📁 Working directory: {Path.cwd()}")


✅ All imports successful!
📁 Working directory: /Users/davidspencer/Documents/GitHub/collie


## Sample Text: Albert Einstein Biography

We'll use a sample text about Albert Einstein to demonstrate the complete workflow.


In [2]:
# Sample text about Albert Einstein - Full Biography
sample_text = """
Albert Einstein was born on March 14, 1879, in Ulm, Württemberg, Germany. He grew up in a secular Jewish family. His father, Hermann Einstein, was a salesman and engineer who, with his brother, founded Elektrotechnische Fabrik J. Einstein & Cie, a Munich-based company that mass-produced electrical equipment. Einstein's mother, the former Pauline Koch, ran the family household. Einstein had one sister, Maja, born two years after him.

Einstein attended elementary school at the Luitpold Gymnasium in Munich. However, he felt alienated there and struggled with the institution's rigid pedagogical style. He also had what were considered speech challenges. However, he developed a passion for classical music and playing the violin, which would stay with him into his later years. Most significantly, Einstein's youth was marked by deep inquisitiveness and inquiry.

Toward the end of the 1880s, Max Talmud, a Polish medical student who sometimes dined with the Einstein family, became an informal tutor to young Einstein. Talmud had introduced his pupil to a children's science text that inspired Einstein to dream about the nature of light. Thus, during his teens, Einstein penned what would be seen as his first major paper, "The Investigation of the State of Aether in Magnetic Fields."

Hermann relocated the family to Milan, Italy, in the mid-1890s after his business lost out on a major contract. Einstein was left at a relative's boarding house in Munich to complete his schooling at the Luitpold.

Faced with military duty when he turned of age, Einstein allegedly withdrew from classes, using a doctor's note to excuse himself and claim nervous exhaustion. With their son rejoining them in Italy, his parents understood Einstein's perspective but were concerned about his future prospects as a school dropout and draft dodger.

Einstein was eventually able to gain admission into the Swiss Federal Institute of Technology in Zurich, specifically due to his superb mathematics and physics scores on the entrance exam. He was still required to complete his pre-university education first and thus attended a high school in Aarau, Switzerland, helmed by Jost Winteler. Einstein lived with the schoolmaster's family and fell in love with Winteler's daughter Marie. Einstein later renounced his German citizenship and became a Swiss citizen at the dawn of the new century.

After graduating from university, Einstein faced major challenges in terms of finding academic positions, having alienated some professors over not attending class more regularly in lieu of studying independently.

Einstein eventually found steady work in 1902 after receiving a referral for a clerk position in a Swiss patent office. While working at the patent office, Einstein had the time to further explore ideas that had taken hold during his university studies and thus cemented his theorems on what would be known as the principle of relativity.

In 1905—seen by many as a "miracle year" for the theorist—Einstein had four papers published in the Annalen der Physik, one of the best-known physics journals of the era. Two focused on the photoelectric effect and Brownian motion. The two others, which outlined E=MC2 and the special theory of relativity, were defining for Einstein's career and the course of the study of physics.

As a physicist, Einstein had many discoveries, but he is perhaps best known for his theory of relativity and the equation E=MC2, which foreshadowed the development of atomic power and the atomic bomb.

Einstein first proposed a special theory of relativity in 1905 in his paper "On the Electrodynamics of Moving Bodies," which took physics in an electrifying new direction. The theory explains that space and time are actually connected, and Einstein called this joint structure space-time.

By November 1915, Einstein completed the general theory of relativity, which accounted for gravity's relationship to space-time. Einstein considered this theory the culmination of his life research. He was convinced of the merits of general relativity because it allowed for a more accurate prediction of planetary orbits around the sun, which fell short in Isaac Newton's theory. It also offered a more expansive, nuanced explanation of how gravitational forces worked.

Einstein's assertions were affirmed via observations and measurements by British astronomers Sir Frank Dyson and Sir Arthur Eddington during the 1919 solar eclipse, and thus a global science icon was born. Today, the theories of relativity underpin the accuracy of GPS technology, among other phenomena.

Einstein's 1905 paper on the matter-energy relationship proposed the equation E=MC²: the energy of a body (E) is equal to the mass (M) of that body times the speed of light squared (C²). This equation suggested that tiny particles of matter could be converted into huge amounts of energy, a discovery that heralded atomic power.

Famed quantum theorist Max Planck backed up the assertions of Einstein, who thus became a star of the lecture circuit and academia, taking on various positions before becoming director of the Kaiser Wilhelm Institute for Physics (today is known as the Max Planck Institute for Physics) from 1917 to 1933.

In 1921, Einstein won the Nobel Prize in Physics for his explanation of the photoelectric effect, since his ideas on relativity were still considered questionable. He wasn't actually given the award until the following year due to a bureaucratic ruling, and during his acceptance speech, he still opted to speak about relativity.

Einstein married Mileva Maric on January 6, 1903. While attending school in Zurich, Einstein met Maric, a Serbian physics student. Einstein continued to grow closer to Maric, but his parents were strongly against the relationship due to her ethnic background.

Nonetheless, Einstein continued to see her, with the two developing a correspondence via letters in which he expressed many of his scientific ideas. Einstein's father passed away in 1902, and the couple married shortly thereafter.

Einstein and Mavic had three children. Their daughter, Lieserl, was born in 1902 before their wedding and might have been later raised by Maric's relatives or given up for adoption. Her ultimate fate and whereabouts remain a mystery. The couple also had two sons: Hans Albert Einstein, who became a well-known hydraulic engineer, and Eduard "Tete" Einstein, who was diagnosed with schizophrenia as a young man.

The Einsteins' marriage would not be a happy one, with the two divorcing in 1919 and Maric having an emotional breakdown in connection to the split. Einstein, as part of a settlement, agreed to give Maric any funds he might receive from possibly winning the Nobel Prize in the future.

During his marriage to Maric, Einstein had also begun an affair some time earlier with a cousin, Elsa Löwenthal. The couple wed in 1919, the same year of Einstein's divorce. He would continue to see other women throughout his second marriage, which ended with Löwenthal's death in 1936.

In 1933, Einstein took on a position at the Institute for Advanced Study in Princeton, New Jersey, where he would spend the rest of his life.

At the time the Nazis, led by Adolf Hitler, were gaining prominence with violent propaganda and vitriol in an impoverished post-World War I Germany. The Nazi Party influenced other scientists to label Einstein's work "Jewish physics." Jewish citizens were barred from university work and other official jobs, and Einstein himself was targeted to be killed. Meanwhile, other European scientists also left regions threatened by Germany and immigrated to the United States, with concern over Nazi strategies to create an atomic weapon.

Not long after moving and beginning his career at IAS, Einstein expressed an appreciation for American meritocracy and the opportunities people had for free thought, a stark contrast to his own experiences coming of age. In 1935, Einstein was granted permanent residency in his adopted country and became an American citizen five years later.

In America, Einstein mostly devoted himself to working on a unified field theory, an all-embracing paradigm meant to unify the varied laws of physics. However, during World War II, he worked on Navy-based weapons systems and made big monetary donations to the military by auctioning off manuscripts worth millions.

In 1939, Einstein and fellow physicist Leo Szilard wrote to President Franklin D. Roosevelt to alert him of the possibility of a Nazi bomb and to galvanize the United States to create its own nuclear weapons.

The United States would eventually initiate the Manhattan Project, though Einstein wouldn't take a direct part in its implementation due to his pacifist and socialist affiliations. Einstein was also the recipient of much scrutiny and major distrust from FBI director J. Edgar Hoover. In July 1940, the U.S. Army Intelligence office denied Einstein a security clearance to participate in the project, meaning J. Robert Oppenheimer and the scientists working in Los Alamos were forbidden from consulting with him.

Einstein had no knowledge of the U.S. plan to use atomic bombs in Japan in 1945. When he heard of the first bombing at Hiroshima, he reportedly said, "Ach! The world is not ready for it."

Einstein became a major player in efforts to curtail usage of the A-bomb. The following year, he and Szilard founded the Emergency Committee of Atomic Scientists, and in 1947, via an essay for The Atlantic Monthly, Einstein espoused working with the United Nations to maintain nuclear weapons as a deterrent to conflict.

After World War II, Einstein continued to work on his unified field theory and key aspects of his general theory of relativity, including time travel, wormholes, black holes, and the origins of the universe.

However, he felt isolated in his endeavors since the majority of his colleagues had begun focusing their attention on quantum theory. In the last decade of his life, Einstein, who had always seen himself as a loner, withdrew even further from any sort of spotlight, preferring to stay close to Princeton and immerse himself in processing ideas with colleagues.

In the late 1940s, Einstein became a member of the National Association for the Advancement of Colored People (NAACP), seeing the parallels between the treatment of Jews in Germany and Black people in the United States. He corresponded with scholar and activist W.E.B. Du Bois as well as performer Paul Robeson and campaigned for civil rights, calling racism a "disease" in a 1946 Lincoln University speech.

Einstein was very particular about his sleep schedule, claiming he needed 10 hours of sleep per day to function well. His theory of relativity allegedly came to him in a dream about cows being electrocuted. He was also known to take regular naps. He is said to have held objects like a spoon or pencil in his hand while falling asleep. That way, he could wake up before hitting the second stage of sleep—a hypnagogic process believed to boost creativity and capture sleep-inspired ideas.

Although sleep was important to Einstein, socks were not. He was famous for refusing to wear them. According to a letter he wrote to future wife Elsa, he stopped wearing them because he was annoyed by his big toe pushing through the material and creating a hole.

One of the most recognizable photos of the 20th century shows Einstein sticking out his tongue while leaving his 72nd birthday party on March 14, 1951.

According to Discovery.com, Einstein was leaving his party at Princeton when a swarm of reporters and photographers approached and asked him to smile. Tired from doing so all night, he refused and rebelliously stuck his tongue out at the crowd for a moment before turning away. UPI photographer Arthur Sasse captured the shot.

Einstein was amused by the picture and ordered several prints to give to his friends. He also signed a copy of the photo that sold for $125,000 at a 2017 auction.

Einstein died on April 18, 1955, at age 76 at the University Medical Center at Princeton. The previous day, while working on a speech to honor Israel's seventh anniversary, Einstein suffered an abdominal aortic aneurysm.

He was taken to the hospital for treatment but refused surgery, believing that he had lived his life and was content to accept his fate. "I want to go when I want," he stated at the time. "It is tasteless to prolong life artificially. I have done my share, it is time to go. I will do it elegantly."

According to the BBC, Einstein muttered a few words in German at the moment of his death. However, the nurse on duty didn't speak German so their translation was lost forever.

In a 2014 interview, Life magazine photographer Ralph Morse said the hospital was swarmed by journalists, photographers, and onlookers once word of Einstein's death spread. Morse decided to travel to Einstein's office at the Institute for Advanced Studies, offering the superintendent alcohol to gain access. He was able to photograph the office just as Einstein left it.

After an autopsy, Einstein's corpse was moved to a Princeton funeral home later that afternoon and then taken to Trenton, New Jersey, for a cremation ceremony. Morse said he was the only photographer present for the cremation, but Life managing editor Ed Thompson decided not to publish an exclusive story at the request of Einstein's son Hans.

During Einstein's autopsy, pathologist Thomas Stoltz Harvey had removed his brain, reportedly without his family's consent, for preservation and future study by doctors of neuroscience.

However, during his life, Einstein participated in brain studies, and at least one biography claimed he hoped researchers would study his brain after he died. Einstein's brain is now located at the Princeton University Medical Center. In keeping with his wishes, the rest of his body was cremated and the ashes scattered in a secret location.

In 1999, Canadian scientists who were studying Einstein's brain found that his inferior parietal lobe, the area that processes spatial relationships, 3D-visualization, and mathematical thought, was 15 percent wider than in people who possess normal intelligence. According to The New York Times, the researchers believe it might help explain why Einstein was so intelligent.

In 2011, the Mütter Museum in Philadelphia received thin slices of Einstein's brain from Dr. Lucy Rorke-Adams, a neuropathologist at the Children's Hospital of Philadelphia, and put them on display. Rorke-Adams said she received the brain slides from Harvey.
"""

print("📝 Sample Text:")
print("=" * 50)
print(sample_text[:500] + "..." if len(sample_text) > 500 else sample_text)
print("=" * 50)
print(f"Text length: {len(sample_text)} characters")
print(f"Preview: First 500 characters shown above")


📝 Sample Text:

Albert Einstein was born on March 14, 1879, in Ulm, Württemberg, Germany. He grew up in a secular Jewish family. His father, Hermann Einstein, was a salesman and engineer who, with his brother, founded Elektrotechnische Fabrik J. Einstein & Cie, a Munich-based company that mass-produced electrical equipment. Einstein's mother, the former Pauline Koch, ran the family household. Einstein had one sister, Maja, born two years after him.

Einstein attended elementary school at the Luitpold Gymnasium...
Text length: 14569 characters
Preview: First 500 characters shown above


## Step 1: AI-Powered Information Extraction

Use PydanticAI to extract CRM entities and relationships from the text.


In [3]:
# Initialize the information extractor
extractor = InformationExtractor()

# Extract entities and relationships
print("🔍 Extracting entities using PydanticAI...")
extraction_result = await extractor.extract_from_text(sample_text)

print(f"✅ Extracted {len(extraction_result.entities)} entities")
print(f"✅ Extracted {len(extraction_result.relationships)} relationships")

# Display extracted entities
print("\n📋 Extracted Entities:")
for i, entity in enumerate(extraction_result.entities, 1):
    print(f"{i}. {entity.label} ({entity.class_code}) - Confidence: {entity.confidence:.2f}")

# Display extracted relationships
if extraction_result.relationships:
    print("\n🔗 Extracted Relationships:")
    for i, rel in enumerate(extraction_result.relationships, 1):
        print(f"{i}. {rel.property_label} - Confidence: {rel.confidence:.2f}")


UserError: Unknown keyword arguments: `result_type`

## Step 2: Convert to CRM Entities

Transform the extracted data into proper CIDOC CRM entities.


In [None]:
# Convert extracted entities to CRM entities
crm_entities = []
for entity in extraction_result.entities:
    crm_entity = CRMEntity(
        id=str(entity.id),
        class_code=entity.class_code,
        label=entity.label,
        notes=entity.description,
        type=[entity.class_code]
    )
    crm_entities.append(crm_entity)

print(f"✅ Created {len(crm_entities)} CRM entities")

# Display CRM entities
print("\n🏗️ CRM Entities:")
for i, entity in enumerate(crm_entities, 1):
    print(f"{i}. {entity.label} ({entity.class_code})")
    print(f"   ID: {entity.id}")
    print(f"   Notes: {entity.notes[:100]}..." if entity.notes and len(entity.notes) > 100 else f"   Notes: {entity.notes}")
    print()


## Step 3: Serialize as Canonical JSON

**This is the crucial step for async/future processing!** Serialize the CRM entities as canonical JSON that can be easily loaded into graph databases, APIs, or other processing pipelines.


In [None]:
# Serialize as canonical JSON using Pydantic models
json_data = [entity.model_dump(mode='json') for entity in crm_entities]

print(f"💾 Serialized {len(json_data)} entities to canonical JSON")

# Create output directory
output_dir = Path("notebook_output")
output_dir.mkdir(exist_ok=True)

# Save canonical JSON
json_file = output_dir / "canonical_entities.json"
with open(json_file, "w") as f:
    json.dump(json_data, f, indent=2)

print(f"✅ Saved canonical JSON to: {json_file}")

# Display the structure of canonical JSON
print("\n📋 Canonical JSON Structure (first entity):")
print(json.dumps(json_data[0], indent=2))

# Demonstrate loading back from JSON
print("\n🔄 Loading back from canonical JSON...")
with open(json_file, "r") as f:
    loaded_data = json.load(f)

loaded_entities = []
for entity_data in loaded_data:
    entity = CRMEntity(**entity_data)
    loaded_entities.append(entity)

print(f"✅ Successfully loaded {len(loaded_entities)} entities from JSON")
print(f"✅ Data integrity verified: {len(loaded_entities) == len(crm_entities)}")


## Step 4: Render to Markdown

Generate human-readable reports in Markdown format.


In [None]:
# Create markdown directory
markdown_dir = output_dir / "markdown"
markdown_dir.mkdir(exist_ok=True)

# Generate individual entity cards
print("📄 Generating Markdown reports...")
for i, entity in enumerate(crm_entities[:5]):  # Show first 5 entities
    markdown_card = to_markdown(entity, MarkdownStyle.CARD)
    card_file = markdown_dir / f"entity_{i+1}_{entity.class_code}.md"
    with open(card_file, "w") as f:
        f.write(markdown_card)
    print(f"✅ Generated entity card: {card_file}")

# Generate summary table
table_markdown = render_table(crm_entities)
table_file = markdown_dir / "entities_summary.md"
with open(table_file, "w") as f:
    f.write("# CRM Entities Summary\n\n" + table_markdown)

print(f"✅ Generated summary table: {table_file}")

# Display the summary table
print("\n📊 Entities Summary Table:")
print(table_markdown)


## Step 5: Convert to NetworkX Graph

Transform the CRM entities into a NetworkX graph for social network analysis.


In [None]:
# Convert to NetworkX graph
graph = to_networkx_graph(crm_entities)

print(f"🕸️ Created NetworkX graph with {graph.number_of_nodes()} nodes and {graph.number_of_edges()} edges")

# Display graph information
print("\n📊 Graph Information:")
print(f"Nodes: {list(graph.nodes())[:5]}..." if len(graph.nodes()) > 5 else f"Nodes: {list(graph.nodes())}")
print(f"Edges: {list(graph.edges())[:5]}..." if len(graph.edges()) > 5 else f"Edges: {list(graph.edges())}")

# Display node data
print("\n🏷️ Node Data (first 3 nodes):")
for i, (node_id, data) in enumerate(list(graph.nodes(data=True))[:3]):
    print(f"Node {i+1}: {data}")


## Step 6: Network Analysis

Perform comprehensive network analysis including centrality measures, community detection, and network statistics.


In [None]:
# Calculate centrality measures
print("📊 Calculating centrality measures...")
centrality_measures = calculate_centrality_measures(graph)

print(f"✅ Calculated centrality measures: {list(centrality_measures.keys())}")

# Display degree centrality (most connected nodes)
if 'degree_centrality' in centrality_measures:
    degree_centrality = centrality_measures['degree_centrality']
    print("\n🔗 Degree Centrality (most connected nodes):")
    sorted_nodes = sorted(degree_centrality.items(), key=lambda x: x[1], reverse=True)
    for node_id, centrality in sorted_nodes[:5]:
        node_data = graph.nodes[node_id]
        print(f"  {node_data.get('label', node_id)}: {centrality:.3f}")

# Find communities
print("\n🏘️ Finding communities...")
communities = find_communities(graph)
print(f"✅ Found {len(communities)} communities")

if communities:
    print("\nCommunity Structure:")
    for i, community in enumerate(communities[:3]):  # Show first 3 communities
        community_labels = [graph.nodes[node_id].get('label', node_id) for node_id in community]
        print(f"  Community {i+1}: {community_labels}")

# Get network statistics
print("\n📈 Calculating network statistics...")
network_stats = get_network_statistics(graph)

print("\n📊 Network Statistics:")
print(f"  Density: {network_stats['basic_metrics']['density']:.3f}")
print(f"  Average clustering: {network_stats['connectivity']['average_clustering']:.3f}")
print(f"  Is connected: {network_stats['basic_metrics']['is_connected']}")

if 'degree_stats' in network_stats:
    degree_stats = network_stats['degree_stats']
    print(f"  Average degree: {degree_stats['avg_degree']:.2f}")
    print(f"  Max degree: {degree_stats['max_degree']}")
    print(f"  Min degree: {degree_stats['min_degree']}")


## Step 7: Interactive Network Visualization

Create an interactive network visualization using Plotly.


In [None]:
# Create plots directory
plots_dir = output_dir / "plots"
plots_dir.mkdir(exist_ok=True)

# Create interactive network plot
print("🎨 Creating interactive network visualization...")
interactive_fig = create_interactive_plot(
    graph,
    title="Einstein's Life Network",
    node_size_multiplier=20,
    edge_width_multiplier=2
)

# Save interactive plot
interactive_file = plots_dir / "network_overview.html"
interactive_fig.write_html(str(interactive_file))
print(f"✅ Generated interactive plot: {interactive_file}")

# Display in notebook - using alternative method
try:
    interactive_fig.show()
except ValueError as e:
    print(f"⚠️  Display issue: {e}")
    print("📁 Plot saved as HTML file - you can open it in your browser")
    print(f"🔗 File location: {interactive_file}")
    print("💡 Tip: Open the HTML file in your browser to view the interactive plot")


## Step 8: Enhanced Interactive Features

Create additional interactive visualizations with community structure and centrality analysis.


In [None]:
# Create enhanced interactive visualizations
print("🎨 Creating enhanced interactive visualizations...")

# 1. Community Structure Visualization
if communities:
    print("📊 Creating community structure visualization...")
    community_fig = create_interactive_plot(
        graph,
        title="Community Structure - Einstein's Network",
        node_size_multiplier=25,
        edge_width_multiplier=2
    )
    
    # Add community colors
    community_colors = px.colors.qualitative.Set3[:len(communities)]
    for i, community in enumerate(communities):
        for node_id in community:
            if node_id in community_fig.data[1].marker.color:
                # Update node color for community
                pass
    
    community_file = plots_dir / "community_structure.html"
    community_fig.write_html(str(community_file))
    print(f"✅ Generated community plot: {community_file}")
    
    # Display community visualization
    try:
        community_fig.show()
    except ValueError as e:
        print(f"⚠️  Display issue: {e}")
        print(f"📁 Community plot saved: {community_file}")

# 2. Centrality Analysis Visualization
print("📊 Creating centrality analysis visualization...")
centrality_fig = create_interactive_plot(
    graph,
    title="Centrality Analysis - Einstein's Network",
    node_size_multiplier=30,
    edge_width_multiplier=1.5
)

centrality_file = plots_dir / "centrality_analysis.html"
centrality_fig.write_html(str(centrality_file))
print(f"✅ Generated centrality plot: {centrality_file}")

# Display centrality visualization
try:
    centrality_fig.show()
except ValueError as e:
    print(f"⚠️  Display issue: {e}")
    print(f"📁 Centrality plot saved: {centrality_file}")

# 3. Main Interactive Network
print("📊 Creating main interactive network...")
main_fig = create_interactive_plot(
    graph,
    title="Interactive Einstein Network - Complete View",
    node_size_multiplier=20,
    edge_width_multiplier=2
)

main_file = plots_dir / "interactive_network.html"
main_fig.write_html(str(main_file))
print(f"✅ Generated main interactive plot: {main_file}")

# Display main visualization
try:
    main_fig.show()
except ValueError as e:
    print(f"⚠️  Display issue: {e}")
    print(f"📁 Main plot saved: {main_file}")


## Step 9: Advanced Interactive Features

Create advanced interactive visualizations with filtering, enhanced hover information, and dynamic layouts.


In [None]:
# Create advanced interactive visualizations with enhanced features
print("🎨 Creating advanced interactive visualizations...")

# Enhanced interactive plot with better hover information
def create_enhanced_interactive_plot(graph, title, node_size_multiplier=20, edge_width_multiplier=2):
    """Create an enhanced interactive plot with better hover information and styling."""
    
    # Get layout positions
    pos = nx.spring_layout(graph, k=3, iterations=50)
    
    # Prepare edge traces with enhanced styling
    edge_x = []
    edge_y = []
    edge_hovertext = []
    
    for edge in graph.edges(data=True):
        x0, y0 = pos[edge[0]]
        x1, y1 = pos[edge[1]]
        edge_x.extend([x0, x1, None])
        edge_y.extend([y0, y1, None])
        
        # Enhanced edge hover information
        edge_data = edge[2]
        property_code = edge_data.get('property_code', 'Unknown')
        edge_hovertext.append(f"Relationship: {property_code}<br>From: {edge[0]}<br>To: {edge[1]}")
    
    edge_trace = go.Scatter(
        x=edge_x, y=edge_y,
        line=dict(width=edge_width_multiplier, color="#888"),
        hoverinfo="none",
        mode="lines",
        name="Relationships"
    )
    
    # Prepare node traces with enhanced information
    node_x = []
    node_y = []
    node_text = []
    node_hovertext = []
    node_colors = []
    node_sizes = []
    node_labels = []
    
    for node in graph.nodes():
        x, y = pos[node]
        node_x.append(x)
        node_y.append(y)
        
        node_data = graph.nodes[node]
        label = node_data.get('label', node)
        class_code = node_data.get('class_code', 'Unknown')
        degree = graph.degree(node)
        
        node_text.append(label)
        node_labels.append(label)
        
        # Enhanced hover information
        hover_info = f"""
        <b>{label}</b><br>
        ID: {node}<br>
        Class: {class_code}<br>
        Degree: {degree}<br>
        Type: {node_data.get('type', 'Unknown')}
        """
        node_hovertext.append(hover_info)
        
        # Color by class code with better color scheme
        color_map = {
            "E21": "#FF6B6B",      # Person - Red
            "E5": "#4ECDC4",       # Event - Teal
            "E53": "#45B7D1",      # Place - Blue
            "E22": "#96CEB4",      # Object - Green
            "E52": "#FFEAA7",      # Time - Yellow
            "E12": "#DDA0DD",      # Production - Purple
            "E39": "#F39C12",      # Actor - Orange
        }
        node_colors.append(color_map.get(class_code, "#95A5A6"))
        
        # Size by degree with better scaling
        node_sizes.append(max(15, degree * node_size_multiplier + 10))
    
    node_trace = go.Scatter(
        x=node_x, y=node_y,
        mode="markers+text",
        hoverinfo="text",
        hovertext=node_hovertext,
        text=node_text,
        textposition="middle center",
        textfont=dict(size=10, color="white"),
        marker=dict(
            size=node_sizes,
            color=node_colors,
            line=dict(width=2, color="white"),
            opacity=0.8
        ),
        name="Entities"
    )
    
    # Create figure with enhanced layout
    fig = go.Figure(
        data=[edge_trace, node_trace],
        layout=go.Layout(
            title=dict(
                text=title,
                font=dict(size=20, color="#2C3E50"),
                x=0.5
            ),
            showlegend=True,
            hovermode="closest",
            margin=dict(b=20, l=5, r=5, t=60),
            annotations=[
                dict(
                    text="Interactive CRM Network - Hover over nodes for details",
                    showarrow=False,
                    xref="paper", yref="paper",
                    x=0.005, y=-0.002,
                    xanchor="left", yanchor="bottom",
                    font=dict(color="#7F8C8D", size=12)
                )
            ],
            xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
            yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
            plot_bgcolor="white",
            paper_bgcolor="white",
            font=dict(family="Arial, sans-serif")
        )
    )
    
    return fig

# Create enhanced interactive plot
enhanced_fig = create_enhanced_interactive_plot(
    graph,
    title="Enhanced Interactive Einstein Network",
    node_size_multiplier=25,
    edge_width_multiplier=2
)

# Save enhanced plot
enhanced_file = plots_dir / "enhanced_interactive_network.html"
enhanced_fig.write_html(str(enhanced_file))
print(f"✅ Generated enhanced interactive plot: {enhanced_file}")

# Display enhanced visualization
try:
    enhanced_fig.show()
except ValueError as e:
    print(f"⚠️  Display issue: {e}")
    print(f"📁 Enhanced plot saved: {enhanced_file}")
    print("💡 Tip: Open the HTML file in your browser to view the interactive plot")

print("\n🎯 Enhanced Features:")
print("  - Better hover information with entity details")
print("  - Improved color scheme for entity types")
print("  - Dynamic node sizing based on degree")
print("  - Enhanced styling and layout")
print("  - Interactive zoom and pan capabilities")


## 📋 How to View Your Interactive Plots

If you encountered display issues in the notebook, here are several ways to view your interactive Plotly visualizations:

### Method 1: Open HTML Files in Browser
All interactive plots are saved as HTML files in the `plots/` directory. You can:
1. Navigate to the `notebook_output/plots/` folder
2. Double-click any `.html` file to open it in your browser
3. Enjoy the full interactive experience with zoom, pan, and hover features

### Method 2: Use Alternative Display Methods
```python
# Alternative 1: Display with specific renderer
import plotly.io as pio
pio.renderers.default = "browser"  # Opens in browser
fig.show()

# Alternative 2: Display as static image
fig.show("png")  # Saves as PNG

# Alternative 3: Display in iframe
from IPython.display import HTML
HTML(fig.to_html(include_plotlyjs='cdn'))
```

### Method 3: Fix Jupyter Environment
To fix the display issue permanently:
1. Restart your Jupyter server
2. Change kernel to "Collie (uv)" in your notebook
3. Re-run the cells

### Generated Files
- `network_overview.html` - Main interactive network
- `community_structure.html` - Community analysis
- `centrality_analysis.html` - Centrality analysis  
- `interactive_network.html` - Complete network view
- `enhanced_interactive_network.html` - Enhanced version with rich hover info


In [None]:
# Alternative display methods for Plotly visualizations
print("🔧 Alternative Display Methods")

# Method 1: Display with browser renderer
print("\n1. Browser Display:")
try:
    import plotly.io as pio
    pio.renderers.default = "browser"
    print("✅ Browser renderer set - plots will open in browser")
except Exception as e:
    print(f"❌ Browser renderer failed: {e}")

# Method 2: Display as static image
print("\n2. Static Image Display:")
try:
    # Create a simple test plot
    import plotly.graph_objects as go
    test_fig = go.Figure(data=go.Scatter(x=[1, 2, 3], y=[4, 5, 6]))
    
    # Save as PNG
    test_fig.write_image(plots_dir / "test_plot.png")
    print("✅ Static image display works - test plot saved as PNG")
except Exception as e:
    print(f"❌ Static image display failed: {e}")

# Method 3: HTML display with IPython
print("\n3. HTML Display:")
try:
    from IPython.display import HTML
    html_content = enhanced_fig.to_html(include_plotlyjs='cdn')
    print("✅ HTML display ready - you can use HTML(html_content) to display")
except Exception as e:
    print(f"❌ HTML display failed: {e}")

print("\n💡 Recommendation: Use the HTML files in your browser for the best interactive experience!")


## Step 10: Export to Cypher

Generate Cypher scripts for importing the data into graph databases like Neo4j or Memgraph.


In [None]:
# Generate Cypher script
print("🔗 Generating Cypher script...")
cypher_script = generate_cypher_script(crm_entities)

# Save Cypher script
cypher_file = output_dir / "network.cypher"
with open(cypher_file, "w") as f:
    f.write(cypher_script)

print(f"✅ Generated Cypher script: {cypher_file}")

# Display first part of Cypher script
print("\n📋 Cypher Script Preview:")
cypher_lines = cypher_script.split('\n')
for line in cypher_lines[:20]:  # Show first 20 lines
    print(line)
if len(cypher_lines) > 20:
    print(f"... ({len(cypher_lines) - 20} more lines)")


## Step 11: Create Summary Report

Generate a comprehensive summary report of the entire workflow.


In [None]:
# Create summary report
print("📋 Creating summary report...")

summary_file = output_dir / "workflow_summary.md"
with open(summary_file, "w") as f:
    f.write("# COLLIE Workflow Summary\n\n")
    f.write(f"## Input Text\n\n{sample_text[:200]}...\n\n")
    f.write(f"## Extraction Results\n\n")
    f.write(f"- Entities extracted: {len(extraction_result.entities)}\n")
    f.write(f"- Relationships extracted: {len(extraction_result.relationships)}\n\n")
    f.write(f"## Network Analysis\n\n")
    f.write(f"- Nodes: {graph.number_of_nodes()}\n")
    f.write(f"- Edges: {graph.number_of_edges()}\n")
    f.write(f"- Density: {network_stats['basic_metrics']['density']:.3f}\n")
    f.write(f"- Communities: {len(communities)}\n\n")
    f.write(f"## Output Files\n\n")
    f.write(f"- Canonical JSON: {json_file}\n")
    f.write(f"- Markdown reports: {markdown_dir}\n")
    f.write(f"- Network plots: {plots_dir}\n")
    f.write(f"- Cypher script: {cypher_file}\n")

print(f"✅ Created summary report: {summary_file}")

# Display summary
print("\n📊 Workflow Summary:")
print(f"  Input text length: {len(sample_text)} characters")
print(f"  Entities extracted: {len(extraction_result.entities)}")
print(f"  Relationships extracted: {len(extraction_result.relationships)}")
print(f"  Network nodes: {graph.number_of_nodes()}")
print(f"  Network edges: {graph.number_of_edges()}")
print(f"  Communities found: {len(communities)}")
print(f"  Output directory: {output_dir}")


## 🎉 Complete Workflow Demo Finished!

### What We Accomplished:

1. **✅ AI Extraction**: Used PydanticAI to extract CRM entities from text
2. **✅ Canonical JSON**: Serialized entities for async/future processing
3. **✅ Markdown Reports**: Generated human-readable documentation
4. **✅ Network Analysis**: Created NetworkX graph and performed analysis
5. **✅ Interactive Visualizations**: Created multiple Plotly-based interactive plots
6. **✅ Enhanced Features**: Added advanced hover information and styling
7. **✅ Cypher Export**: Generated scripts for graph database import
8. **✅ Summary Report**: Documented the complete workflow

### Key Improvements with Plotly:

- **Interactive Exploration**: Zoom, pan, and hover over nodes for detailed information
- **Enhanced Styling**: Better color schemes and visual design
- **Multiple Views**: Community structure, centrality analysis, and complete network views
- **Rich Hover Information**: Detailed entity information on hover
- **Export Capabilities**: Save interactive HTML files for sharing

### Key Files Generated:

- `canonical_entities.json` - **Most important!** Ready for graph databases
- `network_overview.png` - Static visualization
- `interactive_network.html` - Interactive Plotly visualization
- `network.cypher` - Neo4j/Memgraph import script
- `workflow_summary.md` - Complete analysis report

### Next Steps:

1. **Import to Graph Database**: Use the Cypher script with Neo4j or Memgraph
2. **API Integration**: Use canonical JSON for web services
3. **Batch Processing**: Process multiple texts using the same workflow
4. **Advanced Analysis**: Explore more NetworkX algorithms and visualizations

The canonical JSON serialization ensures your extracted data is ready for any future processing needs! 🚀


## 🔧 Advanced Usage Examples

### Custom Entity Creation

You can also create entities manually and add them to the workflow:


In [None]:
# Create custom entities
custom_entities = [
    CRMEntity(id="custom1", class_code="E21", label="Marie Curie"),
    CRMEntity(id="custom2", class_code="E53", label="Paris"),
    CRMEntity(id="custom3", class_code="E5", label="Nobel Prize in Physics 1903")
]

# Add to existing entities
all_entities = crm_entities + custom_entities

print(f"✅ Created {len(custom_entities)} custom entities")
print(f"✅ Total entities: {len(all_entities)}")

# Create new graph with all entities
extended_graph = to_networkx_graph(all_entities)
print(f"✅ Extended graph: {extended_graph.number_of_nodes()} nodes, {extended_graph.number_of_edges()} edges")


### Batch Processing Multiple Texts

Process multiple texts and combine the results:


In [None]:
# Multiple texts for batch processing
texts = [
    "Isaac Newton discovered the laws of motion in England.",
    "Galileo Galilei was an Italian astronomer and physicist.",
    "Charles Darwin developed the theory of evolution by natural selection."
]

print("🔄 Processing multiple texts...")
all_extracted_entities = []

for i, text in enumerate(texts, 1):
    print(f"\nProcessing text {i}: {text[:50]}...")
    result = await extractor.extract_from_text(text)
    all_extracted_entities.extend(result.entities)
    print(f"  Extracted {len(result.entities)} entities")

print(f"\n✅ Total entities from all texts: {len(all_extracted_entities)}")

# Convert to CRM entities
batch_crm_entities = []
for entity in all_extracted_entities:
    crm_entity = CRMEntity(
        id=str(entity.id),
        class_code=entity.class_code,
        label=entity.label,
        notes=entity.description
    )
    batch_crm_entities.append(crm_entity)

# Serialize batch results
batch_json = [entity.model_dump(mode='json') for entity in batch_crm_entities]
batch_file = output_dir / "batch_canonical_entities.json"
with open(batch_file, "w") as f:
    json.dump(batch_json, f, indent=2)

print(f"✅ Saved batch canonical JSON: {batch_file}")


### Loading and Analyzing Existing Data

Load previously saved canonical JSON and analyze it:


In [None]:
# Load existing canonical JSON
print("📖 Loading existing canonical JSON...")
with open(json_file, "r") as f:
    existing_data = json.load(f)

# Convert back to CRM entities
loaded_entities = []
for entity_data in existing_data:
    entity = CRMEntity(**entity_data)
    loaded_entities.append(entity)

print(f"✅ Loaded {len(loaded_entities)} entities from existing JSON")

# Create new analysis
loaded_graph = to_networkx_graph(loaded_entities)
loaded_centrality = calculate_centrality_measures(loaded_graph)
loaded_communities = find_communities(loaded_graph)

print(f"✅ Recreated graph: {loaded_graph.number_of_nodes()} nodes, {loaded_graph.number_of_edges()} edges")
print(f"✅ Found {len(loaded_communities)} communities")

# This demonstrates the power of canonical JSON - you can always recreate the analysis!
