# Winston Churchill - breaking down a memorable speech

In [1]:
RAG_QUERY="What role did King Leopold play during the evacuation of Dunkirk?"

SRC_FILE_URL = "https://raw.githubusercontent.com/donbr/kg_rememberall/refs/heads/main/references/winston_churchill_we_shall_fight_speech_june_1940.txt"

## setup and configuration

In [2]:
%pip install -q ipywidgets lightrag-hku openai aioboto3 tiktoken nano_vectordb
%pip install -q arize-phoenix-otel openinference-instrumentation-openai openai 'httpx<0.28'


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [3]:
import os
from pathlib import Path

# Define configuration constants
DATA_DIR = Path("../data")  # Base data directory
INTERIM_DIR = DATA_DIR / "interim"  # Interim data directory
PROCESSED_DIR = DATA_DIR / "processed"  # Processed data directory

In [4]:
import requests
import shutil

# Define URL and local file path
SRC_FILE_URL = "https://raw.githubusercontent.com/donbr/kg_rememberall/refs/heads/main/references/winston_churchill_we_shall_fight_speech_june_1940.txt"
file_name = SRC_FILE_URL.split("/")[-1].replace(".", "_").lower()

WORKING_DIR = INTERIM_DIR / file_name

# Replace operation: ensure WORKING_DIR is fresh
if os.path.exists(WORKING_DIR):
    shutil.rmtree(WORKING_DIR)  # Remove the existing directory and its contents
os.mkdir(WORKING_DIR)           # Create a new, empty directory

In [5]:
# Fetch and save the file
response = requests.get(SRC_FILE_URL)
response.raise_for_status()  # Raise an exception for HTTP errors
local_file_path = WORKING_DIR / f"{file_name}.txt"
local_file_path.write_text(response.text)

# Define file paths
GRAPHML_FILE = WORKING_DIR / f"graph_chunk_entity_relation.graphml"
PYVIS_HTML_FILE = PROCESSED_DIR / f"{file_name}.html"

### Arize Phoenix - telemetry

- UI endpoint:  http://localhost:6006
- NOTE:  the Docker container will be removed when you shut down the notebook.

In [6]:
# for more information refer to https://docs.arize.com/phoenix/tracing/integrations-tracing/autogen-support#docker
# !docker run -p 6006:6006 -p 4317:4317 arizephoenix/phoenix:latest

import subprocess

# Run the Docker container without interactive mode
subprocess.Popen([
    "docker", "run", "-p", "6006:6006", "-p", "4317:4317",
    "--rm", "arizephoenix/phoenix:latest"
])

<Popen: returncode: None args: ['docker', 'run', '-p', '6006:6006', '-p', '4...>

In [7]:
from phoenix.otel import register

# defaults to endpoint="http://localhost:4317"
tracer_provider = register(
  project_name="lightrag-openai", # Default is 'default'
  endpoint="http://localhost:4317",  # Sends traces using gRPC
)

🔭 OpenTelemetry Tracing Details 🔭
|  Phoenix Project: lightrag-openai
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: localhost:4317
|  Transport: gRPC
|  Transport Headers: {'authorization': '****', 'user-agent': '****'}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.



In [8]:
from openinference.instrumentation.openai import OpenAIInstrumentor

OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)

## Populate the Graph

- Initialize LightRAG and OpenAI connection

In [9]:
import os
from lightrag import LightRAG, QueryParam
from lightrag.llm import gpt_4o_mini_complete
import nano_vectordb

# below two lines required if running in a jupyter notebook to handle the async nature of rag.insert()
import nest_asyncio
nest_asyncio.apply()

rag = LightRAG(
    working_dir=WORKING_DIR,
    llm_model_func=gpt_4o_mini_complete
)

🏃‍♀️‍➡️ Running migrations on the database.
---------------------------
2025-01-21 00:01:44,253 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-01-21 00:01:44,253 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("alembic_version")
2025-01-21 00:01:44,253 INFO sqlalchemy.engine.Engine [raw sql] ()
2025-01-21 00:01:44,254 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("alembic_version")
2025-01-21 00:01:44,254 INFO sqlalchemy.engine.Engine [raw sql] ()
2025-01-21 00:01:44,254 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("alembic_version")
2025-01-21 00:01:44,254 INFO sqlalchemy.engine.Engine [raw sql] ()
2025-01-21 00:01:44,254 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("alembic_version")
2025-01-21 00:01:44,254 INFO sqlalchemy.engine.Engine [raw sql] ()
2025-01-21 00:01:44,255 INFO sqlalchemy.engine.Engine 
CREATE TABLE alembic_version (
	version_num VARCHAR(32) NOT NULL, 
	CONSTRAINT alembic_version_pkc PRIMARY KEY (version_num)
)


2025-01-21 00:01:44,2

INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:6006 (Press CTRL+C to quit)




██████╗ ██╗  ██╗ ██████╗ ███████╗███╗   ██╗██╗██╗  ██╗
██╔══██╗██║  ██║██╔═══██╗██╔════╝████╗  ██║██║╚██╗██╔╝
██████╔╝███████║██║   ██║█████╗  ██╔██╗ ██║██║ ╚███╔╝
██╔═══╝ ██╔══██║██║   ██║██╔══╝  ██║╚██╗██║██║ ██╔██╗
██║     ██║  ██║╚██████╔╝███████╗██║ ╚████║██║██╔╝ ██╗
╚═╝     ╚═╝  ╚═╝ ╚═════╝ ╚══════╝╚═╝  ╚═══╝╚═╝╚═╝  ╚═╝ v7.7.1

|
|  🌎 Join our Community 🌎
|  https://join.slack.com/t/arize-ai/shared_invite/zt-1px8dcmlf-fmThhDFD_V_48oU7ALan4Q
|
|  ⭐️ Leave us a Star ⭐️
|  https://github.com/Arize-ai/phoenix
|
|  📚 Documentation 📚
|  https://docs.arize.com/phoenix
|
|  🚀 Phoenix Server 🚀
|  Phoenix UI: http://0.0.0.0:6006
|  Authentication: False
|  Websockets: True
|  Log traces:
|    - gRPC: http://0.0.0.0:4317
|    - HTTP: http://0.0.0.0:6006/v1/traces
|  Storage: sqlite:////root/.phoenix/phoenix.db



INFO:lightrag:Logger initialized for working directory: ../data/interim/winston_churchill_we_shall_fight_speech_june_1940_txt
INFO:lightrag:Load KV llm_response_cache with 0 data
INFO:lightrag:Load KV full_docs with 0 data
INFO:lightrag:Load KV text_chunks with 0 data
INFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': '../data/interim/winston_churchill_we_shall_fight_speech_june_1940_txt/vdb_entities.json'} 0 data
INFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': '../data/interim/winston_churchill_we_shall_fight_speech_june_1940_txt/vdb_relationships.json'} 0 data
INFO:nano-vectordb:Init {'embedding_dim': 1536, 'metric': 'cosine', 'storage_file': '../data/interim/winston_churchill_we_shall_fight_speech_june_1940_txt/vdb_chunks.json'} 0 data
INFO:lightrag:Loaded document status storage with 0 records


In [10]:
with open(local_file_path) as f:
    rag.insert(f.read())

INFO:lightrag:Processing 1 new unique documents
Processing batch 1:   0%|          | 0/1 [00:00<?, ?it/s]INFO:lightrag:Inserting 5 vectors to chunks
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
Generating embeddings: 100%|██████████| 1/1 [00:01<00:00,  1.07s/batch]
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


⠙ Processed 1 chunks, 4 entities(duplicated), 5 relations(duplicated)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


⠹ Processed 2 chunks, 22 entities(duplicated), 16 relations(duplicated)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


⠸ Processed 3 chunks, 40 entities(duplicated), 28 relations(duplicated)



INFO:     172.17.0.1:40806 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:40834 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:40822 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:40822 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:40834 - "POST /graphql HTTP/1.1" 200 OK


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


⠼ Processed 4 chunks, 61 entities(duplicated), 43 relations(duplicated)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


⠴ Processed 5 chunks, 90 entities(duplicated), 68 relations(duplicated)

Extracting entities from chunks: 100%|██████████| 5/5 [00:41<00:00,  8.35s/chunk]
INFO:lightrag:Inserting entities into storage...
Inserting entities: 100%|██████████| 79/79 [00:00<00:00, 24367.56entity/s]
INFO:lightrag:Inserting relationships into storage...
Inserting relationships: 100%|██████████| 63/63 [00:00<00:00, 16139.82relationship/s]
INFO:lightrag:Inserting 79 vectors to entities
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
Generating embeddings: 100%|██████████| 3/3 [00:01<00:00,  1.51batch/s]
INFO:lightrag:Inserting 63 vectors to relationships
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
Generating embeddings: 100%|██████████| 2/2 [00:01<00:00, 

### Query the Graph

In [11]:
# Perform hybrid search
print("\n## APPROACH 4\n")
print(rag.query(RAG_QUERY, param=QueryParam(mode="hybrid")))


## APPROACH 4



INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:lightrag:kw_prompt result:
INFO:lightrag:Using hybrid mode for query processing


{
  "high_level_keywords": ["King Leopold", "Evacuation of Dunkirk", "Historical role"],
  "low_level_keywords": ["World War II", "Dunkirk evacuation", "Belgium", "Military leadership", "Historical events"]
}


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:lightrag:Local query uses 60 entites, 57 relations, 3 text units
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:lightrag:Global query uses 72 entites, 60 relations, 4 text units
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


King Leopold of Belgium played a pivotal role during the evacuation of Dunkirk, significantly influencing the dynamics of the Allied forces' efforts during World War II. His decisions directly impacted the military strategies of both the Belgian and Allied armies.

At the outset of the German invasion, King Leopold had appealed to the Allies for assistance, leading to the deployment of British and French forces to Belgian territory. The initial intent was to coordinate a defense against the rapidly advancing German Army. However, this situation changed dramatically when King Leopold unilaterally decided to surrender the Belgian Army to the German Command without prior consultation with his advisers or the Allies. This decision took place shortly after the German forces penetrated deeper into Belgium, which left the British and French troops at a strategic disadvantage.

The surrender of the Belgian Army meant that the British forces were compelled to cover a flank of more than 30 miles

### Display the Graph

- graph visualization

In [12]:
import networkx as nx
from pyvis.network import Network
import json

def create_interactive_visualization(graphml_file, output_file):
    # Load the GraphML file
    G = nx.read_graphml(graphml_file)
    
    # Create Pyvis network
    net = Network(height='900px', width='100%', bgcolor='#ffffff', 
                 font_color='black', notebook=False)
    
    # Define color scheme for entity types
    entity_colors = {
        'PERSON': '#e41a1c',        # Bright red
        'ORGANIZATION': '#377eb8',   # Blue
        'GEO': '#4daf4a',           # Green
        'EVENT': '#984ea3',         # Purple
        'CONCEPT': '#ff7f00',       # Orange
        'TECHNOLOGY': '#a65628',     # Brown
        'CATEGORY': '#f781bf',      # Pink
        'NUMBER': '#999999',        # Gray
        'UNKNOWN': '#808080'        # Dark Gray
    }
    
    # Add nodes with colors before adding edges
    for node_id, node_data in G.nodes(data=True):
        # Get entity type (removing quotes if present)
        entity_type = node_data.get('entity_type', 'UNKNOWN').replace('"', '')
        color = entity_colors.get(entity_type, '#808080')
        
        # Create hover text
        hover_info = f"""
        Entity: {node_id}
        Type: {entity_type}
        Description: {node_data.get('description', 'N/A')}
        Source ID: {node_data.get('source_id', 'N/A')}
        """
        
        # Add node with properties
        net.add_node(node_id, 
                    title=hover_info,
                    color=color,
                    size=30)

    # Add edges
    for source, target, edge_data in G.edges(data=True):
        weight = edge_data.get('weight', 1)
        description = edge_data.get('description', '')
        
        hover_info = f"""
        Weight: {weight}
        Description: {description}
        Keywords: {edge_data.get('keywords', 'N/A')}
        """
        
        net.add_edge(source, target,
                    title=hover_info,
                    width=float(weight),
                    color={'color': '#666666', 'highlight': '#ff0000'})

    # Rest of your physics and legend code remains the same
    physics_options = {
        "physics": {
            "forceAtlas2Based": {
                "gravitationalConstant": -100,
                "centralGravity": 0.01,
                "springLength": 200,
                "springConstant": 0.08,
                "damping": 0.4,
                "avoidOverlap": 1
            },
            "solver": "forceAtlas2Based",
            "stabilization": {
                "enabled": True,
                "iterations": 1000,
                "updateInterval": 25
            }
        }
    }
    
    net.set_options(json.dumps(physics_options))
    
    # Save and add legend
    net.write_html(output_file)
    
    # Add legend HTML
    legend_html = """
    <div style="position: absolute; top: 10px; left: 10px; background-color: rgba(255, 255, 255, 0.9); 
                padding: 10px; border-radius: 5px; border: 1px solid #ccc;">
        <h3>Entity Types</h3>
        <ul style="list-style-type: none; padding: 0;">
    """
    
    for entity_type, color in entity_colors.items():
        legend_html += f"""
            <li style="margin: 5px 0;">
                <span style="display: inline-block; width: 20px; height: 20px; 
                           background-color: {color}; border-radius: 50%; margin-right: 5px;"></span>
                {entity_type}
            </li>
        """
    
    legend_html += """
        </ul>
    </div>
    """
    
    with open(output_file, 'r', encoding='utf-8') as file:
        content = file.read()
    content = content.replace('</body>', f'{legend_html}</body>')
    with open(output_file, 'w', encoding='utf-8') as file:
        file.write(content)

In [13]:
# Generate Visualization
create_interactive_visualization(GRAPHML_FILE, str(PYVIS_HTML_FILE))

INFO:     172.17.0.1:43644 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:43672 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:43660 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:43660 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:43672 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:41320 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:41334 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:41350 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:49226 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:49234 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:49242 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:39028 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:39036 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:39040 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:44652 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:44660 - "POST /graphql HTTP/1.1" 200 OK
INFO:     172.17.0.1:446

## Review Telemetry data

- Access the Arize Phoenix UI at [http://localhost:6006](http://localhost:6006)
- both LLM inference and embedding telemetry information is captured

# 🔍 LightRAG Validation with Arize Phoenix

### Overview

This notebook provides one approach for validating and monitoring LightRAG's interaction with LLMs and embedding models -- leveraging [Arize Phoenix](https://docs.arize.com/phoenix/tracing/llm-traces-1) it provides insight into what is a very complex data ingestion pipeline.

It will also make the concepts covered in the LightRAG paper more tangible.

### Purpose
- **System Monitoring**: Validate LightRAG's integration with telemetry pipelines to ensure robust tracking of model inference and embedding use.
- **Performance Tuning**: Identify bottlenecks and optimize configurations using insights from telemetry data.
- **Proactive Debugging**: Quickly detect and resolve anomalies through real-time analysis.

### Key Features
- **Dockerized Deployment**: Simplifies setup with preconfigured Docker containers for Arize Phoenix.
- **Telemetry Integration**: Supports integration with external systems through use of OpenTelemetry standard to provide detailed system traces.
- **Customizable Dashboards**: Enables interactive exploration of model metrics and error logs.

### Usage Instructions
1. **Setup**: 
    - Install required dependencies:
      ```bash
      pip install arize-phoenix-otel
      ```
    - Run the Docker container for Arize Phoenix:
      ```bash
      docker run -p 6006:6006 -p 4317:4317 --rm arizephoenix/phoenix:latest
      ```

2. **Execute the Notebook**: Follow the provided steps in the notebook to validate your LightRAG setup against telemetry data.

3. **Explore Metrics**:
    - Access the Phoenix UI at [http://localhost:6006](http://localhost:6006).
    - Analyze detailed traces, latencies, and throughput metrics.