# 🔍 LightRAG Validation with Arize Phoenix

### Overview

This notebook provides one approach for validating and monitoring LightRAG's interaction with LLMs and embedding models -- leveraging [Arize Phoenix](https://docs.arize.com/phoenix/tracing/llm-traces-1) it provides insight into what is a very complex data ingestion pipeline.

It will also make the concepts covered in the LightRAG paper more tangible.

### Purpose
- **System Monitoring**: Validate LightRAG's integration with telemetry pipelines to ensure robust tracking of model inference and embedding use.
- **Performance Tuning**: Identify bottlenecks and optimize configurations using insights from telemetry data.
- **Proactive Debugging**: Quickly detect and resolve anomalies through real-time analysis.

### Key Features
- **Dockerized Deployment**: Simplifies setup with preconfigured Docker containers for Arize Phoenix.
- **Telemetry Integration**: Supports integration with external systems through use of OpenTelemetry standard to provide detailed system traces.
- **Customizable Dashboards**: Enables interactive exploration of model metrics and error logs.

### Usage Instructions
1. **Setup**: 
    - Install required dependencies:
      ```bash
      pip install arize-phoenix-otel
      ```
    - Run the Docker container for Arize Phoenix:
      ```bash
      docker run -p 6006:6006 -p 4317:4317 --rm arizephoenix/phoenix:latest
      ```

2. **Execute the Notebook**: Follow the provided steps in the notebook to validate your LightRAG setup against telemetry data.

3. **Explore Metrics**:
    - Access the Phoenix UI at [http://localhost:6006](http://localhost:6006).
    - Analyze detailed traces, latencies, and throughput metrics.

In [1]:
%pip install -q ipywidgets lightrag-hku openai aioboto3


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
import os
from pathlib import Path

# Define configuration constants
DATA_DIR = Path("../data")  # Base data directory
INTERIM_DIR = DATA_DIR / "interim"  # Interim data directory
PROCESSED_DIR = DATA_DIR / "processed"  # Processed data directory

In [3]:
import requests

# Define URL and local file path
src_file_url = "https://raw.githubusercontent.com/donbr/kg_rememberall/refs/heads/main/references/winston_churchill_we_shall_fight_speech_june_1940.txt"
file_name = src_file_url.split("/")[-1].replace(".", "_").lower()

WORKING_DIR = INTERIM_DIR / file_name

if not os.path.exists(WORKING_DIR):
    os.mkdir(WORKING_DIR)

In [4]:
# Fetch and save the file
response = requests.get(src_file_url)
response.raise_for_status()  # Raise an exception for HTTP errors
local_file_path = WORKING_DIR / f"{file_name}.txt"
local_file_path.write_text(response.text)

# Define file paths
GRAPHML_FILE = WORKING_DIR / "graph_chunk_entity_relation.graphml"
PYVIS_HTML_FILE = PROCESSED_DIR / f"{file_name}.html"
PYVIS_HTML_FILE2 = PROCESSED_DIR / f"{file_name}2.html"

# Define Neo4j connection parameters
os.environ["NEO4J_URI"] = "neo4j://172.18.176.1:7687"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "password"

## Arize Phoenix

- UI endpoint:  http://localhost:6006
- NOTE:  the Docker container will be removed when you shut down the notebook.

In [5]:
# for more information refer to https://docs.arize.com/phoenix/tracing/integrations-tracing/autogen-support#docker
# !docker run -p 6006:6006 -p 4317:4317 arizephoenix/phoenix:latest

import subprocess

# Run the Docker container without interactive mode
subprocess.Popen([
    "docker", "run", "-p", "6006:6006", "-p", "4317:4317",
    "--rm", "arizephoenix/phoenix:latest"
])

<Popen: returncode: None args: ['docker', 'run', '-p', '6006:6006', '-p', '4...>

## Arize Phoenix:  setup and configuration

In [6]:
%pip install -q arize-phoenix-otel

🏃‍♀️‍➡️ Running migrations on the database.
---------------------------
2025-01-20 04:12:32,405 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-01-20 04:12:32,405 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("alembic_version")
2025-01-20 04:12:32,405 INFO sqlalchemy.engine.Engine [raw sql] ()
2025-01-20 04:12:32,406 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("alembic_version")
2025-01-20 04:12:32,406 INFO sqlalchemy.engine.Engine [raw sql] ()
2025-01-20 04:12:32,406 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("alembic_version")
2025-01-20 04:12:32,406 INFO sqlalchemy.engine.Engine [raw sql] ()
2025-01-20 04:12:32,406 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("alembic_version")
2025-01-20 04:12:32,406 INFO sqlalchemy.engine.Engine [raw sql] ()
2025-01-20 04:12:32,407 INFO sqlalchemy.engine.Engine 
CREATE TABLE alembic_version (
	version_num VARCHAR(32) NOT NULL, 
	CONSTRAINT alembic_version_pkc PRIMARY KEY (version_num)
)


2025-01-20 04:12:32,4

INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:6006 (Press CTRL+C to quit)



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [7]:
from phoenix.otel import register

# defaults to endpoint="http://localhost:4317"
tracer_provider = register(
  project_name="lightrag-openai", # Default is 'default'
  endpoint="http://localhost:4317",  # Sends traces using gRPC
)

🔭 OpenTelemetry Tracing Details 🔭
|  Phoenix Project: lightrag-openai
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: localhost:4317
|  Transport: gRPC
|  Transport Headers: {'authorization': '****', 'user-agent': '****'}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.



In [8]:
## install python telemetry and openai library requirements
%pip install -q openinference-instrumentation-openai openai 'httpx<0.28'


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [9]:
from openinference.instrumentation.openai import OpenAIInstrumentor

OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)

## Example Use Case

Monitor LightRAG’s real-time LLM and embedding model usage
- performance and response latencies
- model behavior and accuracy

### Populate the Graph

- Initialize LightRAG and OpenAI connection

In [10]:
# install Ollama as LightRAG requires it
%pip install -q ollama tiktoken nano_vectordb


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [12]:
import os
from lightrag import LightRAG, QueryParam
from lightrag.llm import gpt_4o_mini_complete
import nano_vectordb

#########
# Uncomment the below two lines if running in a jupyter notebook to handle the async nature of rag.insert()
import nest_asyncio
nest_asyncio.apply()
#########

# When you launch the project be sure to override the default KG: NetworkX
# by specifying kg="Neo4JStorage".

rag = LightRAG(
    working_dir=WORKING_DIR,
    llm_model_func=gpt_4o_mini_complete,  # Use gpt_4o_mini_complete LLM model
    graph_storage="Neo4JStorage", #<-----------override KG default
    log_level="DEBUG"  #<-----------override log_level default
)

INFO:lightrag:Logger initialized for working directory: ../data/interim/winston_churchill_we_shall_fight_speech_june_1940_txt
DEBUG:lightrag:LightRAG init with param:
  working_dir = ../data/interim/winston_churchill_we_shall_fight_speech_june_1940_txt,
  embedding_cache_config = {'enabled': False, 'similarity_threshold': 0.95, 'use_llm_check': False},
  kv_storage = JsonKVStorage,
  vector_storage = NanoVectorDBStorage,
  graph_storage = Neo4JStorage,
  log_level = DEBUG,
  chunk_token_size = 1200,
  chunk_overlap_token_size = 100,
  tiktoken_model_name = gpt-4o-mini,
  entity_extract_max_gleaning = 1,
  entity_summary_to_max_tokens = 500,
  node_embedding_algorithm = node2vec,
  node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3},
  embedding_func = {'embedding_dim': 1536, 'max_token_size': 8192, 'func': <function openai_embedding at 0x7f689d6751c0>, 'concurrent_limit': 16},
  embedding_batch_num = 32,
  embe

In [13]:
with open(local_file_path) as f:
    rag.insert(f.read())

INFO:lightrag:All documents have been processed or are duplicates


### Query the Graph

In [14]:
# Perform hybrid search
print("\n## APPROACH 4\n")
print(rag.query("What role did King Leopold play during the evacuation of Dunkirk?", param=QueryParam(mode="hybrid")))


## APPROACH 4

### The Role of King Leopold During the Dunkirk Evacuation

King Leopold of Belgium played a significant and complex role during the evacuation of Dunkirk in World War II. His actions had far-reaching consequences on the situation for Allied forces, particularly the British and French armies.

Initially, Belgium sought to maintain its neutrality when the conflict erupted, but the German invasion challenged this stance. As the German forces advanced, King Leopold called upon the British and French militaries for assistance, emphasizing the need for support to defend Belgium against the encroaching enemy. This appeal was crucial in the early stages of the invasion as it highlighted Belgium's strategic importance and the necessity for a coordinated defense among the Allies.

However, the situation drastically changed when King Leopold made a unilateral decision to surrender the Belgian Army to the Germans. This action occurred unexpectedly and without prior consultation wi

### Display the Graph

- graph visualization

## Review Telemetry data

- Access the Arize Phoenix UI at [http://localhost:6006](http://localhost:6006)
- both LLM inference and embedding telemetry information is captured