# 🔍 LightRAG Validation with Arize Phoenix

### Overview

This notebook provides one approach for validating and monitoring LightRAG's interaction with LLMs and embedding models -- leveraging [Arize Phoenix](https://docs.arize.com/phoenix/tracing/llm-traces-1) it provides insight into what is a very complex data ingestion pipeline.

It will also make the concepts covered in the LightRAG paper more tangible.

### Purpose
- **System Monitoring**: Validate LightRAG's integration with telemetry pipelines to ensure robust tracking of model inference and embedding use.
- **Performance Tuning**: Identify bottlenecks and optimize configurations using insights from telemetry data.
- **Proactive Debugging**: Quickly detect and resolve anomalies through real-time analysis.

### Key Features
- **Dockerized Deployment**: Simplifies setup with preconfigured Docker containers for Arize Phoenix.
- **Telemetry Integration**: Supports integration with external systems through use of OpenTelemetry standard to provide detailed system traces.
- **Customizable Dashboards**: Enables interactive exploration of model metrics and error logs.

### Usage Instructions
1. **Setup**: 
    - Install required dependencies:
      ```bash
      pip install arize-phoenix-otel
      ```
    - Run the Docker container for Arize Phoenix:
      ```bash
      docker run -p 6006:6006 -p 4317:4317 --rm arizephoenix/phoenix:latest
      ```

2. **Execute the Notebook**: Follow the provided steps in the notebook to validate your LightRAG setup against telemetry data.

3. **Explore Metrics**:
    - Access the Phoenix UI at [http://localhost:6006](http://localhost:6006).
    - Analyze detailed traces, latencies, and throughput metrics.

In [None]:
%pip install -q ipywidgets lightrag-hku openai aioboto3

In [2]:
import os
from pathlib import Path

# Define configuration constants
DATA_DIR = Path("../data")  # Base data directory
INTERIM_DIR = DATA_DIR / "interim"  # Interim data directory
PROCESSED_DIR = DATA_DIR / "processed"  # Processed data directory

In [3]:
import requests
import shutil

# Define URL and local file path
src_file_url = "https://raw.githubusercontent.com/donbr/kg_rememberall/refs/heads/main/references/winston_churchill_we_shall_fight_speech_june_1940.txt"
file_name = src_file_url.split("/")[-1].replace(".", "_").lower()

WORKING_DIR = INTERIM_DIR / file_name

# Replace operation: ensure WORKING_DIR is fresh
if os.path.exists(WORKING_DIR):
    shutil.rmtree(WORKING_DIR)  # Remove the existing directory and its contents
os.mkdir(WORKING_DIR)           # Create a new, empty directory

In [4]:
# Fetch and save the file
response = requests.get(src_file_url)
response.raise_for_status()  # Raise an exception for HTTP errors
local_file_path = WORKING_DIR / f"{file_name}.txt"
local_file_path.write_text(response.text)

# Define file paths
GRAPHML_FILE = WORKING_DIR / f"{file_name}.graphml"
PYVIS_HTML_FILE = PROCESSED_DIR / f"{file_name}.html"

# Define Neo4j connection parameters
os.environ["NEO4J_URI"] = "neo4j://172.18.176.1:7687"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "password"

## Arize Phoenix

- UI endpoint:  http://localhost:6006
- NOTE:  the Docker container will be removed when you shut down the notebook.

In [None]:
# for more information refer to https://docs.arize.com/phoenix/tracing/integrations-tracing/autogen-support#docker
# !docker run -p 6006:6006 -p 4317:4317 arizephoenix/phoenix:latest

import subprocess

# Run the Docker container without interactive mode
subprocess.Popen([
    "docker", "run", "-p", "6006:6006", "-p", "4317:4317",
    "--rm", "arizephoenix/phoenix:latest"
])

## Arize Phoenix:  setup and configuration

In [None]:
%pip install -q arize-phoenix-otel

In [None]:
from phoenix.otel import register

# defaults to endpoint="http://localhost:4317"
tracer_provider = register(
  project_name="lightrag-openai", # Default is 'default'
  endpoint="http://localhost:4317",  # Sends traces using gRPC
)

In [None]:
## install python telemetry and openai library requirements
%pip install -q openinference-instrumentation-openai openai 'httpx<0.28'

In [9]:
from openinference.instrumentation.openai import OpenAIInstrumentor

OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)

## Example Use Case

Monitor LightRAG’s real-time LLM and embedding model usage
- performance and response latencies
- model behavior and accuracy

### Populate the Graph

- Initialize LightRAG and OpenAI connection

In [None]:
# install Ollama as LightRAG requires it
%pip install -q ollama tiktoken nano_vectordb

In [None]:
import os
from lightrag import LightRAG, QueryParam
from lightrag.llm import gpt_4o_mini_complete
import nano_vectordb

# below two lines rerquired if running in a jupyter notebook
import nest_asyncio
nest_asyncio.apply()

# When you launch the project be sure to override the default KG: NetworkX
# by specifying kg="Neo4JStorage".

rag = LightRAG(
    working_dir=WORKING_DIR,
    llm_model_func=gpt_4o_mini_complete,
    graph_storage="Neo4JStorage",
    log_level="DEBUG"
)

In [None]:
with open(local_file_path) as f:
    rag.insert(f.read())

### Query the Graph

In [None]:
# Perform hybrid search
print("\n## APPROACH 4\n")
print(rag.query("What role did King Leopold play during the evacuation of Dunkirk?", param=QueryParam(mode="hybrid")))

### Display the Graph

- graph visualization

## Review Telemetry data

- Access the Arize Phoenix UI at [http://localhost:6006](http://localhost:6006)
- both LLM inference and embedding telemetry information is captured