# WikiPathways - Using LightRAG to understand disease pathways

In [1]:
RAG_QUERY="What are the core pathways for this disease?"

SRC_FILE_URL = "https://www.wikipathways.org/wikipathways-assets/pathways/WP4255/WP4255.json"

## setup and configuration

In [None]:
%pip install -q ipywidgets lightrag-hku openai aioboto3 tiktoken nano_vectordb
%pip install -q arize-phoenix-otel openinference-instrumentation-openai openai 'httpx<0.28'

In [3]:
import os
from pathlib import Path

# Define configuration constants
DATA_DIR = Path("../data")  # Base data directory
INTERIM_DIR = DATA_DIR / "interim"  # Interim data directory
PROCESSED_DIR = DATA_DIR / "processed"  # Processed data directory

In [4]:
import requests
import shutil

# Define URL and local file path
SRC_FILE_URL = "https://www.wikipathways.org/wikipathways-assets/pathways/WP4255/WP4255.json"
file_name = SRC_FILE_URL.split("/")[-1].replace(".", "_").lower()

WORKING_DIR = INTERIM_DIR / file_name

# Replace operation: ensure WORKING_DIR is fresh
if os.path.exists(WORKING_DIR):
    shutil.rmtree(WORKING_DIR)  # Remove the existing directory and its contents
os.mkdir(WORKING_DIR)           # Create a new, empty directory

In [5]:
# Fetch and save the file
response = requests.get(SRC_FILE_URL)
response.raise_for_status()  # Raise an exception for HTTP errors
local_file_path = WORKING_DIR / f"{file_name}.txt"
local_file_path.write_text(response.text)

# Define file paths
GRAPHML_FILE = WORKING_DIR / f"graph_chunk_entity_relation.graphml"
PYVIS_HTML_FILE = PROCESSED_DIR / f"{file_name}.html"

### Arize Phoenix - telemetry

- UI endpoint:  http://localhost:6006
- NOTE:  the Docker container will be removed when you shut down the notebook.

In [None]:
# for more information refer to https://docs.arize.com/phoenix/tracing/integrations-tracing/autogen-support#docker
# !docker run -p 6006:6006 -p 4317:4317 arizephoenix/phoenix:latest

import subprocess

# Run the Docker container without interactive mode
subprocess.Popen([
    "docker", "run", "-p", "6006:6006", "-p", "4317:4317",
    "--rm", "arizephoenix/phoenix:latest"
])

In [None]:
from phoenix.otel import register

# defaults to endpoint="http://localhost:4317"
tracer_provider = register(
  project_name="lightrag-openai", # Default is 'default'
  endpoint="http://localhost:4317",  # Sends traces using gRPC
)

In [8]:
from openinference.instrumentation.openai import OpenAIInstrumentor

OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)

## Populate the Graph

- Initialize LightRAG and OpenAI connection

In [None]:
import os
from lightrag import LightRAG, QueryParam
from lightrag.llm import gpt_4o_mini_complete
import nano_vectordb

# next two lines are required if running in a jupyter notebook to handle the async nature of rag.insert()
import nest_asyncio
nest_asyncio.apply()

rag = LightRAG(
    working_dir=WORKING_DIR,
    llm_model_func=gpt_4o_mini_complete
)

In [None]:
with open(local_file_path) as f:
    rag.insert(f.read())

### Query the Graph

In [None]:
# Perform hybrid search
print("\n## APPROACH 4\n")
print(rag.query(RAG_QUERY, param=QueryParam(mode="hybrid")))

### Display the Graph

- graph visualization

In [12]:
import networkx as nx
from pyvis.network import Network
import json

def create_interactive_visualization(graphml_file, output_file):
    # Load the GraphML file
    G = nx.read_graphml(graphml_file)
    
    # Create Pyvis network
    net = Network(height='900px', width='100%', bgcolor='#ffffff', 
                 font_color='black', notebook=False)
    
    # Define color scheme for entity types
    entity_colors = {
        'PERSON': '#e41a1c',              # Bright red
        'ORGANIZATION': '#377eb8',        # Blue
        'GEO': '#4daf4a',                 # Green
        'EVENT': '#984ea3',               # Purple
        'CONCEPT': '#ff7f00',             # Orange
        'TECHNOLOGY': '#a65628',          # Brown
        'CATEGORY': '#f781bf',            # Pink
        'BIOLOGICAL PROCESS': '#ff1493', # Deep Pink
        'CELLULAR COMPONENT': '#8a2be2', # Blue Violet
        'DATA NODE': '#00ced1',           # Dark Turquoise
        'ENTITY': '#4682b4',              # Steel Blue
        'GENE': '#32cd32',                # Lime Green
        'GENEPRODUCT': '#adff2f',         # Green Yellow
        'IDENTIFIER': '#ff6347',          # Tomato
        'LEGEND': '#daa520',              # Goldenrod
        'METABOLITE': '#20b2aa',          # Light Sea Green
        'PATHWAY': '#ff4500',             # Orange Red
        'PROTEIN': '#6a5acd',             # Slate Blue
        'STATE': '#7fffd4',               # Aquamarine
        'TERM': '#bdb76b',                # Dark Khaki
        'UNKNOWN': '#808080'              # Dark Gray
    }
    
    # Add nodes with colors before adding edges
    for node_id, node_data in G.nodes(data=True):
        # Get entity type (removing quotes if present)
        entity_type = node_data.get('entity_type', 'UNKNOWN').replace('"', '')
        color = entity_colors.get(entity_type, '#808080')
        
        # Create hover text
        hover_info = f"""
        Entity: {node_id}
        Type: {entity_type}
        Description: {node_data.get('description', 'N/A')}
        Source ID: {node_data.get('source_id', 'N/A')}
        """
        
        # Add node with properties
        net.add_node(node_id, 
                    title=hover_info,
                    color=color,
                    size=30)

    # Add edges
    for source, target, edge_data in G.edges(data=True):
        weight = edge_data.get('weight', 1)
        description = edge_data.get('description', '')
        
        hover_info = f"""
        Weight: {weight}
        Description: {description}
        Keywords: {edge_data.get('keywords', 'N/A')}
        """
        
        net.add_edge(source, target,
                    title=hover_info,
                    width=float(weight),
                    color={'color': '#666666', 'highlight': '#ff0000'})

    # Rest of your physics and legend code remains the same
    physics_options = {
        "physics": {
            "forceAtlas2Based": {
                "gravitationalConstant": -100,
                "centralGravity": 0.01,
                "springLength": 200,
                "springConstant": 0.08,
                "damping": 0.4,
                "avoidOverlap": 1
            },
            "solver": "forceAtlas2Based",
            "stabilization": {
                "enabled": True,
                "iterations": 1000,
                "updateInterval": 25
            }
        }
    }
    
    net.set_options(json.dumps(physics_options))
    
    # Save and add legend
    net.write_html(output_file)
    
    # Add legend HTML
    legend_html = """
    <div style="position: absolute; top: 10px; left: 10px; background-color: rgba(255, 255, 255, 0.9); 
                padding: 10px; border-radius: 5px; border: 1px solid #ccc;">
        <h3>Entity Types</h3>
        <ul style="list-style-type: none; padding: 0;">
    """
    
    for entity_type, color in entity_colors.items():
        legend_html += f"""
            <li style="margin: 5px 0;">
                <span style="display: inline-block; width: 20px; height: 20px; 
                           background-color: {color}; border-radius: 50%; margin-right: 5px;"></span>
                {entity_type}
            </li>
        """
    
    legend_html += """
        </ul>
    </div>
    """
    
    with open(output_file, 'r', encoding='utf-8') as file:
        content = file.read()
    content = content.replace('</body>', f'{legend_html}</body>')
    with open(output_file, 'w', encoding='utf-8') as file:
        file.write(content)

In [None]:
# Generate Visualization
create_interactive_visualization(GRAPHML_FILE, str(PYVIS_HTML_FILE))

## Review Telemetry data

- Access the Arize Phoenix UI at [http://localhost:6006](http://localhost:6006)
- both LLM inference and embedding telemetry information is captured

# 🔍 LightRAG Validation with Arize Phoenix

### Overview

This notebook provides one approach for validating and monitoring LightRAG's interaction with LLMs and embedding models -- leveraging [Arize Phoenix](https://docs.arize.com/phoenix/tracing/llm-traces-1) it provides insight into what is a very complex data ingestion pipeline.

It will also make the concepts covered in the LightRAG paper more tangible.

### Purpose
- **System Monitoring**: Validate LightRAG's integration with telemetry pipelines to ensure robust tracking of model inference and embedding use.
- **Performance Tuning**: Identify bottlenecks and optimize configurations using insights from telemetry data.
- **Proactive Debugging**: Quickly detect and resolve anomalies through real-time analysis.

### Key Features
- **Dockerized Deployment**: Simplifies setup with preconfigured Docker containers for Arize Phoenix.
- **Telemetry Integration**: Supports integration with external systems through use of OpenTelemetry standard to provide detailed system traces.
- **Customizable Dashboards**: Enables interactive exploration of model metrics and error logs.

### Usage Instructions
1. **Setup**: 
    - Install required dependencies:
      ```bash
      pip install arize-phoenix-otel
      ```
    - Run the Docker container for Arize Phoenix:
      ```bash
      docker run -p 6006:6006 -p 4317:4317 --rm arizephoenix/phoenix:latest
      ```

2. **Execute the Notebook**: Follow the provided steps in the notebook to validate your LightRAG setup against telemetry data.

3. **Explore Metrics**:
    - Access the Phoenix UI at [http://localhost:6006](http://localhost:6006).
    - Analyze detailed traces, latencies, and throughput metrics.