# Zep Agent Conversation with Full Trace

**Goal**: See every function call, LLM prompt/response, and Neo4j query from user input to database.

**Approach**: Use `graphiti_core` directly to simulate Zep Cloud API behavior, with comprehensive tracing.

The markdown description is generated by Claude and edited by me.

## 1. Arch Overview

### Zep Cloud Production Architecture

In production, Zep Cloud has a multi-layer architecture:

```
┌───────────────────────────────────────────────────────────┐
│   Agent Application                                       │
│  (Uses Zep Python SDK: zep_cloud.client)                  │
└─────────────────────┬─────────────────────────────────────┘
                      │ HTTPS REST API
                      ▼
┌───────────────────────────────────────────────────────────┐
│  Zep Server (Go)                                          │
│  - Manages users, sessions, threads                       │
│  - Handles authentication and rate limiting               │
│  - Orchestrates memory operations                         │
│  Source: zep/legacy/src/api/apihandlers/                  │
└─────────────────────┬─────────────────────────────────────┘
                      │ HTTP REST API
                      ▼
┌───────────────────────────────────────────────────────────┐
│  Graphiti Server (Python FastAPI)                         │
│  - Wraps graphiti_core library                            │
│  - Provides REST endpoints for graph operations           │
│  - Handles async processing queue                         │
│  Source: zep-graphiti/server/graph_service/               │
└─────────────────────┬─────────────────────────────────────┘
                      │ Python function calls
                      ▼
┌───────────────────────────────────────────────────────────┐
│  graphiti_core (Python Library)                           │
│  - Temporal knowledge graph engine                        │
│  - Entity extraction and resolution                       │
│  - Bi-temporal data model                                 │
│  Source: zep-graphiti/graphiti_core/                      │
├─────────────────────┬─────────────────────────────────────┤
│                     │                                     │
│    ┌────────────────┴────────────────┐                    │
│    ▼                                 ▼                    │
│  LLM API                        Neo4j Database            │
│  (Entity extraction,            (Graph storage,           │
│   summarization,                 Cypher queries)          │
│   reranking)                                              │
└───────────────────────────────────────────────────────────┘
```

like a **Layered Microservices** pattern:

| Layer | Component | Role | Responsibility |
|-------|-----------|------|----------------|
| **Client / Interface** | Zep Python SDK | HTTP Client Wrapper | Converts function calls (`memory.add()`) to REST API requests, handles retries and authentication. **Contains no business logic.** |
| **Gateway / Orchestration** | Zep Server (Go) | API Gateway & User Management | Multi-tenant management (AuthN/AuthZ), rate limiting, request routing. Dispatches tasks to downstream services. |
| **Worker / Execution** | Graphiti Server (FastAPI) | Async Task Runner | Maintains async job queue. Since LLM processing is slow (seconds to tens of seconds), it converts memory operations into async jobs to prevent blocking the gateway. |
| **Business Logic / \"The Brain\"** | Graphiti Core (Library) | Core Logic Engine | Contains all prompt templates (`prompts/*.py`), LLM interaction flow control (extract → dedupe → validate), and dynamic Cypher query generation (`*_db_queries.py`). **Stateless logic code.** |

**Evidence from Source Code:**

1. **Zep Python SDK** - HTTP client with no logic:
   ```python
   # zep/examples/python/graph_example/graph_example.py
   client = AsyncZep(api_key=API_KEY)  # Just an HTTP client
   await client.graph.add(graph_id=graph_id, data="...", type="text")
   ```

2. **Zep Server (Go)** - Routes to Graphiti:
   ```go
   // zep/legacy/src/store/memory_ce.go
   graphiti.I().PutMemory(ctx, session.SessionID, memoryMessages.Messages, true)
   graphiti.I().GetMemory(ctx, graphiti.GetMemoryRequest{...})
   ```

3. **Graphiti Server** - Async queue for slow LLM operations:
   ```python
   # zep-graphiti/server/graph_service/routers/ingest.py
   class AsyncWorker:
       def __init__(self):
           self.queue = asyncio.Queue()  # Async job queue
   
   @router.post('/messages', status_code=status.HTTP_202_ACCEPTED)  # Returns immediately
   async def add_messages(...):
       await async_worker.queue.put(partial(add_messages_task, m))  # Queue the job
   ```

4. **Graphiti Core** - All business logic:
   - `prompts/extract_nodes.py` - Entity extraction prompts
   - `prompts/dedupe_nodes.py` - Entity deduplication prompts
   - `prompts/extract_edges.py` - Relationship extraction prompts
   - `models/nodes/node_db_queries.py` - Dynamic Cypher generation (Neo4j/FalkorDB/Kuzu)
   - `graphiti.py:add_episode()` - Orchestrates the full extraction pipeline

### My Setup

I just bypass Zep Server and Graphiti Server, calling `graphiti_core` directly.

This can give **complete visibility** into every operation easily

later I may run with Zep Server and Graphiti Server, and compare the results.

```
┌────────────────────────────────────────────────────────────┐
│  - ZepSimulator: Mimics Zep SDK patterns                   │
│  - TraceLogger: Captures all operations                    │
│  - Agent: Simple conversation loop                         │
└─────────────────────┬──────────────────────────────────────┘
                      │ Direct Python calls (OTEL tracing)
                      ▼
┌────────────────────────────────────────────────────────────┐
│  graphiti_core (Python Library)                            │
│  - Full source code access                                 │
│  - All internal operations visible                         │
├─────────────────────┬──────────────────────────────────────┤
│    ┌────────────────┴────────────────┐                     │
│    ▼                                 ▼                     │
│  vLLM Server                    Local Neo4j                │
│  (Qwen2.5-32B on H100)          (bolt://localhost:7687)    │
│  - HTTP traffic logged          - All queries logged       │
└────────────────────────────────────────────────────────────┘
```

## 2. Zep Cloud API → Graphiti Complete Mapping

This mapping is verified from source code analysis.

### API Mapping Table

| Zep Python SDK | Zep Server (Go) | Graphiti Server | graphiti_core | Purpose |
|----------------|-----------------|-----------------|---------------|----------|
| `thread.add_messages()` | `memory_ce.go:_initializeProcessingMemory()` | `POST /messages` | `add_episode()` | Store conversation messages |
| `thread.get_user_context()` | `memory_ce.go:_get()` | `POST /get-memory` | `search()` | Retrieve relevant context |
| `session.get_memory()` | `memory_ce.go:_get()` | `POST /get-memory` | `search()` | Get session memory |
| `graph.search()` | `memory_ce.go:_searchSessions()` | `POST /search` | `search()` | Search knowledge graph |
| `user.add()` | `userstore_ce.go:_processCreatedUser()` | `POST /entity-node` | `save_entity_node()` | Create user entity |
| `graph.add()` | N/A | `POST /entity-node` | `save_entity_node()` | Add structured data |
| `user.delete()` | `user_handlers.go:DeleteUserHandler()` | `DELETE /group/{group_id}` | `delete_group()` | Delete user and all data |
| `fact.get()` | `fact_handlers_ce.go:getFact()` | `GET /entity-edge/{uuid}` | `EntityEdge.get_by_uuid()` | Get a specific fact |
| `fact.delete()` | `fact_handlers_ce.go:deleteSessionFact()` | `DELETE /entity-edge/{uuid}` | `delete_entity_edge()` | Delete a fact |
| `session.delete_memory()` | `memory_handlers_ce.go:deleteMemory()` | `DELETE /group/{group_id}` | `delete_group()` | Delete session memory |
| `episode.delete()` | N/A | `DELETE /episode/{uuid}` | `delete_episodic_node()` | Delete an episode |

### Source Code Evidence

> **Just FYI, no need to read this.**

**1. thread.add_messages() → add_episode()**

From `zep/legacy/src/store/memory_ce.go`:
```go
func (dao *memoryDAO) _initializeProcessingMemory(...) error {
    err := graphiti.I().PutMemory(ctx, session.SessionID, memoryMessages.Messages, true)
}
```

From `zep-graphiti/server/graph_service/routers/ingest.py`:
```python
@router.post('/messages', status_code=status.HTTP_202_ACCEPTED)
async def add_messages(request: AddMessagesRequest, graphiti: ZepGraphitiDep):
    async def add_messages_task(m: Message):
        await graphiti.add_episode(
            uuid=m.uuid,
            group_id=request.group_id,
            name=m.name,
            episode_body=f'{m.role or ""}({m.role_type}): {m.content}',
            ...
        )
```

**2. thread.get_user_context() → search()**

From `zep/legacy/src/store/memory_ce.go`:
```go
func (dao *memoryDAO) _get(...) (*models.Memory, error) {
    memory, err := graphiti.I().GetMemory(ctx, graphiti.GetMemoryRequest{
        GroupID:  groupID,
        MaxFacts: 5,
        Messages: mForRetrieval,
    })
}
```

From `zep-graphiti/server/graph_service/routers/retrieve.py`:
```python
@router.post('/get-memory', status_code=status.HTTP_200_OK)
async def get_memory(request: GetMemoryRequest, graphiti: ZepGraphitiDep):
    combined_query = compose_query_from_messages(request.messages)
    result = await graphiti.search(
        group_ids=[request.group_id],
        query=combined_query,
        num_results=request.max_facts,
    )
```

**3. user.delete() → delete_group()**

From `zep/legacy/src/api/apihandlers/user_handlers.go`:
```go
func DeleteUserHandler(appState *models.AppState) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        err := userStore.DeleteUser(ctx, userID)
    }
}
```

From `zep-graphiti/server/graph_service/zep_graphiti.py`:
```python
async def delete_group(self, group_id: str):
    edges = await EntityEdge.get_by_group_ids(self.driver, [group_id])
    for edge in edges:
        await edge.delete(self.driver)
    # Also deletes nodes and episodes
```

**4. fact.get() / fact.delete() → get_entity_edge() / delete_entity_edge()**

From `zep/legacy/src/api/apihandlers/fact_handlers_ce.go`:
```go
func getFact(...) (*models.Fact, error) { ... }
func deleteSessionFact(...) error { ... }
```

From `zep-graphiti/server/graph_service/routers/ingest.py`:
```python
@router.delete('/entity-edge/{uuid}')
async def delete_entity_edge(uuid: str, graphiti: ZepGraphitiDep):
    await graphiti.delete_entity_edge(uuid)
```

From `zep-graphiti/server/graph_service/routers/retrieve.py`:
```python
@router.get('/entity-edge/{uuid}')
async def get_entity_edge(uuid: str, graphiti: ZepGraphitiDep):
    return await graphiti.get_entity_edge(uuid)
```

## 3. Internal Flow: What Happens Inside Each Operation

### 3.1 add_episode() Internal Flow

When you call `graphiti.add_episode()`, here's what happens internally.

```mermaid
sequenceDiagram
    participant Agent
    participant ZepAPI as Zep Server (Go)
    participant Worker as Graphiti Server (Py)
    participant LLM
    participant Neo4j

    Note over Agent, ZepAPI: Phase 1: Ingest (Fast)
    Agent->>ZepAPI: POST "I live in SF"
    ZepAPI->>DB: Save Raw Msg
    ZepAPI->>Worker: Enqueue Task
    ZepAPI-->>Agent: 202 Accepted

    Note over Worker, Neo4j: Phase 2: Async ETL (Slow)
    Worker->>LLM: Extract Entities ("SF")
    LLM-->>Worker: JSON
    Worker->>Neo4j: Search Existing ("San Francisco"?)
    Neo4j-->>Worker: Candidates
    Worker->>LLM: Deduplicate (Is "SF" == "San Francisco"?)
    LLM-->>Worker: Yes
    Worker->>LLM: Extract Relation (LIVES_IN)
    LLM-->>Worker: Edge Data
    Worker->>Neo4j: COMMIT Transaction (Nodes + Edges)
```


```
add_episode(episode_body="Alice Chen(user): Hi, I'm Alice Chen. I work at TechCorp...")
│
├─► Step 0: Query Previous Episodes
│   Neo4j: MATCH (e:Episodic) WHERE e.valid_at <= $reference_time AND e.group_id IN $group_ids
│          RETURN e ORDER BY e.valid_at DESC LIMIT $num_episodes
│   Purpose: Get conversation context for entity extraction
│
├─► Step 1: Extract Entities (LLM Call - extract_nodes.extract_message)
│   System: "You are an AI assistant that extracts entity nodes from conversational messages..."
│   User: "<ENTITY TYPES>...</ENTITY TYPES> <CURRENT MESSAGE>Alice Chen(user): Hi, I'm Alice Chen...</CURRENT MESSAGE>"
│   Response: {"extracted_entities": [{"name": "Alice Chen", "entity_type_id": 0}, {"name": "TechCorp", "entity_type_id": 0}]}
│
├─► Step 2: Search for Existing Entities (Neo4j - Parallel Queries)
│   For EACH extracted entity, run two parallel searches:
│   │
│   ├─► 2a: BM25 Fulltext Search
│   │   Neo4j: CALL db.index.fulltext.queryNodes(\"node_name_and_summary\", $query, {limit: $limit})
│   │          YIELD node AS n, score WHERE n.group_id IN $group_ids
│   │
│   └─► 2b: Cosine Similarity Search
│       Neo4j: MATCH (n:Entity) WHERE n.group_id IN $group_ids
│              WITH n, vector.similarity.cosine(n.name_embedding, $search_vector) AS score
│              WHERE score > $min_score RETURN n ORDER BY score DESC
│
├─► Step 3: Deduplicate Entities (LLM Call - dedupe_nodes.nodes)
│   System: "You are a helpful assistant that determines whether or not ENTITIES extracted from a conversation are duplicates..."
│   User: "<ENTITIES>[extracted entities]</ENTITIES> <EXISTING ENTITIES>[candidates from Neo4j]</EXISTING ENTITIES>"
│   Response: {"entity_resolutions": [{"id": 0, "name": "Alice Chen", "duplicate_idx": -1, "duplicates": []}, ...]}
│   Purpose: Match new entities to existing ones or confirm they are new
│
├─► Step 4: Extract Relationships (LLM Call - extract_edges.edge)
│   System: "You are an expert fact extractor that extracts fact triples from text..."
│   User: "<ENTITIES>[resolved entities]</ENTITIES> <CURRENT_MESSAGE>...</CURRENT_MESSAGE> <REFERENCE_TIME>...</REFERENCE_TIME>"
│   Response: {"edges": [{"relation_type": "WORKS_AT", "source_entity_id": 0, "target_entity_id": 1,
│              "fact": "Alice Chen works at TechCorp as a senior software engineer.", "valid_at": "2026-01-31T13:35:08Z"}]}
│
├─► Step 5: Search for Existing Edges (Neo4j - Multiple Queries)
│   │
│   ├─► 5a: Direct Edge Lookup
│   │   Neo4j: MATCH (n:Entity {uuid: $source_node_uuid})-[e:RELATES_TO]->(m:Entity {uuid: $target_node_uuid}) RETURN e
│   │
│   ├─► 5b: BM25 Fulltext Search on Edges
│   │   Neo4j: CALL db.index.fulltext.queryRelationships(\"edge_name_and_fact\", $query, {limit: $limit})
│   │          YIELD relationship AS rel, score MATCH (n:Entity)-[e:RELATES_TO {uuid: rel.uuid}]->(m:Entity)
│   │
│   └─► 5c: Cosine Similarity Search on Edges
│       Neo4j: MATCH (n:Entity)-[e:RELATES_TO]->(m:Entity) WHERE e.group_id IN $group_ids
│              WITH DISTINCT e, n, m, vector.similarity.cosine(e.fact_embedding, $search_vector) AS score
│
├─► Step 6: Deduplicate Edges (LLM Call - dedupe_edges.resolve_edge) [Only if existing edges found]
│   System: "You are a helpful assistant that de-duplicates facts from fact lists..."
│   User: "<EXISTING FACTS>[...]</EXISTING FACTS> <FACT INVALIDATION CANDIDATES>[...]</FACT INVALIDATION CANDIDATES> <NEW FACT>...</NEW FACT>"
│   Response: {"duplicate_facts": [], "contradicted_facts": [], "fact_type": "DEFAULT"}
│   Purpose: Detect duplicate or contradicting facts
│
├─► Step 7: Generate Entity Summaries (LLM Calls - PARALLEL - extract_nodes.extract_summary)
│   For EACH new or updated entity, run in parallel:
│   System: "You are a helpful assistant that extracts entity summaries from the provided text..."
│   User: "<MESSAGES>[conversation history]</MESSAGES> <ENTITY>{name, summary, entity_types, attributes}</ENTITY>"
│   Response: {"summary": "Alice Chen works at TechCorp as a senior software engineer."}
│   parallel execution
│
└─► Step 8: Write to Neo4j (Implicit - happens during steps above)
    Entities and edges are created/updated as they are processed.
    The EpisodicNode is also created to store the original message.
```

### 3.2 search() Internal Flow

When you call `graphiti.search()`, the system uses **two search methods in parallel** for edge retrieval.

```
search(query="What does Alice work on?", group_ids=["demo_session_..."], num_results=10)
│
├─► Step 1: Generate Query Embedding (Local Embedder)
│   Embedder: sentence-transformers/all-MiniLM-L6-v2
│   encode("What does Alice work on?") → [0.12, -0.34, ...] (384 dimensions)
│
├─► Step 2: Execute Search Methods (Parallel)
│   │
│   ├─► 2a: BM25 Fulltext Search on Relationships
│   │   Neo4j: CALL db.index.fulltext.queryRelationships(\"edge_name_and_fact\", $query, {limit: $limit})
│   │          YIELD relationship AS rel, score
│   │          MATCH (n:Entity)-[e:RELATES_TO {uuid: rel.uuid}]->(m:Entity)
│   │          WHERE e.group_id IN $group_ids
│   │          RETURN e, n, m ORDER BY score DESC
│   │
│   └─► 2b: Cosine Similarity Search on Relationships
│       Neo4j: MATCH (n:Entity)-[e:RELATES_TO]->(m:Entity)
│              WHERE e.group_id IN $group_ids
│              WITH DISTINCT e, n, m, vector.similarity.cosine(e.fact_embedding, $search_vector) AS score
│              WHERE score > $min_score
│              RETURN e, n, m ORDER BY score DESC LIMIT $limit
│
├─► Step 3: Combine and Deduplicate Results
│   Merge results from both search methods
│   Remove duplicates based on edge UUID
│   Sort by relevance score
│
└─► Step 4: Return Top Results
    Return: [EntityEdge(fact="Alice Chen is currently leading Project Phoenix.", ...),
             EntityEdge(fact="Project Phoenix has a deadline on February 15th.", ...),
             EntityEdge(fact="Alice Chen works at TechCorp as a senior software engineer.", ...)]
```

**Search Methods by Data Type** (from `graphiti_core/search/search.py`):
- **EntityEdge**: BM25 Fulltext + Cosine Similarity (+ optional BFS, LLM rerank)
- **EntityNode**: BM25 Fulltext + Cosine Similarity (+ optional BFS)
- **EpisodicNode**: BM25 Fulltext only
- **Community**: BM25 Fulltext + Cosine Similarity

### 3.3 Why No BFS in This Demo?

You may notice that **BFS** is not used in this demo's search. This is **by configuration, not because BFS is unimplemented**.

**How Search Configuration Works:**

1. **BFS is fully implemented** in `graphiti_core/search/search_utils.py`:
   - `edge_bfs_search()` - BFS traversal to find edges from origin nodes
   - `node_bfs_search()` - BFS traversal to find nodes from origin nodes

2. **BFS is an optional search method** defined in `search_config.py`:
   ```python
   class EdgeSearchMethod(Enum):
       cosine_similarity = 'cosine_similarity'
       bm25 = 'bm25'
       bfs = 'breadth_first_search'  # Optional!
   ```

3. **`graphiti.search()` uses `EDGE_HYBRID_SEARCH_RRF` by default** (from `graphiti.py`):
   ```python
   # In graphiti.search():
   search_config = EDGE_HYBRID_SEARCH_RRF  # When center_node_uuid is None
   ```

4. **`EDGE_HYBRID_SEARCH_RRF` does NOT include BFS** (from `search_config_recipes.py`):
   ```python
   EDGE_HYBRID_SEARCH_RRF = SearchConfig(
       edge_config=EdgeSearchConfig(
           search_methods=[EdgeSearchMethod.bm25, EdgeSearchMethod.cosine_similarity],
           # Note: NO EdgeSearchMethod.bfs here!
           reranker=EdgeReranker.rrf,
       )
   )
   ```

**To Enable BFS**, use `search_()` with a config that includes BFS:

```python
from graphiti_core.search.search_config_recipes import EDGE_HYBRID_SEARCH_CROSS_ENCODER

# Option 1: Use pre-defined config with BFS
results = await graphiti.search_(
    query="What does Alice work on?",
    config=EDGE_HYBRID_SEARCH_CROSS_ENCODER,  # Includes BFS!
    group_ids=[group_id],
    bfs_origin_node_uuids=["alice_node_uuid"],  # Optional: specify start nodes
)

# Option 2: Custom config
from graphiti_core.search.search_config import EdgeSearchConfig, EdgeSearchMethod, SearchConfig, EdgeReranker

custom_config = SearchConfig(
    edge_config=EdgeSearchConfig(
        search_methods=[
            EdgeSearchMethod.bm25,
            EdgeSearchMethod.cosine_similarity,
            EdgeSearchMethod.bfs,  # Add BFS!
        ],
        reranker=EdgeReranker.rrf,
        bfs_max_depth=2,  # How deep to traverse
    )
)
```

**When to Use BFS:**
- When you have a known starting node (e.g., \"Alice Chen\") and want to explore related facts
- For graph traversal queries like \"Find all facts connected to this entity\"
- When semantic similarity alone may miss structurally related information

**Available Search Config Recipes** (from `search_config_recipes.py`):

| Config Name | Search Methods | Reranker | BFS? |
|-------------|----------------|----------|------|
| `EDGE_HYBRID_SEARCH_RRF` | BM25 + Cosine | RRF | N |
| `EDGE_HYBRID_SEARCH_MMR` | BM25 + Cosine | MMR | N |
| `EDGE_HYBRID_SEARCH_CROSS_ENCODER` | BM25 + Cosine + **BFS** | Cross-Encoder | Y |
| `COMBINED_HYBRID_SEARCH_CROSS_ENCODER` | BM25 + Cosine + **BFS** | Cross-Encoder | Y |

## 4. Environment Setup

The following cells configure the same environment as `graphiti_neo4j_otel_demo.ipynb`.

In [None]:
# Cell 4.1: Imports and Basic Logging Setup
import os
import sys
import json
import logging
import asyncio
from datetime import datetime, timezone
from collections.abc import Iterable
from pathlib import Path

print(f"Python: {sys.version}")
print(f"Working dir: {os.getcwd()}")

Python: 3.12.12 (main, Jan 14 2026, 19:35:58) [Clang 21.1.4 ]
Working dir: /mnt/data-disk-1/home/cpii.local/ericlo/projects/zep-repos/zep-graphiti/examples/neo4j_otel


In [None]:
# Cell 4.2: Load Environment Variables
from dotenv import load_dotenv

load_dotenv()

# Neo4j Configuration
neo4j_uri = os.environ.get('NEO4J_URI', 'bolt://localhost:7687')
neo4j_user = os.environ.get('NEO4J_USER', 'neo4j')
neo4j_password = os.environ.get('NEO4J_PASSWORD', 'password')

# Local LLM Configuration
local_llm_enabled = os.environ.get('LOCAL_LLM_ENABLED', 'false').lower() == 'true'
local_llm_base_url = os.environ.get('LOCAL_LLM_BASE_URL', 'http://localhost:8000/v1')
local_llm_model = os.environ.get('LOCAL_LLM_MODEL', 'Qwen/Qwen2.5-32B-Instruct')
local_llm_api_key = os.environ.get('LOCAL_LLM_API_KEY', 'vllm')

# Embedding Configuration
embedding_provider = os.environ.get('EMBEDDING_PROVIDER', 'local')
local_embedding_model = os.environ.get('LOCAL_EMBEDDING_MODEL', 'all-MiniLM-L6-v2')

print(f'Neo4j URI: {neo4j_uri}')
print(f'Local LLM Enabled: {local_llm_enabled}')
if local_llm_enabled:
    print(f'  Base URL: {local_llm_base_url}')
    print(f'  Model: {local_llm_model}')
print(f'Embedding Provider: {embedding_provider}')

Neo4j URI: bolt://localhost:7687
Local LLM Enabled: True
  Base URL: http://localhost:8801/v1
  Model: Qwen/Qwen2.5-32B-Instruct
Embedding Provider: local


In [None]:
# Cell 4.3: Configure OpenTelemetry Tracing
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor

def setup_otel_tracing():
    """Configure OpenTelemetry to output traces to console"""
    resource = Resource(attributes={
        'service.name': 'zep-agent-full-trace',
        'service.version': '1.0.0',
    })
    
    provider = TracerProvider(resource=resource)
    provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter()))
    trace.set_tracer_provider(provider)
    
    return trace.get_tracer(__name__)

otel_tracer = setup_otel_tracing()
print("OpenTelemetry tracing configured")

OpenTelemetry tracing configured


In [None]:
# Cell 4.4: Define Local Embedder (sentence-transformers)
from graphiti_core.embedder.client import EmbedderClient
from sentence_transformers import SentenceTransformer

class SentenceTransformerEmbedder(EmbedderClient):
    """
    Local embedder using sentence-transformers.
    No API key required - runs entirely locally.
    """
    
    def __init__(self, model_name: str = 'all-MiniLM-L6-v2'):
        print(f"Loading sentence-transformers model: {model_name}")
        self.model = SentenceTransformer(model_name)
        self.embedding_dim = self.model.get_sentence_embedding_dimension()
        print(f"Model loaded. Embedding dimension: {self.embedding_dim}")
    
    async def create(
        self, input_data: str | list[str] | Iterable[int] | Iterable[Iterable[int]]
    ) -> list[float]:
        """Create embedding for input text."""
        if isinstance(input_data, str):
            text = input_data
        elif isinstance(input_data, list) and len(input_data) > 0:
            text = input_data[0] if isinstance(input_data[0], str) else str(input_data[0])
        else:
            text = str(input_data)
        
        loop = asyncio.get_running_loop()
        embedding = await loop.run_in_executor(
            None, 
            lambda: self.model.encode(text, convert_to_numpy=True).tolist()
        )
        return embedding
    
    async def create_batch(self, input_data_list: list[str]) -> list[list[float]]:
        """Create embeddings for a batch of texts."""
        loop = asyncio.get_running_loop()
        embeddings = await loop.run_in_executor(
            None,
            lambda: self.model.encode(input_data_list, convert_to_numpy=True).tolist()
        )
        return embeddings

print("SentenceTransformerEmbedder class defined")

  from .autonotebook import tqdm as notebook_tqdm


SentenceTransformerEmbedder class defined


In [None]:
# Cell 4.5: Initialize Graphiti
#
# IMPORTANT: Make sure vLLM is running on port 8801:
# CUDA_VISIBLE_DEVICES=4,5 uv run vllm serve Qwen/Qwen2.5-32B-Instruct \
#     --port 8801 --api-key vllm --tensor-parallel-size 2 \
#     --max-model-len 16384 --enforce-eager --gpu-memory-utilization 0.85

from graphiti_core import Graphiti
from graphiti_core.llm_client.config import LLMConfig
from graphiti_core.llm_client.openai_generic_client import OpenAIGenericClient
from graphiti_core.cross_encoder.openai_reranker_client import OpenAIRerankerClient
from openai import AsyncOpenAI

# Timeout Configuration
# NOTE: add_episode() makes multiple LLM calls (entity extraction, resolution, etc.)
# Each call can take 30-60+ seconds with local vLLM, so we need a generous timeout.
LLM_TIMEOUT_SECONDS = 600  # 10 minutes per LLM call

# Initialize Local Embedder
print(f'Initializing local embedder with model: {local_embedding_model}')
embedder = SentenceTransformerEmbedder(model_name=local_embedding_model)

# Initialize LLM Client
if not local_llm_enabled:
    raise ValueError('LOCAL_LLM_ENABLED must be true in .env')

print(f'Using Local LLM at {local_llm_base_url}')
print(f'Model: {local_llm_model}')
print(f'Timeout: {LLM_TIMEOUT_SECONDS}s')

llm_config = LLMConfig(
    api_key=local_llm_api_key,
    model=local_llm_model,
    small_model=local_llm_model,
    base_url=local_llm_base_url,
)

custom_openai_client = AsyncOpenAI(
    api_key=local_llm_api_key,
    base_url=local_llm_base_url,
    timeout=LLM_TIMEOUT_SECONDS,
)

llm_client = OpenAIGenericClient(
    config=llm_config,
    client=custom_openai_client,
    max_tokens=4096
)
cross_encoder = OpenAIRerankerClient(client=llm_client, config=llm_config)

# Initialize Graphiti
graphiti = Graphiti(
    uri=neo4j_uri,
    user=neo4j_user,
    password=neo4j_password,
    llm_client=llm_client,
    embedder=embedder,
    cross_encoder=cross_encoder,
    tracer=otel_tracer,
    trace_span_prefix='zep.graphiti',
)

print(f'Graphiti initialized, connected to {neo4j_uri}')

Initializing local embedder with model: all-MiniLM-L6-v2
Loading sentence-transformers model: all-MiniLM-L6-v2
Model loaded. Embedding dimension: 384
Using Local LLM at http://localhost:8801/v1
Model: Qwen/Qwen2.5-32B-Instruct
Timeout: 600s
Graphiti initialized, connected to bolt://localhost:7687


In [None]:
# Cell 4.6: Build Indices and Constraints
await graphiti.build_indices_and_constraints()
print("Indices and constraints built successfully")

Indices and constraints built successfully


## 5. Trace System Implementation

We implement a dual-output trace system:
- **Raw JSON** → `trace_raw.jsonl` (for debugging)
- **Pretty Print** → stdout (for reading)

In [None]:
# Cell 5.1: TraceLogger - Dual Output Trace System

class TraceLogger:
    """
    Captures and formats trace information at all levels.
    
    Output:
    - Raw JSON to file (for debugging)
    - Pretty-printed text to stdout (for reading)
    """
    
    def __init__(self, log_file: str = 'trace_raw.jsonl'):
        self.log_file = Path(log_file)
        self.raw_log = open(self.log_file, 'w')
        self.indent_level = 0
        print(f"TraceLogger initialized. Raw logs: {self.log_file.absolute()}")
    
    def _write_raw(self, entry: dict):
        """Write raw JSON to log file"""
        entry['timestamp'] = datetime.now(timezone.utc).isoformat()
        self.raw_log.write(json.dumps(entry) + '\n')
        self.raw_log.flush()
    
    def _indent(self) -> str:
        return '   ' * self.indent_level
    
    def log_section(self, title: str):
        """Print a section header"""
        print(f"\n{'='*70}")
        print(f"  {title}")
        print(f"{'='*70}")
        self._write_raw({'level': 'SECTION', 'title': title})
    
    def log_api_call(self, method: str, params: dict):
        """Log a Zep-equivalent API call"""
        self._write_raw({'level': 'API', 'method': method, 'params': params})
        print(f"\n{self._indent()} ZEP API: {method}()")
        for key, value in params.items():
            if isinstance(value, str) and len(value) > 100:
                value = value[:100] + '...'
            print(f"{self._indent()}   {key}: {value}")
    
    def log_graphiti_call(self, method: str, params: dict = None):
        """Log a graphiti_core function call"""
        self._write_raw({'level': 'GRAPHITI', 'method': method, 'params': params})
        print(f"{self._indent()} GRAPHITI: {method}()")
        self.indent_level += 1
    
    def log_graphiti_step(self, step_num: int, description: str):
        """Log a step within a graphiti operation"""
        self._write_raw({'level': 'GRAPHITI_STEP', 'step': step_num, 'description': description})
        print(f"{self._indent()}├─ Step {step_num}: {description}")
    
    def log_graphiti_end(self, duration_ms: float = None, result_summary: str = None):
        """End a graphiti operation"""
        self.indent_level = max(0, self.indent_level - 1)
        if duration_ms:
            print(f"{self._indent()}└─ Duration: {duration_ms:.1f}ms")
        if result_summary:
            print(f"{self._indent()}   Result: {result_summary}")
        self._write_raw({'level': 'GRAPHITI_END', 'duration_ms': duration_ms, 'result': result_summary})
    
    def log_llm_call(self, purpose: str, prompt_preview: str = None):
        """Log an LLM call"""
        self._write_raw({'level': 'LLM_CALL', 'purpose': purpose, 'prompt_preview': prompt_preview})
        print(f"{self._indent()} LLM Call: {purpose}")
        if prompt_preview:
            preview = prompt_preview[:200] + '...' if len(prompt_preview) > 200 else prompt_preview
            print(f"{self._indent()}   Prompt: {preview}")
    
    def log_llm_response(self, response_preview: str, tokens: dict = None):
        """Log an LLM response"""
        self._write_raw({'level': 'LLM_RESPONSE', 'response_preview': response_preview, 'tokens': tokens})
        preview = response_preview[:300] + '...' if len(response_preview) > 300 else response_preview
        print(f"{self._indent()}   Response: {preview}")
        if tokens:
            print(f"{self._indent()}   Tokens: input={tokens.get('input', '?')}, output={tokens.get('output', '?')}")
    
    def log_neo4j_query(self, query: str, params: dict = None):
        """Log a Neo4j Cypher query"""
        self._write_raw({'level': 'NEO4J', 'query': query, 'params': params})
        # Clean up query for display
        query_oneline = ' '.join(query.split())
        if len(query_oneline) > 150:
            query_oneline = query_oneline[:150] + '...'
        print(f"{self._indent()} Neo4j: {query_oneline}")
    
    def log_result(self, description: str, data: any = None):
        """Log a result"""
        self._write_raw({'level': 'RESULT', 'description': description, 'data': str(data) if data else None})
        print(f"{self._indent()} {description}")
        if data:
            data_str = str(data)
            if len(data_str) > 200:
                data_str = data_str[:200] + '...'
            print(f"{self._indent()}   Data: {data_str}")
    
    def close(self):
        """Close the log file"""
        self.raw_log.close()
        print(f"\nTrace log saved to: {self.log_file.absolute()}")

# Initialize the trace logger
trace_logger = TraceLogger('trace_raw.jsonl')

TraceLogger initialized. Raw logs: /mnt/data-disk-1/home/cpii.local/ericlo/projects/zep-repos/zep-graphiti/examples/neo4j_otel/trace_raw.jsonl


In [None]:
# Cell 5.2: Configure Detailed Logging for Neo4j and HTTP
#
# This captures the actual queries and HTTP traffic

import logging

# Create a custom handler that also writes to our trace logger
class TraceLoggingHandler(logging.Handler):
    def __init__(self, trace_logger: TraceLogger):
        super().__init__()
        self.trace_logger = trace_logger
    
    def emit(self, record):
        msg = self.format(record)
        # Capture Neo4j queries
        if 'neo4j' in record.name.lower() and 'query' in msg.lower():
            self.trace_logger._write_raw({'level': 'NEO4J_LOG', 'message': msg})
        # Capture HTTP requests to vLLM
        elif 'httpx' in record.name.lower() or 'httpcore' in record.name.lower():
            self.trace_logger._write_raw({'level': 'HTTP_LOG', 'message': msg})

# Set up logging levels
logging.basicConfig(
    level=logging.WARNING,
    format='%(asctime)s | %(name)s | %(levelname)s | %(message)s',
    datefmt='%H:%M:%S',
)

# Enable DEBUG for graphiti and neo4j to see internal operations
logging.getLogger('graphiti_core').setLevel(logging.DEBUG)
logging.getLogger('neo4j').setLevel(logging.DEBUG)

# Add our custom handler
trace_handler = TraceLoggingHandler(trace_logger)
logging.getLogger('graphiti_core').addHandler(trace_handler)
logging.getLogger('neo4j').addHandler(trace_handler)

print("Detailed logging configured")

Detailed logging configured


### 5.3 Deep Trace Hooks (LLM + Neo4j + OTEL)

This cell implements **non-invasive** hooks to capture:
1. **LLM calls**: Full prompt and response via httpx event hooks
2. **Neo4j queries**: Query text, parameters, and results via driver wrapper
3. **OTEL spans**: All internal operation spans via custom SpanProcessor

In [None]:
# Cell 5.3: Deep Trace Hooks
#
# These hooks capture complete information WITHOUT modifying graphiti_core source code.
# They work by:
# 1. Wrapping the httpx client to intercept LLM requests/responses
# 2. Wrapping the Neo4j driver's execute_query method
# 3. Using a custom OTEL SpanProcessor to capture all spans

import httpx
from functools import wraps
from opentelemetry.sdk.trace import SpanProcessor
from opentelemetry.sdk.trace import ReadableSpan

# ============================================================================
# 1. LLM Request/Response Hook via httpx
# ============================================================================

class LLMTraceHook:
    """
    Captures complete LLM request/response via httpx event hooks.
    This is non-invasive - we just add hooks to the existing client.
    """
    
    def __init__(self, trace_logger: TraceLogger):
        self.trace = trace_logger
        self.call_count = 0
    
    async def log_request(self, request: httpx.Request):
        """Called before each HTTP request"""
        self.call_count += 1
        
        # Only log LLM API calls (to vLLM)
        if '/v1/chat/completions' in str(request.url):
            try:
                body = json.loads(request.content.decode('utf-8'))
                messages = body.get('messages', [])
                
                # Log to file (full content)
                self.trace._write_raw({
                    'level': 'LLM_REQUEST',
                    'call_number': self.call_count,
                    'model': body.get('model'),
                    'messages': messages,
                    'temperature': body.get('temperature'),
                    'max_tokens': body.get('max_tokens'),
                })
                
                # Pretty print (truncated)
                print(f"\n{self.trace._indent()}LLM Call #{self.call_count}")
                print(f"{self.trace._indent()}   Model: {body.get('model')}")
                for i, msg in enumerate(messages):
                    role = msg.get('role', 'unknown')
                    content = msg.get('content', '')[:500]
                    if len(msg.get('content', '')) > 500:
                        content += '...'
                    print(f"{self.trace._indent()}   [{role}]: {content[:200]}{'...' if len(content) > 200 else ''}")
            except Exception as e:
                print(f"Error parsing LLM request: {e}")
    
    async def log_response(self, response: httpx.Response):
        """Called after each HTTP response"""
        if '/v1/chat/completions' in str(response.url):
            try:
                # Need to read the response body
                await response.aread()
                body = json.loads(response.content.decode('utf-8'))
                
                choices = body.get('choices', [])
                usage = body.get('usage', {})
                
                content = ''
                if choices:
                    content = choices[0].get('message', {}).get('content', '')
                
                # Log to file (full content)
                self.trace._write_raw({
                    'level': 'LLM_RESPONSE',
                    'call_number': self.call_count,
                    'content': content,
                    'usage': usage,
                    'finish_reason': choices[0].get('finish_reason') if choices else None,
                })
                
                # Pretty print (truncated)
                content_preview = content[:300] + '...' if len(content) > 300 else content
                print(f"{self.trace._indent()}   Response: {content_preview}")
                print(f"{self.trace._indent()}   Tokens: prompt={usage.get('prompt_tokens', '?')}, completion={usage.get('completion_tokens', '?')}")
            except Exception as e:
                print(f"Error parsing LLM response: {e}")

# ============================================================================
# 2. Neo4j Query Hook via method wrapper
# ============================================================================

class Neo4jTraceHook:
    """
    Wraps Neo4j driver's execute_query to capture queries and results.
    Non-invasive - we just wrap the existing method.
    """
    
    def __init__(self, trace_logger: TraceLogger):
        self.trace = trace_logger
        self.query_count = 0
    
    def wrap_driver(self, driver):
        """Wrap the driver's execute_query method"""
        original_execute = driver.execute_query
        
        @wraps(original_execute)
        async def traced_execute(cypher_query, **kwargs):
            self.query_count += 1
            params = kwargs.get('params', {})
            
            # Log query start
            self.trace._write_raw({
                'level': 'NEO4J_QUERY',
                'query_number': self.query_count,
                'query': cypher_query,
                'params': {k: str(v)[:200] for k, v in params.items()} if params else {},
            })
            
            # Pretty print query
            query_oneline = ' '.join(cypher_query.split())[:150]
            print(f"{self.trace._indent()} Neo4j Query #{self.query_count}: {query_oneline}{'...' if len(cypher_query) > 150 else ''}")
            
            # Execute original
            start_time = time.time()
            result = await original_execute(cypher_query, **kwargs)
            duration_ms = (time.time() - start_time) * 1000
            
            # Log result
            records = result.records if hasattr(result, 'records') else []
            record_count = len(records)
            
            # Serialize records for logging (first 5 only)
            records_preview = []
            for r in records[:5]:
                try:
                    records_preview.append(dict(r))
                except:
                    records_preview.append(str(r))
            
            self.trace._write_raw({
                'level': 'NEO4J_RESULT',
                'query_number': self.query_count,
                'duration_ms': duration_ms,
                'record_count': record_count,
                'records_preview': str(records_preview)[:1000],
            })
            
            print(f"{self.trace._indent()}   → {record_count} records, {duration_ms:.1f}ms")
            
            return result
        
        driver.execute_query = traced_execute
        return driver

# ============================================================================
# 3. OTEL Span Processor for internal operations
# ============================================================================

class TraceSpanProcessor(SpanProcessor):
    """
    Custom SpanProcessor that logs all spans to our trace logger.
    This captures internal graphiti operations like entity extraction, etc.
    """
    
    def __init__(self, trace_logger: TraceLogger):
        self.trace = trace_logger
    
    def on_start(self, span: ReadableSpan, parent_context=None):
        """Called when a span starts"""
        span_name = span.name if hasattr(span, 'name') else str(span)
        self.trace._write_raw({
            'level': 'OTEL_SPAN_START',
            'span_name': span_name,
        })
        # Only print for graphiti spans (not too verbose)
        if 'graphiti' in span_name.lower() or 'zep' in span_name.lower():
            print(f"{self.trace._indent()}Span Start: {span_name}")
    
    def on_end(self, span: ReadableSpan):
        """Called when a span ends"""
        span_name = span.name
        duration_ns = span.end_time - span.start_time if span.end_time and span.start_time else 0
        duration_ms = duration_ns / 1_000_000
        
        # Get span attributes
        attributes = dict(span.attributes) if span.attributes else {}
        
        self.trace._write_raw({
            'level': 'OTEL_SPAN_END',
            'span_name': span_name,
            'duration_ms': duration_ms,
            'attributes': {k: str(v) for k, v in attributes.items()},
            'status': str(span.status) if hasattr(span, 'status') else None,
        })
        
        if 'graphiti' in span_name.lower() or 'zep' in span_name.lower():
            print(f"{self.trace._indent()}Span End: {span_name} ({duration_ms:.1f}ms)")
    
    def shutdown(self):
        pass
    
    def force_flush(self, timeout_millis=None):
        pass

print("Deep trace hooks defined")

Deep trace hooks defined


In [None]:
# Cell 5.4: Apply Deep Trace Hooks
#
# This cell applies the hooks to the existing graphiti instance.
# Run this AFTER Cell 4.5 (Graphiti initialization) and Cell 5.1 (TraceLogger).

# 1. Apply Neo4j trace hook
neo4j_hook = Neo4jTraceHook(trace_logger)
neo4j_hook.wrap_driver(graphiti.driver)
print(f"Neo4j trace hook applied to driver")

# 2. Apply LLM trace hook via httpx event hooks
# We need to access the underlying httpx client in the AsyncOpenAI client
llm_hook = LLMTraceHook(trace_logger)

# The OpenAI client uses httpx internally. We can add event hooks.
# However, the AsyncOpenAI client doesn't expose the httpx client directly.
# Instead, we'll create a new client with event hooks.

import httpx

# Create httpx client with event hooks
traced_http_client = httpx.AsyncClient(
    timeout=httpx.Timeout(LLM_TIMEOUT_SECONDS),
    event_hooks={
        'request': [llm_hook.log_request],
        'response': [llm_hook.log_response],
    }
)

# Create new AsyncOpenAI client with traced http client
traced_openai_client = AsyncOpenAI(
    api_key=local_llm_api_key,
    base_url=local_llm_base_url,
    timeout=LLM_TIMEOUT_SECONDS,
    http_client=traced_http_client,
)

# Update the LLM client to use the traced OpenAI client
llm_client.client = traced_openai_client
print(f"LLM trace hook applied via httpx event hooks")

# 3. Add custom span processor to OTEL
from opentelemetry import trace as otel_trace
provider = otel_trace.get_tracer_provider()
if hasattr(provider, 'add_span_processor'):
    trace_span_processor = TraceSpanProcessor(trace_logger)
    provider.add_span_processor(trace_span_processor)
    print(f"OTEL span processor added")
else:
    print(f"WARNING: Could not add OTEL span processor (provider type: {type(provider)})")

print("\nAll deep trace hooks applied! Ready for full observability.")

Neo4j trace hook applied to driver
LLM trace hook applied via httpx event hooks
OTEL span processor added

All deep trace hooks applied! Ready for full observability.


## 6. ZepSimulator - Simulating Zep Cloud API

This class mimics the Zep Python SDK interface using graphiti_core directly.

In [None]:
# Cell 6.1: ZepSimulator Class

from graphiti_core.nodes import EpisodeType
import time

class ZepSimulator:
    """
    Simulates Zep Cloud API behavior using graphiti_core directly.
    
    This provides the same interface patterns as the Zep Python SDK,
    allowing us to see exactly what happens internally.
    
    Mapping:
    - add_message() → zep_client.thread.add_messages() → graphiti.add_episode()
    - get_user_context() → zep_client.thread.get_user_context() → graphiti.search()
    - graph_search() → zep_client.graph.search() → graphiti.search()
    """
    
    def __init__(self, graphiti_client: Graphiti, trace_logger: TraceLogger):
        self.graphiti = graphiti_client
        self.trace = trace_logger
    
    async def add_message(
        self,
        group_id: str,
        role: str,
        name: str,
        content: str,
    ) -> dict:
        """
        Equivalent to: zep_client.thread.add_messages()
        
        This stores a message in the knowledge graph by:
        1. Extracting entities from the message (LLM call)
        2. Resolving entity duplicates (LLM call)
        3. Extracting relationships (LLM call)
        4. Creating nodes and edges in Neo4j
        """
        self.trace.log_api_call('add_message (thread.add_messages)', {
            'group_id': group_id,
            'role': role,
            'name': name,
            'content': content,
        })
        
        # Format message as Zep does: "{role}({role_type}): {content}"
        episode_body = f"{name}({role}): {content}"
        
        self.trace.log_graphiti_call('add_episode', {
            'group_id': group_id,
            'episode_body': episode_body,
        })
        
        start_time = time.time()
        
        # This is the actual graphiti call
        # NOTE: Do NOT pass uuid - let graphiti auto-generate it for new episodes.
        # If you pass a uuid, graphiti will try to fetch an existing episode with that uuid.
        result = await self.graphiti.add_episode(
            name=name,
            episode_body=episode_body,
            source_description='Agent conversation message',
            reference_time=datetime.now(timezone.utc),
            source=EpisodeType.message,
            group_id=group_id,
            # uuid is auto-generated by graphiti
        )
        
        duration_ms = (time.time() - start_time) * 1000
        
        # Get the uuid from the result
        episode_uuid = result.episode.uuid
        
        # Log the result
        self.trace.log_graphiti_end(
            duration_ms=duration_ms,
            result_summary=f"Episode {episode_uuid[:8]}... created"
        )
        
        return {
            'uuid': episode_uuid,
            'duration_ms': duration_ms,
            'result': result,
        }
    
    async def get_user_context(
        self,
        group_id: str,
        query: str,
        max_facts: int = 10,
    ) -> list[str]:
        """
        Equivalent to: zep_client.thread.get_user_context()
        
        This retrieves relevant facts from the knowledge graph by:
        1. Generating an embedding for the query
        2. Vector similarity search in Neo4j
        3. Reranking results with LLM
        4. Returning top facts
        """
        self.trace.log_api_call('get_user_context (thread.get_user_context)', {
            'group_id': group_id,
            'query': query,
            'max_facts': max_facts,
        })
        
        self.trace.log_graphiti_call('search', {
            'group_ids': [group_id],
            'query': query,
            'num_results': max_facts,
        })
        
        start_time = time.time()
        
        # This is the actual graphiti call
        results = await self.graphiti.search(
            group_ids=[group_id],
            query=query,
            num_results=max_facts,
        )
        
        duration_ms = (time.time() - start_time) * 1000
        
        # Extract facts from results
        facts = [edge.fact for edge in results]
        
        self.trace.log_graphiti_end(
            duration_ms=duration_ms,
            result_summary=f"Found {len(facts)} facts"
        )
        
        # Log each fact
        for i, fact in enumerate(facts):
            self.trace.log_result(f"Fact {i+1}", fact)
        
        return facts
    
    async def graph_search(
        self,
        group_ids: list[str],
        query: str,
        max_facts: int = 10,
    ) -> list:
        """
        Equivalent to: zep_client.graph.search()
        
        Same as get_user_context but can search across multiple groups.
        """
        self.trace.log_api_call('graph_search (graph.search)', {
            'group_ids': group_ids,
            'query': query,
            'max_facts': max_facts,
        })
        
        self.trace.log_graphiti_call('search', {
            'group_ids': group_ids,
            'query': query,
            'num_results': max_facts,
        })
        
        start_time = time.time()
        
        results = await self.graphiti.search(
            group_ids=group_ids,
            query=query,
            num_results=max_facts,
        )
        
        duration_ms = (time.time() - start_time) * 1000
        
        self.trace.log_graphiti_end(
            duration_ms=duration_ms,
            result_summary=f"Found {len(results)} edges"
        )
        
        return results

# Initialize the simulator
zep = ZepSimulator(graphiti, trace_logger)
print("ZepSimulator initialized")

ZepSimulator initialized


## 7. Agent Conversation Demo

This demonstrates a real-world conversation that shows:
- **Adding nodes**: User introduces themselves
- **Adding relationships**: User mentions their work
- **Updating**: User provides more information
- **Searching**: Agent retrieves context

The demo simulates agent_memory_full_example pattern:

- Turn 1: User introduces themselves → Creates Alice Chen, TechCorp entities
- Turn 2: User mentions project → Creates Project Phoenix, LEADS relationship
- Turn 3: User provides deadline → Updates with deadline info
- Turn 4: Search test → Retrieves context about Alice

In [None]:
# Cell 7.1: Define the Conversation

# Use a unique group_id for this demo session
GROUP_ID = f"demo_session_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
USER_NAME = "Alice Chen"

# The conversation turns
CONVERSATION = [
    {
        "turn": 1,
        "description": "User introduces themselves - creates new entities",
        "user_message": "Hi, I'm Alice Chen. I work at TechCorp as a senior software engineer.",
        "expected_entities": ["Alice Chen (Person)", "TechCorp (Organization)"],
        "expected_relationships": ["Alice Chen WORKS_AT TechCorp"],
    },
    {
        "turn": 2,
        "description": "User mentions a project - adds more entities",
        "user_message": "I'm currently leading Project Phoenix, which is a cloud migration initiative.",
        "expected_entities": ["Project Phoenix (Project)"],
        "expected_relationships": ["Alice Chen LEADS Project Phoenix"],
    },
    {
        "turn": 3,
        "description": "User provides deadline - updates existing entity",
        "user_message": "The project deadline is February 15th, and we have 3 team members.",
        "expected_entities": [],
        "expected_relationships": ["Project Phoenix HAS_DEADLINE February 15th"],
    },
    {
        "turn": 4,
        "description": "Search test - retrieve context about Alice",
        "search_query": "What does Alice work on?",
        "expected_facts": ["Alice works at TechCorp", "Alice leads Project Phoenix"],
    },
]

print(f"Demo Group ID: {GROUP_ID}")
print(f"User: {USER_NAME}")
print(f"Conversation turns: {len(CONVERSATION)}")

Demo Group ID: demo_session_20260131_213508
User: Alice Chen
Conversation turns: 4


In [None]:
# Cell 7.2: Run Turn 1 - User Introduction

turn = CONVERSATION[0]
trace_logger.log_section(f"TURN {turn['turn']}: {turn['description']}")

print(f"\nUser: {turn['user_message']}")
print(f"\nExpected entities: {turn['expected_entities']}")
print(f"Expected relationships: {turn['expected_relationships']}")
print("\n" + "-"*70)

# Add the user message
result = await zep.add_message(
    group_id=GROUP_ID,
    role="user",
    name=USER_NAME,
    content=turn['user_message'],
)

print(f"\nOK: Turn 1 complete. Duration: {result['duration_ms']:.1f}ms")

In [None]:
# Cell 7.3: Run Turn 2 - Project Information

turn = CONVERSATION[1]
trace_logger.log_section(f"TURN {turn['turn']}: {turn['description']}")

print(f"\nUser: {turn['user_message']}")
print(f"\nExpected entities: {turn['expected_entities']}")
print(f"Expected relationships: {turn['expected_relationships']}")
print("\n" + "-"*70)

result = await zep.add_message(
    group_id=GROUP_ID,
    role="user",
    name=USER_NAME,
    content=turn['user_message'],
)

print(f"\nOK: Turn 2 complete. Duration: {result['duration_ms']:.1f}ms")

In [None]:
# Cell 7.4: Run Turn 3 - Deadline Update

turn = CONVERSATION[2]
trace_logger.log_section(f"TURN {turn['turn']}: {turn['description']}")

print(f"\nUser: {turn['user_message']}")
print(f"\nExpected relationships: {turn['expected_relationships']}")
print("\n" + "-"*70)

result = await zep.add_message(
    group_id=GROUP_ID,
    role="user",
    name=USER_NAME,
    content=turn['user_message'],
)

print(f"\nOK: Turn 3 complete. Duration: {result['duration_ms']:.1f}ms")

In [None]:
# Cell 7.5: Run Turn 4 - Search Test

turn = CONVERSATION[3]
trace_logger.log_section(f"TURN {turn['turn']}: {turn['description']}")

print(f"\nSearch Query: {turn['search_query']}")
print(f"\nExpected facts: {turn['expected_facts']}")
print("\n" + "-"*70)

# Search for context
facts = await zep.get_user_context(
    group_id=GROUP_ID,
    query=turn['search_query'],
    max_facts=10,
)

print(f"\nOK: Turn 4 complete. Found {len(facts)} facts.")
print("\nRetrieved Facts:")
for i, fact in enumerate(facts):
    print(f"   {i+1}. {fact}")

# demo output
# ======================================================================
#   TURN 4: Search test - retrieve context about Alice
# ======================================================================

# Search Query: What does Alice work on?

# Expected facts: ['Alice works at TechCorp', 'Alice leads Project Phoenix']

# ----------------------------------------------------------------------

#  ZEP API: get_user_context (thread.get_user_context)()
#    group_id: demo_session_20260131_213508
#    query: What does Alice work on?
#    max_facts: 10
#  GRAPHITI: search()
#     Neo4j Query #36: CALL db.index.fulltext.queryRelationships("edge_name_and_fact", $query, {limit: $limit}) YIELD relationship AS rel, score MATCH (n:Entity)-[e:RELATES_...
#     Neo4j Query #37: MATCH (n:Entity)-[e:RELATES_TO]->(m:Entity) WHERE e.group_id IN $group_ids WITH DISTINCT e, n, m, vector.similarity.cosine(e.fact_embedding, $search_v...
#       → 3 records, 67.1ms
#       → 2 records, 60.2ms
# └─ Duration: 93.2ms
#    Result: Found 3 facts
#  Fact 1
#    Data: Alice Chen is currently leading Project Phoenix.
#  Fact 2
# ...
# Retrieved Facts:
#    1. Alice Chen is currently leading Project Phoenix.
#    2. Project Phoenix has a deadline on February 15th.
#    3. Alice Chen works at TechCorp as a senior software engineer.

## 8. Additional Search Tests

In [None]:
# Cell 8.1: Search for project deadline

trace_logger.log_section("ADDITIONAL SEARCH: Project Deadline")

query = "When is the project deadline?"
print(f"\nSearch Query: {query}")
print("\n" + "-"*70)

facts = await zep.get_user_context(
    group_id=GROUP_ID,
    query=query,
    max_facts=5,
)

print(f"\nRetrieved Facts:")
for i, fact in enumerate(facts):
    print(f"   {i+1}. {fact}")

# demo output
# ======================================================================
#   ADDITIONAL SEARCH: Project Deadline
# ======================================================================

# Search Query: When is the project deadline?

# ----------------------------------------------------------------------

#  ZEP API: get_user_context (thread.get_user_context)()
#    group_id: demo_session_20260131_213508
#    query: When is the project deadline?
#    max_facts: 5
#  GRAPHITI: search()
#     Neo4j Query #38: CALL db.index.fulltext.queryRelationships("edge_name_and_fact", $query, {limit: $limit}) YIELD relationship AS rel, score MATCH (n:Entity)-[e:RELATES_...
#     Neo4j Query #39: MATCH (n:Entity)-[e:RELATES_TO]->(m:Entity) WHERE e.group_id IN $group_ids WITH DISTINCT e, n, m, vector.similarity.cosine(e.fact_embedding, $search_v...
# 21:35:27 | neo4j.io | DEBUG | [#E860]  S: RECORD * 1
# 21:35:27 | neo4j.io | DEBUG | [#E860]  S: SUCCESS {'statuses': [{'gql_status': '00000', 'status_description': 'note: successful completion'}], 'type': 'r', 't_last': 1, 'db': 'neo4j'}
# 21:35:27 | neo4j.io | DEBUG | [#E860]  C: COMMIT
# 21:35:27 | neo4j.io | DEBUG | [#E860]  _: <CONNECTION> client state: TX_READY_OR_TX_STREAMING > READY
# 21:35:27 | neo4j.io | DEBUG | [#E86E]  S: SUCCESS {'bookmark': 'FB:kcwQLskJJyV+REC9lj1ew4ZBjkqQ'}
# 21:35:27 | neo4j.io | DEBUG | [#E86E]  _: <CONNECTION> server state: TX_READY_OR_TX_STREAMING > READY
# 21:35:27 | neo4j.pool | DEBUG | [#E86E]  _: <POOL> released bolt-104875
# 21:35:27 | neo4j.io | DEBUG | [#E860]  S: SUCCESS {'bookmark': 'FB:kcwQLskJJyV+REC9lj1ew4ZBjkqQ'}
# 21:35:27 | neo4j.io | DEBUG | [#E860]  _: <CONNECTION> server state: TX_READY_OR_TX_STREAMING > READY
# 21:35:27 | neo4j.pool | DEBUG | [#E860]  _: <POOL> released bolt-104890
# 21:35:27 | graphiti_core.search.search | DEBUG | search returned context for query When is the project deadline? in 89.6604061126709 ms
#       → 2 records, 64.8ms
#       → 2 records, 55.6ms
# └─ Duration: 91.1ms
#    Result: Found 2 facts
#  Fact 1
#    Data: Project Phoenix has a deadline on February 15th.
#  Fact 2
#    Data: Alice Chen is currently leading Project Phoenix.

# Retrieved Facts:
#    1. Project Phoenix has a deadline on February 15th.
#    2. Alice Chen is currently leading Project Phoenix.

In [None]:
# Cell 8.2: Search for company information

trace_logger.log_section("ADDITIONAL SEARCH: Company Information")

query = "What company does Alice work for?"
print(f"\nSearch Query: {query}")
print("\n" + "-"*70)

facts = await zep.get_user_context(
    group_id=GROUP_ID,
    query=query,
    max_facts=5,
)

print(f"\nRetrieved Facts:")
for i, fact in enumerate(facts):
    print(f"   {i+1}. {fact}")

# demo output
# ======================================================================
#   ADDITIONAL SEARCH: Company Information
# ======================================================================

# Search Query: What company does Alice work for?

# ----------------------------------------------------------------------

#  ZEP API: get_user_context (thread.get_user_context)()
#    group_id: demo_session_20260131_213508
#    query: What company does Alice work for?
#    max_facts: 5
#  GRAPHITI: search()
#     Neo4j Query #40: CALL db.index.fulltext.queryRelationships("edge_name_and_fact", $query, {limit: $limit}) YIELD relationship AS rel, score MATCH (n:Entity)-[e:RELATES_...
#     Neo4j Query #41: MATCH (n:Entity)-[e:RELATES_TO]->(m:Entity) WHERE e.group_id IN $group_ids WITH DISTINCT e, n, m, vector.similarity.cosine(e.fact_embedding, $search_v...
#       → 2 records, 70.1ms
#       → 3 records, 60.5ms
# 21:35:27 | graphiti_core.search.search | DEBUG | search returned context for query What company does Alice work for? in 95.98636627197266 ms
# └─ Duration: 97.9ms
#    Result: Found 3 facts
#  Fact 1
#    Data: Alice Chen is currently leading Project Phoenix.
#  Fact 2
#    Data: Alice Chen works at TechCorp as a senior software engineer.
#  Fact 3
#    Data: Project Phoenix has a deadline on February 15th.

# Retrieved Facts:
#    1. Alice Chen is currently leading Project Phoenix.
#    2. Alice Chen works at TechCorp as a senior software engineer.
#    3. Project Phoenix has a deadline on February 15th.

## 9. Cleanup and Summary

In [None]:
# Cell 9.1: Close trace logger and show summary

trace_logger.log_section("DEMO COMPLETE")
trace_logger.close()

print("\n" + "="*70)
print("  SUMMARY")
print("="*70)
print(f"\nRaw trace log saved to: trace_raw.jsonl")
print(f"   - Contains all API calls, LLM calls, and Neo4j queries in JSON format")
print(f"   - Use for debugging and detailed analysis")
print(f"\nDemo Group ID: {GROUP_ID}")
print(f"   - Use this to query the graph directly in Neo4j Browser")
print(f"\nNeo4j Browser Query:")
print(f"   MATCH (n) WHERE n.group_id = '{GROUP_ID}' RETURN n")


  DEMO COMPLETE

Trace log saved to: /mnt/data-disk-1/home/cpii.local/ericlo/projects/zep-repos/zep-graphiti/examples/neo4j_otel/trace_raw.jsonl

  SUMMARY

Raw trace log saved to: trace_raw.jsonl
   - Contains all API calls, LLM calls, and Neo4j queries in JSON format
   - Use for debugging and detailed analysis

Demo Group ID: demo_session_20260131_213508
   - Use this to query the graph directly in Neo4j Browser

Neo4j Browser Query:
   MATCH (n) WHERE n.group_id = 'demo_session_20260131_213508' RETURN n


In [None]:
# Cell 9.2: View the raw trace log

print("First 20 lines of trace_raw.jsonl:\n")
with open('trace_raw.jsonl', 'r') as f:
    for i, line in enumerate(f):
        if i >= 20:
            print("...")
            break
        entry = json.loads(line)
        print(f"{i+1:3d}. [{entry.get('level', 'UNKNOWN'):15s}] {json.dumps(entry)[:100]}...")

First 20 lines of trace_raw.jsonl:

  1. [SECTION        ] {"level": "SECTION", "title": "TURN 1: User introduces themselves - creates new entities", "timestam...
  2. [API            ] {"level": "API", "method": "add_message (thread.add_messages)", "params": {"group_id": "demo_session...
  3. [GRAPHITI       ] {"level": "GRAPHITI", "method": "add_episode", "params": {"group_id": "demo_session_20260131_213508"...
  4. [OTEL_SPAN_START] {"level": "OTEL_SPAN_START", "span_name": "zep.graphiti.add_episode", "timestamp": "2026-01-31T13:35...
  5. [NEO4J_QUERY    ] {"level": "NEO4J_QUERY", "query_number": 1, "query": "\n                                    MATCH (e...
  6. [NEO4J_RESULT   ] {"level": "NEO4J_RESULT", "query_number": 1, "duration_ms": 30.347824096679688, "record_count": 0, "...
  7. [OTEL_SPAN_START] {"level": "OTEL_SPAN_START", "span_name": "zep.graphiti.llm.generate", "timestamp": "2026-01-31T13:3...
  8. [LLM_REQUEST    ] {"level": "LLM_REQUEST", "call_number": 1, "model": 

In [None]:
# Cell 9.3: Close Graphiti connection

await graphiti.close()
print("Graphiti connection closed.")

## 10. Complete Execution Flow Analysis

This section documents the complete execution flow of the demo, based on actual trace logs captured during execution.

### 10.1 Execution Summary

| Turn | Description | Duration | LLM Calls | Neo4j Queries | Entities Created | Relationships Created |
|------|-------------|----------|-----------|---------------|------------------|----------------------|
| 1 | User Introduction | 6813ms | 5 | 10 | Alice Chen, TechCorp | WORKS_AT |
| 2 | Project Information | 6273ms | 6 | 10 | Project Phoenix | LEADS_PROJECT |
| 3 | Deadline Update | 5881ms | 6 | 15 | - | PROJECT_DEADLINE |
| 4 | Search Test | ~130ms | 0 | 2 | - | - |

**Total Log Entries:** 206  
**Log Level Distribution:**
- LLM_REQUEST/RESPONSE: 17 each
- NEO4J_QUERY/RESULT: 41 each  
- OTEL_SPAN_START/END: 20 each
- API: 6, GRAPHITI: 6, SECTION: 7

---

### 10.2 Turn 1: User Introduction (6813ms)

**Input Message:**
```
Alice Chen(user): Hi, I'm Alice Chen. I work at TechCorp as a senior software engineer.
```

**Execution Flow:**

```
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 0: Query Previous Episodes                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│ Neo4j Query #1 (30.3ms):                                                    │
│   MATCH (e:Episodic) WHERE e.valid_at <= $reference_time                    │
│   AND e.group_id IN $group_ids AND e.source = $source                       │
│   RETURN e ORDER BY e.valid_at DESC LIMIT $num_episodes                     │
│ Result: 0 records (first message, no history)                               │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 1: Extract Entities (LLM Call #1) - 1671ms                             │
├─────────────────────────────────────────────────────────────────────────────┤
│ Prompt: extract_nodes.extract_message                                       │
│ Tokens: prompt=491, completion=54, total=545                                │
│                                                                             │
│ System: "You are an AI assistant that extracts entity nodes from            │
│          conversational messages..."                                        │
│                                                                             │
│ User Input:                                                                 │
│   <ENTITY TYPES>                                                            │
│   [{'entity_type_id': 0, 'entity_type_name': 'Entity',                      │
│     'entity_type_description': 'Default entity classification...'}]         │
│   </ENTITY TYPES>                                                           │
│   <PREVIOUS MESSAGES>[]</PREVIOUS MESSAGES>                                 │
│   <CURRENT MESSAGE>                                                         │
│   Alice Chen(user): Hi, I'm Alice Chen. I work at TechCorp...               │
│   </CURRENT MESSAGE>                                                        │
│                                                                             │
│ Response:                                                                   │
│   {                                                                         │
│     "extracted_entities": [                                                 │
│       {"name": "Alice Chen", "entity_type_id": 0},                          │
│       {"name": "TechCorp", "entity_type_id": 0}                             │
│     ]                                                                       │
│   }                                                                         │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 2: Search Existing Entities (Neo4j Parallel Queries) - ~285ms          │
├─────────────────────────────────────────────────────────────────────────────┤
│ For each extracted entity (Alice Chen, TechCorp), run parallel searches:    │
│                                                                             │
│ Query #2 - BM25 Fulltext (TechCorp):                                        │
│   CALL db.index.fulltext.queryNodes(\"node_name_and_summary\", $query, ...) │
│   Result: 0 records                                                         │
│                                                                             │
│ Query #3 - Cosine Similarity (TechCorp):                                    │
│   MATCH (n:Entity) WHERE n.group_id IN $group_ids                           │
│   WITH n, vector.similarity.cosine(n.name_embedding,$search_vector) AS score│
│   Result: 0 records                                                         │
│                                                                             │
│ Query #4 - BM25 Fulltext (Alice Chen):                                      │
│   Result: 0 records                                                         │
│                                                                             │
│ Query #5 - Cosine Similarity (Alice Chen):                                  │
│   Result: 0 records                                                         │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 3: Deduplicate Entities (LLM Call #2) - 1278ms                         │
├─────────────────────────────────────────────────────────────────────────────┤
│ Prompt: dedupe_nodes.nodes                                                  │
│ Tokens: prompt=663, completion=78, total=741                                │
│                                                                             │
│ System: "You are a helpful assistant that determines whether or not         │
│          ENTITIES extracted from a conversation are duplicates..."          │
│                                                                             │
│ User Input:                                                                 │
│   <ENTITIES>                                                                │
│   [{"id": 0, "name": "Alice Chen", "entity_type": ["Entity"]},              │
│    {"id": 1, "name": "TechCorp", "entity_type": ["Entity"]}]                │
│   </ENTITIES>                                                               │
│   <EXISTING ENTITIES>[]</EXISTING ENTITIES>                                 │
│                                                                             │
│ Response:                                                                   │
│   {                                                                         │
│     "entity_resolutions": [                                                 │
│       {"id": 0, "name": "Alice Chen","duplicate_idx": -1, "duplicates": []},│
│       {"id": 1, "name": "TechCorp", "duplicate_idx": -1, "duplicates": []}  │
│     ]                                                                       │
│   }                                                                         │
│ → Both entities are NEW (no duplicates found)                               │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 4: Extract Relationships (LLM Call #3) - 2249ms                        │
├─────────────────────────────────────────────────────────────────────────────┤
│ Prompt: extract_edges.edge                                                  │
│ Tokens: prompt=774, completion=100, total=874                               │
│                                                                             │
│ System: "You are an expert fact extractor that extracts fact triples..."    │
│                                                                             │
│ User Input:                                                                 │
│   <ENTITIES>                                                                │
│   [{"id": 0, "name": "Alice Chen", "entity_types": ["Entity"]},             │
│    {"id": 1, "name": "TechCorp", "entity_types": ["Entity"]}]               │
│   </ENTITIES>                                                               │
│   <CURRENT_MESSAGE>Alice Chen(user): Hi, I'm Alice Chen...</CURRENT_MESSAGE>│
│   <REFERENCE_TIME>2026-01-31 13:35:08.650972+00:00</REFERENCE_TIME>         │
│                                                                             │
│ Response:                                                                   │
│   {                                                                         │
│     "edges": [{                                                             │
│       "relation_type": "WORKS_AT",                                          │
│       "source_entity_id": 0,                                                │
│       "target_entity_id": 1,                                                │
│       "fact": "Alice Chen works at TechCorp as a senior software engineer.",│
│       "valid_at": "2026-01-31T13:35:08Z",                                   │
│       "invalid_at": null                                                    │
│     }]                                                                      │
│   }                                                                         │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 5: Search Existing Edges (Neo4j Queries) - ~418ms                      │
├─────────────────────────────────────────────────────────────────────────────┤
│ Query #6 - Direct Edge Lookup (27ms):                                       │
│   MATCH (n:Entity {uuid: $source})-[e:RELATES_TO]->(m:Entity {uuid: $target})│
│   Result: 0 records                                                         │
│                                                                             │
│ Query #7-8 - BM25 + Cosine on existing edges:                               │
│   CALL db.index.fulltext.queryRelationships(\"edge_name_and_fact\", ...)     │
│   Result: 0 records (no existing edges to compare)                          │
│                                                                             │
│ Query #9-10 - Additional edge searches:                                     │
│   Result: 0 records                                                         │
│                                                                             │
│ → No existing edges found, skip deduplication LLM call                      │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 6: Generate Entity Summaries (LLM Calls #4-5 PARALLEL) - ~566ms        │
├─────────────────────────────────────────────────────────────────────────────┤
│ Prompt: extract_nodes.extract_summary (run in parallel for each entity)     │
│                                                                             │
│ LLM Call #4 - TechCorp Summary (534ms):                                     │
│   Tokens: prompt=425, completion=17, total=442                              │
│   Response: {"summary": "TechCorp employs Alice Chen as a senior software   │
│              engineer."}                                                    │
│                                                                             │
│ LLM Call #5 - Alice Chen Summary (566ms):                                   │
│   Tokens: prompt=425, completion=18, total=443                              │
│   Response: {"summary": "Alice Chen works at TechCorp as a senior software  │
│              engineer."}                                                    │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ RESULT: Episode Created                                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│ Episode UUID: 03013a24-a05b-4ed8-b57a-28dad9073f35                          │
│ Total Duration: 6808.7ms                                                    │
│                                                                             │
│ Created:                                                                    │
│   - Entity: Alice Chen (uuid: 9fbefe48-8540-4878-ac2a-1f6d0e89c3e8)         │
│   - Entity: TechCorp (uuid: 10633107-3301-4b90-a8d3-96375555b786)           │
│   - Edge: Alice Chen --[WORKS_AT]--> TechCorp                               │
│   - Episodic: Original message stored                                       │
└─────────────────────────────────────────────────────────────────────────────┘
```

---

### 10.3 Turn 2: Project Information (6273ms)

**Input Message:**
```
Alice Chen(user): I'm currently leading Project Phoenix, which is a cloud migration initiative.
```

**Key Differences from Turn 1:**

1. **Entity Extraction (LLM #6):** Extracts `Alice Chen` and `Project Phoenix`

2. **Entity Deduplication (LLM #7):**
   - `Alice Chen` → Found existing entity (idx: 0), marked as duplicate
   - `Project Phoenix` → New entity (duplicate_idx: -1)

3. **Relationship Extraction (LLM #8):**
   ```json
   {
     "edges": [{
       "relation_type": "LEADS_PROJECT",
       "source_entity_id": 0,
       "target_entity_id": 1,
       "fact": "Alice Chen is currently leading Project Phoenix.",
       "valid_at": "2026-01-31T13:35:15Z"
     }]
   }
   ```

4. **Edge Deduplication (LLM #9):**
   - Checks against existing edge: `Alice Chen WORKS_AT TechCorp`
   - Result: No duplicates, no contradictions

5. **Summary Updates (LLM #10-11):**
   - Project Phoenix: `"Cloud migration initiative led by Alice Chen at TechCorp."`
   - Alice Chen: Updated to include project leadership

**Created:**
- Entity: Project Phoenix
- Edge: Alice Chen --[LEADS_PROJECT]--> Project Phoenix

---

### 10.4 Turn 3: Deadline Update (5881ms)

**Input Message:**
```
Alice Chen(user): The deadline for Project Phoenix is February 15th. We're almost ready to go live.
```

**Key Observations:**

1. **Entity Extraction (LLM #12):** Same entities (Alice Chen, Project Phoenix)

2. **No Entity Deduplication LLM Call:** Both entities already exist

3. **Relationship Extraction (LLM #13):** Extracts TWO relationships:
   ```json
   {
     "edges": [
       {
         "relation_type": "LEADS_PROJECT",
         "fact": "Alice Chen is leading Project Phoenix."
       },
       {
         "relation_type": "PROJECT_DEADLINE",
         "fact": "Project Phoenix has a deadline on February 15th."
       }
     ]
   }
   ```

4. **Edge Deduplication (LLM #14-15):**
   - LEADS_PROJECT: `duplicate_facts: [0]` → Duplicate of existing edge, SKIPPED
   - PROJECT_DEADLINE: New fact, CREATED

5. **More Neo4j Queries (15 vs 10):** Additional queries to find existing edges for deduplication

**Created:**
- Edge: Project Phoenix --[PROJECT_DEADLINE]--> (with fact about Feb 15th)

---

### 10.5 Turn 4: Search Test (~130ms)

**Query:**
```
What does Alice work on?
```

**Execution Flow:**

```
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 1: Generate Query Embedding (Local)                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│ Embedder: sentence-transformers/all-MiniLM-L6-v2                            │
│ Input: "What does Alice work on?"                                           │
│ Output: 384-dimensional vector                                              │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 2: Execute Parallel Searches                                           │
├─────────────────────────────────────────────────────────────────────────────┤
│ Query #36 - BM25 Fulltext Search (67.1ms):                                  │
│   CALL db.index.fulltext.queryRelationships(\"edge_name_and_fact\", $query) │
│   YIELD relationship AS rel, score                                          │
│   MATCH (n:Entity)-[e:RELATES_TO {uuid: rel.uuid}]->(m:Entity)              │
│   WHERE e.group_id IN $group_ids                                            │
│   Result: 3 records                                                         │
│                                                                             │
│ Query #37 - Cosine Similarity Search (60.2ms):                              │
│   MATCH (n:Entity)-[e:RELATES_TO]->(m:Entity)                               │
│   WHERE e.group_id IN $group_ids                                            │
│   WITH DISTINCT e, n, m,                                                    │
│        vector.similarity.cosine(e.fact_embedding, $search_vector) AS score  │
│   WHERE score > $min_score                                                  │
│   Result: 2 records                                                         │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 3: Merge and Return Results                                            │
├─────────────────────────────────────────────────────────────────────────────┤
│ Total Duration: 93.2ms                                                      │
│                                                                             │
│ Retrieved Facts (3):                                                        │
│   1. "Alice Chen is currently leading Project Phoenix."                     │
│   2. "Project Phoenix has a deadline on February 15th."                     │
│   3. "Alice Chen works at TechCorp as a senior software engineer."          │
└─────────────────────────────────────────────────────────────────────────────┘
```

**Key Observation:** Search is extremely fast (~130ms) because:
- No LLM calls required
- Only 2 Neo4j queries (parallel)
- Results are pre-indexed with embeddings

---

### 10.6 Final Knowledge Graph State

```
                    ┌─────────────────────┐
                    │      TechCorp       │
                    │     (Entity)        │
                    │                     │
                    │ Summary: "TechCorp  │
                    │ employs Alice Chen  │
                    │ as a senior software│
                    │ engineer."          │
                    └──────────▲──────────┘
                               │
                               │ WORKS_AT
                               │ "Alice Chen works at TechCorp
                               │  as a senior software engineer."
                               │
┌──────────────────────────────┴──────────────────────────────┐
│                        Alice Chen                           │
│                         (Entity)                            │
│                                                             │
│ Summary: "Alice Chen works at TechCorp as a senior software │
│          engineer, leading Project Phoenix, a cloud         │
│          migration initiative with a Feb 15th deadline."    │
└──────────────────────────────┬──────────────────────────────┘
                               │
                               │ LEADS_PROJECT
                               │ "Alice Chen is currently
                               │  leading Project Phoenix."
                               │
                               ▼
                    ┌─────────────────────┐
                    │   Project Phoenix   │
                    │      (Entity)       │
                    │                     │
                    │ Summary: "Cloud     │
                    │ migration initiative│
                    │ led by Alice Chen.  │
                    │ Deadline: Feb 15th."│
                    └──────────┬──────────┘
                               │
                               │ PROJECT_DEADLINE
                               │ "Project Phoenix has a
                               │  deadline on February 15th."
                               │
                               ▼
                         (implicit target)
```

---

### 10.7 Performance Analysis

**Time Breakdown by Component:**

| Component | Turn 1 | Turn 2 | Turn 3 | Turn 4 |
|-----------|--------|--------|--------|--------|
| LLM Calls | ~6106ms (89.6%) | ~6009ms (95.8%) | ~4835ms (82.2%) | 0ms |
| Neo4j Queries | ~704ms (10.3%) | ~264ms (4.2%) | ~1047ms (17.8%) | ~127ms (100%) |
| Other | ~3ms (0.1%) | 0ms | 0ms | ~3ms |
| **Total** | **6813ms** | **6273ms** | **5881ms** | **~130ms** |

another angle: read is db performance, but write is not

| Metric | Write (add_episode) | Read (search) |
|--------|---------------------|---------------|
| **Primary Bottleneck** | LLM (82-96% of time) | DB + Embedding |
| **Neo4j Time** | ~264-1047ms (4-18%) | ~127ms (100%) |
| **LLM Time** | ~4835-6106ms | 0ms |
| **Limiting Factor** | LLM API concurrency | Neo4j indexing efficiency |

**LLM Call Breakdown (Turn 1):**

| Step | Prompt Name | Duration | Tokens |
|------|-------------|----------|--------|
| Extract Entities | extract_nodes.extract_message | 1671ms | 545 |
| Dedupe Entities | dedupe_nodes.nodes | 1278ms | 741 |
| Extract Edges | extract_edges.edge | 2249ms | 874 |
| Summary (TechCorp) | extract_nodes.extract_summary | 534ms | 442 |
| Summary (Alice) | extract_nodes.extract_summary | 566ms | 443 |

**Observations:**
1. **LLM is the bottleneck:** 82-96% of execution time is spent on LLM calls
2. **Edge extraction is slowest:** ~2.2s for relationship extraction
3. **Parallel execution helps:** Summary generation runs in parallel
4. **Search is fast:** No LLM needed, only vector similarity search
5. **Neo4j is efficient:** ofc

---

### 10.8 Neo4j's Role

To clarify Neo4j's actual role in this architecture:

perhaps **Neo4j is NOT a \"Graph Operator with LLM\"** - it is just a storage engine with vector retrieval capabilities, so zep is kinada like other agent mem papers but a bit better: still a **AI Agent Orchestration Framework Built on Top of a Database**.

| Aspect | Neo4j's Role | NOT Neo4j's Role |
|--------|--------------|------------------|
| Storage | ✅ Store nodes, edges, embeddings | |
| Query | ✅ Execute Cypher, BM25, vector similarity | |
| Reasoning | | ❌ Handled by LLM in Python layer |
| Entity Resolution | | ❌ \"Alice\" vs \"Alice Chen\" decided by LLM |
| Graph Integrity | | ❌ Deduplication logic in Graphiti Core |

**Scalability Implications:**

- Python/LLM Layer: Processing involves heavy prompt construction, JSON parsing, and concurrent LLM requests. Graphiti Server is the layer that needs horizontal scaling (more Python workers).
- DB Layer: Neo4j handles storage and retrieval efficiently. Unless the graph reaches hundreds of millions of nodes, the database is unlikely to be the primary bottleneck.

---

### 10.9 Comparison with VikingMem

graphiti/zep is NOT a DBMS of agent memory, but VikingMem seems to be a DBMS. The core of zep/graphiti is still python client-side logic, still a hard-coded pipeline, using data shipping. 
- db usage:
  - Yes, zep uses graph db to store all the data including raw data and higher-level 'views', but DB itself has no aware of what it stores.
  - However, vikingmem, as I speculate based on the paper, likely has customized the computation/access layer of VikingDB, or perhaps it wraps VikingDB within a tightly integrated system shell.
  - No built-in native operators such as SUM/AVG/LLM_MERGE in vikingmem (code/query shipping), no in-system processing, e.g., TTL, TIMECOMPRESS, LLM_MERGE are system primitives. Instead of DB->Python, it modifies the kernel/computation path.
- Data model wise: 
  - Graphiti has schema (entity node, episode node, community node, etc.), but it seems to be descriptive, and the evolutionary logic between nodes is loose (flexible though, fully managed by LLM that may have hallunations);
  - However, VikingMem's data model is presciptive and deterministic, as it has defined operators inside the data model. It explicitly states that the attribute Y of Entity X must and can only be calculated from the attribute B of Event A through the operator OP, attempting to make the evolution of memory a deterministic function.
- Also, VikingMem has lifecycle management, making Consolidation (memory consolidation/forgetting) a bg automatic process (not explicitly called by the application layer).
- Query optimization: graphiti's search is hard-coded python pipeline, but vikingmem mentioned multi-granular indexing and dynamic search, kinda meaning that system can automatically decide the best way to search, and use index effectively (query optimizer?).

---

### 10.10 Comparison with LOTUS

Target:
- LOTUS's target is to save money, sacraficing a little bit accuracy compared to full LLM queries；
- Graphiti's main target is accuracy, not 'saving money'.

Methodology:
- Graphiti has 2 phases:
  - create/update: expensive ETL pipeline, LLM heavy, creating and maintaining the view on top of raw data
  - retrieval: only graph query, no LLM, fast, just retrieve the view
- LOTUS has 1 phase:
  - retrieval: no insertion and ETL, every operation is based on retrieval, LLM heavy, retrieve raw data, then process

