# Mem0 Implementation: Scalable Long-Term Memory for LLM Agents (Refined Output)

This notebook provides a detailed walkthrough and simplified implementation of the Mem0 architecture, focusing on its core mechanisms and comparing its token efficiency against a full-context approach. This version aims for more concise output during runs, storing detailed logs in variables for later inspection, and presenting the final token comparison in a structured DataFrame.

**Scenario:** AI assistant helping a user plan a 'New Marketing Campaign'.

**Core Mem0 Concepts Implemented (Simplified):**
1.  Memory Extraction (`ϕ`)
2.  Memory Storage (Textual facts with embeddings)
3.  Memory Update Logic (ADD, UPDATE, NOOP via LLM)
4.  Memory Retrieval (Semantic search for query context)

**Improvements in this version:**
- Reduced default console output during runs.
- `VERBOSE_RAW_RUN` and `VERBOSE_MEM0_RUN` flags to enable detailed logging if needed.
- Detailed turn-by-turn logs stored in `raw_run_log` and `mem0_run_log` variables.
- Final token comparison presented as a Pandas DataFrame.

## 1. Setup: Libraries, API Configuration, and Helper Functions

### 1.1. Import Libraries

In [None]:
# Make sure to install the required packages:
# !pip install openai scikit-learn pandas numpy

In [1]:
# Import required standard and third-party libraries for memory, LLM, and data handling
import os      # OS interactions
import json    # For JSON parsing/generation (LLM outputs)
import time    # For delays between API calls
import uuid    # For unique memory item IDs
from datetime import datetime  # For timestamps

import numpy as np            # For embedding vectors
import pandas as pd           # For DataFrame/tabular analysis
from openai import OpenAI     # For OpenAI-compatible LLM/embedding API
from sklearn.metrics.pairwise import cosine_similarity  # For embedding similarity


### 1.2. API and Model Configuration

In [None]:
# --- API and Model Configuration ---

# Get API key from environment variable (already set in variable API_KEY)
# If not set, raise an error to prevent accidental unauthenticated requests
API_KEY = os.getenv("NEBIUS_API_KEY")
if not API_KEY:
    raise ValueError("API key not set. Please set the NEBIUS_API_KEY environment variable.")

# Set the base URL for the Nebius API
BASE_URL = "https://api.studio.nebius.com/v1/"

# Specify the LLM model to use for generating responses, extracting facts, and making update decisions
# Using a more capable model is recommended for better Mem0 performance
LLM_MODEL = "deepseek-ai/DeepSeek-V3"

# Specify the embedding model to use for generating text embeddings
EMBEDDING_MODEL = "BAAI/bge-multilingual-gemma2"

# Initialize the OpenAI-compatible client with the specified base URL and API key
client = OpenAI(
    base_url=BASE_URL,
    api_key=API_KEY
)

print(f"OpenAI client configured. Using LLM: {LLM_MODEL}, Embeddings: {EMBEDDING_MODEL}")

OpenAI client configured. Using LLM: deepseek-ai/DeepSeek-V3, Embeddings: BAAI/bge-multilingual-gemma2


### 1.3. Global Token Counters and Logging Variables

In [3]:
# --- Global Token Counters ---
# Track token usage for both approaches and each Mem0 sub-task
total_prompt_tokens_raw, total_completion_tokens_raw = 0, 0  # Raw/full-context approach
total_prompt_tokens_mem0_conversation, total_completion_tokens_mem0_conversation = 0, 0  # Mem0: conversational queries
total_prompt_tokens_mem0_extraction, total_completion_tokens_mem0_extraction = 0, 0      # Mem0: extraction sub-task
total_prompt_tokens_mem0_update, total_completion_tokens_mem0_update = 0, 0              # Mem0: update sub-task

# --- Logging Variables ---
# Store detailed logs for each run for later inspection or analysis
raw_run_log = []   # Logs for the raw/full-context approach
mem0_run_log = []  # Logs for the Mem0 approach

def reset_all_token_counters_and_logs():
       """
       Resets ALL global token counters and log lists to zero/empty.
       This ensures a clean slate before running a new experiment or comparison.
       """
       global total_prompt_tokens_raw, total_completion_tokens_raw, \
                 total_prompt_tokens_mem0_conversation, total_completion_tokens_mem0_conversation, \
                 total_prompt_tokens_mem0_extraction, total_completion_tokens_mem0_extraction, \
                 total_prompt_tokens_mem0_update, total_completion_tokens_mem0_update, \
                 raw_run_log, mem0_run_log

       # Reset all token counters to zero
       total_prompt_tokens_raw, total_completion_tokens_raw = 0, 0
       total_prompt_tokens_mem0_conversation, total_completion_tokens_mem0_conversation = 0, 0
       total_prompt_tokens_mem0_extraction, total_completion_tokens_mem0_extraction = 0, 0
       total_prompt_tokens_mem0_update, total_completion_tokens_mem0_update = 0, 0

       # Clear all run logs
       raw_run_log = []
       mem0_run_log = []
       print("[Counters & Logs] All token counters and run logs have been reset.")

### 1.4. Core LLM and Embedding Helper Functions

In [4]:
def get_embedding(text_to_embed, verbose=False):
    """
    Generates an embedding vector for the given text using the configured embedding model.
    Args:
        text_to_embed (str): The input text to embed.
        verbose (bool): If True, prints error messages.
    Returns:
        np.ndarray: Embedding vector as a numpy array. Returns a zero vector on failure.
    """
    try:
        # Call the embedding API to get the embedding vector
        response = client.embeddings.create(model=EMBEDDING_MODEL, input=text_to_embed)
        return np.array(response.data[0].embedding)
    except Exception as e:
        # On failure, print error (if verbose) and return a zero vector of expected dimension
        if verbose: print(f"[Error] Embedding failed for '{text_to_embed[:50]}...': {e}. Returning zero vector.")
        default_embedding_dim = 2560  # Adjust if your model's dimension is different
        return np.zeros(default_embedding_dim)

def get_llm_chat_completion(messages, temperature=0.1, max_tokens=150, verbose=False):
    """
    Calls the LLM chat completion API with the given messages and parameters.
    Args:
        messages (list): List of message dicts for the chat history.
        temperature (float): Sampling temperature for the LLM.
        max_tokens (int): Maximum tokens to generate in the completion.
        verbose (bool): If True, prints error messages.
    Returns:
        tuple: (response_content, prompt_tokens, completion_tokens)
    """
    try:
        # Call the LLM chat completion API
        response = client.chat.completions.create(
            model=LLM_MODEL,
            messages=messages,
            temperature=temperature,
            max_tokens=max_tokens
        )
        content = response.choices[0].message.content
        prompt_tokens = response.usage.prompt_tokens if response.usage else 0
        completion_tokens = response.usage.completion_tokens if response.usage else 0
        return content, prompt_tokens, completion_tokens
    except Exception as e:
        # On failure, print error (if verbose) and return error message with zero tokens
        if verbose: print(f"[Error] LLM chat completion failed: {e}. Returning error message.")
        return f"Error: LLM call failed. {e}", 0, 0

print("Helper functions for embeddings and LLM calls defined.")

Helper functions for embeddings and LLM calls defined.


## 2. The Conversational Scenario: Planning a Marketing Campaign

In [5]:
# Define the conversation script as a list of user turns (each turn is a dictionary)
conversation_script = [
    # User states the main goal for the campaign
    {"role": "user", "content": "Hi, let's start planning the 'New Marketing Campaign'. My primary goal is to increase brand awareness by 20%."},
    # User specifies the target audience
    {"role": "user", "content": "For this campaign, the target audience is young adults aged 18-25."},
    # User allocates an initial budget for social media ads
    {"role": "user", "content": "I want to allocate a budget of $5000 for social media ads for the New Marketing Campaign."},
    # User asks about the main goal for the campaign
    {"role": "user", "content": "What's the main goal for the New Marketing Campaign?"},
    # User asks about the target audience
    {"role": "user", "content": "Who are we targeting for this campaign?"},
    # User adds a new task related to influencer research
    {"role": "user", "content": "Let's also consider influencers. Add a task: 'Research potential influencers for the 18-25 demographic' for the New Marketing Campaign."},
    # User updates the budget for social media ads
    {"role": "user", "content": "Actually, let's increase the social media ad budget for the New Marketing Campaign to $7500."},
    # User asks about the current budget for social media ads
    {"role": "user", "content": "What's the current budget for social media ads for the New Marketing Campaign?"},
    # User asks about pending tasks for the campaign
    {"role": "user", "content": "What tasks do I have pending for this campaign?"},
    # User expresses a preference for visual content
    {"role": "user", "content": "Also, for the New Marketing Campaign, I prefer visual content for this demographic, like short videos and infographics."}
]

print(f"Conversational scenario defined with {len(conversation_script)} turns.")

Conversational scenario defined with 10 turns.


In [6]:
def classify_input(user_input):
    """
    Classifies user input as a query or statement using the LLM.

    Args:
        user_input (str): The user's input message.

    Returns:
        str: "query" if the input is a question, "statement" otherwise.
    """
    # Define the system prompt to instruct the LLM on classification rules
    system_prompt = (
        "You are a classifier. "
        "A 'query' is a question or request for information. "
        "A 'statement' is a declaration, instruction, or information that is not a question. "
        "Respond with only one word: either 'query' or 'statement'."
    )
    # Call the LLM to classify the input
    response = client.chat.completions.create(
        model="microsoft/phi-4",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": f"Classify the following input as a query or statement: {user_input}"}
        ]
    )
    # Extract and normalize the classification result
    classification = response.choices[0].message.content.strip().lower()
    return classification

In [7]:
# Iterate through each turn in the conversation script
for turn in conversation_script:
    # Use the LLM to classify the user's input as either a 'query' or 'statement'
    classification = classify_input(turn["content"])
    # Add the classification result to the turn dictionary under the 'type' key
    turn["type"] = classification
    # Print the input and its classification for verification
    print(f"Input: {turn['content']}, Classification: {classification}")

Input: Hi, let's start planning the 'New Marketing Campaign'. My primary goal is to increase brand awareness by 20%., Classification: statement
Input: For this campaign, the target audience is young adults aged 18-25., Classification: statement
Input: I want to allocate a budget of $5000 for social media ads for the New Marketing Campaign., Classification: statement
Input: What's the main goal for the New Marketing Campaign?, Classification: query
Input: Who are we targeting for this campaign?, Classification: query
Input: Let's also consider influencers. Add a task: 'Research potential influencers for the 18-25 demographic' for the New Marketing Campaign., Classification: statement
Input: Actually, let's increase the social media ad budget for the New Marketing Campaign to $7500., Classification: statement
Input: What's the current budget for social media ads for the New Marketing Campaign?, Classification: query
Input: What tasks do I have pending for this campaign?, Classification: 

## 3. Approach 1: Raw / Full-Context Approach

In [8]:
VERBOSE_RAW_RUN = False  # Set to True for detailed turn-by-turn console output

def run_raw_full_context_approach(script):
    """
    Runs the conversation using the raw/full-context approach:
    - Each user turn is appended to the full conversation history.
    - The LLM receives the entire conversation history for every response.
    - Token usage and responses are logged for analysis.
    """
    global total_prompt_tokens_raw, total_completion_tokens_raw, raw_run_log

    print("--- Running Raw/Full-Context Approach ---")
    # Initialize conversation history with a system prompt
    current_conversation_history_for_llm = [
        {"role": "system", "content": "You are a helpful assistant. Respond based on the full conversation history."}
    ]

    for i, turn_details in enumerate(script):
        user_message_content = turn_details['content']
        turn_type = turn_details['type']

        if VERBOSE_RAW_RUN:
            print(f"\n--- Raw Turn {i+1}/{len(script)} (Type: {turn_type}) ---")
        print(f"Raw Turn {i+1} User: {user_message_content[:80]}...")

        # Add user message to conversation history
        current_conversation_history_for_llm.append({"role": "user", "content": user_message_content})

        # Call LLM with the full conversation history
        assistant_response_text, p_tokens, c_tokens = get_llm_chat_completion(
            current_conversation_history_for_llm, max_tokens=150, verbose=VERBOSE_RAW_RUN
        )

        # Update global token counters
        total_prompt_tokens_raw += p_tokens
        total_completion_tokens_raw += c_tokens

        print(f"Raw Turn {i+1} Assistant: {assistant_response_text[:80]}...")
        if VERBOSE_RAW_RUN:
            print(f"Tokens for this turn (Prompt: {p_tokens}, Completion: {c_tokens})")
            print(f"Cumulative Raw Tokens (Prompt: {total_prompt_tokens_raw}, Completion: {total_completion_tokens_raw})")

        # Add assistant response to conversation history
        current_conversation_history_for_llm.append({"role": "assistant", "content": assistant_response_text})

        # Log turn details for later analysis
        raw_run_log.append({
            "turn": i + 1,
            "type": turn_type,
            "user_content": user_message_content,
            "assistant_response": assistant_response_text,
            "prompt_tokens_turn": p_tokens,
            "completion_tokens_turn": c_tokens,
            "cumulative_prompt_tokens": total_prompt_tokens_raw,
            "cumulative_completion_tokens": total_completion_tokens_raw
        })

        time.sleep(0.2)  # Reduced sleep time, adjust if rate limits are an issue

    print("\n--- Raw/Full-Context Approach Summary ---")
    print(f"Total Prompt Tokens: {total_prompt_tokens_raw}")
    print(f"Total Completion Tokens: {total_completion_tokens_raw}")
    print(f"Overall Total Tokens for Raw Approach: {total_prompt_tokens_raw + total_completion_tokens_raw}")

## 4. Approach 2: Mem0 Implementation

### 4.1. Memory Item and Memory Store Classes

In [9]:
class MemoryItem:
    def __init__(self, text_content, source_turn_indices_list, verbose_embedding=False):
        """
        Represents a single memory item with text, embedding, and metadata.

        Args:
            text_content (str): The content to store in memory.
            source_turn_indices_list (list): List of conversation turn indices that contributed to this memory.
            verbose_embedding (bool): If True, prints embedding errors.
        """
        self.id = str(uuid.uuid4())  # Unique identifier for the memory item
        self.text = text_content  # The actual memory content
        self.embedding = get_embedding(text_content, verbose=verbose_embedding)  # Embedding vector
        self.creation_timestamp = datetime.now()  # When the memory was created
        self.last_accessed_timestamp = self.creation_timestamp  # Last time accessed
        self.access_count = 0  # Number of times accessed
        self.source_turn_indices = list(source_turn_indices_list)  # Source turns for provenance

    def __repr__(self):
        # String representation for debugging
        return (f"MemoryItem(id={self.id}, text='{self.text[:60]}...', "
                f"created={self.creation_timestamp.strftime('%H:%M:%S')}, accessed={self.access_count})")

    def mark_accessed(self):
        # Update access metadata when memory is accessed
        self.last_accessed_timestamp = datetime.now()
        self.access_count += 1

In [10]:
class MemoryStore:
    def __init__(self, verbose_ops=False):
        """
        Stores and manages multiple MemoryItem instances.

        Args:
            verbose_ops (bool): If True, prints debug information for operations.
        """
        self.memories = {}  # Dictionary to store memory_id -> MemoryItem
        self.verbose = verbose_ops
        if self.verbose:
            print("[MemoryStore] Initialized an empty memory store.")

    def add_memory_item(self, memory_item_instance):
        """
        Adds a new MemoryItem to the store.

        Args:
            memory_item_instance (MemoryItem): The memory item to add.
        """
        self.memories[memory_item_instance.id] = memory_item_instance
        if self.verbose:
            print(f"[MemoryStore] ADDED: '{memory_item_instance.text[:70]}' (ID: {memory_item_instance.id})")

    def get_memory_item_by_id(self, memory_id):
        """
        Retrieves a MemoryItem by its ID and marks it as accessed.

        Args:
            memory_id (str): The ID of the memory item.

        Returns:
            MemoryItem or None: The memory item if found, else None.
        """
        if memory_id in self.memories:
            self.memories[memory_id].mark_accessed()  # Mark as accessed
            return self.memories[memory_id]
        return None

    def update_existing_memory_item(self, memory_id, new_text_content, contributing_turn_indices):
        """
        Updates the text and embedding of an existing memory item.

        Args:
            memory_id (str): The ID of the memory to update.
            new_text_content (str): The new text content.
            contributing_turn_indices (list): Additional turn indices to add.

        Returns:
            bool: True if update succeeded, False otherwise.
        """
        if memory_id in self.memories:
            memory_to_update = self.memories[memory_id]
            original_text_preview = memory_to_update.text[:70]
            # Update text and embedding
            memory_to_update.text = new_text_content
            memory_to_update.embedding = get_embedding(new_text_content, verbose=self.verbose)
            memory_to_update.creation_timestamp = datetime.now()
            # Add new contributing turn indices if not already present
            for turn_idx in contributing_turn_indices:
                if turn_idx not in memory_to_update.source_turn_indices:
                    memory_to_update.source_turn_indices.append(turn_idx)
            memory_to_update.mark_accessed()
            if self.verbose:
                print(f"[MemoryStore] UPDATED ID {memory_id}: From '{original_text_preview}' TO '{new_text_content[:70]}'")
            return True
        if self.verbose:
            print(f"[MemoryStore] UPDATE FAILED: ID {memory_id} not found.")
        return False

    def find_semantically_similar_memories(self, query_text_embedding, top_s_results=3, similarity_threshold=0.5):
        """
        Finds memories most similar to a query embedding.

        Args:
            query_text_embedding (np.ndarray): Embedding of the query text.
            top_s_results (int): Max number of results to return.
            similarity_threshold (float): Minimum similarity score to consider.

        Returns:
            list: Tuples of (MemoryItem, similarity_score).
        """
        # Return empty if no memories or invalid query embedding
        if not self.memories or query_text_embedding is None or query_text_embedding.size == 0:
            return []
        # Filter out memories with invalid embeddings
        valid_memories_for_similarity = [
            (mid, self.memories[mid].embedding) for mid in self.memories
            if self.memories[mid].embedding is not None and 
               self.memories[mid].embedding.size > 0 and 
               np.any(self.memories[mid].embedding)  # Ensure not all zeros
        ]
        if not valid_memories_for_similarity:
            return []
        valid_memory_ids = [item[0] for item in valid_memories_for_similarity]
        valid_memory_embeddings = np.array([item[1] for item in valid_memories_for_similarity])
        # Handle case of single valid memory
        if valid_memory_embeddings.ndim == 1:
            valid_memory_embeddings = valid_memory_embeddings.reshape(1, -1)
        if query_text_embedding.ndim == 1:
            query_text_embedding = query_text_embedding.reshape(1, -1)

        # Compute cosine similarity between query and all valid memories
        similarities_vector = cosine_similarity(query_text_embedding, valid_memory_embeddings)[0]
        sorted_similarity_indices = np.argsort(similarities_vector)[::-1]
        retrieved_similar_memories = []
        # Collect top results above threshold
        for i in range(min(top_s_results, len(sorted_similarity_indices))):
            idx = sorted_similarity_indices[i]
            similarity_score = similarities_vector[idx]
            if similarity_score >= similarity_threshold:
                retrieved_similar_memories.append((self.memories[valid_memory_ids[idx]], similarity_score))
            else:
                break
        return retrieved_similar_memories

    def clear_store(self):
        """
        Clears all memories from the store.
        """
        self.memories = {}
        if self.verbose:
            print("[MemoryStore] Memory store has been cleared.")

# Instantiate the memory store (set verbose_ops=True for debug output)
mem0_memory_store = MemoryStore(verbose_ops=False)

### 4.2. Memory Extraction Function (`ϕ`)

In [11]:
def mem0_extract_salient_facts_from_turn(current_user_statement_text, recent_turns_window_text, current_turn_index_in_script, verbose=False):
    """
    Uses the LLM to extract concise, self-contained, declarative facts from a user's statement,
    considering recent conversation context. Returns a list of fact strings.

    Args:
        current_user_statement_text (str): The user's current statement.
        recent_turns_window_text (str): Recent conversation context as text.
        current_turn_index_in_script (int): Index of the current turn in the script.
        verbose (bool): If True, prints debug information.

    Returns:
        list: List of extracted fact strings.
    """
    global total_prompt_tokens_mem0_extraction, total_completion_tokens_mem0_extraction

    # Build the extraction prompt for the LLM
    extraction_prompt_template = f"""
    You are an AI expert in extracting key information from dialogue.
    Analyze 'New User Statement' in context of 'Recent Conversation Context'.
    Extract concise, self-contained, declarative facts representing new, important user-provided information from 'New User Statement'.
    Each fact: complete sentence. Focus: user's goals, plans, preferences, decisions, key entities.
    Avoid: questions, acknowledgements, fluff. If statement updates info, extract updated fact. Do NOT infer.
    Recent Conversation Context:
    ---BEGIN CONTEXT---
    {recent_turns_window_text if recent_turns_window_text else "(No prior context)"}
    ---END CONTEXT---
    New User Statement to Process: "{current_user_statement_text}"
    Output ONLY a valid JSON list of strings (facts). E.g.: ["Fact 1.", "Fact 2."]. Empty list [] if no new salient facts.
    Extracted Facts (JSON list):
    """

    # Prepare messages for the LLM call
    extraction_messages = [
        {"role": "system", "content": "Expert extraction AI. Output ONLY valid JSON list of facts."},
        {"role": "user", "content": extraction_prompt_template}
    ]

    # Call the LLM to extract facts
    llm_extraction_response_text, p_tokens, c_tokens = get_llm_chat_completion(
        extraction_messages, temperature=0.0, max_tokens=250, verbose=verbose
    )

    # Update global token counters
    total_prompt_tokens_mem0_extraction += p_tokens
    total_completion_tokens_mem0_extraction += c_tokens

    if verbose:
        print(f"[Extractor LLM] Raw Output: {llm_extraction_response_text}")

    # Attempt to parse the LLM's output as a JSON list of strings
    try:
        json_start_index = llm_extraction_response_text.find('[')
        json_end_index = llm_extraction_response_text.rfind(']')
        if json_start_index != -1 and json_end_index != -1 and json_end_index > json_start_index:
            json_string_candidate = llm_extraction_response_text[json_start_index : json_end_index+1]
            parsed_facts_list = json.loads(json_string_candidate)
            if isinstance(parsed_facts_list, list) and all(isinstance(fact, str) for fact in parsed_facts_list):
                if verbose or len(parsed_facts_list) > 0:
                    print(f"[Extractor LLM] Parsed {len(parsed_facts_list)} fact(s).")
                return parsed_facts_list
            if verbose:
                print(f"[Extractor LLM] Warning: Parsed JSON not list of strings: {parsed_facts_list}. Returning [].")
        elif verbose:
            print(f"[Extractor LLM] Warning: No valid JSON list brackets in: '{llm_extraction_response_text}'. Returning [].")
    except Exception as e:
        if verbose:
            print(f"[Extractor LLM] Error parsing JSON: {e}. Response: '{llm_extraction_response_text}'. Returning [].")
    return []

### 4.3. Memory Update Logic (ADD, UPDATE, NOOP)

In [12]:
# Number of similar memories to consider for update decision
S_SIMILAR_MEMORIES_FOR_UPDATE_DECISION = 3

def mem0_decide_memory_operation_with_llm(candidate_fact_text, similar_existing_memories_list, verbose=False):
    """
    Uses the LLM to decide whether to ADD, UPDATE, or NOOP a candidate fact in memory,
    based on its similarity to existing memories.

    Args:
        candidate_fact_text (str): The new fact extracted from user input.
        similar_existing_memories_list (list): List of (MemoryItem, similarity_score) tuples.
        verbose (bool): If True, prints debug information.

    Returns:
        dict: JSON object with operation decision, e.g.,
              {"operation": "ADD"} or
              {"operation": "UPDATE", "target_memory_id": "ID", "updated_memory_text": "Text"} or
              {"operation": "NOOP"}
    """
    global total_prompt_tokens_mem0_update, total_completion_tokens_mem0_update

    # Prepare prompt segment describing similar memories, if any
    similar_memories_prompt_segment = "No highly similar memories found."
    if similar_existing_memories_list:
        formatted_list = [
            f"  {i+1}. ID: {mem.id}, Sim: {sim_score:.4f}, Text: '{mem.text}'"
            for i, (mem, sim_score) in enumerate(similar_existing_memories_list)
        ]
        similar_memories_prompt_segment = "Existing Similar Memories:\n" + "\n".join(formatted_list)
    
    # Build the update decision prompt for the LLM
    formatted_update_decision_prompt = f"""
    AI Memory Consolidation Module.
    Task: Integrate 'New Candidate Fact' into memory store.
    New Candidate Fact: "{candidate_fact_text}"
    {similar_memories_prompt_segment}
    Decide ONE operation: ADD, UPDATE, or NOOP.
    Rules:
    1. ADD: If new info, not covered by similar memories (or no similar found).
    2. UPDATE: If fact corrects, makes current, or adds essential detail to ONE similar memory, superseding it. Specify 'target_memory_id' (from list) and 'updated_memory_text' (usually New Candidate Fact text).
    3. NOOP: If redundant or no new substantive value over existing similar memories.
    Output STRICTLY JSON: {{ "operation": "ADD" }} OR {{ "operation": "UPDATE", "target_memory_id": "ID", "updated_memory_text": "Text" }} OR {{ "operation": "NOOP" }}
    Decision (JSON object):
    """
    # Prepare messages for the LLM call
    update_decision_messages = [
        {"role": "system", "content": "Expert memory AI. Output ONLY valid JSON decision as instructed."},
        {"role": "user", "content": formatted_update_decision_prompt}
    ]

    # Call the LLM to get the update decision
    llm_decision_response_text, p_tokens, c_tokens = get_llm_chat_completion(
        update_decision_messages, temperature=0.0, max_tokens=200, verbose=verbose
    )

    # Update global token counters
    total_prompt_tokens_mem0_update += p_tokens
    total_completion_tokens_mem0_update += c_tokens

    if verbose:
        print(f"[Updater LLM] Raw Output: {llm_decision_response_text}")

    # Attempt to parse the LLM's output as a JSON object
    try:
        json_start_index = llm_decision_response_text.find('{')
        json_end_index = llm_decision_response_text.rfind('}')
        if json_start_index != -1 and json_end_index != -1 and json_end_index > json_start_index:
            json_string_candidate = llm_decision_response_text[json_start_index : json_end_index+1]
            parsed_decision_json = json.loads(json_string_candidate)
            op = parsed_decision_json.get("operation")
            # Check for valid operation types and required fields
            if op in ["ADD", "NOOP"]:
                if verbose or op == "NOOP":
                    print(f"[Updater LLM] Parsed decision: {op}")
                return parsed_decision_json
            elif op == "UPDATE" and "target_memory_id" in parsed_decision_json and "updated_memory_text" in parsed_decision_json:
                if verbose:
                    print(f"[Updater LLM] Parsed decision: {op}")
                return parsed_decision_json
            if verbose:
                print(f"[Updater LLM] Warning: Invalid decision structure: {parsed_decision_json}. Defaulting to ADD.")
        elif verbose:
            print(f"[Updater LLM] Warning: No valid JSON object brackets in: '{llm_decision_response_text}'. Defaulting to ADD.")
    except Exception as e:
        if verbose:
            print(f"[Updater LLM] Error parsing JSON: {e}. Response: '{llm_decision_response_text}'. Defaulting to ADD.")
    # Default fallback: ADD operation
    return {"operation": "ADD"}

### 4.4. Orchestrating Memory Extraction and Update

In [13]:
def mem0_process_user_statement_for_memory(user_statement_text, recent_turns_context_text, memory_store_instance, current_turn_idx, turn_log_entry, verbose=False):
    """
    Orchestrates the process of extracting salient facts from a user's statement,
    determining memory operations (ADD, UPDATE, NOOP) for each fact, and updating the memory store accordingly.

    Args:
        user_statement_text (str): The user's statement to process.
        recent_turns_context_text (str): Recent conversation context for extraction.
        memory_store_instance (MemoryStore): The memory store to update.
        current_turn_idx (int): Index of the current turn in the script.
        turn_log_entry (dict): Log entry for this turn (for detailed logging).
        verbose (bool): If True, prints detailed debug information.
    """
    if verbose:
        print(f"[MemoryOrchestrator] Processing: '{user_statement_text[:60]}...' (Turn: {current_turn_idx})")
    turn_log_entry['extraction_details'] = []
    turn_log_entry['update_details'] = []

    # Step 1: Extract candidate facts from the user statement using the LLM extractor
    candidate_facts_list = mem0_extract_salient_facts_from_turn(
        user_statement_text, recent_turns_context_text, current_turn_idx, verbose=verbose
    )
    turn_log_entry['extracted_facts_raw'] = list(candidate_facts_list)
    if not candidate_facts_list:
        if verbose:
            print("[MemoryOrchestrator] No candidate facts extracted.")
        return
    if verbose or len(candidate_facts_list) > 0:
        print(f"[MemoryOrchestrator] Extracted {len(candidate_facts_list)} fact(s).")

    # Step 2: For each extracted fact, determine the appropriate memory operation
    for fact_idx, individual_fact_text in enumerate(candidate_facts_list):
        extraction_detail = {
            "candidate_fact": individual_fact_text,
            "similar_memories_checked": [],
            "llm_decision": None
        }
        if verbose:
            print(f"\n[MemoryOrchestrator] -> Processing Fact {fact_idx+1}: '{individual_fact_text[:60]}...'")
        # Generate embedding for the candidate fact
        candidate_fact_embedding = get_embedding(individual_fact_text, verbose=verbose)
        # Find similar memories in the store
        similar_memories_found = memory_store_instance.find_semantically_similar_memories(
            candidate_fact_embedding, top_s_results=S_SIMILAR_MEMORIES_FOR_UPDATE_DECISION
        )
        if similar_memories_found:
            if verbose:
                print(f"[MemoryOrchestrator]    Found {len(similar_memories_found)} similar memories.")
            for mem, sim_score in similar_memories_found:
                extraction_detail['similar_memories_checked'].append({
                    'id': mem.id,
                    'text': mem.text,
                    'similarity': sim_score
                })
                if verbose:
                    print(f"[MemoryOrchestrator]      ID: {mem.id}, Sim: {sim_score:.2f}, Text: '{mem.text[:50]}...'")
        elif verbose:
            print("[MemoryOrchestrator]    No highly similar memories found.")

        # Step 3: Use LLM to decide on ADD, UPDATE, or NOOP operation
        llm_decision_json = mem0_decide_memory_operation_with_llm(
            individual_fact_text, similar_memories_found, verbose=verbose
        )
        operation_to_perform = llm_decision_json.get("operation")
        extraction_detail['llm_decision'] = llm_decision_json
        print(f"[MemoryOrchestrator] Fact '{individual_fact_text[:30]}...': LLM Decision -> {operation_to_perform}")
        turn_log_entry['update_details'].append(extraction_detail)  # Log before executing

        # Step 4: Execute the decided memory operation
        if operation_to_perform == "ADD":
            # Add new memory item to the store
            new_memory_item = MemoryItem(
                text_content=individual_fact_text,
                source_turn_indices_list=[current_turn_idx],
                verbose_embedding=verbose
            )
            memory_store_instance.add_memory_item(new_memory_item)
        elif operation_to_perform == "UPDATE":
            # Update an existing memory item
            target_memory_id = llm_decision_json.get("target_memory_id")
            updated_fact_text = llm_decision_json.get("updated_memory_text")
            # Check if the target ID is plausible (in similar memories)
            is_plausible_target = any(mem.id == target_memory_id for mem, _ in similar_memories_found)
            if target_memory_id and updated_fact_text and is_plausible_target:
                memory_store_instance.update_existing_memory_item(
                    target_memory_id, updated_fact_text, [current_turn_idx]
                )
            elif target_memory_id and updated_fact_text:
                # Try to update even if not in similar list; fallback to add if fails
                if verbose:
                    print(f"[MemoryOrchestrator] Warning: UPDATE target_id '{target_memory_id}' not in similar list. Attempting update.")
                if not memory_store_instance.update_existing_memory_item(
                    target_memory_id, updated_fact_text, [current_turn_idx]
                ):
                    if verbose:
                        print(f"[MemoryOrchestrator] UPDATE failed. Adding as new.")
                    memory_store_instance.add_memory_item(MemoryItem(individual_fact_text, [current_turn_idx], verbose))
            else:
                # Malformed UPDATE, fallback to add as new
                if verbose:
                    print(f"[MemoryOrchestrator] Warning: UPDATE malformed/implausible. Adding as new.")
                memory_store_instance.add_memory_item(MemoryItem(individual_fact_text, [current_turn_idx], verbose))
        elif operation_to_perform == "NOOP":
            # Do nothing for redundant facts
            if verbose:
                print(f"[MemoryOrchestrator]    NOOP for fact: '{individual_fact_text[:70]}...'")
        else:
            # Fallback: treat as ADD if unknown operation
            if verbose:
                print(f"[MemoryOrchestrator] Warning: Unknown op '{operation_to_perform}'. Defaulting to ADD.")
            memory_store_instance.add_memory_item(MemoryItem(individual_fact_text, [current_turn_idx], verbose))
        time.sleep(0.1)  # Small delay to avoid API rate limits

### 4.5. Memory Retrieval for Answering User Queries

In [14]:
K_MEMORIES_TO_RETRIEVE_FOR_QUERY = 3

def mem0_retrieve_and_format_memories_for_llm_query(
    user_query_text, memory_store_instance, turn_log_entry, 
    top_k_results=K_MEMORIES_TO_RETRIEVE_FOR_QUERY, verbose=False
):
    """
    Retrieves the top-k semantically relevant memories for a user query and formats them for LLM input.

    Args:
        user_query_text (str): The user's query.
        memory_store_instance (MemoryStore): The memory store to search.
        turn_log_entry (dict): Log entry for this turn (for detailed logging).
        top_k_results (int): Number of top relevant memories to retrieve.
        verbose (bool): If True, prints debug information.

    Returns:
        str: Formatted string of relevant memories for LLM, or a message if none found.
    """
    # Initialize log for retrieved memories
    turn_log_entry['retrieved_memories_for_query'] = []

    # Return early if query is empty or memory store is empty
    if not user_query_text.strip() or not memory_store_instance.memories:
        return "(No relevant memories in store or query empty.)"
        
    # Generate embedding for the user query
    query_embedding = get_embedding(user_query_text, verbose=verbose)

    # Retrieve top-k semantically similar memories
    retrieved_memories_with_scores = memory_store_instance.find_semantically_similar_memories(
        query_embedding, top_s_results=top_k_results
    )

    # If no relevant memories found, return message
    if not retrieved_memories_with_scores:
        return "(No relevant memories found for this query.)"
    
    # Format the retrieved memories for LLM input
    formatted_memories_string = "Based on my memory, here's relevant information:\n"
    for i, (mem_item, similarity_score) in enumerate(retrieved_memories_with_scores):
        memory_store_instance.get_memory_item_by_id(mem_item.id)  # Mark as accessed
        formatted_memories_string += f"  {i+1}. {mem_item.text} (Similarity: {similarity_score:.3f})\n"
        # Log retrieved memory details
        turn_log_entry['retrieved_memories_for_query'].append({
            'id': mem_item.id, 
            'text': mem_item.text, 
            'similarity': similarity_score
        })
    if verbose:
        print(f"[Retriever] Formatted memories for LLM: \n{formatted_memories_string}")
    return formatted_memories_string.strip()

### 4.6. Main Function: Running the Mem0-Powered Conversation

In [15]:
# Controls verbosity for Mem0 run (set True for detailed output)
VERBOSE_MEM0_RUN = False

# Number of recent turns to use as context for memory extraction
M_RECENT_RAW_TURNS_FOR_EXTRACTION_CONTEXT = 3

# Number of recent turns to keep in short-term chat history for LLM context
SHORT_TERM_CHAT_HISTORY_WINDOW = 2

def run_mem0_approach_conversation(script, memory_store_instance):
    """
    Runs a conversation using the Mem0 memory-powered approach.
    Handles both user statements (for memory extraction/update) and queries (for memory retrieval/answering).
    Tracks token usage and logs details for each turn.

    Args:
        script (list): List of dicts, each representing a user turn with 'content' and 'type'.
        memory_store_instance (MemoryStore): The memory store to use for this run.
    """
    global total_prompt_tokens_mem0_conversation, total_completion_tokens_mem0_conversation, mem0_run_log

    # Reset memory store and set verbosity
    memory_store_instance.clear_store()
    memory_store_instance.verbose = VERBOSE_MEM0_RUN

    print("--- Running Mem0-Powered Conversational Approach ---")

    # Raw log for extraction context (for fact extraction)
    raw_conversation_log_for_extraction_context = []

    # Short-term chat history for LLM context (system prompt + last N turns)
    current_short_term_llm_chat_history = [
        {"role": "system", "content": "Helpful AI assistant. Use general knowledge and 'Relevant Information from Memory' to answer concisely."}
    ]

    for turn_index, turn_data in enumerate(script):
        user_message_content = turn_data['content']
        turn_type = turn_data['type']
        turn_log_entry = {
            "turn": turn_index + 1,
            "type": turn_type,
            "user_content": user_message_content
        }

        print(f"\n--- Mem0 Turn {turn_index + 1}/{len(script)} ({turn_type}) ---")
        print(f"User: {user_message_content[:80]}...")

        assistant_response_text = "(Ack/Internal Processing)"

        # Handle user statements (add/update memory)
        if turn_type == 'statement' or turn_type == 'statement_update':
            # Get recent turns for extraction context
            start_idx = max(0, len(raw_conversation_log_for_extraction_context) - M_RECENT_RAW_TURNS_FOR_EXTRACTION_CONTEXT)
            recent_turns_text = "\n".join(raw_conversation_log_for_extraction_context[start_idx:])

            # Extract facts and update memory store
            mem0_process_user_statement_for_memory(
                user_message_content,
                recent_turns_text,
                memory_store_instance,
                turn_index,
                turn_log_entry,
                verbose=VERBOSE_MEM0_RUN
            )

            # Provide acknowledgement response
            assistant_response_text = "Okay, noted." if turn_type == 'statement' else "Okay, updated."
            print(f"Assistant (Ack): {assistant_response_text}")

            # Log response and token usage (no LLM call for ack)
            turn_log_entry['assistant_response_conversational'] = assistant_response_text
            turn_log_entry['prompt_tokens_conversational_turn'] = 0
            turn_log_entry['completion_tokens_conversational_turn'] = 0

        # Handle user queries (retrieve memory and answer)
        elif turn_type == 'query':
            if VERBOSE_MEM0_RUN:
                print(f"[Mem0 Run] Processing query: '{user_message_content}'")

            # Retrieve relevant memories for the query
            retrieved_memories_text = mem0_retrieve_and_format_memories_for_llm_query(
                user_message_content,
                memory_store_instance,
                turn_log_entry,
                verbose=VERBOSE_MEM0_RUN
            )

            # Prepare LLM input: short-term chat + user query + relevant memories
            messages_for_llm = list(current_short_term_llm_chat_history)
            messages_for_llm.append({
                "role": "user",
                "content": f"User Query: '{user_message_content}'\n\nRelevant Info from Memory:\n{retrieved_memories_text}"
            })

            # Get assistant response from LLM
            assistant_response_text, p_tokens, c_tokens = get_llm_chat_completion(
                messages_for_llm,
                max_tokens=120,
                verbose=VERBOSE_MEM0_RUN
            )

            # Update global token counters
            total_prompt_tokens_mem0_conversation += p_tokens
            total_completion_tokens_mem0_conversation += c_tokens

            print(f"Assistant: {assistant_response_text[:80]}...")
            if VERBOSE_MEM0_RUN:
                print(f"  Tokens for query response (P: {p_tokens}, C: {c_tokens})")

            # Log response and token usage
            turn_log_entry['assistant_response_conversational'] = assistant_response_text
            turn_log_entry['prompt_tokens_conversational_turn'] = p_tokens
            turn_log_entry['completion_tokens_conversational_turn'] = c_tokens

        # Update extraction context log (for fact extraction in future turns)
        raw_conversation_log_for_extraction_context.append(f"T{turn_index+1} U: {user_message_content}")
        raw_conversation_log_for_extraction_context.append(f"T{turn_index+1} A: {assistant_response_text}")

        # Update short-term chat history for LLM context
        current_short_term_llm_chat_history.extend([
            {"role": "user", "content": user_message_content},
            {"role": "assistant", "content": assistant_response_text}
        ])
        # Truncate chat history to keep only the most recent N turns (plus system prompt)
        if len(current_short_term_llm_chat_history) > (1 + SHORT_TERM_CHAT_HISTORY_WINDOW * 2):
            current_short_term_llm_chat_history = [current_short_term_llm_chat_history[0]] + \
                current_short_term_llm_chat_history[-(SHORT_TERM_CHAT_HISTORY_WINDOW*2):]

        # Log memory store size after this turn
        turn_log_entry['mem_store_size_after_turn'] = len(memory_store_instance.memories)
        mem0_run_log.append(turn_log_entry)

        if VERBOSE_MEM0_RUN:
            print(f"  Current Mem0 Store Size: {len(memory_store_instance.memories)}")

        # Small delay to avoid API rate limits
        time.sleep(0.2)

    print("\n--- Mem0-Powered Conversational Approach Summary (Global Counters) ---")
    # Final token totals are available via global counters if needed

    if VERBOSE_MEM0_RUN:
        print("\n--- Final Content of Mem0 Memory Store ---")
        if memory_store_instance.memories:
            for mem_id, mem_item in memory_store_instance.memories.items():
                print(f"  ID: {mem_id}\n    Text: '{mem_item.text}'\n    Created: {mem_item.creation_timestamp.strftime('%H:%M:%S')}, Accessed: {mem_item.access_count}, Last: {mem_item.last_accessed_timestamp.strftime('%H:%M:%S')}, Sources: {mem_item.source_turn_indices}")
        else:
            print("  Mem0 memory store is empty.")

## 5. Execution and Comparative Analysis

In [16]:
# --- Main Execution Block for Comparison ---
print("Starting comparative analysis of Raw vs. Mem0 approaches.\n")

# --- Run Raw/Full-Context Approach ---
print("STEP 1: Running Raw/Full-Context Approach...")

# Reset all global token counters and logs to ensure a clean run
reset_all_token_counters_and_logs()

# Set verbosity for the raw approach (set to True for detailed logs)
VERBOSE_RAW_RUN = False

# Run the raw/full-context approach using the conversation script
run_raw_full_context_approach(conversation_script)

# Capture final token counts for the raw approach
final_raw_prompt_tokens = total_prompt_tokens_raw
final_raw_completion_tokens = total_completion_tokens_raw
final_raw_total_tokens = final_raw_prompt_tokens + final_raw_completion_tokens

print("Raw approach run complete.")

Starting comparative analysis of Raw vs. Mem0 approaches.

STEP 1: Running Raw/Full-Context Approach...
[Counters & Logs] All token counters and run logs have been reset.
--- Running Raw/Full-Context Approach ---
Raw Turn 1 User: Hi, let's start planning the 'New Marketing Campaign'. My primary goal is to inc...
Raw Turn 1 Assistant: Great! Let’s break this down into actionable steps to achieve your goal of incre...
Raw Turn 2 User: For this campaign, the target audience is young adults aged 18-25....
Raw Turn 2 Assistant: Got it! Targeting young adults aged 18-25 is a great demographic for increasing ...
Raw Turn 3 User: I want to allocate a budget of $5000 for social media ads for the New Marketing ...
Raw Turn 3 Assistant: With a $5000 budget for social media ads, we can create a focused and impactful ...
Raw Turn 4 User: What's the main goal for the New Marketing Campaign?...
Raw Turn 4 Assistant: The main goal for the **New Marketing Campaign** is to **increase brand awarenes...
R

In [17]:
# --- Run Mem0-Powered Approach ---
print("STEP 2: Running Mem0-Powered Conversational Approach...")

# NOTE: If running this cell independently, ensure Mem0 counters and logs are reset for a clean run.
# The counters are reset in reset_all_token_counters_and_logs() if called previously.
# If you want to run only the Mem0 part, uncomment the next line:
# reset_all_token_counters_and_logs()

# Clear previous Mem0 run log if any (raw_run_log is not reset here)
global mem0_run_log
mem0_run_log = []

# Set verbosity for Mem0 run and memory store operations
VERBOSE_MEM0_RUN = False  # Set to True for detailed Mem0 run logs
mem0_memory_store.verbose = VERBOSE_MEM0_RUN

# Run the Mem0-powered conversational approach using the conversation script
run_mem0_approach_conversation(conversation_script, mem0_memory_store)

# Collect final Mem0 token counts for each sub-task and overall
final_mem0_conv_prompt_tokens = total_prompt_tokens_mem0_conversation
final_mem0_conv_completion_tokens = total_completion_tokens_mem0_conversation
final_mem0_extr_prompt_tokens = total_prompt_tokens_mem0_extraction
final_mem0_extr_completion_tokens = total_completion_tokens_mem0_extraction
final_mem0_upd_prompt_tokens = total_prompt_tokens_mem0_update
final_mem0_upd_completion_tokens = total_completion_tokens_mem0_update

# Calculate overall Mem0 prompt, completion, and total tokens
final_mem0_overall_prompt_tokens = (
    final_mem0_conv_prompt_tokens + final_mem0_extr_prompt_tokens + final_mem0_upd_prompt_tokens
)
final_mem0_overall_completion_tokens = (
    final_mem0_conv_completion_tokens + final_mem0_extr_completion_tokens + final_mem0_upd_completion_tokens
)
final_mem0_overall_total_tokens = (
    final_mem0_overall_prompt_tokens + final_mem0_overall_completion_tokens
)

print("Mem0 approach run complete.")

STEP 2: Running Mem0-Powered Conversational Approach...
--- Running Mem0-Powered Conversational Approach ---

--- Mem0 Turn 1/10 (statement) ---
User: Hi, let's start planning the 'New Marketing Campaign'. My primary goal is to inc...
[Extractor LLM] Parsed 2 fact(s).
[MemoryOrchestrator] Extracted 2 fact(s).
[MemoryOrchestrator] Fact 'The user wants to start planni...': LLM Decision -> ADD
[MemoryOrchestrator] Fact 'The user's primary goal for th...': LLM Decision -> ADD
Assistant (Ack): Okay, noted.

--- Mem0 Turn 2/10 (statement) ---
User: For this campaign, the target audience is young adults aged 18-25....
[Extractor LLM] Parsed 1 fact(s).
[MemoryOrchestrator] Extracted 1 fact(s).
[MemoryOrchestrator] Fact 'The target audience for the 'N...': LLM Decision -> ADD
Assistant (Ack): Okay, noted.

--- Mem0 Turn 3/10 (statement) ---
User: I want to allocate a budget of $5000 for social media ads for the New Marketing ...
[Extractor LLM] Parsed 1 fact(s).
[MemoryOrchestrator] Extracted 1

In [18]:
# Prepare the data dictionary for comparative analysis DataFrame
# Each key is a column: 'Metric', 'Raw Approach', 'Mem0 Approach'
data = {
    'Metric': [
        'Prompt Tokens',                  # Total prompt tokens used (Raw vs Mem0 overall)
        'Completion Tokens',              # Total completion tokens used (Raw vs Mem0 overall)
        'Total Tokens',                   # Total tokens (Prompt + Completion)
        '',                               # Separator row for readability
        'Mem0: Conversational Prompt',    # Mem0: Prompt tokens for conversational (query) turns
        'Mem0: Conversational Completion',# Mem0: Completion tokens for conversational (query) turns
        'Mem0: Extraction Prompt',        # Mem0: Prompt tokens for extraction sub-task
        'Mem0: Extraction Completion',    # Mem0: Completion tokens for extraction sub-task
        'Mem0: Update Logic Prompt',      # Mem0: Prompt tokens for update logic sub-task
        'Mem0: Update Logic Completion'   # Mem0: Completion tokens for update logic sub-task
    ],
    'Raw Approach': [
        final_raw_prompt_tokens,          # Raw approach: total prompt tokens
        final_raw_completion_tokens,      # Raw approach: total completion tokens
        final_raw_total_tokens,           # Raw approach: total tokens
        '',                              # Separator (blank)
        '-',                             # Not applicable for Raw approach
        '-',                             # Not applicable for Raw approach
        '-',                             # Not applicable for Raw approach
        '-',                             # Not applicable for Raw approach
        '-',                             # Not applicable for Raw approach
        '-'                              # Not applicable for Raw approach
    ],
    'Mem0 Approach': [
        final_mem0_overall_prompt_tokens,     # Mem0: overall prompt tokens (sum of all sub-tasks)
        final_mem0_overall_completion_tokens, # Mem0: overall completion tokens (sum of all sub-tasks)
        final_mem0_overall_total_tokens,      # Mem0: overall total tokens
        '',                                   # Separator (blank)
        final_mem0_conv_prompt_tokens,        # Mem0: conversational prompt tokens
        final_mem0_conv_completion_tokens,    # Mem0: conversational completion tokens
        final_mem0_extr_prompt_tokens,        # Mem0: extraction prompt tokens
        final_mem0_extr_completion_tokens,    # Mem0: extraction completion tokens
        final_mem0_upd_prompt_tokens,         # Mem0: update logic prompt tokens
        final_mem0_upd_completion_tokens      # Mem0: update logic completion tokens
    ]
}

In [19]:
# Create a DataFrame for comparative analysis using the prepared data dictionary
comparison_df = pd.DataFrame(data)

# Print analysis summary based on token usage
print("\n--- Analysis --- ")

# Case 1: Both approaches have zero tokens (likely an error or incomplete run)
if final_raw_total_tokens == 0 and final_mem0_overall_total_tokens == 0:
    print("Token counts are both zero. Ensure runs completed successfully and API calls were made.")

# Case 2: Mem0 approach used fewer tokens than Raw (token savings)
elif final_mem0_overall_total_tokens < final_raw_total_tokens:
    savings = final_raw_total_tokens - final_mem0_overall_total_tokens
    percentage_savings = (savings / final_raw_total_tokens) * 100 if final_raw_total_tokens > 0 else 0
    print(f"Mem0 was more token efficient, saving {savings} tokens ({percentage_savings:.2f}%) compared to Raw.")

# Case 3: Raw approach used fewer tokens than Mem0 (token overhead for Mem0)
elif final_raw_total_tokens < final_mem0_overall_total_tokens:
    overhead = final_mem0_overall_total_tokens - final_raw_total_tokens
    percentage_overhead = (overhead / final_raw_total_tokens) * 100 if final_raw_total_tokens > 0 else float('inf')
    print(f"Raw was more token efficient. Mem0 had an overhead of {overhead} tokens ({percentage_overhead:.2f}%).")
    print("(This can be expected for short conversations due to Mem0's extraction/update costs.)")

# Case 4: Both approaches used approximately the same number of tokens
else:
    print(f"Both approaches used approx. the same tokens: {final_mem0_overall_total_tokens}.")

# General note about Mem0's advantage in longer conversations
print("\nKey benefit of Mem0: Token efficiency in *longer* conversations due to stable query prompt sizes.")


--- Analysis --- 
Mem0 was more token efficient, saving 3163 tokens (35.56%) compared to Raw.

Key benefit of Mem0: Token efficiency in *longer* conversations due to stable query prompt sizes.


In [20]:
# Add a new column 'Percentage Difference' to the comparison DataFrame.
# For each row, if both 'Raw Approach' and 'Mem0 Approach' are integers and 'Raw Approach' is not zero,
# calculate the percentage difference as: 
#   -((Mem0 Approach - Raw Approach) / Raw Approach) * 100
# This shows the percent token savings (positive means Mem0 used fewer tokens).
# If not applicable, set as None.
comparison_df['Percentage Difference'] = comparison_df.apply(
    lambda row: (
        -(row['Mem0 Approach'] - row['Raw Approach']) / row['Raw Approach'] * 100
        if isinstance(row['Raw Approach'], int) and isinstance(row['Mem0 Approach'], int) and row['Raw Approach'] != 0
        else None
    ),
    axis=1
)

In [21]:
comparison_df

Unnamed: 0,Metric,Raw Approach,Mem0 Approach,Percentage Difference
0,Prompt Tokens,7492,5244.0,30.005339
1,Completion Tokens,1403,488.0,65.217391
2,Total Tokens,8895,5732.0,35.559303
3,,,,
4,Mem0: Conversational Prompt,-,814.0,
5,Mem0: Conversational Completion,-,101.0,
6,Mem0: Extraction Prompt,-,1522.0,
7,Mem0: Extraction Completion,-,173.0,
8,Mem0: Update Logic Prompt,-,2908.0,
9,Mem0: Update Logic Completion,-,214.0,


## 6. Discussion of Expected Results and Further Improvements

After running the notebook with a capable LLM:

**Token Efficiency:**
*   For short conversations, Mem0's extraction/update overhead might make its total token count comparable or slightly higher than Raw. However, for longer conversations, Mem0 should become significantly more token-efficient as its query prompt sizes (query + K retrieved memories) remain stable, while Raw's prompt (full history) grows linearly.

**Response Quality & Context Handling:**
*   **Raw Approach:** Can be good with capable LLMs but risks "lost in the middle" issues, recency bias, and difficulty with conflicting/updated info in very long contexts.
*   **Mem0 Approach:** Aims for focused, accurate responses using relevant retrieved memories. The `UPDATE` mechanism is key for handling evolving information. Quality hinges on accurate extraction and update logic.

**Challenges & Improvements for this Notebook's Mem0:**
*   **LLM Dependency:** Mem0's internal operations (extraction, update decisions) are highly sensitive to the `LLM_MODEL`'s capabilities.
*   **Prompt Engineering:** Prompts for extraction and update are crucial and can always be refined.
*   **Error Handling:** More robust parsing of LLM JSON outputs and error recovery.
*   **Full Mem0 Features:** Implementing `DELETE`, conversation summary `S`, and graph memory (`Mem0g`) would align closer with the paper.

This notebook serves as a foundational exploration. Production-ready systems require more engineering and potentially fine-tuned models for memory sub-tasks.