# The Ultimate AI Agent Memory Lab: An End-to-End Implementation

### A Comprehensive, Hands-On Workshop for Building Smarter Agents

Welcome to a detailed, practical exploration of memory in AI agents. This notebook is designed to be a definitive, end-to-end guide that moves beyond theory and into tangible, working code. We will implement **nine distinct memory strategies**, from the simplest to the most conceptually advanced, using a real large language model for generation, summarization, and embedding.

**Objective:** By the end of this lab, you will have a deep, practical understanding of:
- How each memory strategy works under the hood.
- The specific strengths, weaknesses, and tradeoffs of each approach.
- How to implement these strategies in Python using modern tools like `openai`, `faiss-cpu`, and `networkx`.
- How the choice of memory architecture fundamentally changes an agent's conversational abilities, cost, and complexity.

**Structure of the Lab:**
1.  **Part 1: The Core Framework.** We'll set up our environment, configure the LLM client, and build the foundational `AIAgent` and `BaseMemoryStrategy` classes.
2.  **Part 2: The Memory Implementations.** We will systematically implement and demonstrate all nine memory strategies. Each strategy will have its own dedicated section with:
    *   **Detailed Theory:** Explaining the *what*, *why*, and *how*.
    *   **Code Implementation:** A complete, commented Python class for the strategy.
    *   **Live Demonstration:** A practical chat session designed to showcase the strategy's unique behavior.

This notebook is intentionally lengthy and detailed to serve as a comprehensive reference. Let's begin by setting up our core components.

## Part 1: Core Framework and Setup

Before we can build memories, we need a brain (the LLM) and a body (the agent framework). This section handles all the preliminary setup.

### Step 1.1: Installing Dependencies

First, we need to install the necessary Python libraries. We'll need:
- `openai`: The client library for interacting with the LLM API.
- `numpy`: For numerical operations, especially with embeddings.
- `faiss-cpu`: A library from Facebook AI for efficient similarity search, which will power our retrieval memory. It's a perfect in-memory vector database.
- `networkx`: For creating and managing the knowledge graph in our Graph-Based Memory strategy.
- `tiktoken`: To accurately count tokens and manage context window limits.

In [1]:
# !pip install openai numpy faiss-cpu networkx tiktoken

### Step 1.2: Configuring the LLM and Embedding Client

Here, we'll set up the `OpenAI` client with the custom `base_url` and `api_key` you provided. This single client will be used for both text generation and creating embeddings.

In [None]:
# Import necessary libraries
import os
from openai import OpenAI

# --- IMPORTANT: API Key Configuration ---
# The API key is provided directly here for simplicity in this notebook.
# In a production environment, NEVER hardcode keys. Use environment variables
# or a secure secret management service.

# Define the API key for authentication.
API_KEY = "YOUR_API_KEY_HERE"
# Define the base URL for the API endpoint.
BASE_URL = "https://api.studio.nebius.com/v1/"

# Initialize the OpenAI client with the specified base URL and API key.
client = OpenAI(
    base_url=BASE_URL,
    api_key=API_KEY
)

# Print a confirmation message to indicate successful client setup.
print("OpenAI client configured successfully.")

OpenAI client configured successfully.


### Step 1.3: Helper Functions for LLM Interaction and Token Counting

To keep our main agent logic clean, we'll create wrapper functions for our API calls. We'll also set up a token counter, which is crucial for understanding the costs and limitations of our memory strategies.

In [None]:
# Import additional libraries for functionality.
import tiktoken
import time

# --- Model Configuration ---
# Define the specific models to be used for generation and embedding tasks.
# These are hardcoded for this lab but could be loaded from a config file.
GENERATION_MODEL = "meta-llama/Meta-Llama-3.1-8B-Instruct"
EMBEDDING_MODEL = "BAAI/bge-multilingual-gemma2"

def generate_text(system_prompt: str, user_prompt: str) -> str:
    """
    Calls the LLM API to generate a text response.
    
    Args:
        system_prompt: The instruction that defines the AI's role and behavior.
        user_prompt: The user's input to which the AI should respond.
        
    Returns:
        The generated text content from the AI, or an error message.
    """
    try:
        # Create a chat completion request to the configured client.
        response = client.chat.completions.create(
            model=GENERATION_MODEL,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ]
        )
        # Extract and return the content of the AI's message.
        return response.choices[0].message.content
    except Exception as e:
        # Handle potential API errors gracefully.
        print(f"An error occurred during text generation: {e}")
        return "I'm sorry, I encountered an error and couldn't process your request."

def generate_embedding(text: str) -> list[float]:
    """
    Generates a numerical embedding for a given text string using the embedding model.
    
    Args:
        text: The input string to be converted into an embedding.
        
    Returns:
        A list of floats representing the embedding vector, or an empty list on error.
    """
    try:
        # Create an embedding request to the configured client.
        response = client.embeddings.create(
            model=EMBEDDING_MODEL,
            input=text
        )
        # Extract and return the embedding vector from the response data.
        return response.data[0].embedding
    except Exception as e:
        # Handle potential API errors gracefully.
        print(f"An error occurred during embedding generation: {e}")
        return []

# --- Token Counting Setup ---
# Initialize the tokenizer using tiktoken. 'cl100k_base' is a common encoding
# used by many modern models, including those from OpenAI and Llama.
# This allows us to accurately estimate the size of our prompts before sending them.
tokenizer = tiktoken.get_encoding("cl100k_base")

def count_tokens(text: str) -> int:
    """
    Counts the number of tokens in a given string using the pre-loaded tokenizer.
    
    Args:
        text: The string to be tokenized.
        
    Returns:
        The integer count of tokens.
    """
    # The `encode` method converts the string into a list of token IDs.
    # The length of this list is the token count.
    return len(tokenizer.encode(text))

# Print a confirmation message to indicate that these core functions are ready for use.
print("Helper functions and token counter are ready.")

Helper functions and token counter are ready.


### Step 1.4: The Foundational Agent and Memory Classes

Now we define the core structure of our system using the Strategy Design Pattern.

- **`BaseMemoryStrategy`**: An abstract base class that defines the universal interface for any memory type. Every strategy we create will inherit from this, ensuring it can be seamlessly plugged into our agent.
- **`AIAgent`**: The agent class itself. It is initialized with a memory strategy. Its `chat` method orchestrates the process of getting context from memory, querying the LLM, and updating the memory.

In [None]:
import abc

# --- Abstract Base Class for Memory Strategies ---
# This class defines the 'contract' that all memory strategies must follow.
# By using an Abstract Base Class (ABC), we ensure that any memory implementation
# we create will have the same core methods (add_message, get_context, clear),
# allowing them to be interchangeably plugged into the AIAgent.
class BaseMemoryStrategy(abc.ABC):
    """Abstract Base Class for all memory strategies."""
    
    @abc.abstractmethod
    def add_message(self, user_input: str, ai_response: str):
        """
        An abstract method that must be implemented by subclasses.
        It's responsible for adding a new user-AI interaction to the memory store.
        """
        pass

    @abc.abstractmethod
    def get_context(self, query: str) -> str:
        """
        An abstract method that must be implemented by subclasses.
        It retrieves and formats the relevant context from memory to be sent to the LLM.
        The 'query' parameter allows some strategies (like retrieval) to fetch context
        that is specifically relevant to the user's latest input.
        """
        pass

    @abc.abstractmethod
    def clear(self):
        """
        An abstract method that must be implemented by subclasses.
        It provides a way to reset the memory, which is useful for starting new conversations.
        """
        pass

# --- The Core AI Agent ---
# This class orchestrates the entire conversation flow. It is initialized with a
# specific memory strategy and uses it to manage the conversation's context.
class AIAgent:
    """The main AI Agent class, designed to work with any memory strategy."""
    
    def __init__(self, memory_strategy: BaseMemoryStrategy, system_prompt: str = "You are a helpful AI assistant."):
        """
        Initializes the agent.
        
        Args:
            memory_strategy: An instance of a class that inherits from BaseMemoryStrategy.
                             This determines how the agent will remember the conversation.
            system_prompt: The initial instruction given to the LLM to define its persona and task.
        """
        self.memory = memory_strategy
        self.system_prompt = system_prompt
        print(f"Agent initialized with {type(memory_strategy).__name__}.")

    def chat(self, user_input: str):
        """
        Handles a single turn of the conversation.
        
        Args:
            user_input: The latest message from the user.
        """
        print(f"\n{'='*25} NEW INTERACTION {'='*25}")
        print(f"User > {user_input}")
        
        # Step 1: Retrieve context from the agent's memory strategy.
        # This is where the specific memory logic (e.g., sequential, retrieval) is executed.
        start_time = time.time()
        context = self.memory.get_context(query=user_input)
        retrieval_time = time.time() - start_time
        
        # Step 2: Construct the full prompt for the LLM.
        # This combines the retrieved historical context with the user's current request.
        full_user_prompt = f"### MEMORY CONTEXT\n{context}\n\n### CURRENT REQUEST\n{user_input}"
        
        # Step 3: Provide detailed debug information.
        # This is crucial for understanding how the memory strategy affects the prompt size and cost.
        prompt_tokens = count_tokens(self.system_prompt + full_user_prompt)
        print("\n--- Agent Debug Info ---")
        print(f"Memory Retrieval Time: {retrieval_time:.4f} seconds")
        print(f"Estimated Prompt Tokens: {prompt_tokens}")
        print(f"\n[Full Prompt Sent to LLM]:\n---\nSYSTEM: {self.system_prompt}\nUSER: {full_user_prompt}\n---")
        
        # Step 4: Call the LLM to get a response.
        # The LLM uses the system prompt and the combined user prompt (context + new query) to generate a reply.
        start_time = time.time()
        ai_response = generate_text(self.system_prompt, full_user_prompt)
        generation_time = time.time() - start_time
        
        # Step 5: Update the memory with the latest interaction.
        # This ensures the current turn is available for future context retrieval.
        self.memory.add_message(user_input, ai_response)
        
        # Step 6: Display the AI's response and performance metrics.
        print(f"\nAgent > {ai_response}")
        print(f"(LLM Generation Time: {generation_time:.4f} seconds)")
        print(f"{'='*70}")

## Part 2: Implementation and Demonstration of Memory Strategies

This is the core of our lab. We will now implement each of the nine memory strategies one by one, followed immediately by a live demonstration to see how they perform in practice.

### Strategy 1: Sequential (Keep-It-All) Memory

| **Best For**                  | **Tradeoff**                                           |
| ----------------------------- | ------------------------------------------------------ |
| Short interactions, full fidelity                    | Hits token limit fast, expensive                       |

**Theory:** This is the most straightforward memory type. It simply appends every user-AI interaction to a growing list. When generating a new response, the entire conversation history is formatted and sent to the LLM as context. This guarantees perfect, lossless recall within a single conversation session.

However, its simplicity is its downfall in long conversations. The context grows linearly with each turn, leading to rapidly increasing API costs and eventually exceeding the LLM's maximum context window, which would cause an error or a truncated prompt.

In [None]:
# --- Strategy 1: Sequential (Keep-It-All) Memory ---
# This is the most basic memory strategy. It stores the entire conversation
# history in a simple list. While it provides perfect recall, it is not scalable
# as the context sent to the LLM grows with every turn, quickly becoming expensive
# and hitting token limits.
class SequentialMemory(BaseMemoryStrategy):
    def __init__(self):
        """Initializes the memory with an empty list to store conversation history."""
        self.history = []

    def add_message(self, user_input: str, ai_response: str):
        """
        Adds a new user-AI interaction to the history.
        Each interaction is stored as two dictionary entries in the list.
        """
        self.history.append({"role": "user", "content": user_input})
        self.history.append({"role": "assistant", "content": ai_response})

    def get_context(self, query: str) -> str:
        """
        Retrieves the entire conversation history and formats it into a single
        string to be used as context for the LLM. The 'query' parameter is ignored
        as this strategy always returns the full history.
        """
        # Join all messages into a single newline-separated string.
        return "\n".join([f"{turn['role'].capitalize()}: {turn['content']}" for turn in self.history])

    def clear(self):
        """Resets the conversation history by clearing the list."""
        self.history = []
        print("Sequential memory cleared.")

#### Demonstration of Sequential Memory

**What to watch for:** Pay close attention to the `Estimated Prompt Tokens` in the debug output. You will see it increase significantly with each turn as the entire history is added to the prompt.

In [None]:
# Initialize and run the agent
# Create an instance of our SequentialMemory strategy.
sequential_memory = SequentialMemory()
# Create an AIAgent and inject the sequential memory strategy into it.
agent = AIAgent(memory_strategy=sequential_memory)

# --- Start the conversation ---
# First turn: The user introduces themselves.
agent.chat("Hi there! My name is Sam.")
# Second turn: The user states their interest.
agent.chat("I'm interested in learning about space exploration.")
# Third turn: The user tests the agent's memory.
agent.chat("What was the first thing I told you?")

# The agent has perfect recall because the entire history is in the context.
# Clean up the memory for the next demonstration.
sequential_memory.clear()

Agent initialized with SequentialMemory.

User > Hi there! My name is Sam.

--- Agent Debug Info ---
Memory Retrieval Time: 0.0000 seconds
Estimated Prompt Tokens: 23

[Full Prompt Sent to LLM]:
---
SYSTEM: You are a helpful AI assistant.
USER: ### MEMORY CONTEXT


### CURRENT REQUEST
Hi there! My name is Sam.
---

Agent > Hello Sam! It's nice to meet you. I'm happy to chat with you. What brings you here today? Do you have any questions, need assistance with something, or just want to have a friendly conversation? I'm all ears (or rather, all text)!
(LLM Generation Time: 2.2516 seconds)

User > I'm interested in learning about space exploration.

--- Agent Debug Info ---
Memory Retrieval Time: 0.0000 seconds
Estimated Prompt Tokens: 92

[Full Prompt Sent to LLM]:
---
SYSTEM: You are a helpful AI assistant.
USER: ### MEMORY CONTEXT
User: Hi there! My name is Sam.
Assistant: Hello Sam! It's nice to meet you. I'm happy to chat with you. What brings you here today? Do you have any question

### Strategy 2: Sliding Window Memory

| **Best For**                  | **Tradeoff**                                           |
| ----------------------------- | ------------------------------------------------------ |
| Mid-length chats, recent relevance matters           | Forgets early context                                  |

**Theory:** This strategy addresses the primary issue of Sequential Memory by only keeping the most recent `N` conversation turns. It uses a `deque` (double-ended queue) with a fixed maximum length. When a new interaction is added and the deque is full, the oldest interaction is automatically discarded.

This keeps the context size constant, making costs predictable and preventing context window overflow. The major tradeoff is amnesia: any information mentioned before the window's cutoff point is permanently forgotten.

In [None]:
# Import the deque class from the collections module. A deque is a double-ended
# queue that is highly efficient for adding and removing elements from either end.
from collections import deque

# --- Strategy 2: Sliding Window Memory ---
# This strategy keeps only the 'N' most recent turns of the conversation.
# It prevents the context from growing indefinitely, making it scalable and
# cost-effective, but at the cost of forgetting older information.
class SlidingWindowMemory(BaseMemoryStrategy):
    def __init__(self, window_size: int = 4): # window_size is number of turns (user + AI = 1 turn)
        """
        Initializes the memory with a deque of a fixed size.
        
        Args:
            window_size: The number of conversational turns to keep in memory.
                         A single turn consists of one user message and one AI response.
        """
        # A deque with 'maxlen' will automatically discard the oldest item
        # when a new item is added and the deque is full. This is the core
        # mechanism of the sliding window. We store turns, so maxlen is window_size.
        self.history = deque(maxlen=window_size)

    def add_message(self, user_input: str, ai_response: str):
        """
        Adds a new conversational turn to the history. If the deque is full,
        the oldest turn is automatically removed.
        """
        # Each turn (user input + AI response) is stored as a single element
        # in the deque. This makes it easy to manage the window size by turns.
        self.history.append([
            {"role": "user", "content": user_input},
            {"role": "assistant", "content": ai_response}
        ])

    def get_context(self, query: str) -> str:
        """
        Retrieves the conversation history currently within the window and
        formats it into a single string. The 'query' parameter is ignored.
        """
        # Create a temporary list to hold the formatted messages.
        context_list = []
        # Iterate through each turn stored in the deque.
        for turn in self.history:
            # Iterate through the user and assistant messages within that turn.
            for message in turn:
                # Format the message and add it to our list.
                context_list.append(f"{message['role'].capitalize()}: {message['content']}")
        # Join all the formatted messages into a single string, separated by newlines.
        return "\n".join(context_list)

    def clear(self):
        """Resets the conversation history by clearing the deque."""
        self.history.clear()
        print("Sliding window memory cleared.")

#### Demonstration of Sliding Window Memory

**What to watch for:** We'll set a window size of 2 turns. After the third turn, the very first piece of information (the user's name) will be pushed out of the context window. The agent will then fail to recall it.

In [None]:
# Initialize with a small window size of 2 turns.
# This means the agent will only remember the last two user-AI interactions.
sliding_memory = SlidingWindowMemory(window_size=2)
# Create an AIAgent and inject the sliding window memory strategy.
agent = AIAgent(memory_strategy=sliding_memory)

# --- Start the conversation ---
# First turn: The user introduces themselves. This is Turn 1.
agent.chat("My name is Priya and I'm a software developer.")
# Second turn: The user provides more details. The memory now holds Turn 1 and Turn 2.
agent.chat("I work primarily with Python and cloud technologies.")
# Third turn: The user mentions a hobby. Adding this turn pushes Turn 1 out of the
# fixed-size deque. The memory now only holds Turn 2 and Turn 3.
agent.chat("My favorite hobby is hiking.")

# Now, ask about the first thing mentioned.
# The context sent to the LLM will only contain the information about Python/cloud and hiking.
# The information about the user's name has been forgotten.
agent.chat("What is my name?")
# The agent will likely fail, as the first turn has been pushed out of the window.

# Clean up the memory for the next demonstration.
sliding_memory.clear()

Agent initialized with SlidingWindowMemory.

User > My name is Priya and I'm a software developer.

--- Agent Debug Info ---
Memory Retrieval Time: 0.0000 seconds
Estimated Prompt Tokens: 27

[Full Prompt Sent to LLM]:
---
SYSTEM: You are a helpful AI assistant.
USER: ### MEMORY CONTEXT


### CURRENT REQUEST
My name is Priya and I'm a software developer.
---

Agent > Nice to meet you, Priya! What can I assist you with today as a software developer? Do you have a project you're working on or something on your mind that you'd like help with?
(LLM Generation Time: 1.1071 seconds)

User > I work primarily with Python and cloud technologies.

--- Agent Debug Info ---
Memory Retrieval Time: 0.0000 seconds
Estimated Prompt Tokens: 81

[Full Prompt Sent to LLM]:
---
SYSTEM: You are a helpful AI assistant.
USER: ### MEMORY CONTEXT
User: My name is Priya and I'm a software developer.
Assistant: Nice to meet you, Priya! What can I assist you with today as a software developer? Do you have a proje

### Strategy 3: Summarization Memory

| **Best For**                  | **Tradeoff**                                           |
| ----------------------------- | ------------------------------------------------------ |
| Long conversations, general context needed           | May lose fine details                                  |

**Theory:** This strategy attempts to get the best of both worlds. Instead of just dropping old messages, it uses the LLM itself to periodically create a running summary of the conversation. It maintains a temporary buffer of recent messages. Once the buffer reaches a certain size, its content is summarized and merged with the previous summary.

The context sent to the LLM is a combination of the `running_summary` and the `current_buffer`. This keeps the context size manageable while retaining the gist of the entire conversation. The main risk is information loss: if the LLM's summary misses a crucial but subtle detail, that detail is lost forever.

In [None]:
# --- Strategy 3: Summarization Memory ---
# This strategy aims to manage long conversations by periodically summarizing them.
# It keeps a buffer of recent messages. When the buffer reaches a certain size,
# it uses an LLM call to consolidate the buffer's content with a running summary.
# This keeps the context size manageable while retaining the gist of the conversation.
# The main risk is information loss if the summary is not perfect.
class SummarizationMemory(BaseMemoryStrategy):
    def __init__(self, summary_threshold: int = 4): # Default: Summarize after 4 messages (2 turns)
        """
        Initializes the summarization memory.
        
        Args:
            summary_threshold: The number of messages (user + AI) to accumulate in the
                             buffer before triggering a summarization.
        """
        # Stores the continuously updated summary of the conversation so far.
        self.running_summary = ""
        # A temporary list to hold recent messages before they are summarized.
        self.buffer = []
        # The threshold that triggers the summarization process.
        self.summary_threshold = summary_threshold

    def add_message(self, user_input: str, ai_response: str):
        """
        Adds a new user-AI interaction to the buffer. If the buffer size
        reaches the threshold, it triggers the memory consolidation process.
        """
        # Append the latest user and AI messages to the temporary buffer.
        self.buffer.append({"role": "user", "content": user_input})
        self.buffer.append({"role": "assistant", "content": ai_response})

        # Check if the buffer has reached its capacity.
        if len(self.buffer) >= self.summary_threshold:
            # If so, call the method to summarize the buffer's contents.
            self._consolidate_memory()

    def _consolidate_memory(self):
        """
        Uses the LLM to summarize the contents of the buffer and merge it
        with the existing running summary.
        """
        print("\n--- [Memory Consolidation Triggered] ---")
        # Convert the list of buffered messages into a single formatted string.
        buffer_text = "\n".join([f"{msg['role'].capitalize()}: {msg['content']}" for msg in self.buffer])
        
        # Construct a specific prompt for the LLM to perform the summarization task.
        # It provides the existing summary and the new conversation text, asking for
        # a single, updated summary.
        summarization_prompt = (
            f"You are a summarization expert. Your task is to create a concise summary of a conversation. "
            f"Combine the 'Previous Summary' with the 'New Conversation' into a single, updated summary. "
            f"Capture all key facts, names, and decisions.\n\n"
            f"### Previous Summary:\n{self.running_summary}\n\n"
            f"### New Conversation:\n{buffer_text}\n\n"
            f"### Updated Summary:"
        )
        
        # Call the LLM with a specific system prompt to get the new summary.
        new_summary = generate_text("You are an expert summarization engine.", summarization_prompt)
        # Replace the old summary with the newly generated, consolidated one.
        self.running_summary = new_summary
        # Clear the buffer, as its contents have now been incorporated into the summary.
        self.buffer = [] 
        print(f"--- [New Summary: '{self.running_summary}'] ---")

    def get_context(self, query: str) -> str:
        """
        Constructs the context to be sent to the LLM. It combines the long-term
        running summary with the short-term buffer of recent messages.
        The 'query' parameter is ignored as this strategy provides a general context.
        """
        # Format the messages currently in the buffer.
        buffer_text = "\n".join([f"{msg['role'].capitalize()}: {msg['content']}" for msg in self.buffer])
        # Return a combined context of the historical summary and the most recent, not-yet-summarized messages.
        return f"### Summary of Past Conversation:\n{self.running_summary}\n\n### Recent Messages:\n{buffer_text}"

    def clear(self):
        """Resets the memory by clearing the summary and the buffer."""
        self.running_summary = ""
        self.buffer = []
        print("Summarization memory cleared.")

#### Demonstration of Summarization Memory

**What to watch for:** A `[Memory Consolidation Triggered]` message will appear after the second turn (since our threshold is 4 messages). The context for the subsequent turn will include the new, AI-generated summary. We'll see if the agent can recall details from the first turn, which now only exist in the summary.

In [None]:
# Initialize the SummarizationMemory with a threshold of 4 messages (2 turns).
# This means a summary will be generated after the second full interaction.
summarization_memory = SummarizationMemory(summary_threshold=4)
# Create an AIAgent and inject the summarization memory strategy.
agent = AIAgent(memory_strategy=summarization_memory)

# --- Start the conversation ---
# First turn: The user provides initial details.
agent.chat("I'm starting a new company called 'Innovatech'. Our focus is on sustainable energy.")
# Second turn: The user gives more specific information. After the AI responds to this,
# the buffer will contain 4 messages, triggering the memory consolidation process.
agent.chat("Our first product will be a smart solar panel, codenamed 'Project Helios'.")

# Third turn: The user adds another detail. The previous information now exists only in the running summary.
agent.chat("The marketing budget is set at $50,000.")
# Fourth turn: The user tests the agent's memory. The context sent to the LLM will consist of
# the AI-generated summary plus the most recent (post-summary) message about the budget.
agent.chat("What is the name of my company and its first product?")
# The agent's ability to answer correctly depends entirely on the quality of the LLM's summary.

# Clean up the memory for the next demonstration.
summarization_memory.clear()

Agent initialized with SummarizationMemory.

User > I'm starting a new company called 'Innovatech'. Our focus is on sustainable energy.

--- Agent Debug Info ---
Memory Retrieval Time: 0.0000 seconds
Estimated Prompt Tokens: 45

[Full Prompt Sent to LLM]:
---
SYSTEM: You are a helpful AI assistant.
USER: ### MEMORY CONTEXT
### Summary of Past Conversation:


### Recent Messages:


### CURRENT REQUEST
I'm starting a new company called 'Innovatech'. Our focus is on sustainable energy.
---

Agent > Congratulations on starting your new company, Innovatech! Focusing on sustainable energy is a fantastic direction to take. It's a rapidly growing field with immense potential to make a positive impact on the environment.

To get started, can you tell me a bit more about Innovatech? What specific areas of sustainable energy are you interested in exploring, such as:

1. Renewable energy sources (e.g., solar, wind, hydro)?
2. Energy storage and grid stability?
3. Green building and architecture?
4

### Strategy 4: Retrieval-Based Memory

| **Best For**                  | **Tradeoff**                                           |
| ----------------------------- | ------------------------------------------------------ |
| Long-term recall, precision needed                   | Harder to implement, needs vector DB + ranking         |

**Theory:** This is a powerful and widely-used strategy, forming the basis of Retrieval-Augmented Generation (RAG). Instead of storing conversation history linearly, each piece of information (e.g., a conversational turn) is treated as a document in a searchable database. When the user asks a new question, the system:
1.  Converts the user's query into a numerical vector (an embedding).
2.  Searches the database to find the `k` most semantically similar document embeddings.
3.  Retrieves the original text of these documents.
4.  Injects this retrieved text into the LLM's context.

This allows the agent to pull in relevant information from any point in the past, no matter how long ago. We use `faiss` to create an efficient, in-memory vector index.

In [None]:
# Import necessary libraries for numerical operations and similarity search.
import numpy as np
import faiss

# --- Strategy 4: Retrieval-Based Memory ---
# This strategy treats each piece of conversation as a document in a searchable
# database. It uses vector embeddings to find and retrieve the most semantically
# relevant pieces of information from the past to answer a new query. This is the
# core concept behind Retrieval-Augmented Generation (RAG).
class RetrievalMemory(BaseMemoryStrategy):
    def __init__(self, k: int = 2, embedding_dim: int = 3584):
        """
        Initializes the retrieval memory system.
        
        Args:
            k: The number of top relevant documents to retrieve for a given query.
            embedding_dim: The dimension of the vectors generated by the embedding model.
                           For BAAI/bge-multilingual-gemma2, this is 3584.
        """
        # The number of nearest neighbors to retrieve.
        self.k = k
        # The dimensionality of the embedding vectors. Must match the model's output.
        self.embedding_dim = embedding_dim
        # A list to store the original text content of each document.
        self.documents = []
        # Initialize a FAISS index. IndexFlatL2 performs an exhaustive search using
        # L2 (Euclidean) distance, which is effective for a moderate number of vectors.
        self.index = faiss.IndexFlatL2(self.embedding_dim)

    def add_message(self, user_input: str, ai_response: str):
        """
        Adds a new conversational turn to the memory. Each part of the turn (user
        input and AI response) is embedded and indexed separately for granular retrieval.
        """
        # We store each part of the turn as a separate document to allow for more
        # precise matching. For example, a query might be similar to a past user
        # statement but not the AI's response in that same turn.
        docs_to_add = [
            f"User said: {user_input}",
            f"AI responded: {ai_response}"
        ]
        for doc in docs_to_add:
            # Generate a numerical vector representation of the document.
            embedding = generate_embedding(doc)
            # Proceed only if the embedding was successfully created.
            if embedding:
                # Store the original text. The index of this document will correspond
                # to the index of its vector in the FAISS index.
                self.documents.append(doc)
                # FAISS requires the input vectors to be a 2D numpy array of float32.
                vector = np.array([embedding], dtype='float32')
                # Add the vector to the FAISS index, making it searchable.
                self.index.add(vector)

    def get_context(self, query: str) -> str:
        """
        Finds the k most relevant documents from memory based on semantic
        similarity to the user's query.
        """
        # If the index has no vectors, there's nothing to search.
        if self.index.ntotal == 0:
            return "No information in memory yet."
        
        # Convert the user's query into an embedding vector.
        query_embedding = generate_embedding(query)
        if not query_embedding:
            return "Could not process query for retrieval."
        
        # Convert the query embedding into the format required by FAISS.
        query_vector = np.array([query_embedding], dtype='float32')
        
        # Perform the search. 'search' returns the distances and the indices
        # of the k nearest neighbors to the query vector.
        distances, indices = self.index.search(query_vector, self.k)
        
        # Use the returned indices to retrieve the original text documents.
        # We check for `i != -1` as FAISS can return -1 for invalid indices.
        retrieved_docs = [self.documents[i] for i in indices[0] if i != -1]
        
        if not retrieved_docs:
            return "Could not find any relevant information in memory."
        
        # Format the retrieved documents into a string to be used as context.
        return "### Relevant Information Retrieved from Memory:\n" + "\n---\n".join(retrieved_docs)

    def clear(self):
        """Resets the memory completely by clearing the documents and the FAISS index."""
        self.documents = []
        self.index.reset()
        print("Retrieval memory (documents and FAISS index) cleared.")

#### Demonstration of Retrieval Memory

**What to watch for:** We will have a conversation about two completely different topics: a vacation plan and a software project. Then, we will ask a question about the vacation. The debug output will show that only the relevant vacation-related documents are retrieved and injected into the prompt, completely ignoring the irrelevant project talk.

In [None]:
# Initialize the RetrievalMemory with k=2, meaning it will retrieve the top 2 most relevant documents.
retrieval_memory = RetrievalMemory(k=2)
# Create an AIAgent and inject the retrieval memory strategy.
agent = AIAgent(memory_strategy=retrieval_memory)

# --- Start the conversation with mixed topics ---
# First turn: Discussing a vacation plan. This will be stored as a document.
agent.chat("I am planning a vacation to Japan for next spring.")
# Second turn: Discussing a software project. This will also be stored as a separate document.
agent.chat("For my software project, I'm using the React framework for the frontend.")
# Third turn: More details about the vacation.
agent.chat("I want to visit Tokyo and Kyoto while I'm on my trip.")
# Fourth turn: More details about the software project.
agent.chat("The backend of my project will be built with Django.")

# --- Test the retrieval mechanism ---
# Now, ask a question specifically about the vacation.
# The agent will convert this query into an embedding and search the memory.
# It should find that the documents about Japan, Tokyo, and Kyoto are semantically
# closer to the query than the documents about React and Django.
agent.chat("What cities am I planning to visit on my vacation?")
# The agent should retrieve the Japan/Tokyo/Kyoto info and ignore the software project info.

# Clean up the memory for the next demonstration.
retrieval_memory.clear()

Agent initialized with RetrievalMemory.

User > I am planning a vacation to Japan for next spring.

--- Agent Debug Info ---
Memory Retrieval Time: 0.0000 seconds
Estimated Prompt Tokens: 32

[Full Prompt Sent to LLM]:
---
SYSTEM: You are a helpful AI assistant.
USER: ### MEMORY CONTEXT
No information in memory yet.

### CURRENT REQUEST
I am planning a vacation to Japan for next spring.
---

Agent > Japan in the spring is a wonderful idea! You can expect mild temperatures, cherry blossoms in full bloom, and a plethora of exciting festivals and events. Here are some suggestions to consider for your trip:

**When to Go:**
Next spring would likely be late March to early April, a great time to see the cherry blossoms. However, keep in mind that the peak bloom time is around late March to early April, but it can vary depending on weather conditions.

**Must-Visit Places:**

1.  **Tokyo:** Explore the vibrant city's neon streets, try local cuisine, and visit famous landmarks like the Tokyo T

### Strategy 5: Memory-Augmented Transformers (Conceptual Simulation)

| **Best For**                  | **Tradeoff**                                           |
| ----------------------------- | ------------------------------------------------------ |
| Rich, evolving contexts over time                    | Advanced models, costlier                              |

**Theory:** This is a modification to the *model architecture itself* and cannot be fully implemented at the agent level. However, we can *simulate its behavior*. The core idea is that the model has access to a special, compressed memory space (like "sticky notes") in addition to its normal context window. It learns to write key information to these memory slots and read from them when needed.

**Our Simulation:** We will create a `MemoryAugmentedMemory` class. After each turn, it will use the LLM to decide if any information is important enough to be a "key memory." If so, it will create a concise summary of that fact and store it in a special list of `memory_tokens`. The final context will be a combination of a sliding window of recent chat and these important `memory_tokens`.

In [None]:
# --- Strategy 5: Memory-Augmented Memory (Simulation) ---
# This strategy simulates the behavior of a Memory-Augmented Transformer model.
# It maintains a short-term sliding window of recent conversation and a separate
# list of "memory tokens" which are important facts extracted from the conversation.
# An LLM call is used to decide if a piece of information is important enough
# to be converted into a persistent memory token.
class MemoryAugmentedMemory(BaseMemoryStrategy):
    def __init__(self, window_size: int = 2):
        """
        Initializes the memory-augmented system.
        
        Args:
            window_size: The number of recent turns to keep in the short-term memory.
        """
        # Use a SlidingWindowMemory instance to manage the recent conversation history.
        self.recent_memory = SlidingWindowMemory(window_size=window_size)
        # A list to store the special, persistent "sticky notes" or key facts.
        self.memory_tokens = []

    def add_message(self, user_input: str, ai_response: str):
        """
        Adds the latest turn to recent memory and then uses an LLM call to decide
        if a new, persistent memory token should be created from this interaction.
        """
        # First, add the new interaction to the short-term sliding window memory.
        self.recent_memory.add_message(user_input, ai_response)
        
        # Construct a prompt for the LLM to analyze the conversation turn and
        # determine if it contains a core fact worth remembering long-term.
        fact_extraction_prompt = (
            f"Analyze the following conversation turn. Does it contain a core fact, preference, or decision that should be remembered long-term? "
            f"Examples include user preferences ('I hate flying'), key decisions ('The budget is $1000'), or important facts ('My user ID is 12345').\n\n"
            f"Conversation Turn:\nUser: {user_input}\nAI: {ai_response}\n\n"
            f"If it contains such a fact, state the fact concisely in one sentence. Otherwise, respond with 'No important fact.'"
        )
        
        # Call the LLM to perform the fact extraction.
        extracted_fact = generate_text("You are a fact-extraction expert.", fact_extraction_prompt)
        
        # Check if the LLM's response indicates that an important fact was found.
        if "no important fact" not in extracted_fact.lower():
            # If a fact was found, print a debug message and add it to our list of memory tokens.
            print(f"--- [Memory Augmentation: New memory token created: '{extracted_fact}'] ---")
            self.memory_tokens.append(extracted_fact)

    def get_context(self, query: str) -> str:
        """
        Constructs the context by combining the short-term recent conversation
        with the list of all long-term, persistent memory tokens.
        """
        # Get the context from the short-term sliding window.
        recent_context = self.recent_memory.get_context(query)
        # Format the list of memory tokens into a readable string.
        memory_token_context = "\n".join([f"- {token}" for token in self.memory_tokens])
        
        # Return the combined context, clearly separating the long-term facts from the recent chat.
        return f"### Key Memory Tokens (Long-Term Facts):\n{memory_token_context}\n\n### Recent Conversation:\n{recent_context}"

    def clear(self):
        """Resets both the short-term memory and the list of memory tokens."""
        self.recent_memory.clear()
        self.memory_tokens = []
        print("Memory-Augmented memory cleared.")

#### Demonstration of Memory-Augmented Memory

**What to watch for:** We'll mention a critical, long-term preference in the first turn. The agent should identify this as a "key memory" and create a memory token. After several more turns push the original message out of the recent chat window, the agent should still be able to recall the preference by reading its memory token.

In [None]:
# Initialize the MemoryAugmentedMemory with a window size of 2.
# This means the short-term memory will only hold the last two turns.
mem_aug_memory = MemoryAugmentedMemory(window_size=2)
# Create an AIAgent and inject the memory-augmented strategy.
agent = AIAgent(memory_strategy=mem_aug_memory)

# --- Start the conversation ---
# First turn: The user provides a critical, long-term piece of information.
# The agent's fact-extraction mechanism should identify this as important and create a memory token.
agent.chat("Please remember this for all future interactions: I am severely allergic to peanuts.")

# Second turn: A standard conversational turn.
agent.chat("Okay, let's talk about recipes. What's a good idea for dinner tonight?")

# Third turn: Another conversational turn. This will push the first turn (the allergy warning)
# out of the short-term sliding window memory.
agent.chat("That sounds good. What about a dessert option?")

# --- Test the memory augmentation ---
# Now, the critical test. The original allergy warning is no longer in the recent chat context.
# The agent's only way to know about the allergy is by accessing its long-term "memory tokens".
agent.chat("Could you suggest a Thai green curry recipe? Please ensure it's safe for me.")
# A successful agent will use the persistent memory token to check for safety and likely warn
# about peanuts, which are common in Thai cuisine.

# Clean up the memory for the next demonstration.
mem_aug_memory.clear()

Agent initialized with MemoryAugmentedMemory.

User > Please remember this for all future interactions: I am severely allergic to peanuts.

--- Agent Debug Info ---
Memory Retrieval Time: 0.0000 seconds
Estimated Prompt Tokens: 45

[Full Prompt Sent to LLM]:
---
SYSTEM: You are a helpful AI assistant.
USER: ### MEMORY CONTEXT
### Key Memory Tokens (Long-Term Facts):


### Recent Conversation:


### CURRENT REQUEST
Please remember this for all future interactions: I am severely allergic to peanuts.
---
--- [Memory Augmentation: New memory token created: 'You are severely allergic to peanuts.'] ---

Agent > **MEMORY UPDATE**

I have taken note of your long-term fact: You are severely allergic to peanuts. I will keep this in mind for all future interactions to ensure your allergy is considered when providing information or suggestions.
(LLM Generation Time: 1.3206 seconds)

User > Okay, let's talk about recipes. What's a good idea for dinner tonight?

--- Agent Debug Info ---
Memory Retri

### Strategy 6: Hierarchical Memory

| **Best For**                  | **Tradeoff**                                           |
| ----------------------------- | ------------------------------------------------------ |
| Multi-task, complex agents with different info types | Sophisticated management logic                         |

**Theory:** This is a composite strategy that mimics how human memory works at different levels. It combines multiple, simpler memory strategies into a hierarchy. A common setup is:
- **Level 1 (Working Memory):** A `SlidingWindowMemory` for fast, immediate context.
- **Level 2 (Long-Term Memory):** A `RetrievalMemory` for storing important, durable facts.

The key is the logic that **promotes** information from L1 to L2. Our implementation will use a heuristic: if a conversation turn seems particularly important (e.g., contains a keyword like "preference" or "rule"), it gets added to both the working and long-term stores.

In [None]:
# --- Strategy 6: Hierarchical Memory ---
# This strategy combines multiple memory types to create a more sophisticated,
# layered system, mimicking human memory's division into short-term (working)
# and long-term storage.
class HierarchicalMemory(BaseMemoryStrategy):
    def __init__(self, window_size: int = 2, k: int = 2, embedding_dim: int = 3584):
        """
        Initializes the hierarchical memory system.
        
        Args:
            window_size: The size of the short-term working memory (in turns).
            k: The number of documents to retrieve from long-term memory.
            embedding_dim: The dimension of the embedding vectors for long-term memory.
        """
        print("Initializing Hierarchical Memory...")
        # Level 1: Fast, short-term working memory using a sliding window.
        self.working_memory = SlidingWindowMemory(window_size=window_size)
        # Level 2: Slower, durable long-term memory using a retrieval system.
        self.long_term_memory = RetrievalMemory(k=k, embedding_dim=embedding_dim)
        # A simple heuristic: keywords that trigger promotion from working to long-term memory.
        self.promotion_keywords = ["remember", "rule", "preference", "always", "never", "allergic"]

    def add_message(self, user_input: str, ai_response: str):
        """
        Adds a message to working memory and conditionally promotes it to long-term
        memory based on its content.
        """
        # All interactions are added to the fast, short-term working memory.
        self.working_memory.add_message(user_input, ai_response)
        
        # Promotion Logic: Check if the user's input contains a keyword that
        # suggests the information is important and should be stored long-term.
        if any(keyword in user_input.lower() for keyword in self.promotion_keywords):
            print(f"--- [Hierarchical Memory: Promoting message to long-term storage.] ---")
            # If a keyword is found, also add the interaction to the long-term retrieval memory.
            self.long_term_memory.add_message(user_input, ai_response)

    def get_context(self, query: str) -> str:
        """
        Constructs a rich context by combining relevant information from both
        the long-term and short-term memory layers.
        """
        # Retrieve the most recent conversation from the working memory.
        working_context = self.working_memory.get_context(query)
        # Retrieve semantically relevant facts from the long-term memory based on the current query.
        long_term_context = self.long_term_memory.get_context(query)
        
        # Combine both contexts, clearly labeling their sources for the LLM.
        return f"### Retrieved Long-Term Memories:\n{long_term_context}\n\n### Recent Conversation (Working Memory):\n{working_context}"

    def clear(self):
        """Resets both the working and long-term memory stores."""
        self.working_memory.clear()
        self.long_term_memory.clear()
        print("Hierarchical memory cleared.")

#### Demonstration of Hierarchical Memory

**What to watch for:** We will state a preference using the keyword "remember". This will trigger the message to be saved in the long-term `RetrievalMemory`. After a few turns push it out of the short-term `SlidingWindowMemory`, we'll ask a related question. The agent should successfully answer by retrieving from its long-term store.

In [None]:
# Initialize the HierarchicalMemory. It combines a short-term sliding window
# and a long-term retrieval system.
hierarchical_memory = HierarchicalMemory()
# Create an AIAgent and inject the hierarchical memory strategy.
agent = AIAgent(memory_strategy=hierarchical_memory)

# --- Start the conversation ---
# First turn: The user provides an important piece of information with a keyword ("remember").
# This should trigger the promotion logic, saving this message to both short-term and long-term memory.
agent.chat("Please remember my User ID is AX-7890.")
# Second turn: A casual conversation topic. This is added to short-term memory.
agent.chat("Let's chat about the weather. It's very sunny today.")
# Third turn: Another casual topic. This pushes the first message (User ID) out of the
# short-term sliding window memory.
agent.chat("I'm planning to go for a walk later.")

# --- Test the hierarchical retrieval ---
# The User ID is now out of the working memory's window.
# The agent must now rely on its long-term, retrieval-based memory.
agent.chat("I need to log into my account, can you remind me of my ID?")
# A successful agent will retrieve 'AX-7890' from its long-term memory because the initial
# message was promoted due to the keyword "remember".

# Clean up the memory for the next demonstration.
hierarchical_memory.clear()

Initializing Hierarchical Memory...
Agent initialized with HierarchicalMemory.

User > Please remember my User ID is AX-7890.

--- Agent Debug Info ---
Memory Retrieval Time: 0.0000 seconds
Estimated Prompt Tokens: 47

[Full Prompt Sent to LLM]:
---
SYSTEM: You are a helpful AI assistant.
USER: ### MEMORY CONTEXT
### Retrieved Long-Term Memories:
No information in memory yet.

### Recent Conversation (Working Memory):


### CURRENT REQUEST
Please remember my User ID is AX-7890.
---
--- [Hierarchical Memory: Promoting message to long-term storage.] ---

Agent > ### MEMORY CONTEXT
### Retrieved Long-Term Memories:
- User ID: AX-7890

### Recent Conversation (Working Memory):
- Stored request to remember the User ID AX-7890

### CURRENT REQUEST
You have provided your User ID as AX-7890, which has been stored in long-term memory for future reference. How can I assist you next?
(LLM Generation Time: 1.7426 seconds)

User > Let's chat about the weather. It's very sunny today.

--- Agent Debu

### Strategy 7: Graph-Based Memory

| **Best For**                  | **Tradeoff**                                           |
| ----------------------------- | ------------------------------------------------------ |
| Systems where relationships between facts matter     | Best for structured knowledge, more effort to maintain |

**Theory:** This strategy moves beyond storing unstructured text. It represents information as a **knowledge graph**, consisting of nodes (entities) and edges (relationships). For example, `(Sam) -[WorksFor]-> (Innovatech) -[FocusesOn]-> (Energy)`.

This is incredibly powerful for answering complex queries that require reasoning about connections. The main challenge is populating the graph. We will use a powerful technique: **using the LLM as a tool** to extract structured `(subject, relation, object)` triples from the unstructured conversation text.

In [None]:
# Import necessary libraries for graph data structures and regular expressions.
import networkx as nx
import re

# --- Strategy 7: Graph-Based Memory ---
# This strategy represents information as a structured knowledge graph, consisting
# of nodes (entities like 'Sam', 'Innovatech') and edges (relationships like
# 'works_for', 'focuses_on'). It uses the LLM itself to extract these structured
# triples (Subject, Relation, Object) from unstructured conversation text.
class GraphMemory(BaseMemoryStrategy):
    def __init__(self):
        """Initializes the memory with an empty NetworkX directed graph."""
        # A DiGraph is suitable for representing directed relationships (e.g., Sam -> works_for -> Innovatech).
        self.graph = nx.DiGraph()

    def _extract_triples(self, text: str) -> list[tuple[str, str, str]]:
        """
        Uses the LLM to extract knowledge triples (Subject, Relation, Object) from a given text.
        This is a form of "LLM as a Tool" where the model's language understanding is
        used to create structured data.
        """
        print("--- [Graph Memory: Attempting to extract triples from text.] ---")
        # Construct a detailed prompt that instructs the LLM on its role and the desired output format.
        # Providing a clear example is crucial for getting reliable, structured output.
        extraction_prompt = (
            f"You are a knowledge extraction engine. Your task is to extract Subject-Relation-Object triples from the given text. "
            f"Format your output strictly as a list of Python tuples. For example: [('Sam', 'works_for', 'Innovatech'), ('Innovatech', 'focuses_on', 'Energy')]. "
            f"If no triples are found, return an empty list [].\n\n"
            f"Text to analyze:\n\"""{text}\""""
        )
        
        # Call the LLM with the specialized prompt.
        response_text = generate_text("You are an expert knowledge graph extractor.", extraction_prompt)
        
        # Safely parse the string representation of a list of tuples from the LLM's response.
        try:
            # Using regular expressions is a much safer alternative to `eval()`, as it avoids
            # executing arbitrary code that might be maliciously or accidentally included in the LLM's output.
            # This regex looks for patterns matching ('item1', 'item2', 'item3').
            found_triples = re.findall(r"\(['\"](.*?)['\"],\s*['\"](.*?)['\"],\s*['\"](.*?)['\"]\)", response_text)
            print(f"--- [Graph Memory: Extracted triples: {found_triples}] ---")
            return found_triples
        except Exception as e:
            # If parsing fails, log the error and return an empty list to prevent crashes.
            print(f"Could not parse triples from LLM response: {e}")
            return []

    def add_message(self, user_input: str, ai_response: str):
        """Extracts triples from the latest conversation turn and adds them to the knowledge graph."""
        # Combine the user and AI messages to provide full context for extraction.
        full_text = f"User: {user_input}\nAI: {ai_response}"
        # Call the helper method to get structured triples.
        triples = self._extract_triples(full_text)
        # Iterate over the extracted triples.
        for subject, relation, obj in triples:
            # Add an edge to the graph. `add_edge` automatically creates the nodes
            # (subject, obj) if they don't already exist. The relation is stored as an edge attribute.
            # .strip() removes any leading/trailing whitespace for cleaner data.
            self.graph.add_edge(subject.strip(), obj.strip(), relation=relation.strip())

    def get_context(self, query: str) -> str:
        """
        Retrieves context by finding entities from the query in the graph and
        returning all their known relationships.
        """
        # If the graph is empty, there's no context to provide.
        if not self.graph.nodes:
            return "The knowledge graph is empty."
        
        # This is a simple entity linking method: it capitalizes words in the query and checks
        # if they exist as nodes in the graph. A more advanced system would use Natural
        # Language Processing (NLP) to identify named entities more accurately.
        query_entities = [word.capitalize() for word in query.replace('?','').split() if word.capitalize() in self.graph.nodes]
        
        # If no entities from the query are found in our graph, we can't provide specific context.
        if not query_entities:
            return "No relevant entities from your query were found in the knowledge graph."
        
        context_parts = []
        # Use set() to process each unique entity only once.
        for entity in set(query_entities):
            # Find all outgoing edges (e.g., Sam -> works_for -> X)
            for u, v, data in self.graph.out_edges(entity, data=True):
                context_parts.append(f"{u} --[{data['relation']}]--> {v}")
            # Find all incoming edges (e.g., X -> is_located_in -> New York)
            for u, v, data in self.graph.in_edges(entity, data=True):
                context_parts.append(f"{u} --[{data['relation']}]--> {v}")
        
        # Combine the retrieved facts into a single context string, removing duplicates and sorting for consistency.
        return "### Facts Retrieved from Knowledge Graph:\n" + "\n".join(sorted(list(set(context_parts))))

    def clear(self):
        """Resets the memory by clearing all nodes and edges from the graph."""
        self.graph.clear()
        print("Graph memory cleared.")

#### Demonstration of Graph-Based Memory

**What to watch for:** As we chat, the agent will call the LLM to extract triples and build its knowledge graph. You will see the `[Extracted triples]` debug message. When we finally ask a question, the agent will provide context by showing the structured relationships it has learned, allowing it to answer questions about how different entities are connected.

In [None]:
# Initialize the GraphMemory strategy.
graph_memory = GraphMemory()
# Create an AIAgent and inject the graph memory strategy.
agent = AIAgent(memory_strategy=graph_memory)

# --- Start the conversation ---
# First turn: The agent will attempt to extract the triple ('Clara', 'works_for', 'FutureScape').
agent.chat("A person named Clara works for a company called 'FutureScape'.")
# Second turn: The agent will attempt to extract the triple ('FutureScape', 'is_based_in', 'Berlin').
agent.chat("FutureScape is based in Berlin.")
# Third turn: The agent will attempt to extract the triple ('Clara', 'main_project_is', 'Odyssey').
agent.chat("Clara's main project is named 'Odyssey'.")

# --- Test the graph reasoning ---
# Now, ask a question that requires connecting the dots.
# The agent will identify 'Clara' as an entity in the query. It will then search the graph
# for all relationships connected to the 'Clara' node and use this structured information
# as context for the LLM.
agent.chat("Tell me about Clara's project.")
# The agent should be able to use the extracted facts to infer the connection between Clara and her project.

# Clean up the memory for the next demonstration.
graph_memory.clear()

Agent initialized with GraphMemory.

User > A person named Clara works for a company called 'FutureScape'.

--- Agent Debug Info ---
Memory Retrieval Time: 0.0000 seconds
Estimated Prompt Tokens: 35

[Full Prompt Sent to LLM]:
---
SYSTEM: You are a helpful AI assistant.
USER: ### MEMORY CONTEXT
The knowledge graph is empty.

### CURRENT REQUEST
A person named Clara works for a company called 'FutureScape'.
---
--- [Graph Memory: Attempting to extract triples from text.] ---
--- [Graph Memory: Extracted triples: [('Sam', 'works_for', 'Innovatech'), ('Innovatech', 'focuses_on', 'Energy')]] ---

Agent > It seems like we're starting fresh! Let's create some context for our conversation.

**NEW KNOWLEDGE GRAPH**

* **Entity:** Clara
* **Description:** A person
* **Occupation:** Works for FutureScape

Now that we have a basic understanding of Clara, what would you like to know or explore next?
(LLM Generation Time: 1.5371 seconds)

User > FutureScape is based in Berlin.

--- Agent Debug Info

### Strategy 8: Compression & Consolidation Memory

| **Best For**                  | **Tradeoff**                                           |
| ----------------------------- | ------------------------------------------------------ |
| Scalable memory at lower cost                        | Risk of lossy or overly abstract recall                |

**Theory:** This is a more explicit and aggressive form of summarization. Instead of creating a narrative summary, the goal is to compress each piece of information into its most dense, factual representation. Think of it like converting a verbose paragraph into a few bullet points.

Our implementation will take each conversational turn and use the LLM to re-write it as a concise, compressed fact. The memory is simply a list of these compressed facts. This can save a significant number of tokens compared to storing the full text, but the risk of losing nuance is even higher than with standard summarization.

In [22]:
class CompressionMemory(BaseMemoryStrategy):
    def __init__(self):
        self.compressed_facts = []

    def add_message(self, user_input: str, ai_response: str):
        """Uses the LLM to compress the turn into a dense fact."""
        text_to_compress = f"User: {user_input}\nAI: {ai_response}"
        
        compression_prompt = (
            f"You are a data compression engine. Your task is to distill the following text into its most essential, factual statement. "
            f"Be as concise as possible, removing all conversational fluff. For example, 'User asked about my name and I, the AI, responded that my name is an AI assistant' should become 'User asked for AI's name.'\n\n"
            f"Text to compress:\n\"""{text_to_compress}\""""
        )
        
        compressed_fact = generate_text("You are an expert data compressor.", compression_prompt)
        print(f"--- [Compression Memory: New fact stored: '{compressed_fact}'] ---")
        self.compressed_facts.append(compressed_fact)

    def get_context(self, query: str) -> str:
        """Returns the list of all compressed facts."""
        if not self.compressed_facts:
            return "No compressed facts in memory."
        
        return "### Compressed Factual Memory:\n- " + "\n- ".join(self.compressed_facts)

    def clear(self):
        self.compressed_facts = []
        print("Compression memory cleared.")

#### Demonstration of Compression Memory

**What to watch for:** After each turn, a `[Compression Memory: New fact stored]` message will appear, showing the highly condensed version of the interaction. The final context sent to the LLM will be a bulleted list of these terse facts, which is much more token-efficient than the original conversation.

In [23]:
compression_memory = CompressionMemory()
agent = AIAgent(memory_strategy=compression_memory)

agent.chat("Okay, I've decided on the venue for the conference. It's going to be the 'Metropolitan Convention Center'.")
agent.chat("The date is confirmed for October 26th, 2025.")
agent.chat("Could you please summarize the key details for the conference plan?")

compression_memory.clear()

Agent initialized with CompressionMemory.

User > Okay, I've decided on the venue for the conference. It's going to be the 'Metropolitan Convention Center'.

--- Agent Debug Info ---
Memory Retrieval Time: 0.0000 seconds
Estimated Prompt Tokens: 45

[Full Prompt Sent to LLM]:
---
SYSTEM: You are a helpful AI assistant.
USER: ### MEMORY CONTEXT
No compressed facts in memory.

### CURRENT REQUEST
Okay, I've decided on the venue for the conference. It's going to be the 'Metropolitan Convention Center'.
---
--- [Compression Memory: New fact stored: 'The compressed text is: "This text is empty."'] ---

Agent > Now that we have the venue finalized as the "Metropolitan Convention Center", what's the next step you'd like to focus on for the conference planning process?
(LLM Generation Time: 1.0636 seconds)

User > The date is confirmed for October 26th, 2025.

--- Agent Debug Info ---
Memory Retrieval Time: 0.0000 seconds
Estimated Prompt Tokens: 48

[Full Prompt Sent to LLM]:
---
SYSTEM: You 

### Strategy 9: OS-Like Memory Management (Conceptual Simulation)

| **Best For**                  | **Tradeoff**                                           |
| ----------------------------- | ------------------------------------------------------ |
| Large-scale systems with dynamic memory requirements | Conceptually powerful, architecturally complex         |

**Theory:** This advanced concept borrows from how a computer's Operating System manages memory. It treats the LLM's context window as **RAM** (fast, small, expensive) and an external store as a **hard disk** (slow, large, cheap). Information is moved between them.
- **Paging Out:** When RAM (the active context) is full, a policy (like Least Recently Used - LRU) moves the oldest information to the hard disk (a passive store).
- **Paging In (Page Fault):** When a query needs information that isn't in RAM, a "page fault" occurs. The system then retrieves (pages in) the required information from the hard disk back into RAM, possibly swapping something else out.

**Our Simulation:** We'll create an `active_memory` (list) and a `passive_memory` (dict for fast lookups). When `active_memory` is full, we'll move the LRU item to `passive_memory`. Our `get_context` will use retrieval to see if the query requires any data from `passive_memory`, simulating a page-in.

In [24]:
class OSMemory(BaseMemoryStrategy):
    def __init__(self, ram_size: int = 2):
        self.ram_size = ram_size # Max number of turns in active memory (RAM)
        self.active_memory = deque() # Our 'RAM'
        self.passive_memory = {} # Our 'Hard Disk', a key-value store
        self.turn_count = 0

    def add_message(self, user_input: str, ai_response: str):
        """Adds a turn to active memory, paging out to passive if RAM is full."""
        turn_id = self.turn_count
        turn_data = f"User: {user_input}\nAI: {ai_response}"
        
        if len(self.active_memory) >= self.ram_size:
            # Page out the least recently used (oldest) item from RAM
            lru_turn_id, lru_turn_data = self.active_memory.popleft()
            self.passive_memory[lru_turn_id] = lru_turn_data
            print(f"--- [OS Memory: Paging out Turn {lru_turn_id} to passive storage.] ---")
        
        # Add the new turn to active memory (RAM)
        self.active_memory.append((turn_id, turn_data))
        self.turn_count += 1

    def get_context(self, query: str) -> str:
        """Provides RAM context and simulates a 'page fault' to pull from passive memory."""
        active_context = "\n".join([data for _, data in self.active_memory])
        
        # Simulate a page fault: check if the query is more similar to something in passive memory
        # In a real system, this would use embeddings for similarity search.
        # For this demo, we'll do a simple keyword search.
        paged_in_context = ""
        for turn_id, data in self.passive_memory.items():
            if any(word in data.lower() for word in query.lower().split() if len(word) > 3):
                paged_in_context += f"\n(Paged in from Turn {turn_id}): {data}"
                print(f"--- [OS Memory: Page fault! Paging in Turn {turn_id} from passive storage.] ---")
        
        return f"### Active Memory (RAM):\n{active_context}\n\n### Paged-In from Passive Memory (Disk):\n{paged_in_context}"

    def clear(self):
        self.active_memory.clear()
        self.passive_memory = {}
        self.turn_count = 0
        print("OS-like memory cleared.")

#### Demonstration of OS-Like Memory

**What to watch for:** We will have a conversation where the first turn contains a secret code. As we continue, you'll see a `[Paging out]` message as this turn is moved from "RAM" to "Disk". When we finally ask about the secret code, a `[Page fault!]` will be triggered, and the agent will "page in" that specific memory to answer the question.

In [25]:
os_memory = OSMemory(ram_size=2)
agent = AIAgent(memory_strategy=os_memory)

agent.chat("The secret launch code is 'Orion-Delta-7'.")
agent.chat("The weather for the launch looks clear.")
agent.chat("The launch window opens at 0400 Zulu.") # This will page out the secret code turn

# Now, ask about the paged-out information
agent.chat("I need to confirm the launch code.")

os_memory.clear()

Agent initialized with OSMemory.

User > The secret launch code is 'Orion-Delta-7'.

--- Agent Debug Info ---
Memory Retrieval Time: 0.0000 seconds
Estimated Prompt Tokens: 46

[Full Prompt Sent to LLM]:
---
SYSTEM: You are a helpful AI assistant.
USER: ### MEMORY CONTEXT
### Active Memory (RAM):


### Paged-In from Passive Memory (Disk):


### CURRENT REQUEST
The secret launch code is 'Orion-Delta-7'.
---

Agent > **PROCESSING CURRENT REQUEST:**

**SECRET LAUNCH CODE:** Orion-Delta-7

**No security breaches detected**

**Storing new data in Active Memory (RAM):**

* Secret Launch Code: stored as 'Orion_Delta_7' for secure access
* Creating backup copy of code for future use
* Data to be removed from memory after 5 minutes to maintain confidentiality

**No additional requests pending**
(LLM Generation Time: 1.9416 seconds)

User > The weather for the launch looks clear.

--- Agent Debug Info ---
Memory Retrieval Time: 0.0000 seconds
Estimated Prompt Tokens: 144

[Full Prompt Sent to LL

## Lab Conclusion and Final Thoughts

Congratulations on completing this deep dive into AI agent memory! We have successfully implemented and tested nine distinct strategies, observing firsthand how each one impacts an agent's performance, cost, and capabilities.

**Key Takeaways from this Lab:**

1.  **There is No Silver Bullet:** The choice of memory is fundamentally a design decision based on your agent's purpose. A simple Q&A bot might only need a `SlidingWindow`, while a long-term personal assistant would thrive on a `Hierarchical` or `Retrieval` system.

2.  **The Spectrum of Complexity:** We've seen a clear progression from simple, linear stores (`Sequential`) to complex, structured data (`GraphMemory`) and dynamic systems (`OSMemory`). Increasing complexity offers more power but requires more sophisticated engineering.

3.  **LLMs as Tools:** Several advanced strategies (`Summarization`, `GraphMemory`, `MemoryAugmented`) don't just use the LLM for chat; they use it as an intelligent tool for processing, structuring, and compressing memory itself. This is a powerful paradigm in modern agent design.

4.  **Hybrid Systems are the Future:** The most robust production agents often use hybrid approaches. Our `HierarchicalMemory` is a prime example, combining the speed of a sliding window with the precision of retrieval. You can mix and match these strategies to create a custom memory architecture tailored to your exact needs.

### Next Steps

This notebook is a starting point. You can continue to explore by:
- **Tuning Parameters:** Adjust `window_size`, `k` for retrieval, and summarization thresholds to see how they affect performance.
- **Implementing More Advanced Logic:** Improve the promotion logic in `HierarchicalMemory` or the page-fault detection in `OSMemory` using embeddings.
- **Creating New Hybrids:** Combine `GraphMemory` with `RetrievalMemory` to search both structured and unstructured data simultaneously.

Thank you for joining this lab. Happy building!