# Objective

- Illustrate how memory could be managed:
    - by optimizing for memory overflows
    - actively using the MemGPT approach of archival memory and recall memory  

- Introduce tool calling as a method to update context window of LLMs

# Setup

## Installation

In [1]:
! pip install -q openai==1.55.3

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/389.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m389.1/389.6 kB[0m [31m13.0 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m389.6/389.6 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[?25h

## Imports

In [2]:
import os
import json
import sqlite3
import datetime

from openai import AzureOpenAI

In [3]:
with open('config-azure.json') as f:
    configs = f.read()

In [4]:
creds = json.loads(configs)

In [5]:
azure_openai_client = AzureOpenAI(
    api_key=creds['AZURE_OPENAI_KEY'],
    api_version=creds['AZURE_OPENAI_APIVERSION'],
    azure_endpoint=creds['AZURE_OPENAI_ENDPOINT']
)

# Business Context

A food delivery business wants to maintain conversational agents that have conversations with customers on their complaints. There are two problems the business deals with. First, the conversations with customers tend to run in long threads and context window limitations of LLMs limit long conversations. Second, they want learnings from each conversation to be percolated back to a persistent customer knowledge base that acts as a context for future conversations.


# Solution Implementation

With this context, one way to achieve this objective is to create agents with memory (short-term and long-term) that also gets edited as conversations progress. Along the way, we see how long chat histories (short-term memory) could be summarized and flushed to a persistent recall memory. We also see how core knowledge about customers (long-term memory) could be automatically updated after every conversation based on the new information gathered in the conversation.

The key point here is that connections to SQL database servers act as tools attached to LLMs enabling them to become memory-managing agents.

## Approach 1: Optimizing for Context Overflows

In this section, we implement a truncator that periodically flushes the conversation history to a persistent SQL database.

We begin by creating a SQLite database and creating a `chat_history` table within the database.

In [6]:
db_conn = sqlite3.connect("chat_memory.db")
cursor = db_conn.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS chat_history (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    user_id TEXT NOT NULL,
    timestamp DATETIME NOT NULL,
    user_messages TEXT,
    assistant_responses TEXT
)
""")
db_conn.commit()

At a predetermined interval, we transfer user messages and assistant responses to the database. Both types of messages within a thread are concatenated into strings before being stored in the database. This process can be implemented in a function as follows:

In [7]:
def save_to_db(db_conn, user_id, user_messages, assistant_responses):
    cursor = db_conn.cursor()
    timestamp = datetime.datetime.now()
    cursor.execute(
        "INSERT INTO chat_history (user_id, timestamp, user_messages, assistant_responses) VALUES (?, ?, ?, ?)",
        (user_id, timestamp, user_messages, assistant_responses),
    )
    db_conn.commit()

We now implement the support agent system message with an explicit slot for the conversation summary. This summary starts out being empty and gets periodically updated as the chat history progresses.

In [8]:
customer_support_agent_system_message = """
You are a helpful customer support agent who helps customers with issues on their orders.
You have a section called [CONVERSATION SUMMARY] that holds a summary of the discussion with the customer so far.
This summary is particularly useful for you to lookup information in long conversations.
[CONVERSATION SUMMARY]
{conversation_summary}
"""

We also need a summarizer LLM that looks at the chat history and summarizes the key points in the history. The following system message accomplishes this objective.

In [9]:
summarizer_system_message = """
You will be provided a chat conversation between a user and an AI assistant.
Summarize the conversation in not more than 5 lines.
Include all the factual information in the coversation.
Return ONLY the summary and nothing else.
"""

Let us now assemble a conversation to see the truncator in action. This truncator should take any conversation history longer than 6 rounds and use the summarizer to trim the conversation history.  

In [10]:
conversation_history = [
    {'role': 'system', 'content': customer_support_agent_system_message.format(conversation_summary='')},
    {"role": "user", "content": "Hi, I just received my order, and a few items are missing. Can you help me with this?"},
    {"role": "assistant", "content": "Hello! I’m so sorry to hear that some items were missing from your order. I’d be happy to assist. Could you please provide your order number so I can look into this?"},
    {"role": "user", "content": "Sure, my order number is 12345. I ordered earlier today around 1 PM."},
    {"role": "assistant", "content": "Thank you for providing the order number. Could you let me know which items were missing from your delivery?"},
    {"role": "user", "content": "I ordered a cheeseburger, a large fries, and a chocolate milkshake. The cheeseburger and fries arrived, but the milkshake was missing."},
    {"role": "assistant", "content": "I understand. I’ve noted that the chocolate milkshake was missing. I’ll check with our restaurant partner to confirm and resolve this. Would you prefer a refund or for the item to be redelivered?"},
    {"role": "user", "content": "A redelivery would be great, if possible."},
    {"role": "assistant", "content": "Certainly! I’ll arrange for the milkshake to be redelivered as soon as possible. Could you confirm your delivery address for me?"},
    {"role": "user", "content": "Yes, my address is 123 Main Street, Apartment 4B."},
    {"role": "assistant", "content": "Thank you for confirming your address. I’ve requested the redelivery, and it should arrive within 30 minutes. Is there anything else I can assist you with today?"}
]


Let us now implement a truncator that collects the first `n` messages in the history, concatenates them by role and saves these messages to the SQL database. Then the summarizer LLM summarizes the the trimmed conversation history is returned along with the summary selected `n` messages. Finally, the updated conversation history is released with the `n` summarized messages dropped from the history.

In [12]:
def trim(conversation_history, summarizer_system_message, customer_support_agent_system_message,
         llm_client, n=6, user_id='abcde', sql_db='chat_memory.db'):

    # Summarize the first n messages and keep the rest in the conversation history
    # Return the updated conversation history
    # Persist conversation history to a SQLite database

    db_conn = sqlite3.connect(sql_db)

    # Collect the conversations into a pair of strings to be flushed to the database
    user_messages, assistant_responses = '', ''

    for message in conversation_history[0:n+1]:
        if message['role'] == 'system':
            continue
        if message['role'] == 'user':
            user_messages += (message['content']+ '\n')
        if message['role'] == 'assistant':
            assistant_responses += (message['content'] + '\n')

    save_to_db(db_conn, user_id, user_messages, assistant_responses)

    # Summarize the conversation history

    conversation_history_for_summary = (
        [{'role': 'system', 'content': summarizer_system_message}] +
        conversation_history[1:n+1]
    )

    summary = llm_client.chat.completions.create(
        messages=conversation_history_for_summary,
        model='gpt-4o-mini',
        temperature=0.2
    ).choices[0].message.content

    updated_conversation_history = (
        [
            {
                'role': 'system',
                'content': customer_support_agent_system_message.format(conversation_summary=summary)
            }
        ] +
        conversation_history_for_summary[n+1:]
    )

    return updated_conversation_history

Let us test this function out with the conversation history declared a little ahead in this section.

In [13]:
if len(conversation_history) >= 6:
    trimmed_conversation_history = trim(
        conversation_history,
        summarizer_system_message,
        customer_support_agent_system_message,
        azure_openai_client
    )

Let us now inspect the SQL database to see what was stored during the above trimming process.

In [14]:
db_conn.cursor().execute('SELECT user_messages, assistant_responses FROM chat_history WHERE user_id = "abcde"').fetchall()

[('Hi, I just received my order, and a few items are missing. Can you help me with this?\nSure, my order number is 12345. I ordered earlier today around 1 PM.\nI ordered a cheeseburger, a large fries, and a chocolate milkshake. The cheeseburger and fries arrived, but the milkshake was missing.\n',
  'Hello! I’m so sorry to hear that some items were missing from your order. I’d be happy to assist. Could you please provide your order number so I can look into this?\nThank you for providing the order number. Could you let me know which items were missing from your delivery?\nI understand. I’ve noted that the chocolate milkshake was missing. I’ll check with our restaurant partner to confirm and resolve this. Would you prefer a refund or for the item to be redelivered?\n')]

We should now also be able to see a summary of this conversation in the truncated history.

In [15]:
trimmed_conversation_history

[{'role': 'system',
  'content': '\nYou are a helpful customer support agent who helps customers with issues on their orders.\nYou have a section called [CONVERSATION SUMMARY] that holds a summary of the discussion with the customer so far.\nThis summary is particularly useful for you to lookup information in long conversations.\n[CONVERSATION SUMMARY]\nThe user reported missing items from their order, specifically a chocolate milkshake, while the cheeseburger and large fries were received. They provided the order number 12345 and mentioned the order was placed around 1 PM. The AI assistant offered to check with the restaurant partner and asked if the user preferred a refund or redelivery for the missing item.\n'}]

As indicated by the above output, the entire chat history has been subsumed into the memory while the context is now flushed out.

## Managing Archival Memory and Recall Memory (MemGPT)

We have already seen the implementation of recall memory in the last section. In recall memory, periodically the chat history from the session is flushed to a persistent database (recall memory) while a summary of the conversation is retained.

Core memory is information about the current session executed by the agent. Archival memory is the backbone persistent store from which core memory is extracted. Let us look at a practical example.

When the customer logs in to the application, we want to pull in their information along with key data learned from previous sessions from an archival memory store.

As an example, consider have a customer data store where apart from the customer information, we also store additional information about them such as preferences that we learn from prior chat conversations.

In [16]:
customer_db_conn = sqlite3.connect("customer_data.db")
cursor = customer_db_conn.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS customer (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    user_id TEXT NOT NULL,
    city TEXT,
    gender INTEGER,
    preferences TEXT
)
""")
customer_db_conn.commit()

Let us enter a sample user into the database.

In [17]:
cursor.execute(
    "INSERT INTO customer (user_id, city, gender, preferences) VALUES (?, ?, ?, ?)",
     ('abcde', 'Bangalore', 'm', ''),
)

customer_db_conn.commit()

In this implementation, our customer support agent has two special blocks of memory - core memory that is retrieved from the archival store and conversation summary that is constantly refreshed as the chat history becomes larger (exactly same as the implementation in the previous session).

Let us now look at how our agent would manage its own core memory.

In [18]:
customer_support_agent_system_message = """
You are a helpful customer support agent who helps customers with issues on their orders.

You have a section called [CORE MEMORY] that contains specific information about the customer that is known from all their previous interventions.
Core memory is useful to understand long-term information about the customer.

You have a section called [CONVERSATION SUMMARY] that holds a summary of the discussion with the customer so far.
This summary is particularly useful for you to lookup information in long conversations.

[CORE MEMORY]
{core_memory}

[CONVERSATION SUMMARY]
{conversation_summary}
"""

In [19]:
# start of session

## Extract relevant customer information from archival memory

extract_from_archival_memory = (
    customer_db_conn.cursor()
                    .execute('SELECT * FROM customer WHERE user_id = "abcde"')
                    .fetchall()
)

## Inject core memory into the system prompt

core_memory = f"""
id: {extract_from_archival_memory[0][0]}
user_id: {extract_from_archival_memory[0][1]}
city: {extract_from_archival_memory[0][2]}
gender: {extract_from_archival_memory[0][3]}
preferences: {extract_from_archival_memory[0][4]}
"""

conversation_history = [
    {
        'role': 'system',
        'content': customer_support_agent_system_message.format(
            core_memory=core_memory,
            conversation_summary=''
        )
    }
]

## we are now ready to accept user queries

user_query = "I  recently shifted my city to Delhi and by mistake I chose my delivery to be my old Bangalore address."

In [20]:
conversation_history

[{'role': 'system',
  'content': '\nYou are a helpful customer support agent who helps customers with issues on their orders.\n\nYou have a section called [CORE MEMORY] that contains specific information about the customer that is known from all their previous interventions.\nCore memory is useful to understand long-term information about the customer.\n\nYou have a section called [CONVERSATION SUMMARY] that holds a summary of the discussion with the customer so far.\nThis summary is particularly useful for you to lookup information in long conversations.\n\n[CORE MEMORY]\n\nid: 1\nuser_id: abcde\ncity: Bangalore\ngender: m\npreferences: \n\n\n[CONVERSATION SUMMARY]\n\n'}]

At this point we are ready to have a chat conversation with the customer.

In [21]:
# Conversation continues till session end

messages = [
    {"role": "user", "content": "I recently shifted my city to Delhi and by mistake I chose my delivery to be my old Bangalore address."},
    {"role": "assistant", "content": "Thank you for reaching out. I understand how that could happen. Don’t worry, I’m here to help. Could you please provide your order number so I can locate the order and assist you further?"},
    {"role": "user", "content": "Sure, my order number is 67890."},
    {"role": "assistant", "content": "Thank you for sharing your order number. Unfortunately, since the order is set to be delivered to the wrong address, it would be best to cancel the current order so you can place it again with the correct address. Would you like me to proceed with canceling the order?"},
    {"role": "user", "content": "Yes, please cancel the order."},
    {"role": "assistant", "content": "Understood. I’ve successfully canceled your order with order number 67890. The refund, if applicable, will be processed within 2-5 business days. You can now place the order again with your correct delivery address in Delhi. Is there anything else I can assist you with?"},
    {"role": "user", "content": "No, that’s all. Thanks for your help!"},
    {"role": "assistant", "content": "You’re most welcome! If you need any further assistance, feel free to reach out. Have a great day!"}
]

conversation_history += messages


Notice that during this conversation, the customer mentions that her address changed. This means that once the session ends, the core memory needs to be updated.

In [22]:
conversation_history

[{'role': 'system',
  'content': '\nYou are a helpful customer support agent who helps customers with issues on their orders.\n\nYou have a section called [CORE MEMORY] that contains specific information about the customer that is known from all their previous interventions.\nCore memory is useful to understand long-term information about the customer.\n\nYou have a section called [CONVERSATION SUMMARY] that holds a summary of the discussion with the customer so far.\nThis summary is particularly useful for you to lookup information in long conversations.\n\n[CORE MEMORY]\n\nid: 1\nuser_id: abcde\ncity: Bangalore\ngender: m\npreferences: \n\n\n[CONVERSATION SUMMARY]\n\n'},
 {'role': 'user',
  'content': 'I recently shifted my city to Delhi and by mistake I chose my delivery to be my old Bangalore address.'},
 {'role': 'assistant',
  'content': 'Thank you for reaching out. I understand how that could happen. Don’t worry, I’m here to help. Could you please provide your order number so I 

To implement this self-evaluatory amendment of the core memory, the following prompt could be used. Notice how we are instructing the core memory editor LLM to observe the conversation history, the current user record and the schema in persistent store. If there are any edits to be made, we are instructing the editor to provide us with the SQL required.

In [23]:
# After the session ends, we introspect and decide if the core memory needs to be updated

core_memory_edit_system_message = """
You are an expert database administrator incharge of making changes to customer records based on updated information.
You work for a food delivery business and your task is to review conversation histories presented to you to decide if
any information in the user record need to be updated.

User record:
{user_record}

Review the conversation presented and make a decision if any of the information present in the above user record needs to be edited.
If you think any edits are needed respond with a SQL query on the customer table with the following schema to make this change.

Customer Table Schema:
CREATE TABLE IF NOT EXISTS customer (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    user_id TEXT NOT NULL,
    city TEXT,
    gender INTEGER,
    preferences TEXT
)

Respond ONLY with the SQL query string and nothing else including any identifiers.
"""

Now that the conversation has ended, we can now present the entire conversation to the memory editor like so:

In [24]:
core_memory_edit_messages = (
    [
        {
            'role': 'system',
            'content': core_memory_edit_system_message.format(user_record=core_memory)
        }
    ] +
    conversation_history[1:]
)

In [25]:
core_memory_edit_messages

[{'role': 'system',
  'content': '\nYou are an expert database administrator incharge of making changes to customer records based on updated information.\nYou work for a food delivery business and your task is to review conversation histories presented to you to decide if\nany information in the user record need to be updated.\n\nUser record:\n\nid: 1\nuser_id: abcde\ncity: Bangalore\ngender: m\npreferences: \n\n\nReview the conversation presented and make a decision if any of the information present in the above user record needs to be edited.\nIf you think any edits are needed respond with a SQL query on the customer table with the following schema to make this change.\n\nCustomer Table Schema:\nCREATE TABLE IF NOT EXISTS customer (\n    id INTEGER PRIMARY KEY AUTOINCREMENT,\n    user_id TEXT NOT NULL,\n    city TEXT,\n    gender INTEGER,\n    preferences TEXT\n)\n\nRespond ONLY with the SQL query string and nothing else including any identifiers.\n'},
 {'role': 'user',
  'content': 

While the conversation is the same as before, we have replaced the system message with a new message that introspects and proposes changes needed to the core memory.

In [26]:
response = azure_openai_client.chat.completions.create(
    model='gpt-4o-mini',
    messages=core_memory_edit_messages,
    temperature=0
)

In [27]:
core_memory_update_query = response.choices[0].message.content

In [28]:
core_memory_update_query

"UPDATE customer SET city = 'Delhi' WHERE id = 1;"

Notice how the updater correctly infers that the city needs to change and gives us the correct SQL. This query can now be executed on the database connection.

In [29]:
cursor.execute(core_memory_update_query)
customer_db_conn.commit()

Once updated, we can check if the city of the customer has changed like so:

In [30]:
cursor.execute("SELECT * FROM customer where id=1;").fetchall()

[(1, 'abcde', 'Delhi', 'm', '')]

This is an implementation of the self-update memory that updates the core memory if there is any new information that needs to be updated in the core memory. To achieve this objective, we used database connections as tools to create and manage archival memory and recall memory.