# Conversational Memory for OpenAI - LangChain
Conversational memory allows our chatbots and agents to remember previous interactions within a conversation. Without conversational memory, our chatbots would only ever be able to respond to the last message they received, essentially forgetting all previous messages with each new message.

Naturally, conversations require our chatbots to be able to respond over multiple interactions and refer to previous messages to understand the context of the conversation.

# LangChain's Memory Types
LangChain versions 0.0.x consisted of various conversational memory types. Most of these are due for deprecation but still hold value in understanding the different approaches that we can take to building conversational memory.

Throughout the notebook we will be referring to these older memory types and then rewriting them using the recommended RunnableWithMessageHistory class. We will learn about:

- ConversationBufferMemory: the simplest and most intuitive form of conversational memory, keeping track of a conversation without any additional bells and whistles.
- ConversationBufferWindowMemory: similar to ConversationBufferMemory, but only keeps track of the last k messages.
- ConversationSummaryMemory: rather than keeping track of the entire conversation, this memory type keeps track of a summary of the conversation.
- ConversationSummaryBufferMemory: merges the ConversationSummaryMemory and ConversationTokenBufferMemory types.
  
We'll work through each of these memory types in turn, and rewrite each one using the RunnableWithMessageHistory class.

# 1. ConversationBufferMemory
ConversationBufferMemory is the simplest form of conversational memory, it is literally just a place that we store messages, and then use to feed messages into our LLM.

Let's start with LangChain's original ConversationBufferMemory object, we are setting return_messages=True to return the messages as a list of ChatMessage objects — unless using a non-chat model we would always set this to True as without it the messages are passed as a direct string which can lead to unexpected behavior from chat LLMs.

In [6]:
from langchain_ollama import ChatOllama

# For normal accurate responses
llm = ChatOllama(temperature=0.0, model="llama3.2")

In [1]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(return_messages=True)

  memory = ConversationBufferMemory(return_messages=True)


In [2]:
memory.save_context(
    {"input": "Hi, my name is James"},  # user message
    {"output": "Hey James, what's up? I'm an AI model called Zeta."}  # AI response
)
memory.save_context(
    {"input": "I'm researching the different types of conversational memory."},  # user message
    {"output": "That's interesting, what are some examples?"}  # AI response
)
memory.save_context(
    {"input": "I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory."},  # user message
    {"output": "That's interesting, what's the difference?"}  # AI response
)
memory.save_context(
    {"input": "Buffer memory just stores the entire conversation, right?"},  # user message
    {"output": "That makes sense, what about ConversationBufferWindowMemory?"}  # AI response
)
memory.save_context(
    {"input": "Buffer window memory stores the last k messages, dropping the rest."},  # user message
    {"output": "Very cool!"}  # AI response
)

In [3]:
memory.load_memory_variables({})

{'history': [HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}),
  AIMessage(content="Hey James, what's up? I'm an AI model called Zeta.", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
  HumanMess

In [4]:
memory = ConversationBufferMemory(return_messages=True)

memory.chat_memory.add_user_message("Hi, my name is James")
memory.chat_memory.add_ai_message("Hey James, what's up? I'm an AI model called Zeta.")
memory.chat_memory.add_user_message("I'm researching the different types of conversational memory.")
memory.chat_memory.add_ai_message("That's interesting, what are some examples?")
memory.chat_memory.add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
memory.chat_memory.add_ai_message("That's interesting, what's the difference?")
memory.chat_memory.add_user_message("Buffer memory just stores the entire conversation, right?")
memory.chat_memory.add_ai_message("That makes sense, what about ConversationBufferWindowMemory?")
memory.chat_memory.add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
memory.chat_memory.add_ai_message("Very cool!")

memory.load_memory_variables({})

{'history': [HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}),
  AIMessage(content="Hey James, what's up? I'm an AI model called Zeta.", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
  HumanMess

In [7]:
from langchain.chains import ConversationChain

chain = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True
)

  chain = ConversationChain(


In [8]:
chain.invoke({"input": "what is my name again?"})



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
[HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}), AIMessage(content="Hey James, what's up? I'm an AI model called Zeta.", additional_kwargs={}, response_metadata={}), HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}), AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}), HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}), AIMessage(content="That's interesting, what's the differ

{'input': 'what is my name again?',
 'history': [HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}),
  AIMessage(content="Hey James, what's up? I'm an AI model called Zeta.", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}

# ConversationBufferMemory with RunnableWithMessageHistory
As mentioned, the ConversationBufferMemory type is due for deprecation. Instead, we can use the RunnableWithMessageHistory class to implement the same functionality.

When implementing RunnableWithMessageHistory we will use LangChain Expression Language (LCEL) and for this we need to define our prompt template and LLM components. Our llm has already been defined, so now we just define a ChatPromptTemplate object.

In [9]:
from langchain.prompts import (
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
    ChatPromptTemplate
)

system_prompt = "You are a helpful assistant called Zeta."

prompt_template = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(system_prompt),
    MessagesPlaceholder(variable_name="history"),
    HumanMessagePromptTemplate.from_template("{query}"),
])

In [10]:
pipeline = prompt_template | llm

In [12]:
from langchain_core.chat_history import InMemoryChatMessageHistory

chat_map = {}
def get_chat_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = InMemoryChatMessageHistory()
    return chat_map[session_id]

In [13]:
from langchain_core.runnables.history import RunnableWithMessageHistory

pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history"
)

In [14]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is James"},
    config={"session_id": "id_123"}
)

AIMessage(content="Hello James! It's nice to meet you. I'm Zeta, your friendly AI assistant. How can I help you today? Do you have any questions or topics you'd like to discuss?", additional_kwargs={}, response_metadata={'model': 'llama3.2', 'created_at': '2025-05-14T06:44:34.9550283Z', 'done': True, 'done_reason': 'stop', 'total_duration': 5272147000, 'load_duration': 3421235200, 'prompt_eval_count': 40, 'prompt_eval_duration': 475612200, 'eval_count': 41, 'eval_duration': 1372120400, 'model_name': 'llama3.2'}, id='run--e7ba01b9-0c73-43c8-b160-8b75100d5219-0', usage_metadata={'input_tokens': 40, 'output_tokens': 41, 'total_tokens': 81})

In [15]:
pipeline_with_history.invoke(
    {"query": "What is my name again?"},
    config={"session_id": "id_123"}
)

AIMessage(content="Your name is James. Don't worry, I've got it stored in our little chat session, so I'll remember it for next time!", additional_kwargs={}, response_metadata={'model': 'llama3.2', 'created_at': '2025-05-14T06:45:01.2578835Z', 'done': True, 'done_reason': 'stop', 'total_duration': 1341130600, 'load_duration': 19726000, 'prompt_eval_count': 96, 'prompt_eval_duration': 305238000, 'eval_count': 30, 'eval_duration': 1015061100, 'model_name': 'llama3.2'}, id='run--8cce83d6-7873-459b-8107-05e16a8d2309-0', usage_metadata={'input_tokens': 96, 'output_tokens': 30, 'total_tokens': 126})

# 2. ConversationBufferWindowMemory
The ConversationBufferWindowMemory type is similar to ConversationBufferMemory, but only keeps track of the last k messages. There are a few reasons why we would want to keep only the last k messages:

- More messages mean more tokens are sent with each request, more tokens increases latency and cost.

- LLMs tend to perform worse when given more tokens, making them more likely to deviate from instructions, hallucinate, or "forget" information provided to them. Conciseness is key to high performing LLMs.

- If we keep all messages we will eventually hit the LLM's context window limit, by adding a window size k we can ensure we never hit this limit.

The buffer window solves many problems that we encounter with the standard buffer memory, while still being a very simple and intuitive form of conversational memory.

In [16]:

from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=4, return_messages=True)

  memory = ConversationBufferWindowMemory(k=4, return_messages=True)


In [17]:
memory.chat_memory.add_user_message("Hi, my name is James")
memory.chat_memory.add_ai_message("Hey James, what's up? I'm an AI model called Zeta.")
memory.chat_memory.add_user_message("I'm researching the different types of conversational memory.")
memory.chat_memory.add_ai_message("That's interesting, what are some examples?")
memory.chat_memory.add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
memory.chat_memory.add_ai_message("That's interesting, what's the difference?")
memory.chat_memory.add_user_message("Buffer memory just stores the entire conversation, right?")
memory.chat_memory.add_ai_message("That makes sense, what about ConversationBufferWindowMemory?")
memory.chat_memory.add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
memory.chat_memory.add_ai_message("Very cool!")

memory.load_memory_variables({})

{'history': [HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer window memory stores the last k messages, dropping the rest.', additional_kwargs={}, response_metadata={}),
  AIMessage(content='Very cool!', additional_kwargs={}, response_metadata={})]}

In [18]:
chain = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True
)

In [19]:
chain.invoke({"input": "what is my name again?"})



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
[HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}), AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}), HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}), AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}), HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}), AIMessage(content='That mak

{'input': 'what is my name again?',
 'history': [HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer window memory stores the last k messages, dropping the rest.', additional_kwargs={}, response_metadata={}),
  AIMessage(content='Very cool!', additional_kw

# ConversationBufferWindowMemory with RunnableWithMessageHistory
To implement this memory type using the RunnableWithMessageHistory class, we can use the same approach as before. We define our prompt_template and llm as before, and then wrap our pipeline in a RunnableWithMessageHistory object.

For the window feature, we need to define a custom version of the InMemoryChatMessageHistory class that removes any messages beyond the last k messages.

In [20]:

from pydantic import BaseModel, Field
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.messages import BaseMessage

class BufferWindowMessageHistory(BaseChatMessageHistory, BaseModel):
    messages: list[BaseMessage] = Field(default_factory=list)
    k: int = Field(default_factory=int)

    def __init__(self, k: int):
        super().__init__(k=k)
        print(f"Initializing BufferWindowMessageHistory with k={k}")

    def add_messages(self, messages: list[BaseMessage]) -> None:
        """Add messages to the history, removing any messages beyond
        the last `k` messages.
        """
        self.messages.extend(messages)
        self.messages = self.messages[-self.k:]

    def clear(self) -> None:
        """Clear the history."""
        self.messages = []

In [21]:
chat_map = {}
def get_chat_history(session_id: str, k: int = 4) -> BufferWindowMessageHistory:
    print(f"get_chat_history called with session_id={session_id} and k={k}")
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = BufferWindowMessageHistory(k=k)
    # remove anything beyond the last
    return chat_map[session_id]

In [22]:
from langchain_core.runnables import ConfigurableFieldSpec

pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="The session ID to use for the chat history",
            default="id_default",
        ),
        ConfigurableFieldSpec(
            id="k",
            annotation=int,
            name="k",
            description="The number of messages to keep in the history",
            default=4,
        )
    ]
)

In [23]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is James"},
    config={"configurable": {"session_id": "id_k4", "k": 4}}
)

get_chat_history called with session_id=id_k4 and k=4
Initializing BufferWindowMessageHistory with k=4


AIMessage(content="Hello James! It's nice to meet you. I'm Zeta, your friendly AI assistant. How can I help you today? Do you have any questions or topics you'd like to discuss?", additional_kwargs={}, response_metadata={'model': 'llama3.2', 'created_at': '2025-05-14T06:59:43.1489054Z', 'done': True, 'done_reason': 'stop', 'total_duration': 3487673600, 'load_duration': 1943714100, 'prompt_eval_count': 40, 'prompt_eval_duration': 202329000, 'eval_count': 41, 'eval_duration': 1338860400, 'model_name': 'llama3.2'}, id='run--8d42cab5-3578-4f64-b259-68e9f95809a8-0', usage_metadata={'input_tokens': 40, 'output_tokens': 41, 'total_tokens': 81})

In [24]:
chat_map["id_k4"].clear()  # clear the history

# manually insert history
chat_map["id_k4"].add_user_message("Hi, my name is James")
chat_map["id_k4"].add_ai_message("I'm an AI model called Zeta.")
chat_map["id_k4"].add_user_message("I'm researching the different types of conversational memory.")
chat_map["id_k4"].add_ai_message("That's interesting, what are some examples?")
chat_map["id_k4"].add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
chat_map["id_k4"].add_ai_message("That's interesting, what's the difference?")
chat_map["id_k4"].add_user_message("Buffer memory just stores the entire conversation, right?")
chat_map["id_k4"].add_ai_message("That makes sense, what about ConversationBufferWindowMemory?")
chat_map["id_k4"].add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
chat_map["id_k4"].add_ai_message("Very cool!")

chat_map["id_k4"].messages

[HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='Buffer window memory stores the last k messages, dropping the rest.', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Very cool!', additional_kwargs={}, response_metadata={})]

In [25]:
pipeline_with_history.invoke(
    {"query": "what is my name again?"},
    config={"configurable": {"session_id": "id_k4", "k": 4}}
)

get_chat_history called with session_id=id_k4 and k=4


AIMessage(content="You didn't tell me your name, but I'm Zeta, your helpful assistant!", additional_kwargs={}, response_metadata={'model': 'llama3.2', 'created_at': '2025-05-14T07:00:56.6189014Z', 'done': True, 'done_reason': 'stop', 'total_duration': 913858900, 'load_duration': 19773800, 'prompt_eval_count': 97, 'prompt_eval_duration': 269164700, 'eval_count': 19, 'eval_duration': 622547400, 'model_name': 'llama3.2'}, id='run--b2139d01-8d6a-49a1-9b9d-ddb66b97cc54-0', usage_metadata={'input_tokens': 97, 'output_tokens': 19, 'total_tokens': 116})

In [26]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is James"},
    config={"session_id": "id_k14", "k": 14}
)

get_chat_history called with session_id=id_k14 and k=14
Initializing BufferWindowMessageHistory with k=14


AIMessage(content="Hello James! It's nice to meet you. I'm Zeta, your friendly AI assistant. How can I help you today? Do you have any questions or topics you'd like to discuss?", additional_kwargs={}, response_metadata={'model': 'llama3.2', 'created_at': '2025-05-14T07:01:05.7858787Z', 'done': True, 'done_reason': 'stop', 'total_duration': 1628066300, 'load_duration': 18528000, 'prompt_eval_count': 40, 'prompt_eval_duration': 195842800, 'eval_count': 41, 'eval_duration': 1412630200, 'model_name': 'llama3.2'}, id='run--df21b23c-263f-4f66-ac22-57a1dfa44f0c-0', usage_metadata={'input_tokens': 40, 'output_tokens': 41, 'total_tokens': 81})

In [27]:
chat_map["id_k14"].add_user_message("I'm researching the different types of conversational memory.")
chat_map["id_k14"].add_ai_message("That's interesting, what are some examples?")
chat_map["id_k14"].add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
chat_map["id_k14"].add_ai_message("That's interesting, what's the difference?")
chat_map["id_k14"].add_user_message("Buffer memory just stores the entire conversation, right?")
chat_map["id_k14"].add_ai_message("That makes sense, what about ConversationBufferWindowMemory?")
chat_map["id_k14"].add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
chat_map["id_k14"].add_ai_message("Very cool!")

chat_map["id_k14"].messages

[HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}),
 AIMessage(content="Hello James! It's nice to meet you. I'm Zeta, your friendly AI assistant. How can I help you today? Do you have any questions or topics you'd like to discuss?", additional_kwargs={}, response_metadata={'model': 'llama3.2', 'created_at': '2025-05-14T07:01:05.7858787Z', 'done': True, 'done_reason': 'stop', 'total_duration': 1628066300, 'load_duration': 18528000, 'prompt_eval_count': 40, 'prompt_eval_duration': 195842800, 'eval_count': 41, 'eval_duration': 1412630200, 'model_name': 'llama3.2'}, id='run--df21b23c-263f-4f66-ac22-57a1dfa44f0c-0', usage_metadata={'input_tokens': 40, 'output_tokens': 41, 'total_tokens': 81}),
 HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
 AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
 HumanMessage(conten

In [28]:
pipeline_with_history.invoke(
    {"query": "what is my name again?"},
    config={"session_id": "id_k14", "k": 14}
)

get_chat_history called with session_id=id_k14 and k=14


AIMessage(content='Your name is James.', additional_kwargs={}, response_metadata={'model': 'llama3.2', 'created_at': '2025-05-14T07:01:40.1992502Z', 'done': True, 'done_reason': 'stop', 'total_duration': 453657400, 'load_duration': 21278900, 'prompt_eval_count': 216, 'prompt_eval_duration': 258053900, 'eval_count': 6, 'eval_duration': 170578000, 'model_name': 'llama3.2'}, id='run--4d9186fe-cdb8-4b21-8333-4c1b7f9c94a6-0', usage_metadata={'input_tokens': 216, 'output_tokens': 6, 'total_tokens': 222})

In [29]:
chat_map["id_k14"].messages

[HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}),
 AIMessage(content="Hello James! It's nice to meet you. I'm Zeta, your friendly AI assistant. How can I help you today? Do you have any questions or topics you'd like to discuss?", additional_kwargs={}, response_metadata={'model': 'llama3.2', 'created_at': '2025-05-14T07:01:05.7858787Z', 'done': True, 'done_reason': 'stop', 'total_duration': 1628066300, 'load_duration': 18528000, 'prompt_eval_count': 40, 'prompt_eval_duration': 195842800, 'eval_count': 41, 'eval_duration': 1412630200, 'model_name': 'llama3.2'}, id='run--df21b23c-263f-4f66-ac22-57a1dfa44f0c-0', usage_metadata={'input_tokens': 40, 'output_tokens': 41, 'total_tokens': 81}),
 HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
 AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
 HumanMessage(conten

# 3. ConversationSummaryMemory
Next up we have ConversationSummaryMemory, this memory type keeps track of a summary of the conversation rather than the entire conversation. This is useful for long conversations where we don't need to keep track of the entire conversation, but we do want to keep some thread of the full conversation.

As before, we'll start with the original memory class before reimplementing it with the RunnableWithMessageHistory class.

In [30]:
from langchain.memory import ConversationSummaryMemory

memory = ConversationSummaryMemory(llm=llm)

  memory = ConversationSummaryMemory(llm=llm)


In [31]:
chain = ConversationChain(
    llm=llm,
    memory = memory,
    verbose=True
)

In [32]:
chain.invoke({"input": "hello there my name is James"})
chain.invoke({"input": "I am researching the different types of conversational memory."})
chain.invoke({"input": "I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory."})
chain.invoke({"input": "Buffer memory just stores the entire conversation"})
chain.invoke({"input": "Buffer window memory stores the last k messages, dropping the rest."})



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: hello there my name is James
AI:[0m

[1m> Finished chain.[0m


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Current summary:
The human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good because it will help humans reach their full potential.

New lines of conversation:
Human: hello there 

{'input': 'Buffer window memory stores the last k messages, dropping the rest.',
 'history': 'This conversation appears to be a simulated discussion between a human and an artificial intelligence (AI) system called Ada. The conversation covers various topics related to conversational AI, including:\n\n1. **Conversational Memory**: Ada explains how she stores and retrieves information from previous conversations using techniques such as contextualized embeddings, attention mechanisms, and memory-augmented neural networks.\n2. **Architecture Differences**: The human asks about the differences between transformer-based models and recurrent neural networks in conversational AI, and Ada discusses their strengths and weaknesses.\n3. **Ambiguity and Uncertainty Handling**: The human asks how Ada handles ambiguity and uncertainty in conversations, and Ada explains that she uses probabilistic modeling, uncertainty estimation, and attention mechanisms to address these challenges.\n4. **Multi-Tur

In [33]:
chain.invoke({"input": "What is my name again?"})



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Current summary:
The human asks about buffer window memory in conversational AI. The AI explains that it uses a buffer window size of 5 to store the last 5 messages, allowing it to maintain context and coherence while preventing memory overload.

New lines of conversation:
Human: How does the buffer window affect the overall performance of the system?
AI: That's a great question! The buffer window can have both positive and negative effects on performance. On the one hand, it allows me to retain enough context to generate more accurate responses. On the other hand, if the buffer window is too large, it can lead to memory overload and decreased perf

{'input': 'What is my name again?',
 'history': "Current summary:\nThe human asks about buffer window memory in conversational AI. The AI explains that it uses a buffer window size of 5 to store the last 5 messages, allowing it to maintain context and coherence while preventing memory overload.\n\nNew lines of conversation:\nHuman: How does the buffer window affect the overall performance of the system?\nAI: That's a great question! The buffer window can have both positive and negative effects on performance. On the one hand, it allows me to retain enough context to generate more accurate responses. On the other hand, if the buffer window is too large, it can lead to memory overload and decreased performance.\n\nNew summary:\nThe human asks about the impact of the buffer window on conversational AI system performance. The AI explains that a buffer window size of 5 allows for a balance between retaining context and avoiding memory overload, but notes that the optimal size will depend on

As this information was stored in the summary the LLM successfully recalled our name. This may not always be the case, by summarizing the conversation we inevitably compress the full amount of information and so we may lose key details occasionally. Nonetheless, this is a great memory type for long conversations while retaining some key information.

# ConversationSummaryMemory with RunnableWithMessageHistory
Let's implement this memory type using the RunnableWithMessageHistory class. As with the window buffer memory, we need to define a custom implementation of the InMemoryChatMessageHistory class. We'll call this one ConversationSummaryMessageHistory.

In [34]:
from langchain_core.messages import SystemMessage

class ConversationSummaryMessageHistory(BaseChatMessageHistory, BaseModel):
    messages: list[BaseMessage] = Field(default_factory=list)
    llm: ChatOllama = Field(default_factory=lambda: ChatOllama(model="llama3.2", temperature=0.0))

    def __init__(self, llm: ChatOllama):
        super().__init__(llm=llm)
        self.llm = llm

    def add_messages(self, messages: list[BaseMessage]) -> None:
        """Add messages to the history, removing any messages beyond
        the last `k` messages.
        """
        self.messages.extend(messages)

        summary_prompt = ChatPromptTemplate.from_messages([
            SystemMessagePromptTemplate.from_template(
                "Given the existing conversation summary and the new messages, "
                "generate a new summary of the conversation. Ensuring to maintain "
                "as much relevant information as possible."
            ),
            HumanMessagePromptTemplate.from_template(
                "Existing conversation summary:\n{existing_summary}\n\n"
                "New messages:\n{messages}"
            )
        ])

        new_summary = self.llm.invoke(
            summary_prompt.format_messages(
                existing_summary="\n".join([msg.content for msg in self.messages if isinstance(msg, SystemMessage)]),
                messages="\n".join([msg.content for msg in messages])
            )
        )

        self.messages = [SystemMessage(content=new_summary.content)]

    def clear(self) -> None:
        self.messages = []


In [35]:
chat_map = {}
def get_chat_history(session_id: str, llm: ChatOllama) -> ConversationSummaryMessageHistory:
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = ConversationSummaryMessageHistory(llm=llm)
    # return the chat history
    return chat_map[session_id]

In [37]:
pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="The session ID to use for the chat history",
            default="id_default",
        ),
        ConfigurableFieldSpec(
            id="llm",
            annotation=ChatOllama,
            name="LLM",
            description="The LLM to use for the conversation summary",
            default=llm,
        )
    ]
)

In [38]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is James"},
    config={"session_id": "id_123", "llm": llm}
)

AIMessage(content="Hello James! It's nice to meet you. I'm Zeta, your friendly AI assistant. How can I help you today? Do you have any questions or topics you'd like to discuss?", additional_kwargs={}, response_metadata={'model': 'llama3.2', 'created_at': '2025-05-14T07:17:15.987468Z', 'done': True, 'done_reason': 'stop', 'total_duration': 4741794600, 'load_duration': 3167946600, 'prompt_eval_count': 40, 'prompt_eval_duration': 191801500, 'eval_count': 41, 'eval_duration': 1380681400, 'model_name': 'llama3.2'}, id='run--f648f49a-a17f-46b5-9a3e-0117e3c4a469-0', usage_metadata={'input_tokens': 40, 'output_tokens': 41, 'total_tokens': 81})

In [39]:
chat_map["id_123"].messages

[SystemMessage(content='Here is a new summary of the conversation:\n\nThe conversation started with a greeting from an unknown user who introduced themselves as James. The response was a warm welcome from Zeta, the friendly AI assistant, who offered assistance and asked if James had any questions or topics they would like to discuss.', additional_kwargs={}, response_metadata={})]

In [40]:
pipeline_with_history.invoke(
    {"query": "I'm researching the different types of conversational memory."},
    config={"session_id": "id_123", "llm": llm}
)

chat_map["id_123"].messages

[SystemMessage(content='Here is a new summary of the conversation:\n\nThe conversation started with a greeting from an unknown user who introduced themselves as James. Zeta, the friendly AI assistant, responded with a warm welcome and offered assistance. James then expressed interest in researching different types of conversational memory.\n\nZeta provided an overview of various types of conversational memory, including short-term, long-term, contextual, and stateful memory. The conversation also touched on techniques being explored to improve these types of memory in conversational AI, such as memory-augmented neural networks, reinforcement learning, and graph-based memory architectures.\n\nThe user asked which specific type of conversational memory they would like to explore further, but no particular type was specified at the time of the inquiry.', additional_kwargs={}, response_metadata={})]

In [42]:
for msg in [
    "I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory.",
    "Buffer memory just stores the entire conversation",
    "Buffer window memory stores the last k messages, dropping the rest."
]:
    pipeline_with_history.invoke(
        {"query": msg},
        config={"session_id": "id_123", "llm": llm}
    )

In [43]:
chat_map["id_123"].messages

[SystemMessage(content='Here is a new summary of the conversation:\n\nThe conversation began with an introduction from James, who expressed interest in researching different types of conversational memory. Zeta provided an overview of various types of conversational memory, including short-term, long-term, contextual, and stateful memory.\n\nJames then mentioned their interest in exploring ConversationBufferMemory (CBM) and ConversationBufferWindowMemory (CBWM), two types of buffer-based memory architectures designed for conversational AI systems. Zeta provided a brief overview of both architectures, highlighting their strengths and weaknesses.\n\nHowever, it was clarified that buffer-based memory systems do not store the entire conversation at once, but rather use techniques such as windowing, buffering, and contextualization to manage and retrieve relevant information from the conversation history.\n\nKey aspects of how these systems work include:\n\n* Windowing: using a sliding wind

In [44]:
pipeline_with_history.invoke(
    {"query": "What is my name again?"},
    config={"session_id": "id_123", "llm": llm}
)

AIMessage(content="I don't have any information about your name from our previous conversation. You initially introduced yourself as James, but I don't recall any other details about you. If you'd like to share more about yourself, I'm here to help!", additional_kwargs={}, response_metadata={'model': 'llama3.2', 'created_at': '2025-05-14T07:21:48.5357751Z', 'done': True, 'done_reason': 'stop', 'total_duration': 2723158300, 'load_duration': 23707300, 'prompt_eval_count': 1050, 'prompt_eval_duration': 924535200, 'eval_count': 49, 'eval_duration': 1771396100, 'model_name': 'llama3.2'}, id='run--a508eab8-f4c7-4992-9336-edc4ecc0ac2e-0', usage_metadata={'input_tokens': 1050, 'output_tokens': 49, 'total_tokens': 1099})

# 4. ConversationSummaryBufferMemory
Our final memory type acts as a combination of ConversationSummaryMemory and ConversationBufferMemory. It keeps the buffer for the conversation up until the previous n tokens, anything beyond that limit is summarized then dropped from the buffer. Producing something like:

# ~~ a summary of previous interactions
The user named James introduced himself and the AI responded, introducing itself as an AI model called Zeta.
James then said he was researching the different types of conversational memory and Zeta asked for some
examples.
# ~~ the most recent messages
Human: I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory.
AI: That's interesting, what's the difference?
Human: Buffer memory just stores the entire conversation
AI: That makes sense, what about ConversationBufferWindowMemory?
Human: Buffer window memory stores the last k messages, dropping the rest.
AI: Very cool!

In [45]:
from langchain.memory import ConversationSummaryBufferMemory

memory = ConversationSummaryBufferMemory(
    llm=llm,
    max_token_limit=300,
    return_messages=True
)

  memory = ConversationSummaryBufferMemory(


In [46]:
chain = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True
)

In [47]:
chain.invoke({"input": "Hi, my name is James"})



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
[]
Human: Hi, my name is James
AI:[0m


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]


[1m> Finished chain.[0m


{'input': 'Hi, my name is James',
 'history': [HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}),
  AIMessage(content="Nice to meet you, James! I'm an artificial intelligence designed to assist and communicate with humans in a helpful and informative way. My name is Ada, by the way - named after Ada Lovelace, who's considered one of the pioneers in computer science. I've been trained on a vast amount of text data from various sources, including books, articles, and websites, which allows me to provide answers to a wide range of questions on topics like history, science, technology, literature, and more. What brings you here today, James?", additional_kwargs={}, response_metadata={})],
 'response': "Nice to meet you, James! I'm an artificial intelligence designed to assist and communicate with humans in a helpful and informative way. My name is Ada, by the way - named after Ada Lovelace, who's considered one of the pioneers in computer science. I'v

In [48]:
for msg in [
    "I'm researching the different types of conversational memory.",
    "I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory.",
    "Buffer memory just stores the entire conversation",
    "Buffer window memory stores the last k messages, dropping the rest."
]:
    chain.invoke({"input": msg})



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
[HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}), AIMessage(content="Nice to meet you, James! I'm an artificial intelligence designed to assist and communicate with humans in a helpful and informative way. My name is Ada, by the way - named after Ada Lovelace, who's considered one of the pioneers in computer science. I've been trained on a vast amount of text data from various sources, including books, articles, and websites, which allows me to provide answers to a wide range of questions on topics like history, science, technology, literature, and more. What brings you here today, James?", additional_kwargs

# ConversationSummaryBufferMemory with RunnableWithMessageHistory
As with the previous memory types, we will implement this memory type again using the RunnableWithMessageHistory class. In our implementation we will modify the buffer window to be based on the number of messages rather than number of tokens. This tweak will make our implementation more closely aligned with original buffer window.

We will implement all of this via a new ConversationSummaryBufferMessageHistory class.

In [50]:
class ConversationSummaryBufferMessageHistory(BaseChatMessageHistory, BaseModel):
    messages: list[BaseMessage] = Field(default_factory=list)
    llm: ChatOllama = Field(default_factory=ChatOllama)
    k: int = Field(default_factory=int)

    def __init__(self, llm: ChatOllama, k: int):
        super().__init__(llm=llm, k=k)

    def add_messages(self, messages: list[BaseMessage]) -> None:
        """Add messages to the history, removing any messages beyond
        the last `k` messages and summarizing the messages that we
        drop.
        """
        existing_summary: SystemMessage | None = None
        old_messages: list[BaseMessage] | None = None
        # see if we already have a summary message
        if len(self.messages) > 0 and isinstance(self.messages[0], SystemMessage):
            print(">> Found existing summary")
            existing_summary = self.messages.pop(0)
        # add the new messages to the history
        self.messages.extend(messages)
        # check if we have too many messages
        if len(self.messages) > self.k:
            print(
                f">> Found {len(self.messages)} messages, dropping "
                f"oldest {len(self.messages) - self.k} messages.")
            # pull out the oldest messages...
            old_messages = self.messages[:self.k]
            # ...and keep only the most recent messages
            self.messages = self.messages[-self.k:]
        if old_messages is None:
            print(">> No old messages to update summary with")
            # if we have no old_messages, we have nothing to update in summary
            return
        # construct the summary chat messages
        summary_prompt = ChatPromptTemplate.from_messages([
            SystemMessagePromptTemplate.from_template(
                "Given the existing conversation summary and the new messages, "
                "generate a new summary of the conversation. Ensuring to maintain "
                "as much relevant information as possible."
            ),
            HumanMessagePromptTemplate.from_template(
                "Existing conversation summary:\n{existing_summary}\n\n"
                "New messages:\n{old_messages}"
            )
        ])
        # format the messages and invoke the LLM
        new_summary = self.llm.invoke(
            summary_prompt.format_messages(
                existing_summary=existing_summary,
                old_messages=old_messages
            )
        )
        print(f">> New summary: {new_summary.content}")
        # prepend the new summary to the history
        self.messages = [SystemMessage(content=new_summary.content)] + self.messages

    def clear(self) -> None:
        """Clear the history."""
        self.messages = []

In [52]:
chat_map = {}
def get_chat_history(session_id: str, llm: ChatOllama, k: int) -> ConversationSummaryBufferMessageHistory:
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = ConversationSummaryBufferMessageHistory(llm=llm, k=k)
    # return the chat history
    return chat_map[session_id]

In [53]:
pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="The session ID to use for the chat history",
            default="id_default",
        ),
        ConfigurableFieldSpec(
            id="llm",
            annotation=ChatOllama,
            name="LLM",
            description="The LLM to use for the conversation summary",
            default=llm,
        ),
        ConfigurableFieldSpec(
            id="k",
            annotation=int,
            name="k",
            description="The number of messages to keep in the history",
            default=4,
        )
    ]
)

In [54]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is James"},
    config={"session_id": "id_123", "llm": llm, "k": 4}
)
chat_map["id_123"].messages

>> No old messages to update summary with


[HumanMessage(content='Hi, my name is James', additional_kwargs={}, response_metadata={}),
 AIMessage(content="Hello James! It's nice to meet you. I'm Zeta, your friendly AI assistant. How can I help you today? Do you have any questions or topics you'd like to discuss?", additional_kwargs={}, response_metadata={'model': 'llama3.2', 'created_at': '2025-05-14T07:27:23.4661799Z', 'done': True, 'done_reason': 'stop', 'total_duration': 1767583100, 'load_duration': 27600300, 'prompt_eval_count': 40, 'prompt_eval_duration': 338259800, 'eval_count': 41, 'eval_duration': 1399897800, 'model_name': 'llama3.2'}, id='run--c0079ae5-319a-4fbe-9475-fb2610423623-0', usage_metadata={'input_tokens': 40, 'output_tokens': 41, 'total_tokens': 81})]

In [56]:
for i, msg in enumerate([
    "I'm researching the different types of conversational memory.",
    "I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory.",
    "Buffer memory just stores the entire conversation",
    "Buffer window memory stores the last k messages, dropping the rest."
]):
    print(f"---\nMessage {i+1}\n---\n")
    pipeline_with_history.invoke(
        {"query": msg},
        config={"session_id": "id_123", "llm": llm, "k": 4}
    )

---
Message 1
---

>> No old messages to update summary with
---
Message 2
---

>> Found 6 messages, dropping oldest 2 messages.
>> New summary: Here is a new summary of the conversation:

The conversation started with a human message from James, introducing himself. Zeta, the AI assistant, responded with a friendly greeting and asked how she could help James.

James then expressed interest in researching conversational memory, which led to a detailed explanation by Zeta about different types of conversational memory. The main types mentioned were:

1. Short-term conversational memory (up to 10-15 seconds)
2. Working memory
3. Long-term conversational memory
4. Episodic memory

Zeta also discussed specialized types of conversational memory, including contextual memory and pragmatic memory.

The conversation is ongoing, with Zeta asking James which aspect of conversational memory he would like to explore further.
---
Message 3
---

>> Found existing summary
>> Found 6 messages, dropping