# Conversational Memory

Conversational memory allows our chatbots and agents to remember previous interactions within a conversation. Without conversational memory, our chatbots would only ever be able to respond to the last message they received, essentially forgetting all previous messages with each new message.

Naturally, conversations require our chatbots to be able to respond over multiple interactions and refer to previous messages to understand the context of the conversation.

In [2]:
import os
from getpass import getpass
from dotenv import load_dotenv

# Load variables from .env into environment
load_dotenv()

os.environ['LANGSMITH_TRACING'] = 'true'
os.environ['LANGSMITH_ENDPOINT'] = "https://eu.api.smith.langchain.com "
os.environ['LANGSMITH_API_KEY'] =  os.getenv('LANGSMITH_API_KEY') or getpass('Enter your LangSmith API Key: ')
os.environ['LANGSMITH_PROJECT'] = 'LangChain-LangSmith-Demo'

In [4]:
os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY") or getpass(
    "Enter GOOGLE API Key: "
)

In [5]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",      
    temperature=0.0)

## LangChain's Memory Types

LangChain versions `0.0.x` consisted of various conversational memory types. Most of these are due for deprecation but still hold value in understanding the different approaches that we can take to building conversational memory.

Throughout the notebook we will be referring to these _older_ memory types and then rewriting them using the recommended `RunnableWithMessageHistory` class. We will learn about:

* `ConversationBufferMemory`: the simplest and most intuitive form of conversational memory, keeping track of a conversation without any additional bells and whistles.
* `ConversationBufferWindowMemory`: similar to `ConversationBufferMemory`, but only keeps track of the last `k` messages.
* `ConversationSummaryMemory`: rather than keeping track of the entire conversation, this memory type keeps track of a summary of the conversation.
* `ConversationSummaryBufferMemory`: merges the `ConversationSummaryMemory` and `ConversationTokenBufferMemory` types.

We'll work through each of these memory types in turn, and rewrite each one using the `RunnableWithMessageHistory` class.

## 1. `ConversationBufferMemory`

`ConversationBufferMemory` is the simplest form of conversational memory, it is _literally_ just a place that we store messages, and then use to feed messages into our LLM.

### `ConversationBufferMemory` with `RunnableWithMessageHistory`

The `ConversationBufferMemory` type is deprecated. Instead, we can use the `RunnableWithMessageHistory` class to implement the same functionality.

When implementing `RunnableWithMessageHistory` we will use **L**ang**C**hain **E**xpression **L**anguage (LCEL) and for this we need to define our prompt template and LLM components. Our `llm` has already been defined, so now we just define a `ChatPromptTemplate` object.

In [6]:
from langchain_core.prompts import SystemMessagePromptTemplate, HumanMessagePromptTemplate, ChatPromptTemplate, MessagesPlaceholder

system_prompt = " You are a helpful assistant called Jarvis."

prompt_template = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(system_prompt),
    MessagesPlaceholder(variable_name="history"),
    HumanMessagePromptTemplate.from_template("{query}")

])

In [7]:
prompt_template

ChatPromptTemplate(input_variables=['history', 'query'], input_types={'history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessageChunk')], typing.Annotated[langchain_core.messages.human.HumanMessageChunk, Tag(tag='HumanMessageChunk')], typing.Annotated[langchain_core.messages.chat.ChatMessageChunk, Tag(tag='ChatMessageChunk')], typing.Annotated[langchain_core.messages.system.SystemMessageChunk, Tag(tag='SystemMessageChunk')], typing.Annotated[langchain_c

In [8]:
prompt_template.input_variables

['history', 'query']

In [10]:
prompt_template.messages

[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], input_types={}, partial_variables={}, template=' You are a helpful assistant called Jarvis.'), additional_kwargs={}),
 MessagesPlaceholder(variable_name='history'),
 HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['query'], input_types={}, partial_variables={}, template='{query}'), additional_kwargs={})]

We can link our `prompt_template` and our `llm` together to create a pipeline via LCEL.

In [11]:
pipeline = prompt_template | llm

Our `RunnableWithMessageHistory` requires our `pipeline` to be wrapped in a `RunnableWithMessageHistory` object. This object requires a few input parameters. One of those is `get_session_history`, which requires a function that returns a `ChatMessageHistory` object based on a session ID. We define this function ourselves:

In [12]:
from langchain_core.chat_history import InMemoryChatMessageHistory

chat_map = {}
def get_chat_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in chat_map:
        chat_map[session_id] = InMemoryChatMessageHistory()
    return chat_map[session_id]

We also need to tell our runnable which variable name to use for the chat history (ie `history`) and which to use for the user's query (ie `query`).

In [13]:
from langchain_core.runnables import RunnableWithMessageHistory

pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key = "query",
    history_messages_key="history"
)

Now we invoke our runnable:

In [14]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is Sujal."},
    config={"session_id": "id_123"}
)

AIMessage(content="Hi Sujal! It's nice to meet you. How can I help you today?", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'grounding_metadata': {}, 'model_provider': 'google_genai'}, id='lc_run--a642589d-c04d-4774-982f-c038a0afbf36-0', usage_metadata={'input_tokens': 17, 'output_tokens': 44, 'total_tokens': 61, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 25}})

Our chat history will now be memorized and retrieved whenever we invoke our runnable with the same session ID.

In [15]:
pipeline_with_history.invoke(
    {"query": "What is my name again?"},
    config={"session_id": "id_123"}
)

AIMessage(content='Your name is Sujal.', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'grounding_metadata': {}, 'model_provider': 'google_genai'}, id='lc_run--d8df0388-e86f-429b-8d6c-00ebff727c2f-0', usage_metadata={'input_tokens': 44, 'output_tokens': 33, 'total_tokens': 77, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 27}})

In [16]:
chat_map

{'id_123': InMemoryChatMessageHistory(messages=[HumanMessage(content='Hi, my name is Sujal.', additional_kwargs={}, response_metadata={}), AIMessage(content="Hi Sujal! It's nice to meet you. How can I help you today?", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'grounding_metadata': {}, 'model_provider': 'google_genai'}, id='lc_run--a642589d-c04d-4774-982f-c038a0afbf36-0', usage_metadata={'input_tokens': 17, 'output_tokens': 44, 'total_tokens': 61, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 25}}), HumanMessage(content='What is my name again?', additional_kwargs={}, response_metadata={}), AIMessage(content='Your name is Sujal.', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'g

We have now recreated the `ConversationBufferMemory` type using the `RunnableWithMessageHistory`

* RunnableWithMessageHistory: responsible for appending the chats 
* InMemoryChatMessageHistory: stores/appends when called, but does not manage the workflow itself

A normal call to llm just to check if it knows my name by default

In [17]:
llm.invoke("What is my name?")

AIMessage(content="I don't know your name. As an AI, I don't have access to personal information about you.", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'grounding_metadata': {}, 'model_provider': 'google_genai'}, id='lc_run--6bc3025e-06f8-4f60-86a7-772d427167f6-0', usage_metadata={'input_tokens': 6, 'output_tokens': 503, 'total_tokens': 509, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 479}})

## 2. `ConversationBufferWindowMemory`

The `ConversationBufferWindowMemory` type is similar to `ConversationBufferMemory`, but only keeps track of the last `k` messages. There are a few reasons why we would want to keep only the last `k` messages:

* More messages mean more tokens are sent with each request, more tokens increases latency _and_ cost.

* LLMs tend to perform worse when given more tokens, making them more likely to deviate from instructions, hallucinate, or _"forget"_ information provided to them. Conciseness is key to high performing LLMs.

* If we keep _all_ messages we will eventually hit the LLM's context window limit, by adding a window size `k` we can ensure we never hit this limit.

The buffer window solves many problems that we encounter with the standard buffer memory, while still being a very simple and intuitive form of conversational memory.

### `ConversationBufferWindowMemory` with `RunnableWithMessageHistory`

To implement this memory type using the `RunnableWithMessageHistory` class, we can use the same approach as before. We define our `prompt_template` and `llm` as before, and then wrap our pipeline in a `RunnableWithMessageHistory` object.

For the window feature, we need to define a custom version of the `InMemoryChatMessageHistory` class that removes any messages beyond the last `k` messages.

In [42]:
from pydantic import BaseModel, Field
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.messages import BaseMessage

class BufferWindowMessageHistory(BaseChatMessageHistory, BaseModel):
    """Chat message history that stores a window of messages."""

    messages: list[BaseModel] = Field(default_factory=list)
    k: int = Field(default_factory=int)

    def __init__(self, k: int):
        super().__init__(k=k)
        print(f"Initializing BufferWindowMessageHistory with k={k}")

    def add_messages(self, messages: list[BaseMessage]) -> None:
        """Add messages to chat history."""
        self.messages.extend(messages)
        # Keep only the last k messages
        self.messages = self.messages[-self.k :]

    def clear(self) -> None:
        """Clear the chat history."""
        self.messages = []

In [52]:
chat_map = {}

def get_chat_history(session_id: str, k: int = 4) -> BufferWindowMessageHistory:
    print(f"get_chat_history called with session_id={session_id} and k={k}")
    if session_id not in chat_map:
        chat_map[session_id] = BufferWindowMessageHistory(k=k)
    return chat_map[session_id]

In [53]:
from langchain_core.runnables import ConfigurableFieldSpec

pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history",
    history_factory_config=[              #We neet to tell the runnable about the new config parameters i.e.k and we just explicitly define session_id 
        ConfigurableFieldSpec(
            id = "session_id",
            annotation = str,
            name = "Session ID",
            description = "The session ID to use for retrieving chat history.",
            default = "default_id"
        ),
        ConfigurableFieldSpec(
            id = "k",
            annotation = int,
            name = "Window Size",
            description = "The number of messages to retain in the chat history window.",
            default = 4
        )
    ]
)

In [54]:
pipeline_with_history.invoke(
    {"query": "Hello my name is Sujal."},
    config={"configurable": {"session_id": "user_1", "k": 4}}
)

get_chat_history called with session_id=user_1 and k=4
Initializing BufferWindowMessageHistory with k=4


AIMessage(content="Hello Sujal! It's nice to meet you.\n\nHow can I help you today?", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'grounding_metadata': {}, 'model_provider': 'google_genai'}, id='lc_run--a2628e98-e51c-4674-8668-dc8bc19c735c-0', usage_metadata={'input_tokens': 16, 'output_tokens': 37, 'total_tokens': 53, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 17}})

We can also modify the messages that are stored in memory by modifying the records inside the `chat_map` dictionary directly.

In [55]:
chat_map['user_1'].clear()

# manually insert history
chat_map['user_1'].add_user_message("Hello, my name is Sujal.")
chat_map['user_1'].add_ai_message("I'm an AI model called Zeta.")
chat_map["user_1"].add_user_message("I'm researching the different types of conversational memory.")
chat_map["user_1"].add_ai_message("That's interesting, what are some examples?")
chat_map["user_1"].add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
chat_map["user_1"].add_ai_message("That's interesting, what's the difference?")
chat_map["user_1"].add_user_message("Buffer memory just stores the entire conversation, right?")
chat_map["user_1"].add_ai_message("That makes sense, what about ConversationBufferWindowMemory?")
chat_map["user_1"].add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
chat_map["user_1"].add_ai_message("Very cool!")

chat_map["user_1"].messages

[HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='Buffer window memory stores the last k messages, dropping the rest.', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Very cool!', additional_kwargs={}, response_metadata={})]

Now let's see at which `k` value our LLM remembers our name — from the above we can already see that with `k=4` our name is not mentioned, so when running with `k=4` we should expect the LLM to forget our name:

In [56]:
pipeline_with_history.invoke(
    {"query": "Whart is my name again?"},
    config={"configurable": {"session_id": "user_1", "k": 4}}
)

get_chat_history called with session_id=user_1 and k=4


AIMessage(content="I don't know your name. As an AI, I don't have memory of past conversations or personal information about you.", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'grounding_metadata': {}, 'model_provider': 'google_genai'}, id='lc_run--ab8cb8e3-05b8-4f69-bd25-dd4cbf96e8c2-0', usage_metadata={'input_tokens': 57, 'output_tokens': 60, 'total_tokens': 117, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 33}})

Now let's initialize a new session with `k=14`.

In [57]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is Sujal."},
    config={"configurable": {"session_id": "user_2", "k": 14}}
    )

get_chat_history called with session_id=user_2 and k=14
Initializing BufferWindowMessageHistory with k=14


AIMessage(content="Hi Sujal! It's nice to meet you. How can I help you today?", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'grounding_metadata': {}, 'model_provider': 'google_genai'}, id='lc_run--205b94ab-fbf6-4fd7-8ff9-26d67aaca741-0', usage_metadata={'input_tokens': 17, 'output_tokens': 44, 'total_tokens': 61, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 25}})

We'll manually insert the remaining messages as before:

In [58]:
chat_map["user_2"].add_user_message("I'm researching the different types of conversational memory.")
chat_map["user_2"].add_ai_message("That's interesting, what are some examples?")
chat_map["user_2"].add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
chat_map["user_2"].add_ai_message("That's interesting, what's the difference?")
chat_map["user_2"].add_user_message("Buffer memory just stores the entire conversation, right?")
chat_map["user_2"].add_ai_message("That makes sense, what about ConversationBufferWindowMemory?")
chat_map["user_2"].add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
chat_map["user_2"].add_ai_message("Very cool!")

chat_map["user_2"].messages

[HumanMessage(content='Hi, my name is Sujal.', additional_kwargs={}, response_metadata={}),
 AIMessage(content="Hi Sujal! It's nice to meet you. How can I help you today?", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'grounding_metadata': {}, 'model_provider': 'google_genai'}, id='lc_run--205b94ab-fbf6-4fd7-8ff9-26d67aaca741-0', usage_metadata={'input_tokens': 17, 'output_tokens': 44, 'total_tokens': 61, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 25}}),
 HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
 AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
 HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwar

Now let's see if the LLM remembers our name:

In [59]:
pipeline_with_history.invoke(
    {"query": "What is my name again?"},
    config={"configurable": {"session_id": "user_2", "k": 14}}
    )

get_chat_history called with session_id=user_2 and k=14


AIMessage(content='Your name is Sujal.', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'grounding_metadata': {}, 'model_provider': 'google_genai'}, id='lc_run--fe612b62-91af-4553-a25b-948504c02321-0', usage_metadata={'input_tokens': 136, 'output_tokens': 42, 'total_tokens': 178, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 36}})

In [60]:
chat_map['user_2'].messages

[HumanMessage(content='Hi, my name is Sujal.', additional_kwargs={}, response_metadata={}),
 AIMessage(content="Hi Sujal! It's nice to meet you. How can I help you today?", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'grounding_metadata': {}, 'model_provider': 'google_genai'}, id='lc_run--205b94ab-fbf6-4fd7-8ff9-26d67aaca741-0', usage_metadata={'input_tokens': 17, 'output_tokens': 44, 'total_tokens': 61, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 25}}),
 HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
 AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
 HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwar

That's it! We've rewritten our buffer window memory using the recommended `RunnableWithMessageHistory` class.

## 3. `ConversationSummaryMemory`

This memory type keeps track of a summary of the conversation rather than the entire conversation. This is useful for long conversations where we don't need to keep track of the entire conversation, but we do want to keep some thread of the full conversation.

### `ConversationSummaryMemory` with `RunnableWithMessageHistory`

Let's implement this memory type using the `RunnableWithMessageHistory` class. As with the window buffer memory, we need to define a custom implementation of the `InMemoryChatMessageHistory` class. We'll call this one `ConversationSummaryMessageHistory`.

In [118]:
from langchain_core.messages import SystemMessage

class ConversationSummaryMessageHistory(BaseChatMessageHistory, BaseModel):
    """Chat message history that keeps a summary of the conversation."""

    messages: list[BaseModel] = Field(default_factory=list)
    llm: ChatGoogleGenerativeAI = Field(default_factory=ChatGoogleGenerativeAI)

    def __init__(self, llm: ChatGoogleGenerativeAI):
        super().__init__(llm=llm)
    
    def add_messages(self, messages: list[BaseMessage]) -> None:
        """Create a summary."""

        self.messages.extend(messages)
        summary_promt = ChatPromptTemplate.from_messages([
            SystemMessagePromptTemplate.from_template(
                "Given the existing conversation summary and the new messages, "
                "generate a new summary of the conversation. Ensuring to maintain "
                "as much relevant information as possible."
            ),
            HumanMessagePromptTemplate.from_template(
                "Existing Conversation Summary: {existing_summary}" \
                "\n New Messages: {messages}"
            )
        ])
        # format the messages and invoke the LLM
        new_summary = llm.invoke(
            summary_promt.format_messages(
                #existing_summary=self.messages[0].content if self.messages else "",
                existing_summary=self.messages,
                messages= [msg.content for msg in messages]
            )
        )
        # replace the existing history with a single system summary message
        self.messages = [SystemMessage(content=new_summary.content)]

    def clear(self) -> None:
        """Clear the chat history."""
        self.messages = []



In [119]:
chat_map = {}

def get_chat_history(session_id: str, llm: ChatGoogleGenerativeAI) -> ConversationSummaryMessageHistory:
    if session_id not in chat_map:
        chat_map[session_id] = ConversationSummaryMessageHistory(llm=llm)
    return chat_map[session_id]

In [120]:
pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key = "query",
    history_messages_key="history",
    history_factory_config=[              
        ConfigurableFieldSpec(
            id = "session_id",
            annotation = str,
            name = "Session ID",
            description = "The session ID to use for retrieving chat history.",
            default = "default_id"
        ),

        ConfigurableFieldSpec(
            id = "llm",
            annotation = ChatGoogleGenerativeAI,
            name = "LLM",
            description = "The LLM to use for generating conversation summaries.",
            default = llm
        )
    ]
)

Now we invoke our runnable, this time passing a `llm` parameter via the `config` parameter.

In [121]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is Sujal"},
    config={"session_id": "id_123", "llm": llm}
)

AIMessage(content="Hi Sujal, it's nice to meet you! How can I help you today?", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'grounding_metadata': {}, 'model_provider': 'google_genai'}, id='lc_run--38a7e6f8-6b43-496f-88ed-fa7274af407c-0', usage_metadata={'input_tokens': 16, 'output_tokens': 36, 'total_tokens': 52, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 17}})

Let's see what summary was generated:

In [122]:
chat_map['id_123'].messages

[SystemMessage(content='The conversation begins with Sujal introducing themselves. The AI responds by greeting Sujal and asking how it can help.', additional_kwargs={}, response_metadata={})]

Let's continue the conversation and see if the summary is updated:

In [123]:
pipeline_with_history.invoke(
    {"query": "I'm researching the different types of conversational memory."},
    config={"session_id": "id_123", "llm": llm}
)

AIMessage(content='That\'s a fascinating area of study! Conversational memory is crucial for effective communication and understanding.\n\nWhen we talk about "conversational memory," we can approach it from a few different angles, often overlapping. Are you interested primarily in:\n\n1.  **Human cognitive memory during conversations?** (e.g., how people remember what was said, who said it, the context, etc.)\n2.  **Memory in AI/chatbot systems?** (e.g., how AI maintains context and recalls past turns)\n3.  **Or perhaps something else?**\n\nAssuming you\'re looking at the **human cognitive aspect**, here are some common ways researchers categorize or discuss types of conversational memory:\n\n*   **Content Memory:** Remembering *what* was said – the specific words, facts, or propositions exchanged.\n*   **Source Memory:** Remembering *who* said what – attributing specific statements or ideas to the correct speaker. This is vital for understanding perspectives and credibility.\n*   **Co

In [124]:
chat_map['id_123'].messages

[SystemMessage(content='The conversation begins with Sujal introducing themselves. The AI responds by greeting Sujal and asking how it can help. Sujal states they are researching different types of conversational memory. The AI acknowledges the topic as fascinating and crucial, then asks if Sujal is interested primarily in human cognitive memory during conversations, memory in AI/chatbot systems, or something else.\n\nAssuming the human cognitive aspect, the AI provides a detailed list of common types of conversational memory:\n*   **Content Memory:** Remembering what was said.\n*   **Source Memory:** Remembering who said what.\n*   **Contextual Memory:** Remembering the circumstances of the conversation.\n*   **Episodic Memory:** Remembering the conversation as a specific event.\n*   **Semantic Memory:** Integrating the general gist or new knowledge gained.\n*   **Working Memory:** Holding recent utterances and context during the conversation.\n*   **Relational Memory:** Remembering r

So far so good! Let's continue with a few more messages before returning to the name question.

In [125]:
for msg in [
    "I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory.",
    "Buffer memory just stores the entire conversation",
    "Buffer window memory stores the last k messages, dropping the rest."
]:
    pipeline_with_history.invoke(
        {"query": msg},
        config={"session_id": "id_123", "llm": llm}
    )

Let's see the latest summary:

In [126]:
chat_map["id_123"].messages

[SystemMessage(content='The conversation began with Sujal introducing themselves and stating their research interest in different types of conversational memory. The AI responded by outlining various types of human cognitive conversational memory (Content, Source, Contextual, Episodic, Semantic, Working, and Relational Memory) and then asked Sujal to specify their interest.\n\nSujal clarified their interest in `ConversationBufferMemory` and `ConversationBufferWindowMemory`, which the AI identified as AI/chatbot system implementations, particularly common in frameworks like LangChain. The AI provided a detailed explanation of both:\n\n1.  **`ConversationBufferMemory`**: Stores the entire conversation history from the beginning, appending every new interaction. Its pros include complete context and simplicity, while cons include quickly exceeding LLM token limits, higher costs/latency, and potential for irrelevant information. It\'s likened to a full transcript.\n2.  **`ConversationBuffe

The information about our name has been maintained, so let's see if this is enough for our LLM to correctly recall our name.

In [127]:
pipeline_with_history.invoke(
    {"query": "What is my name again?"},
    config={"session_id": "id_123", "llm": llm}
)

AIMessage(content='Your name is Sujal.', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'grounding_metadata': {}, 'model_provider': 'google_genai'}, id='lc_run--788e1a78-852d-48f0-8826-8047f3ae6d4c-0', usage_metadata={'input_tokens': 436, 'output_tokens': 41, 'total_tokens': 477, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 35}})

Perfect! We've successfully implemented the `ConversationSummaryMemory` type using the `RunnableWithMessageHistory` class.

## 4. `ConversationSummaryBufferMemory`

Our final memory type acts as a combination of `ConversationSummaryMemory` and `ConversationBufferMemory`. It keeps the buffer for the conversation up until the previous `n` tokens, anything beyond that limit is summarized then dropped from the buffer. Producing something like:


```
# ~~ a summary of previous interactions
The user named Sujal introduced himself and the AI responded, introducing itself as an AI model called Jarvis.
Sujal then said he was researching the different types of conversational memory and Jarvis asked for some
examples.
# ~~ the most recent messages
Human: I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory.
AI: That's interesting, what's the difference?
Human: Buffer memory just stores the entire conversation
AI: That makes sense, what about ConversationBufferWindowMemory?
Human: Buffer window memory stores the last k messages, dropping the rest.
AI: Very cool!
```


### `ConversationSummaryBufferMemory` with `RunnableWithMessageHistory`

As with the previous memory types, we will implement this memory type again using the `RunnableWithMessageHistory` class. In our implementation we will modify the buffer window to be based on the number of messages rather than number of tokens. This tweak will make our implementation more closely aligned with original buffer window.

We will implement all of this via a new `ConversationSummaryBufferMessageHistory` class.

In [137]:
class ConversationSummaryBufferMessageHistory(BaseChatMessageHistory, BaseModel):
    """Chat message history that keeps a summary and a buffer of recent messages."""

    messages: list[BaseModel] = Field(default_factory=list)
    llm: ChatGoogleGenerativeAI = Field(default_factory=ChatGoogleGenerativeAI)
    k: int = Field(default_factory=int)

    def __init__(self, llm: ChatGoogleGenerativeAI, k: int):
        super().__init__(llm=llm, k=k)

    def add_messages(self, messages: list[BaseMessage]) -> None:
        """"Add messages to the history, removing any messages beyond
        the last `k` messages and summarizing the messages that we
        drop."""

        existing_summary: SystemMessage | None = None
        old_messages: list[BaseMessage] | None = None

        #see if we already have a summary message
        if len(self.messages) > 0 and isinstance(self.messages[0], SystemMessage):
            print(">> found existing summary")
            existing_summary = self.messages.pop(0)

        #add the new messages to the history
        self.messages.extend(messages)

        #check if we have more than k messages
        if len(self.messages) > self.k:
            print(
                f">> Found {len(self.messages)} messages, dropping oldest {len(self.messages) - self.k} messages"
            )

            #pull out the old messages beyond the last k
            old_messages = self.messages[:-self.k]
            #and keep only the last k messages
            self.messages = self.messages[-self.k :]

        if old_messages is None:
            print(">> No old messages to summarize, skipping summary step.")
            return
        
        #construct the summary chat messages
        summary_promt = ChatPromptTemplate.from_messages([
            SystemMessagePromptTemplate.from_template(
                "Given the existing conversation summary and the new messages, "
                "generate a new summary of the conversation. Ensuring to maintain "
                "as much relevant information as possible."
            ),
            HumanMessagePromptTemplate.from_template(
                "Existing Conversation Summary: {existing_summary}\n New Messages: {old_messages}"
            )
        ])
        # format the messages and invoke the LLM
        new_summary = llm.invoke(
            summary_promt.format_messages(
                existing_summary=existing_summary,
                old_messages= [msg.content for msg in old_messages]
            )
        )

        print(f">> New summary: {new_summary.content}")
        # replace the existing history with a single system summary message
        self.messages = [SystemMessage(content=new_summary.content)] + self.messages

    def clear(self) -> None:
        """Clear the chat history."""
        self.messages = []


Redefine the `get_chat_history` function to use our new `ConversationSummaryBufferMessageHistory` class.

In [138]:
chat_map = {}
def get_chat_history(session_id: str, llm: ChatGoogleGenerativeAI, k: int) -> ConversationSummaryBufferMessageHistory:
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = ConversationSummaryBufferMessageHistory(llm=llm, k=k)
    # return the chat history
    return chat_map[session_id]

Setup our pipeline with new configurable fields.

In [139]:
pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key = "query",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id = "session_id",
            annotation = str,
            name = "Session ID",
            description = "The session ID to use for retrieving chat history.",
            default = "default_id"
        ),

        ConfigurableFieldSpec(
            id = "llm",
            annotation = ChatGoogleGenerativeAI,
            name = "LLM",
            description = "The LLM to use for generating conversation summaries.",
            default = llm
        ),

        ConfigurableFieldSpec(
            id = "k",
            annotation = int,
            name = "Window Size",
            description = "The number of messages to retain in the chat history window.",
            default = 4
        )
    ]
)

In [140]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is Sujal."},
    config={"session_id": "id_123", "llm": llm, "k": 4}
)
chat_map["id_123"].messages

>> No old messages to summarize, skipping summary step.


[HumanMessage(content='Hi, my name is Sujal.', additional_kwargs={}, response_metadata={}),
 AIMessage(content="Hi Sujal! It's nice to meet you. How can I help you today?", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'grounding_metadata': {}, 'model_provider': 'google_genai'}, id='lc_run--7999be07-08ca-4bd4-9938-8781b5289de5-0', usage_metadata={'input_tokens': 17, 'output_tokens': 44, 'total_tokens': 61, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 25}})]

In [141]:
for i, msg in enumerate([
    "I'm researching the different types of conversational memory.",
    "I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory.",
    "Buffer memory just stores the entire conversation",
    "Buffer window memory stores the last k messages, dropping the rest."
]):
    print(f"---\nMessage {i+1}\n---\n")
    pipeline_with_history.invoke(
        {"query": msg},
        config={"session_id": "id_123", "llm": llm, "k": 4}
    )

---
Message 1
---

>> No old messages to summarize, skipping summary step.
---
Message 2
---

>> Found 6 messages, dropping oldest 2 messages
>> New summary: Sujal introduced himself, and the other participant greeted him, expressed pleasure in meeting him, and asked how they could help.
---
Message 3
---

>> found existing summary
>> Found 6 messages, dropping oldest 2 messages
>> New summary: Sujal introduced himself, and the other participant greeted him, expressed pleasure in meeting him, and asked how they could help. Sujal then stated he is researching the different types of conversational memory. The participant acknowledged this as a crucial area, explaining that conversational memory is a complex interplay of various memory systems, defined by the ability to process and retain information during a conversation, recall details from past conversations, and use that information to guide future interactions. They then detailed seven key types of memory that contribute to conversat

In [144]:
chat_map['id_123'].messages

[SystemMessage(content='Sujal introduced himself, stating he is researching different types of conversational memory. The other participant greeted him, defined conversational memory as a complex interplay of various memory systems (processing/retaining information, recalling past details, guiding future interactions), and detailed seven key human memory types that contribute to it: Working, Episodic, Semantic, Autobiographical, Prospective, Emotional, and Social/Relational Memory.\n\nTo provide more targeted information, the participant asked about Sujal\'s specific research angle. Sujal clarified that he is looking at "ConversationBufferMemory and ConversationBufferWindowMemory," indicating his focus is on **Large Language Models (LLMs)** and AI frameworks.\n\nThe participant then explained that in LLMs, "memory" refers to how the AI retains and utilizes past conversation turns, as LLMs are inherently stateless. They provided a detailed breakdown of the two memory types Sujal mention

There we go, we've successfully implemented the `ConversationSummaryBufferMemory` type using `RunnableWithMessageHistory`!