## Conversational Memory

In [1]:
import getpass
import os
from dotenv import load_dotenv

load_dotenv(override=True)

if "GOOGLE_API_KEY" not in os.environ:
    os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter your Google AI API key: ")



from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash-lite",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # other params...
)

In [2]:
messages = [
    ("system","You are a helpful assistant that translates English to Hindi. Translate the user sentence."),
    ("human", "I love programming.")
]

ai_msg = llm.invoke(messages)

ai_msg

AIMessage(content='मुझे प्रोग्रामिंग पसंद है।', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash-lite', 'safety_ratings': []}, id='run--6e316f4c-1280-48ea-a788-dc93d6cd307e-0', usage_metadata={'input_tokens': 20, 'output_tokens': 12, 'total_tokens': 32, 'input_token_details': {'cache_read': 0}})

In [None]:
llm.invoke("What was my last message?")

AIMessage(content='I am an AI and do not have memory of past conversations. Therefore, I cannot tell you what your last message was.', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash-lite', 'safety_ratings': []}, id='run--8a4a00ad-c6ee-4686-a94e-53f83cfdadea-0', usage_metadata={'input_tokens': 6, 'output_tokens': 26, 'total_tokens': 32, 'input_token_details': {'cache_read': 0}})

In [None]:
messages = [
    ("system","You are a helpful assistant. Provide short answers."),
    ("human", "Translate this sentence to hindi: I love programming."),
    ("ai", 'मुझे प्रोग्रामिंग पसंद है।'),
    ("human", "What was my last message?")
    

]

In [7]:
llm.invoke(messages)

AIMessage(content='You asked me to translate "I love programming" to Hindi.', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash-lite', 'safety_ratings': []}, id='run--251e4171-ac2f-4f40-ab13-1b3b5ae5f4fd-0', usage_metadata={'input_tokens': 37, 'output_tokens': 14, 'total_tokens': 51, 'input_token_details': {'cache_read': 0}})

---

Conversational memory allows our chatbots and agents to remember previous interactions within a conversation. Without conversational memory, our chatbots would only ever be able to respond to the last message they received, essentially forgetting all previous messages with each new message.



## LangChain's Memory Types

LangChain versions `0.0.x` consisted of various conversational memory types. Most of these are due for deprecation but still hold value in understanding the different approaches that we can take to building conversational memory.

Throughout the notebook we will be referring to these _older_ memory types and then rewriting them using the recommended `RunnableWithMessageHistory` class. We will learn about:

- `ConversationBufferMemory`: the simplest and most intuitive form of conversational memory, keeping track of a conversation without any additional bells and whistles.
- `ConversationBufferWindowMemory`: similar to `ConversationBufferMemory`, but only keeps track of the last `k` messages.
- `ConversationSummaryMemory`: rather than keeping track of the entire conversation, this memory type keeps track of a summary of the conversation.
- `ConversationSummaryBufferMemory`: merges the `ConversationSummaryMemory` and `ConversationTokenBufferMemory` types.

In [8]:
## Initialize the LLM

llm

ChatGoogleGenerativeAI(model='models/gemini-2.0-flash-lite', google_api_key=SecretStr('**********'), temperature=0.0, max_retries=2, client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x000002289B22BB50>, default_metadata=(), model_kwargs={})

## 1. `ConversationBufferMemory`

`ConversationBufferMemory` is the simplest form of conversational memory, it is _literally_ just a place that we store messages, and then use to feed messages into our LLM.

Let's start with LangChain's original `ConversationBufferMemory` object, we are setting `return_messages=True` to return the messages as a list of `ChatMessage` objects — unless using a non-chat model we would always set this to `True` as without it the messages are passed as a direct string which can lead to unexpected behavior from chat LLMs.

In [None]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(return_messages=True)

  memory = ConversationBufferMemory(return_messages=True)


There are several ways that we can add messages to our memory, using the `save_context` method we can add a user query (via the `input` key) and the AI's response (via the `output` key). So, to create the following conversation:

```

User: Hi, my name is Josh
AI: Hey Josh, what's up? I'm an AI model called Zeta.
User: I'm researching the different types of conversational memory.
AI: That's interesting, what are some examples?
User: I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.
AI: That's interesting, what's the difference?
User: Buffer memory just stores the entire conversation, right?
AI: That makes sense, what about ConversationBufferWindowMemory?
User: Buffer window memory stores the last k messages, dropping the rest.
AI: Very cool!


```




In [11]:
memory.save_context(
    {"input": "Hi, my name is Josh"},  # user message
    {"output": "Hey Josh, what's up? I'm an AI model called Zeta."}  # AI response
)
memory.save_context(
    {"input": "I'm researching the different types of conversational memory."},  # user message
    {"output": "That's interesting, what are some examples?"}  # AI response
)
memory.save_context(
    {"input": "I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory."},  # user message
    {"output": "That's interesting, what's the difference?"}  # AI response
)
memory.save_context(
    {"input": "Buffer memory just stores the entire conversation, right?"},  # user message
    {"output": "That makes sense, what about ConversationBufferWindowMemory?"}  # AI response
)
memory.save_context(
    {"input": "Buffer window memory stores the last k messages, dropping the rest."},  # user message
    {"output": "Very cool!"}  # AI response
)


Before using the memory, we need to load in any variables for that memory type — in this case, there are none, so we just pass an empty dictionary:

In [27]:
memory

ConversationBufferMemory(chat_memory=InMemoryChatMessageHistory(messages=[HumanMessage(content='Hi, my name is Josh', additional_kwargs={}, response_metadata={}), AIMessage(content="Hey Josh, what's up? I'm an AI model called Zeta.", additional_kwargs={}, response_metadata={}), HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}), AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}), HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}), AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}), HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}), AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_

In [14]:
his = memory.load_memory_variables({})

In [15]:
type(his)

dict

In [19]:
convs = his['history']

In [None]:
type(convs)

list

In [22]:
for conv in convs:
    print(conv.content)

Hi, my name is Josh
Hey Josh, what's up? I'm an AI model called Zeta.
I'm researching the different types of conversational memory.
That's interesting, what are some examples?
I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.
That's interesting, what's the difference?
Buffer memory just stores the entire conversation, right?
That makes sense, what about ConversationBufferWindowMemory?
Buffer window memory stores the last k messages, dropping the rest.
Very cool!


With that, we've created our buffer memory. Before feeding it into our LLM let's quickly view the alternative method for adding messages to our memory. With this other method, we pass individual user and AI messages via the `add_user_message` and `add_ai_message` methods. To reproduce what we did above, we do:

In [35]:
memory = ConversationBufferMemory(return_messages=True)

memory.chat_memory.add_user_message("Hi, my name is Josh")
memory.chat_memory.add_ai_message("Hey Josh, what's up? I'm an AI model called Zeta.")
memory.chat_memory.add_user_message("I'm researching the different types of conversational memory.")
memory.chat_memory.add_ai_message("That's interesting, what are some examples?")
memory.chat_memory.add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
memory.chat_memory.add_ai_message("That's interesting, what's the difference?")
memory.chat_memory.add_user_message("Buffer memory just stores the entire conversation, right?")
memory.chat_memory.add_ai_message("That makes sense, what about ConversationBufferWindowMemory?")
memory.chat_memory.add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
memory.chat_memory.add_ai_message("Very cool!")

memory.load_memory_variables({})


{'history': [HumanMessage(content='Hi, my name is Josh', additional_kwargs={}, response_metadata={}),
  AIMessage(content="Hey Josh, what's up? I'm an AI model called Zeta.", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
  HumanMessag

In [40]:
from langchain.chains import ConversationChain

chain = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True
)




In [39]:
chain.invoke({"input": "what is my name again?"})

{'input': 'what is my name again?',
 'history': [HumanMessage(content='Hi, my name is Josh', additional_kwargs={}, response_metadata={}),
  AIMessage(content="Hey Josh, what's up? I'm an AI model called Zeta.", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, 

In [41]:
memory.load_memory_variables({})

{'history': [HumanMessage(content='Hi, my name is Josh', additional_kwargs={}, response_metadata={}),
  AIMessage(content="Hey Josh, what's up? I'm an AI model called Zeta.", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
  HumanMessag


### `ConversationBufferMemory` with `RunnableWithMessageHistory`

As mentioned, the `ConversationBufferMemory` type is due for deprecation. Instead, we can use the `RunnableWithMessageHistory` class to implement the same functionality.

When implementing `RunnableWithMessageHistory` we will use **L**ang**C**hain **E**xpression **L**anguage (LCEL) and for this we need to define our prompt template and LLM components. Our `llm` has already been defined, so now we just define a `ChatPromptTemplate` object.

In [None]:
from langchain.prompts import (
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
    ChatPromptTemplate
)

system_prompt = "You are a helpful assistant called Zeta."

prompt_template = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(system_prompt),
    MessagesPlaceholder(variable_name="history"),
    HumanMessagePromptTemplate.from_template("{query}"),
])


We can link our `prompt_template` and our `llm` together to create a pipeline via LCEL.

In [43]:
pipeline = prompt_template | llm



Our `RunnableWithMessageHistory` requires our `pipeline` to be wrapped in a `RunnableWithMessageHistory` object. This object requires a few input parameters. One of those is `get_session_history`, which requires a function that returns a `ChatMessageHistory` object based on a session ID. We define this function ourselves:

In [46]:
from langchain_core.chat_history import InMemoryChatMessageHistory

chat_map = {}

def get_chat_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in chat_map:
        chat_map[session_id] = InMemoryChatMessageHistory()
    return chat_map[session_id]



We also need to tell our runnable which variable name to use for the chat history (ie `history`) and which to use for the user's query (ie `query`).

In [47]:
from langchain_core.runnables.history import RunnableWithMessageHistory

pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history"
)



In [48]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is Josh"},
    config={"session_id": "id_123"}
)

AIMessage(content="Hi Josh, it's nice to meet you! How can I help you today?", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash-lite', 'safety_ratings': []}, id='run--af42ad99-895e-4882-a0ad-476fe534c9a9-0', usage_metadata={'input_tokens': 14, 'output_tokens': 19, 'total_tokens': 33, 'input_token_details': {'cache_read': 0}})

In [49]:
pipeline_with_history.invoke(
    {"query": "what was my name again?"},
    config={"session_id": "id_123"}
)


AIMessage(content='Your name is Josh.', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash-lite', 'safety_ratings': []}, id='run--6723cf50-d410-4c20-8860-e62c8b8a9ab1-0', usage_metadata={'input_tokens': 38, 'output_tokens': 6, 'total_tokens': 44, 'input_token_details': {'cache_read': 0}})

In [50]:
pipeline_with_history.invoke(
    {"query": "what was my name again?"},
    config={"session_id": "id_123"}
)

AIMessage(content='Your name is Josh.', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash-lite', 'safety_ratings': []}, id='run--ceb4f2b7-9f31-4b7f-b67d-241a1f6f3583-0', usage_metadata={'input_tokens': 49, 'output_tokens': 6, 'total_tokens': 55, 'input_token_details': {'cache_read': 0}})

In [51]:
pipeline_with_history.invoke(
    {"query": "what was my name again?"},
    config={"session_id": "id_124"}
)

AIMessage(content='I do not have access to your personal information, so I do not know your name.', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash-lite', 'safety_ratings': []}, id='run--baa8028f-a15d-4856-8a05-4513881ed10b-0', usage_metadata={'input_tokens': 14, 'output_tokens': 19, 'total_tokens': 33, 'input_token_details': {'cache_read': 0}})

In [52]:
pipeline_with_history.invoke(
    {"query": "My name is Nikhil"},
    config={"session_id": "id_124"}
)

AIMessage(content='Okay, Nikhil. Nice to meet you!', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash-lite', 'safety_ratings': []}, id='run--1255a2c9-2d9a-4540-b039-eb183a0f06d7-0', usage_metadata={'input_tokens': 37, 'output_tokens': 11, 'total_tokens': 48, 'input_token_details': {'cache_read': 0}})

In [53]:
pipeline_with_history.invoke(
    {"query": "what is my name again?"},
    config={"session_id": "id_124"}
)

AIMessage(content='Your name is Nikhil.', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash-lite', 'safety_ratings': []}, id='run--2a7f7711-5623-42aa-b574-5e38a70cb0cf-0', usage_metadata={'input_tokens': 53, 'output_tokens': 7, 'total_tokens': 60, 'input_token_details': {'cache_read': 0}})

In [54]:
pipeline_with_history.invoke(
    {"query": "what is my name again?"},
    config={"session_id": "id_123"}
)

AIMessage(content='Your name is Josh.', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash-lite', 'safety_ratings': []}, id='run--2e097581-546d-48a0-8a86-958ec1b73f92-0', usage_metadata={'input_tokens': 60, 'output_tokens': 6, 'total_tokens': 66, 'input_token_details': {'cache_read': 0}})

---

## 2. `ConversationBufferWindowMemory`

The `ConversationBufferWindowMemory` type is similar to `ConversationBufferMemory`, but only keeps track of the last `k` messages. There are a few reasons why we would want to keep only the last `k` messages:

- More messages mean more tokens are sent with each request, more tokens increases latency _and_ cost.
    
- LLMs tend to perform worse when given more tokens, making them more likely to deviate from instructions, hallucinate, or _"forget"_ information provided to them. Conciseness is key to high performing LLMs.
    
- If we keep _all_ messages we will eventually hit the LLM's context window limit, by adding a window size `k` we can ensure we never hit this limit.
    

The buffer window solves many problems that we encounter with the standard buffer memory, while still being a very simple and intuitive form of conversational memory.

In [55]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=4, return_messages=True)



  memory = ConversationBufferWindowMemory(k=4, return_messages=True)


In [None]:
memory.chat_memory.add_user_message("Hi, my name is Josh")
memory.chat_memory.add_ai_message("Hey Josh, what's up? I'm an AI model called Zeta.")

memory.chat_memory.add_user_message("I'm researching the different types of conversational memory.")
memory.chat_memory.add_ai_message("That's interesting, what are some examples?")
memory.chat_memory.add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
memory.chat_memory.add_ai_message("That's interesting, what's the difference?")
memory.chat_memory.add_user_message("Buffer memory just stores the entire conversation, right?")
memory.chat_memory.add_ai_message("That makes sense, what about ConversationBufferWindowMemory?")
memory.chat_memory.add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
memory.chat_memory.add_ai_message("Very cool!")


In [57]:
memory.load_memory_variables({})

{'history': [HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer window memory stores the last k messages, dropping the rest.', additional_kwargs={}, response_metadata={}),
  AIMessage(content='Very cool!', additional_kwargs={}, response_metadata={})]}

In [58]:
chain = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True
)

chain.invoke({"input": "what is my name again?"})



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
[HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}), AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}), HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}), AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}), HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}), AIMessage(content='That mak

{'input': 'what is my name again?',
 'history': [HumanMessage(content="I'm researching the different types of conversational memory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what are some examples?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content="I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.", additional_kwargs={}, response_metadata={}),
  AIMessage(content="That's interesting, what's the difference?", additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
  HumanMessage(content='Buffer window memory stores the last k messages, dropping the rest.', additional_kwargs={}, response_metadata={}),
  AIMessage(content='Very cool!', additional_kw


The reason our LLM can no longer remember our name is because we have set the `k` parameter to `4`, meaning that only the last messages are stored in memory, as we can see above this does not include the first message where we introduced ourselves.

Based on the agent forgetting our name, we might wonder _why_ we would ever use this memory type compared to the standard buffer memory. Well, as with most things in AI, it is always a trade-off. Here we are able to support much longer conversations, use less tokens, and improve latency — but these come at the cost of forgetting non-recent messages.



### `ConversationBufferWindowMemory` with `RunnableWithMessageHistory`

To implement this memory type using the `RunnableWithMessageHistory` class, we can use the same approach as before. We define our `prompt_template` and `llm` as before, and then wrap our pipeline in a `RunnableWithMessageHistory` object.

For the window feature, we need to define a custom version of the `InMemoryChatMessageHistory` class that removes any messages beyond the last `k` messages.




In [59]:
from pydantic import BaseModel, Field
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.messages import BaseMessage

class BufferWindowMessageHistory(BaseChatMessageHistory, BaseModel):
    messages: list[BaseMessage] = Field(default_factory=list)
    k: int = Field(default_factory=int)

    def __init__(self, k: int):
        super().__init__(k=k)
        print(f"Initializing BufferWindowMessageHistory with k={k}")

    def add_messages(self, messages: list[BaseMessage]) -> None:
        """Add messages to the history, removing any messages beyond
        the last `k` messages.
        """
        self.messages.extend(messages)
        self.messages = self.messages[-self.k:]

    def clear(self) -> None:
        """Clear the history."""
        self.messages = []




In [60]:
chat_map = {}
def get_chat_history(session_id: str, k: int = 4) -> BufferWindowMessageHistory:
    print(f"get_chat_history called with session_id={session_id} and k={k}")
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = BufferWindowMessageHistory(k=k)
    # remove anything beyond the last
    return chat_map[session_id]

In [61]:
from langchain_core.runnables import ConfigurableFieldSpec

pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history",

    history_factory_config=[
    ConfigurableFieldSpec(
        id="session_id",
        annotation=str,
        name="Session ID",
        description="The session ID to use for the chat history",
        default="id_default",
    ),

    ConfigurableFieldSpec(
    id="k",
    annotation=int,
    name="k",
    description="The number of messages to keep in the history",
    default=4,
    )
    ]
)



In [64]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is Josh"},
    config={"configurable": {"session_id": "id_k4", "k": 4}}
)

get_chat_history called with session_id=id_k4 and k=4


AIMessage(content="Hi Josh! It's nice to meet you too. How can I help you today?", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash-lite', 'safety_ratings': []}, id='run--2a4d13e2-09fc-4576-a8f0-ea6a5aaa788a-0', usage_metadata={'input_tokens': 38, 'output_tokens': 20, 'total_tokens': 58, 'input_token_details': {'cache_read': 0}})

In [65]:
chat_map["id_k4"].clear()
# manually insert history
chat_map["id_k4"].add_user_message("Hi, my name is Josh")
chat_map["id_k4"].add_ai_message("I'm an AI model called Zeta.")
chat_map["id_k4"].add_user_message("I'm researching the different types of conversational memory.")
chat_map["id_k4"].add_ai_message("That's interesting, what are some examples?")
chat_map["id_k4"].add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
chat_map["id_k4"].add_ai_message("That's interesting, what's the difference?")
chat_map["id_k4"].add_user_message("Buffer memory just stores the entire conversation, right?")
chat_map["id_k4"].add_ai_message("That makes sense, what about ConversationBufferWindowMemory?")
chat_map["id_k4"].add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
chat_map["id_k4"].add_ai_message("Very cool!")

# if we now view the messages, we'll see that ONLY the last 4 messages are stored
chat_map["id_k4"].messages


[HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='Buffer window memory stores the last k messages, dropping the rest.', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Very cool!', additional_kwargs={}, response_metadata={})]

In [66]:
pipeline_with_history.invoke(
    {"query": "what is my name again?"},
    config={"configurable": {"session_id": "id_k4", "k": 4}}
)

get_chat_history called with session_id=id_k4 and k=4


AIMessage(content='I do not have access to your name. I am a language model, and I do not retain information from previous conversations.', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash-lite', 'safety_ratings': []}, id='run--02999509-e697-45c0-81df-f4b850da641c-0', usage_metadata={'input_tokens': 51, 'output_tokens': 26, 'total_tokens': 77, 'input_token_details': {'cache_read': 0}})