In [5]:
from langchain_ollama import ChatOllama

llm =  ChatOllama(model="deepseek-r1:1.5b", temperature=0.0)

## ConversationBufferMemory with RunnableWithMessageHistory

`ConversationBufferMemory` was deprecated, instead we will use `RunnableWithMessageHistory` class to implement the same functionality. We will use LangChain Expression Language (LCEL) and for this we need to define our prompt template and LLM components. 

In [6]:
from langchain.prompts import (
    SystemMessagePromptTemplate, 
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
    ChatPromptTemplate
)

system_prompt = "You are a helpful assistant called Zeta."

prompt_template = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(system_prompt),
    MessagesPlaceholder(variable_name="history"),
    HumanMessagePromptTemplate.from_template("{query}"),
])

In [7]:
pipeline = prompt_template | llm

Our RunnableWithMessageHistory requires our pipeline to be wrapped in a RunnableWithMessageHistory object. This object requires a few input parameters. One of those is get_session_history, which requires a function that returns a ChatMessageHistory object based on a session ID.

In [8]:
from langchain_core.chat_history import InMemoryChatMessageHistory

chat_map = {}
def get_chat_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = InMemoryChatMessageHistory()
    return chat_map[session_id]

We also need to tell our runnable which variable name to use for the chat history (ie history) and which to use for the user's query (ie query).

In [9]:
from langchain_core.runnables.history import RunnableWithMessageHistory

pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history"
)

In [10]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is James"},
    config={"session_id": "id_123"}
)

AIMessage(content="<think>\nOkay, so I'm trying to figure out how to solve this equation: 3x + 5 = 20. Hmm, let me think about it step by step.\n\nFirst, I need to isolate the variable x. That means getting x by itself on one side of the equation. Right now, there's a 3 multiplied by x and then we add 5. So maybe I should start by subtracting 5 from both sides to get rid of that constant term on the left.\n\nLet me write that down: 3x + 5 - 5 = 20 - 5. Simplifying both sides, that becomes 3x = 15. Okay, so now we have 3 times x equals 15. To solve for x, I need to get rid of the coefficient 3 in front of x.\n\nHow do I do that? Well, if something is multiplied by a number and you want to find out what it was before, you divide both sides by that number. So in this case, I should divide both sides by 3.\n\nLet me do that: (3x)/3 = 15/3. Simplifying both sides, the left side becomes x because 3 divided by 3 is 1, and on the right side, 15 divided by 3 is 5. So now we have x = 5.\n\nWait 

In [11]:
pipeline_with_history.invoke(
    {"query": "What is my name again?"},
    config={"session_id": "id_123"}
)

AIMessage(content='<think>\nAlright, so I\'m trying to figure out what my name is. Hmm, where do I even start with this? Maybe it\'s something simple like "James"? That seems familiar from when I was younger. But wait, could there be more to it?\n\nLet me think about the context in which I\'m asking this question. It just says, "What is my name again?" So, maybe it\'s a play on words or something else. Could it be related to math? Like, if I solve an equation and get a number as my name... But that doesn\'t seem quite right.\n\nOh, maybe it\'s about the letters in my name. If I spell out my name with letters, could that form another word or phrase? For example, "James" has J-A-M-E-S. Is there any hidden meaning or connection to those letters?\n\nOr perhaps it\'s a riddle or a pun. Sometimes names are given in a way that\'s tricky or uses double meanings. Maybe "James" is being used metaphorically or literally in a different context.\n\nI could also think about famous people with simila

## ConversationBufferWindowMemory with RunnableWithMessageHistory

The ConversationBufferWindowMemory type is similar to ConversationBufferMemory, but only keeps track of the last k messages. There are a few reasons why we would want to keep only the last k messages:

More messages mean more tokens are sent with each request, more tokens increases latency and cost.

LLMs tend to perform worse when given more tokens, making them more likely to deviate from instructions, hallucinate, or "forget" information provided to them. Conciseness is key to high performing LLMs.

If we keep all messages we will eventually hit the LLM's context window limit, by adding a window size k we can ensure we never hit this limit.

In [10]:
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.messages import BaseMessage
from pydantic import BaseModel, Field

class BufferWindowMessageHistory(BaseChatMessageHistory, BaseModel):
    # Implements the methods from BaseChatMessageHistory
    messages: list[BaseMessage] = Field(default_factory=list)
    k: int = Field(default_factory=int)

    def __init__(self, k: int):
        super().__init__(k=k)
        print(f"Initializing BufferWindowMessageHistory with k={k}")
    
    def add_messages(self, messages: list[BaseMessage]) -> None:
        self.messages.extend(messages)
        self.messages = self.messages[-self.k:]
    
    def clear(self) -> None:
        self.messages = []

In [11]:
chat_map = {}
def get_chat_history(session_id: str, k: int=4)  -> BufferWindowMessageHistory:
    if session_id not in chat_map:
        chat_map[session_id] = BufferWindowMessageHistory(k=k)
    return chat_map[session_id]

In [12]:
from langchain_core.runnables import ConfigurableFieldSpec

pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="The session ID to use for the chat history",
            default="id_default"
        ),
        ConfigurableFieldSpec(
            id="k",
            annotation=int,
            name="k",
            description="The number of messages to keep in the history",
            default=4
        )
    ]
)

In [13]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is Josh"},
    config={"configurable": {"session_id": "id_k4", "k": 4}}
)

Initializing BufferWindowMessageHistory with k=4


AIMessage(content="<think>\nOkay, so I'm trying to figure out how to solve this equation: 3x + 5 = 20. Hmm, let me think about it step by step.\n\nFirst, I remember that when solving equations, the goal is to isolate the variable, which in this case is x. So, I need to get x by itself on one side of the equation.\n\nLooking at the equation, there are two operations being performed on x: multiplication and addition. Specifically, 3x means x is multiplied by 3, and then 5 is added to that product. To reverse these operations, I should do them in the opposite order. That usually means dealing with the addition first before handling the multiplication.\n\nSo, let me start by subtracting 5 from both sides of the equation. If I subtract 5 from the left side, it becomes 3x + 5 - 5, which simplifies to 3x. On the right side, subtracting 5 gives me 20 - 5, which is 15. So now my equation looks like this: 3x = 15.\n\nNext, I need to get rid of that 3 that's multiplied by x. To do that, I should 

In [14]:
chat_map["id_k4"].clear()  # clear the history

# manually insert history
chat_map["id_k4"].add_user_message("Hi, my name is Josh")
chat_map["id_k4"].add_ai_message("I'm an AI model called Zeta.")
chat_map["id_k4"].add_user_message("I'm researching the different types of conversational memory.")
chat_map["id_k4"].add_ai_message("That's interesting, what are some examples?")
chat_map["id_k4"].add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
chat_map["id_k4"].add_ai_message("That's interesting, what's the difference?")
chat_map["id_k4"].add_user_message("Buffer memory just stores the entire conversation, right?")
chat_map["id_k4"].add_ai_message("That makes sense, what about ConversationBufferWindowMemory?")
chat_map["id_k4"].add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
chat_map["id_k4"].add_ai_message("Very cool!")

chat_map["id_k4"].messages

[HumanMessage(content='Buffer memory just stores the entire conversation, right?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='That makes sense, what about ConversationBufferWindowMemory?', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='Buffer window memory stores the last k messages, dropping the rest.', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Very cool!', additional_kwargs={}, response_metadata={})]

In [15]:
pipeline_with_history.invoke(
    {"query": "what is my name again?"},
    config={"configurable": {"session_id": "id_k4", "k": 4}}
)

AIMessage(content='<think>\n\n</think>\n\nI don’t have access to internal company information. For more details about DeepSeek, please visit the official website.', additional_kwargs={}, response_metadata={'model': 'deepseek-r1:1.5b', 'created_at': '2025-05-18T17:17:13.220213749Z', 'done': True, 'done_reason': 'stop', 'total_duration': 171396554, 'load_duration': 13198835, 'prompt_eval_count': 61, 'prompt_eval_duration': 12940668, 'eval_count': 28, 'eval_duration': 136102283, 'model_name': 'deepseek-r1:1.5b'}, id='run--186f19e1-c300-419d-ac62-e2ab9c5790b0-0', usage_metadata={'input_tokens': 61, 'output_tokens': 28, 'total_tokens': 89})

## ConversationSummaryMemory with RunnableWithMessageHistory

This memory type keeps track of a summary of the conversation rather than the entire conversation. This is useful for long conversations where we don't need to keep track of the entire conversation, but we do want to keep some thread of the full conversation.

In [17]:
from langchain_core.messages import SystemMessage

class ConversationSummaryMessageHistory(BaseChatMessageHistory, BaseModel):
    messages: list[BaseModel] = Field(default_factory=list)
    llm: ChatOllama = Field(default_factory=ChatOllama)

    def __init__(self, llm: ChatOllama):
        super().__init__(llm=llm)
    
    def add_messages(self, messages: list[BaseMessage]) -> None:
        self.messages.extend(messages)
        summary_prompt = ChatPromptTemplate.from_messages(
            [
                SystemMessagePromptTemplate.from_template(
                    "Given the existing conversation summary and the new messages, "
                    "generate a new summary of the conversation. Ensuring to maintain "
                    "as much relevant information as possible BUT keep the summary "
                    "concise and no more than a short paragraph in length."
                ),
                HumanMessagePromptTemplate.from_template(
                    "Existing conversation summary:\n{existing_summary}\n\n"
                    "New messages:\n{messages}"
                )
            ]
        )
        new_summary = self.llm.invoke(
            summary_prompt.format_messages(existing_summary=self.messages, messages=messages)
        )
        self.messages = [SystemMessage(content=new_summary.content)]
    
    def clear(self) -> None:
        self.messages = []

In [18]:
chat_map = {}
def get_chat_history(session_id: str, llm: ChatOllama) -> ConversationSummaryMessageHistory:
    if session_id not in chat_map:
        chat_map[session_id] = ConversationSummaryMessageHistory(llm=llm)
    return chat_map[session_id]

In [19]:
pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="The session ID to use for the chat history",
            default="id_default",
        ),
        ConfigurableFieldSpec(
            id="llm",
            annotation=ChatOllama,
            name="LLM",
            description="The LLM to use for the conversation summary",
            default=llm,
        )
    ]
)

In [20]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is Josh"},
    config={"session_id": "id_123", "llm": llm}
)

AIMessage(content="<think>\nOkay, so I'm trying to figure out how to solve this equation: 3x + 5 = 20. Hmm, let me think about it step by step.\n\nFirst, I remember that when solving equations, the goal is to isolate the variable, which in this case is x. So, I need to get x by itself on one side of the equation.\n\nLooking at the equation, there are two operations being performed on x: multiplication and addition. Specifically, 3x means x is multiplied by 3, and then 5 is added to that product. To reverse these operations, I should do them in the opposite order. That usually means dealing with the addition first before handling the multiplication.\n\nSo, let me start by subtracting 5 from both sides of the equation. If I subtract 5 from the left side, it becomes 3x + 5 - 5, which simplifies to 3x. On the right side, subtracting 5 gives me 20 - 5, which is 15. So now my equation looks like this: 3x = 15.\n\nNext, I need to get rid of that 3 that's multiplied by x. To do that, I should 

In [21]:
chat_map["id_123"].messages

[SystemMessage(content='<think>\n\n**Final Answer:** \\\\(x = 5\\\\)', additional_kwargs={}, response_metadata={})]

## ConversationSummaryBufferMemory with RunnableWithMessageHistory

Our final memory type acts as a combination of ConversationSummaryMemory and ConversationBufferMemory. It keeps the buffer for the conversation up until the previous n tokens, anything beyond that limit is summarized then dropped from the buffer. Producing something like:

```text
# ~~ a summary of previous interactions
The user named Josh introduced himself and the AI responded, introducing itself as an AI model called Zeta.
Josh then said he was researching the different types of conversational memory and Zeta asked for some
examples.
# ~~ the most recent messages
Human: I have been looking at ConversationBufferMemory and ConversationBufferWindowMemory.
AI: That's interesting, what's the difference?
Human: Buffer memory just stores the entire conversation
AI: That makes sense, what about ConversationBufferWindowMemory?
Human: Buffer window memory stores the last k messages, dropping the rest.
AI: Very cool!
```

In [22]:
class ConversationSummaryBufferMessageHistory(BaseChatMessageHistory, BaseModel):
    messages: list[BaseMessage] = Field(default_factory=list)
    llm: ChatOllama = Field(default_factory=ChatOllama)
    k: int = Field(default_factory=int)

    def __init__(self, llm: ChatOllama, k: int):
        super().__init__(llm=llm, k=k)

    def add_messages(self, messages: list[BaseMessage]) -> None:
        existing_summary = None
        old_messages = None
        # see if we already have a summary message
        if len(self.messages) > 0 and isinstance(self.messages[0], SystemMessage):
            print(">> Found existing summary")
            existing_summary: str | None = self.messages.pop(0)
        # add the new messages to the history
        self.messages.extend(messages)
        # check if we have too many messages
        if len(self.messages) > self.k:
            print(
                f">> Found {len(self.messages)} messages, dropping "
                f"latest {len(self.messages) - self.k} messages.")
            # pull out the oldest messages...
            old_messages = self.messages[:self.k]
            # ...and keep only the most recent messages
            self.messages = self.messages[-self.k:]
        if old_messages is None:
            print(">> No old messages to update summary with")
            # if we have no old_messages, we have nothing to update in summary
            return
        # construct the summary chat messages
        summary_prompt = ChatPromptTemplate.from_messages([
            SystemMessagePromptTemplate.from_template(
                "Given the existing conversation summary and the new messages, "
                "generate a new summary of the conversation. Ensuring to maintain "
                "as much relevant information as possible BUT keep the summary "
                "concise and no more than a short paragraph in length."
            ),
            HumanMessagePromptTemplate.from_template(
                "Existing conversation summary:\n{existing_summary}\n\n"
                "New messages:\n{old_messages}"
            )
        ])
        # format the messages and invoke the LLM
        new_summary = self.llm.invoke(
            summary_prompt.format_messages(
                existing_summary=existing_summary,
                old_messages=old_messages
            )
        )
        print(f">> New summary: {new_summary.content}")
        # prepend the new summary to the history
        self.messages = [SystemMessage(content=new_summary.content)] + self.messages

    def clear(self) -> None:
        """Clear the history."""
        self.messages = []

In [23]:
chat_map = {}
def get_chat_history(
    session_id: str,
    llm: ChatOllama,
    k: int
) -> ConversationSummaryBufferMessageHistory:
    if session_id not in chat_map:
        chat_map[session_id] = ConversationSummaryBufferMessageHistory(llm=llm, k=k)
    return chat_map[session_id]

In [24]:
pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="The session ID to use for the chat history",
            default="id_default",
        ),
        ConfigurableFieldSpec(
            id="llm",
            annotation=ChatOllama,
            name="LLM",
            description="The LLM to use for the conversation summary",
            default=llm,
        ),
        ConfigurableFieldSpec(
            id="k",
            annotation=int,
            name="k",
            description="The number of messages to keep in the history",
            default=4,
        )
    ]
)

In [25]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is Josh"},
    config={"session_id": "id_123", "llm": llm, "k": 4}
)
chat_map["id_123"].messages

>> No old messages to update summary with


[HumanMessage(content='Hi, my name is Josh', additional_kwargs={}, response_metadata={}),
 AIMessage(content="<think>\nOkay, so I'm trying to figure out how to solve this equation: 3x + 5 = 20. Hmm, let me think about it step by step.\n\nFirst, I remember that when solving equations, the goal is to isolate the variable, which in this case is x. So, I need to get x by itself on one side of the equation.\n\nLooking at the equation, there are two operations being performed on x: multiplication and addition. Specifically, 3x means x is multiplied by 3, and then 5 is added to that product. To reverse these operations, I should do them in the opposite order. That usually means dealing with the addition first before handling the multiplication.\n\nSo, let me start by subtracting 5 from both sides of the equation. If I subtract 5 from the left side, it becomes 3x + 5 - 5, which simplifies to 3x. On the right side, subtracting 5 gives me 20 - 5, which is 15. So now my equation looks like this: 