In this notebook we are going explore how the different ways to add memory to our LLM model.

As first step, we create the chat model.

In [2]:
from langchain_community.chat_models import ChatOllama

chat_model = ChatOllama(model="qwen2.5:3b", temperature=0.0)

The objects that we are going to explore are the these:

-   **ConversationBufferMemory**: Keeps the entire conversation history in memory.
-   **ConversationBufferWindowMemory**: Keeps only the last few exchanges (a sliding window of recent messages).
-   **ConversationTokenBufferMemory**: Keeps messages up to a token limit,  emptying the whole memory when the limit is reached.
-   **ConversationSummaryMemory**: Keeps a summary of previous dialogue, not the full text, to preserve key context efficiently.

The first object we use to set the memory of the model is the class `ConversationBufferMemory` class. Using this object, we add a memory buffer to out chat pipeline, able to converse the knowldge of previous messages.

In additino, we are going to use the class `ConversationChain` class as auxiliar object to create a chat with memory context. It simplifies the use of chain.

In [6]:
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

memory = ConversationBufferMemory()
conversation = ConversationChain(llm=chat_model, memory=memory, verbose=True)
response = conversation.predict(input="Hi, my name is Alejandro.")
print(response)
response = conversation.predict(input="What is my name?")
print(response)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, my name is Alejandro.
AI:[0m

[1m> Finished chain.[0m
Hello Alejandro! It's nice to meet you. How can I assist you today? Are there any specific topics or questions you'd like to discuss? Whether it's about technology, general knowledge, or just a casual chat, feel free to share what interests you.


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversatio

We can access to the memory buffer, and observe the stored messages.

In [8]:
print(memory.buffer)
print(memory.load_memory_variables({}))

Human: Hi, my name is Alejandro.
AI: Hello Alejandro! It's nice to meet you. How can I assist you today? Are there any specific topics or questions you'd like to discuss? Whether it's about technology, general knowledge, or just a casual chat, feel free to share what interests you.
Human: What is my name?
AI: Your name is Alejandro.
{'history': "Human: Hi, my name is Alejandro.\nAI: Hello Alejandro! It's nice to meet you. How can I assist you today? Are there any specific topics or questions you'd like to discuss? Whether it's about technology, general knowledge, or just a casual chat, feel free to share what interests you.\nHuman: What is my name?\nAI: Your name is Alejandro."}


We can also add memory context to the memory buffer.

In [10]:
memory = ConversationBufferMemory()
memory.save_context({"input": "Hi"}, 
                    {"output": "What's up"})
print(memory.load_memory_variables({}))
memory.save_context({"input": "Not much, just hanging"}, 
                    {"output": "Cool"})
print(memory.load_memory_variables({}))

{'history': "Human: Hi\nAI: What's up"}
{'history': "Human: Hi\nAI: What's up\nHuman: Not much, just hanging\nAI: Cool"}


Another possibility to create a memory is to use the object `ConversationBufferMemory`. We have to set the number of exchanges to keep in memory through the input variable `k`.

In [11]:
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

memory = ConversationBufferMemory(k=1) # Keep only the last exchange
memory.save_context({"input": "Hi, my name is Alejandro."}, {"output": "Hello Alejandro! How can I assist you today?"})
memory.save_context({"input": "My surname is Torres"}, {"output": "Hi Mr.Torres!"})
conversation = ConversationChain(llm=chat_model, memory=memory, verbose=True)
response = conversation.predict(input="What is my name?")
print(response)
response = conversation.predict(input="What is my surname?")
print(response)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Alejandro.
AI: Hello Alejandro! How can I assist you today?
Human: My surname is Torres
AI: Hi Mr.Torres!
Human: What is my name?
AI:[0m

[1m> Finished chain.[0m
I apologize, but there seems to be a mix-up in our conversation. You mentioned your surname as "Torres," and now you're asking what your name is. Could you please clarify? Are you looking for information about your given name or are you unsure of your full name?


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specifi

We can observe that the model has forgotten the first exchange, so it will keep the last `k` exchanges.

The next option to be explored is the `ConversationTokenBufferMemory`. We define the limit of tokens to be saved in memory through the variable `max_token_limit`.

In [27]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.57.1-py3-none-any.whl (12.0 MB)
     --------------------------------------- 12.0/12.0 MB 18.2 MB/s eta 0:00:00
Collecting huggingface-hub<1.0,>=0.34.0
  Downloading huggingface_hub-0.35.3-py3-none-any.whl (564 kB)
     ------------------------------------- 564.3/564.3 kB 11.8 MB/s eta 0:00:00
Collecting tokenizers<=0.23.0,>=0.22.0
  Downloading tokenizers-0.22.1-cp39-abi3-win_amd64.whl (2.7 MB)
     ---------------------------------------- 2.7/2.7 MB 24.4 MB/s eta 0:00:00
Collecting safetensors>=0.4.3
  Downloading safetensors-0.6.2-cp38-abi3-win_amd64.whl (320 kB)
     ------------------------------------- 320.2/320.2 kB 19.4 MB/s eta 0:00:00
Installing collected packages: safetensors, huggingface-hub, tokenizers, transformers
Successfully installed huggingface-hub-0.35.3 safetensors-0.6.2 tokenizers-0.22.1 transformers-4.57.1



[notice] A new release of pip available: 22.3 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
from langchain.memory import ConversationTokenBufferMemory
from langchain.chains import ConversationChain

input_prompt = "eggs, milk, bread, butter. "
chat_model = ChatOllama(model="qwen2.5:3b", temperature=0.0)
num_input_tokens = chat_model.get_num_tokens(input_prompt) + 1 # Add 1 for the END token
print(f"Number of memory input tokens: {num_input_tokens}")
memory = ConversationTokenBufferMemory(llm=chat_model, max_token_limit=num_input_tokens+1) # Keep messages up to a token limit
memory.save_context({"input": "Please, memorize the following shopping list"}, {"output": input_prompt})
conversation = ConversationChain(llm=chat_model, memory=memory, verbose=True)
response = conversation.predict(input="Could you tell me the shopping list?")
print(response)
print(memory.load_memory_variables({}))
extended_prompt = input_prompt + " cheese, tomatoes, lettuce, chicken, rice, pasta, olive oil, coffee, tea, sugar, flour, salt, pepper."
# num_input_tokens = chat_model.get_num_tokens(extended_prompt)
print(f"Number of memory input tokens: {num_input_tokens}")
memory = ConversationTokenBufferMemory(llm=chat_model, max_token_limit=num_input_tokens) # Keep messages up to a token limit
memory.save_context({"input": "Please, memorize the following shopping list"}, {"output": extended_prompt})
conversation = ConversationChain(llm=chat_model, memory=memory, verbose=True)
response = conversation.predict(input="Could you tell me the shopping list?")
print(response)
print(memory.load_memory_variables({}))

Number of memory input tokens: 11


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
AI: eggs, milk, bread, butter. 
Human: Could you tell me the shopping list?
AI:[0m

[1m> Finished chain.[0m
Certainly! Based on the items you've mentioned—eggs, milk, bread, and butter—I can provide a basic shopping list for your needs:

1. Eggs
2. Milk (you might want to specify whether it's whole milk, skim milk, or another type)
3. Bread (considering you may need multiple types like white, whole wheat, or gluten-free)
4. Butter

Please note that this is a general list and the quantities can vary depending on your household size and dietary preferences. If you have any specific items in mind or if there are ot

As you can observe, it's important to add 1 token to the count to consider the END token. Also, we can see that if the memory buffer is overflowed with the number of tokens, it get's empty.

And the last object that we are going to implement is the class `CoversationSummaryBufferMemory`.

In [43]:
from langchain.memory import ConversationSummaryBufferMemory
from langchain.chains import ConversationChain
from langchain.chat_models import ChatOllama

schedule = "There is a meeting at 8am with your product team. \
You will need your powerpoint presentation prepared. \
9am-12pm have time to work on your LangChain \
project which will go quickly because Langchain is such a powerful tool. \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."

chat_model = ChatOllama(model="qwen2.5:3b", temperature=0.0)
memory = ConversationSummaryBufferMemory(llm=chat_model, max_token_limit=100)
memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.save_context({"input": "What is on the schedule today?"}, 
                    {"output": f"{schedule}"})
conversation = ConversationChain(llm=chat_model, memory=memory, verbose=True)
response = conversation.predict(input="What would be a good demo to show?")
print(response)

  memory = ConversationSummaryBufferMemory(llm=chat_model, max_token_limit=100)




[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
System: The human initiates a casual greeting and then mentions they are not doing anything specific. The AI responds with a casual acknowledgment and then asks about daily schedules.
AI: There is a meeting at 8am with your product team. You will need your powerpoint presentation prepared. 9am-12pm have time to work on your LangChain project which will go quickly because Langchain is such a powerful tool. At Noon, lunch at the italian resturant with a customer who is driving from over an hour away to meet you to understand the latest in AI. Be sure to bring your laptop to show the latest LLM demo.
Human: What would be a good demo to show?
AI:[0m

