# LangChain: Memory

## Outline
* ConversationBufferMemory
* ConversationBufferWindowMemory
* ConversationTokenBufferMemory
* ConversationSummaryMemory

## ConversationBufferMemory

In [2]:
# Importing my OpenAI API key
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

import warnings
warnings.filterwarnings('ignore')

Note: LLM's do not always produce the same results. When executing the code in your notebook, you may get slightly different answers that those in the lecture.

In [3]:
# account for deprecation of LLM model
import datetime
# Get the current date
current_date = datetime.datetime.now().date()

# Define the date after which the model should be set to "gpt-3.5-turbo"
target_date = datetime.date(2025, 8, 18)

# Set the model variable based on the current date
if current_date > target_date:
    llm_model = "gpt-3.5-turbo"
else:
    llm_model = "gpt-3.5-turbo-0301"

In [4]:
# Importing a few tools that I'll need.
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

### We are going to create a chatbot application using LangChain to learn Memory concept in LangChain
Let's use as the motivating example for memory, using LangChain to manage a chatbot conversation. 

# ConversationBufferMemory

In [5]:
# Set the LLM as a chat interface of OpenAI with temperature equals 0, 
llm = ChatOpenAI(temperature=0.0, model=llm_model)

# We are setting up one specific type of memory here. 
# Use the memory as a ConversationBufferMemory, and you'll see later what this means. 
memory = ConversationBufferMemory()

# Don't worry about Conversation Chain for now, we will learn this later 
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=False # try with False 
)

  llm = ChatOpenAI(temperature=0.0, model=llm_model)
  memory = ConversationBufferMemory()
  conversation = ConversationChain(


In [6]:
# start the conversation with an input question or greetings or anything 
# and let's see what it says. 
conversation.predict(input="Hi, my name is Ankit")

"Hello Ankit! It's nice to meet you. How can I assist you today?"

In [7]:
# another converstion in the using the same conversation object. 
# it might give `1+1=2` as an answer. 
conversation.predict(input="What is 1+1?")

'1+1 equals 2. Is there anything else you would like to know?'

In [8]:
# ask it again, you know, what's my name? 
conversation.predict(input="What is my name?")

'Your name is Ankit.'

# And so if you want, you can change this verbose variable to true, to see what LangChain is actually doing. 
```
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=False # try with True 
)
```

Re-run with that settings. 

When you run predict, Hi my name is Ankit, observe the prompt in verbose output that LangChain is generating. 

And when you execute this on the second and third parts of the conversations, observer the prompt. And notice that by the time I'm asking, what is my name? It has stored the current conversation as follows. 
`Hi, my name is Ankit, what is 1 plus 1, and so on.` 
And so this memory or this history of the conversation gets longer and longer. 

In [9]:
# I have used the memory variable to store the memory. 
# So if I were to print memory.buffer, it has stored the conversation so far. 
print(memory.buffer)

Human: Hi, my name is Ankit
AI: Hello Ankit! It's nice to meet you. How can I assist you today?
Human: What is 1+1?
AI: 1+1 equals 2. Is there anything else you would like to know?
Human: What is my name?
AI: Your name is Ankit.


In [10]:
# This is what LangChain has remembered in its memory so far. 
# Including what Human or AI has said. 
memory.load_memory_variables({})

{'history': "Human: Hi, my name is Ankit\nAI: Hello Ankit! It's nice to meet you. How can I assist you today?\nHuman: What is 1+1?\nAI: 1+1 equals 2. Is there anything else you would like to know?\nHuman: What is my name?\nAI: Your name is Ankit."}

### Task: run the code till here. 

# Re-initialised ConversationBufferMemory with a side example

In [11]:
# recreating the memory `ConversationBufferMemory`
memory = ConversationBufferMemory()

In [12]:
# this is how we can add more context to the memory 
memory.save_context({"input": "Hi"}, 
                    {"output": "What's up"})

In [13]:
print(memory.buffer)

Human: Hi
AI: What's up


In [14]:
# print and see the memory
memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up"}

In [15]:
# adding more stuff in the memory 
memory.save_context({"input": "Not much, just hanging"}, 
                    {"output": "Cool"})

In [16]:
# print and see the memory again
memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up\nHuman: Not much, just hanging\nAI: Cool"}

### So when we use LLM for a conversation (go to slide #10)
- The LLM is stateless -- Meaning they don't remember the conversation you have done so far. 
- And Each transation in API endpoint is idependent 
- Chatbots appear to have memory only because there is the wrapper code which provides the full conversation that you have had so far as a `context` 
- Memory is used as additional context to the LLM such that it can generate the output as if it looks like a next message in the conversation
- As the conversation gets longer, required memory needs also increases - Thus, the cost of sending a lot of tokens to the LLM also increases significantly. 
- So, langchain provides a several kind of memory to store and accumulate the converations 

## ConversationBufferWindowMemory
Prevents the memory to growing without limits. 

In [None]:
# new type of memory 
from langchain.memory import ConversationBufferWindowMemory

In [None]:
# k=1 means wanted to remember just one conversation exchange
# This prevents the memory to grow without limits
# In practice, the k value would be set to a bit larger number
memory = ConversationBufferWindowMemory(k=1)               

In [None]:
# add some context into the memory as before. 
memory.save_context({"input": "Hi"},
                    {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})

In [None]:
# remembers the most recent conversation window
memory.load_memory_variables({})

In [None]:
llm = ChatOpenAI(temperature=0.0, model=llm_model)
memory = ConversationBufferWindowMemory(k=1)
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=False # Try changing this to True and re-run this snippet
)

In [None]:
# let's re-run the conversation that we had just now. 
conversation.predict(input="Hi, my name is Ankit")

In [None]:
# same question
conversation.predict(input="What is 1+1?")

In [None]:
# last question 
conversation.predict(input="What is my name?")

- Task: pause and run this section's code 
- hint: try to change `verbose` and `temperature` and `k` values and see the effect. 

## ConversationTokenBufferMemory

The memory will limit the number of tokens to be stored in memory

Most of the LLM charges based on tokens input and output this maps directly with the cost

In [None]:
#!pip install tiktoken

In [None]:
from langchain.memory import ConversationTokenBufferMemory
from langchain.llms import OpenAI
llm = ChatOpenAI(temperature=0.0, model=llm_model)

In [None]:
# I would advise you to play with max_token_limit param to understand how a small number like 25 or 30 
# is different than a big number like 100 or 200
# check how the output differs

# important note: see llm is an argument here because different LLMs would have a different tokenizers 
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=50)
memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"}, 
                    {"output": "Charming!"})

### observe the variations in output by setting 

max_token_limit = 30 

max_token_limit = 50 

max_token_limit = 100

In [None]:
# check what it outputs by printing the memory buffer like so! 
memory.load_memory_variables({})

- Task: pause and run this section's code 
- hint: try to change `prompts` and `max_token_limit` see the effect. 

## ConversationSummaryMemory
Use LLM to write the summary of the conversation so far and let that be the memory

As opposed to -> fix number of tokens, fix number of message exchanges, or whole conversation this is more smarter way of preserving the memory

In [None]:
from langchain.memory import ConversationSummaryBufferMemory

In [None]:
# create a long string with someone's schedule 
schedule = "There is a meeting at 8am with your product team. \
You will need your powerpoint presentation prepared. \
9am-12pm have time to work on your LangChain \
project which will go quickly because Langchain is such a powerful tool. \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."

# depending upon the max_token_limit it will decide if it has to
# store the entire conversation? or summarise the conversation using the 
# given LLM and store only the summary value. 
# check with max_token_limit = 400 and then with 100 or 50 
# see the difference in outputs in the `memory.load_memory_variables({})`
memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=50)
memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.save_context({"input": "What is on the schedule today?"}, 
                    {"output": f"{schedule}"})

### observe the variations in output by setting 

max_token_limit = 400  (it should store the full conversation)

max_token_limit = 100 (it should summarise)

In [None]:
# print what's in the memory like so! 
memory.load_memory_variables({})

In [None]:
# let's show the conversation on top of this, now. 
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

In [None]:
conversation.predict(input="What would be a good demo to show?")

In [None]:
memory.load_memory_variables({})

- Task: pause and run this section's code 
- hint: try to change `max_token_limit` see the effect. 

Goto Slide #11, 12