# LangChain: Memory
- The Memory module enables the language model to have memory and context, allowing the LLM to make informed decisions.
- It allows the model to remember user inputs, system responses, and any other relevant information.
- The stored information can be accessed and utilized during subsequent interactions.


## Outline
* Importing the Necessary Libraries and other setups 
* ConversationBufferMemory
* ConversationBufferWindowMemory
* ConversationTokenBufferMemory
* ConversationSummaryMemory

*Note: The output produced by LLMs might vary*

## Importing the Necessary Libraries and other setups

In [21]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

import warnings
warnings.filterwarnings('ignore')

In [22]:
# account for deprecation of LLM model
import datetime
# Get the current date
current_date = datetime.datetime.now().date()

# Define the date after which the model should be set to "gpt-3.5-turbo"
target_date = datetime.date(2024, 6, 12)

# Set the model variable based on the current date
if current_date > target_date:
    llm_model = "gpt-3.5-turbo"
else:
    llm_model = "gpt-3.5-turbo-0301"

In [23]:
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.memory import ConversationTokenBufferMemory
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain.memory import ConversationBufferWindowMemory
from langchain.memory import ConversationSummaryBufferMemory

## ConversationBufferMemory

- Conversational memory is how a chatbot can respond to multiple queries in a chat-like manner. It enables a coherent conversation, and without it, every query would be treated as an entirely independent input without considering past interactions.
- The `ConversationBufferMemory` is the most straightforward conversational memory in LangChain. In this scenario the raw input of the past conversation between the user and AI is passed — in its raw form — to the {history} parameter.

- **Pros:**
    - Storing everything gives the LLM the maximum amount of information
    - Storing everything is simple and intuitive
- **Cons:**
    - More tokens mean slowing response times and higher costs
    - Long conversations cannot be remembered as we hit the LLM token limit (4096 tokens for text-davinci-003 and gpt-3.5-turbo)

In [24]:
llm = ChatOpenAI(temperature=0.0, model=llm_model)
memory = ConversationBufferMemory() # Initialized the langchain.memory.buffer.ConversationBufferMemory object
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True # This parameter will describe everything that is going on with LLM while interactions
)

In [25]:
conversation.predict(input="Hi, my name is Sonal")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, my name is Sonal
AI:[0m

[1m> Finished chain.[0m


"Hello Sonal, it's nice to meet you. My name is AI. How can I assist you today?"

In [26]:
conversation.predict(input="What is 1+1?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Sonal
AI: Hello Sonal, it's nice to meet you. My name is AI. How can I assist you today?
Human: What is 1+1?
AI:[0m

[1m> Finished chain.[0m


'The answer to 1+1 is 2.'

In [27]:
conversation.predict(input="What is my name?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Sonal
AI: Hello Sonal, it's nice to meet you. My name is AI. How can I assist you today?
Human: What is 1+1?
AI: The answer to 1+1 is 2.
Human: What is my name?
AI:[0m

[1m> Finished chain.[0m


'Your name is Sonal, as you mentioned earlier.'

As per the above cells it is clearly seen that all the responses have been stored in the Memory and is used by the LLM to respond to the next question that was asked by the user.

In [29]:
print(memory.buffer) 
# This is a string object which stores the conversation that has happened between Human and AI

Human: Hi, my name is Sonal
AI: Hello Sonal, it's nice to meet you. My name is AI. How can I assist you today?
Human: What is 1+1?
AI: The answer to 1+1 is 2.
Human: What is my name?
AI: Your name is Sonal, as you mentioned earlier.


In [30]:
memory.load_memory_variables({}) 
# This contains a dictinory with 'history' as the key and the value is the string of interaction.

{'history': "Human: Hi, my name is Sonal\nAI: Hello Sonal, it's nice to meet you. My name is AI. How can I assist you today?\nHuman: What is 1+1?\nAI: The answer to 1+1 is 2.\nHuman: What is my name?\nAI: Your name is Sonal, as you mentioned earlier."}

### Initializing ConverstionalBufferMemory object to pass to any ConversationChain

In [31]:
memory = ConversationBufferMemory()

In [33]:
memory.save_context({"input": "Hi"}, 
                    {"output": "What's up"})

In [34]:
print(memory.buffer)

Human: Hi
AI: What's up


In [35]:
memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up"}

In [36]:
memory.save_context({"input": "Not much, just hanging"}, 
                    {"output": "Cool"})

In [37]:
memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up\nHuman: Not much, just hanging\nAI: Cool"}

The `ConversationBufferMemory` is an excellent option to get started with but is limited by the storage of every interaction. Let’s take a look at other options that help remedy this.

## ConversationBufferWindowMemory

- The `ConversationBufferWindowMemory` acts in the same way as our earlier “buffer memory” but adds a window to the memory. Meaning that we only keep a given number of past interactions before “forgetting” them.
- In this instance, we set k=1 — this means the window will remember the single latest interaction between the human and AI i.e., the latest human response and the latest AI response.
- That is the latest human response and the latest AI response.
- Although this method isn’t suitable for remembering distant interactions, it is good at limiting the number of tokens being used — a number that we can increase/decrease depending on our needs

In [39]:
memory = ConversationBufferWindowMemory(k=1)               

In [40]:
memory.save_context({"input": "Hi"},
                    {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})


In [41]:
memory.load_memory_variables({})

{'history': 'Human: Not much, just hanging\nAI: Cool'}

In [42]:
llm = ChatOpenAI(temperature=0.0, model=llm_model)
memory = ConversationBufferWindowMemory(k=1)
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=False
)

In [46]:
conversation.predict(input="Hi, my name is Sonal")

"Hello Sonal! It's nice to meet you. Is there anything specific you would like to talk about or ask me?"

In [47]:
conversation.predict(input="What is 1+1?")

'1+1 is equal to 2.'

In [48]:
conversation.predict(input="What is my name?")

"I'm sorry, I don't have access to that information. Could you please tell me your name?"

When the value of `ConversationBufferWindowMemory(k=1)`:
- By the end of the conversation, when we ask *"What is my Name?"*, the answer to this was contained in the human response two interactions ago. 
- As we only kept the most recent interaction *(k=1)*, the model had forgotten and could not give the correct answer.

## ConversationTokenBufferMemory

- `ConversationTokenBufferMemory` keeps a buffer of recent interactions in memory, and uses token length rather than number of interactions to determine when to flush interactions.
- In the further cells there are two `max_token_limit` are set `25` and `50` and it can be seen that they produced two different outputs.

In [49]:
#!pip install tiktoken

In [50]:
llm = ChatOpenAI(temperature=0.0, model=llm_model)

In [59]:
memory_25 = ConversationTokenBufferMemory(llm=llm, max_token_limit=25)
memory_25.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory_25.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory_25.save_context({"input": "Chatbots are what?"}, 
                    {"output": "Charming!"})

In [60]:
memory_25.load_memory_variables({})

{'history': 'Human: Chatbots are what?\nAI: Charming!'}

In [61]:
memory_50 = ConversationTokenBufferMemory(llm=llm, max_token_limit=50)
memory_50.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory_50.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory_50.save_context({"input": "Chatbots are what?"}, 
                    {"output": "Charming!"})

In [62]:
memory_50.load_memory_variables({})

{'history': 'AI: Amazing!\nHuman: Backpropagation is what?\nAI: Beautiful!\nHuman: Chatbots are what?\nAI: Charming!'}

## ConversationSummaryBufferMemory
- The `ConversationSummaryBufferMemory` is a mix of the `ConversationSummaryMemory` and the `ConversationBufferWindowMemory`.
- It summarizes the earliest interactions in a conversation while maintaining the max_token_limit most recent tokens in their conversation. 
- When applying this to our earlier conversation, we can set max_token_limit to a small number and yet the LLM the details of the conversation.
- This is because that information is captured by the “summarization” component of the memory, despite being missed by the “buffer window” component.
- **Pros:**
    - Summarizer means we can remember distant interactions.
    - Buffer prevents us from missing information from the most recent interactions.
- **Cons:**
    - Summarizer increases token count for shorter conversations
    - Storing the raw interactions — even if just the most recent interactions — increases token count

In [74]:
# create a long string
schedule = "There is a meeting at 8am with your product team. \
You will need your powerpoint presentation prepared. \
9am-12pm have time to work on your LangChain \
project which will go quickly because Langchain is such a powerful tool. \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."

memory_25 = ConversationSummaryBufferMemory(llm=llm, max_token_limit=25)
memory_25.save_context({"input": "Hello"}, {"output": "What's up"})
memory_25.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory_25.save_context({"input": "What is on the schedule today?"}, 
                    {"output": f"{schedule}"})

In [75]:
memory_100.load_memory_variables({})

{'history': "System: The human and AI engage in small talk before discussing the day's schedule. The AI informs the human of a morning meeting with the product team, time to work on the LangChain project, and a lunch meeting with a customer interested in the latest AI developments."}

In [76]:
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

In [77]:
conversation.predict(input="What would be a good demo to show?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
System: The human and AI engage in small talk before discussing the day's schedule. The AI informs the human of a morning meeting with the product team, time to work on the LangChain project, and a lunch meeting with a customer interested in the latest AI developments.
Human: What would be a good demo to show?
AI:[0m

[1m> Finished chain.[0m


"Based on the customer's interest in AI developments, I would suggest showcasing our latest natural language processing capabilities. We could demonstrate how our AI can understand and respond to complex language queries, and even provide personalized recommendations based on the user's preferences. Additionally, we could highlight our machine learning algorithms and how they continuously improve the accuracy and efficiency of our AI. Would you like me to prepare a demo for the meeting?"

In [78]:
memory.load_memory_variables({})

{'history': "System: The human and AI engage in small talk before discussing the day's schedule. The AI informs the human of a morning meeting with the product team, time to work on the LangChain project, and a lunch meeting with a customer interested in the latest AI developments. The human asks what would be a good demo to show.\nAI: Based on the customer's interest in AI developments, I would suggest showcasing our latest natural language processing capabilities. We could demonstrate how our AI can understand and respond to complex language queries, and even provide personalized recommendations based on the user's preferences. Additionally, we could highlight our machine learning algorithms and how they continuously improve the accuracy and efficiency of our AI. Would you like me to prepare a demo for the meeting?"}