## LangChain: Memory ##

### Outline ###
- ConversationBufferMemory
- ConversationBufferWindowMemory
- ConversationTokenBufferMemory
- ConversationSummaryMemory

- When you interact with these models, **naturally they don't remember what you say before or any of 
the previous conversation**, which is an issue when you're building some applications like Chatbot and you want to have a conversation with them. 
- In this section we'll cover memory, which is basically how do you remember previous parts of the conversation and feed that into the language model so that they can have this conversational flow as you're interacting with them. 
- LangChain offers multiple sophisticated options of managing these memories. Let's jump in and take a look. 

### ConversationBufferMemory ###
    - using LangChain to manage a chat or a chatbot conversation. 
        - To do that,set the LLM as a **chat interface of OpenAI with temperature equals 0**, and use the memory as a ConversationBufferMemory, 
        - build a conversation chain. 
        - start to have a conversation, conversation.predict, given the input, 
        - you can change this verbose variable to true, to see what LangChain is actually doing. 

    - this is what LangChain has remembered in the memory of the conversation so far. It's just everything that the AI or that the human has said. 
        - the way that LangChain is storing the conversation is with this **ConversationBufferMemory.** 
        - we can use the ConversationBufferMemory to specify a couple of inputs and outputs, this is how we add new things to the memory if we wish to do it explicitly. 
        - Now, if we want to add additional data to the memory, you can keep on saving additional context. 
        
- when you use a large language model for a chat conversation, **the large language model itself is actually stateless**. 
    - The language model itself does not remember the conversation you've had so far. And each transaction, each call to the API endpoint is independent. 
       - Chatbots appear to have memory only because there's usually rapid code that provides the full conversation that's been had so far as context to the LLM. And so, the memory can store explicitly the terms or the utterances so far. 
       - And **this memory storage is used as input or additional context to the LLM** so that they can generate an output as if it's just having the next conversational turn, knowing what's been said before. And as the conversation becomes long, the amounts of memory needed becomes really, really long and does the cost of sending a lot of tokens to the LLM, which usually charges based on the number of tokens it needs to process, will also become more expensive. 
- So LangChain provides several convenient kinds of memory to **store and accumulate the conversation.** 
    - we've been looking at the ConversationBufferMemory. 

### ConversationBufferWindowMemory ###

- memory that **only keeps a window of memory**. 
    - If I set memory to conversationalbufferwindowmemory with k=1, the variable k equals one specifies that I wanted to **remember just one conversational exchange**. That is one utterance from user and one utterance from chatbot. 
        - So, now you were to have it save context, look at memory.load variables, it only remembers the most recent utterance. 
        - Notice it's dropped.` Hi, what's up?` It's just saying `human says not much, just hanging`, and the AI says cool. that is because k=1. 
        - it lets you keep track of just the most recent few conversational terms. 
            - In practice, you probably won't use this with k=1. 
            - You use this with k set to a larger number. 
            - But still, this prevents the memory from growing without limit as the conversation goes longer. 

### ConversationTokenBufferMemory ###
- With the conversational token buffer memory, the **memory will limit the number of tokens saved.**
    - because a lot of LLM pricing is based on tokens, this maps more directly to the cost of the LLM calls. 
        - if I were to say the max token limit is equal to 50, 
        - So let's say the conversation is, 
            - AI is what? Amazing. 
            - Backpropagation is what? Beautiful. 
            - Chatbot is what? Charming. 
        - I use ABC as the first letter of all of these conversational terms. We can keep track of what was said when. 
        - If run this with a high token limit, it has almost the whole conversation. 
        - If increase the token limit to 100, it now has the whole conversation, 
        - If decrease it, Then, you know, it chops off the earlier parts of this conversation to **retain the number of tokens corresponding to the most recent exchanges** but subject to not exceeding the token limit. 
    - why we needed to specify an LLM, 
        it's because **different LLMs use different ways of counting tokens**, so this tells it to use the way of counting tokens that the chatOpenAI LLM uses. 

### ConversationSummaryMemory ###, 
- Use an LLM to write a summary of the conversation so far, and let that be the memory. instead of limiting the memory to a fixed number of tokens based on the most recent utterances or a fixed number of conversational exchanges

conversation summary buffer memory 
with a max token limit of 400 in this case, 
pretty high token limit, 
and I'm going to 
insert in 
a few conversational terms in which we start with hello, what's 
up, no one's just hanging, 
cool. 
And then 
what is on the schedule today, and the 
response is, you know, that long schedule. 
So this 
memory 
now has quite a lot of text in it. In fact, let's take a look 
at the memory variables. 
It contains that entire 
piece of text, 
because 400 tokens was sufficient 
to store 
all this text. 
But now, 
if I were to reduce the max token limit, say to 100 tokens, 
remember, this stores the entire conversational history. 
If I reduce the number of tokens 
to 100, 
then, 
the conversation summary buffer memory has actually used an 
LLM, the OpenAI endpoint in this case, 
because that's where we set the LLM to, to 
actually generate a summary 
of the conversation so far. So the summary is, human-AI 
engage in small talk before the start of the schedule, and 
informs human in the morning meeting, blah, blah, blah, 
lunch meeting with customer interested in AI and latest 
AI developments. 
And so, 
if we were to have a conversation, 
using this LLM, 
let me create a conversation chain, 
same as before. 
And let's say that we were to ask, 
you know, input what would be a good demo to show. 
I said verbose equals true, so here's the prompt. 
The LLM thinks the current conversation 
has had 
this discussion so far, 
because that's the summary of the conversation. 
And just one note, if you're familiar with the OpenAI 
chat API endpoint, there is a specific system message. 
In this example, 
this is not using the official OpenAI system message, 
it's just including it as part of the prompt here, 
but it nonetheless works pretty well. And given this prompt, you know, 
the LLM outputs basic constituent AI developments, 
so it's just showcasing our latest NLP capabilities. Okay, that's 
cool. 
Well, it's, you know, making some suggestions to the cool demos, 
and makes you think if I was meeting a customer, I 
would say, 
boy, if only there were open source framework available to 
help me build cool NLP applications using LLMs. 
Good things are launching. 
And the interesting thing is, if you now look at 
what has happened to the memory, 
so, notice that here, it has incorporated the 
most recent AI system output, whereas my utterance asking it, 
what would be a good demo to show, has been incorporated 
into the system message, you know, the overall summary of 
the conversation so far. With the conversation 
summary buffer memory, what it tries to do is keep the 
explicit storage of the messages up to the 
number of tokens we have specified as a limit. So, 
you know, this part, the explicit storage, we're 
trying to cap at 100 tokens, because that's what we're asked for. 
 
And then anything beyond that, 
it will use CLM to generate a summary, which is what is seen up here. 
And even though I've illustrated these different memories 
using a chat as a running example, 
these memories are useful for other applications too, 
where you might keep on getting new snippets of text, 
or keep on getting new information, 
such as if your system repeatedly goes online 
to search for facts, but you want to keep the total memory 
used to store this growing list of facts as, you know, 
capped and not growing arbitrarily long. 
I encourage you to pause the video and run the code. 
In this video, you saw a few types of memory 
including buffer memories that limits based on number of 
conversation exchanges or tokens 
or a memory that can summarize tokens above a certain 
limit. 
LangChain actually supports additional memory types as well. 
One of the most powerful is vector data memory, 
if you're familiar with word embeddings and text embeddings, the 
vector database actually stores such embeddings. 
If you don't know what that means, don't worry about it. Harrison will 
explain it later. 
And it can then retrieve the most relevant 
blocks of text using this type of a vector database for its memory. 
And LangChain also supports entity memories, 
which is applicable when you wanted to remember 
details about specific people, specific other entities, 
such as if you talk about 
a specific friend, you can have LangChain 
remember facts about that friend, which would be 
an entity in an explicit way. When you're implementing applications 
using LangChain, you can also use multiple types 
of memories such as 
using one of the types of conversation memory that you saw in 
this video plus additionally entity memory 
to recall individuals. So this way you can remember maybe a summary 
of the conversation plus an explicit way of storing 
important facts about important people in 
the conversation. 
And of course, in addition to using these memory types, 
it's also not uncommon for developers to 
store the entire conversation in the conventional database. Some 
sort of key value store or SQL database. So you could refer back 
to the whole conversation for auditing or for 
improving the system further. 
And so, that's memory types. I hope you find this useful 
building your own applications. 
And now let's go on to the next video to learn about the 
key building block of LangChain, namely the chain. 


Note: 
- LLM's do not always produce the same results. When executing the code in your notebook, you may get slightly different answers that those in the video.

In [5]:
import os
os.environ['OPENAI_API_KEY']='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'

In [1]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

import warnings
warnings.filterwarnings('ignore')

In [2]:
# account for deprecation of LLM model
import datetime
# Get the current date
current_date = datetime.datetime.now().date()

# Define the date after which the model should be set to "gpt-3.5-turbo"
target_date = datetime.date(2024, 6, 12)

# Set the model variable based on the current date
if current_date > target_date:
    llm_model = "gpt-3.5-turbo"
else:
    llm_model = "gpt-3.5-turbo-0301"

In [3]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain #build a conversation chain
from langchain.memory import ConversationBufferMemory #use the memory as a ConversationBufferMemory


In [6]:
llm = ChatOpenAI(temperature=0.0, model=llm_model)
memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True # using verbose=True to see what 
)

In [7]:
conversation.predict(input="Hi, my name is Jessica")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, my name is Jessica
AI:[0m

[1m> Finished chain.[0m


"Hello Jessica, it's nice to meet you. My name is AI. How can I assist you today?"

In [8]:
conversation.predict(input="What is 5/6?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Jessica
AI: Hello Jessica, it's nice to meet you. My name is AI. How can I assist you today?
Human: What is 5/6?
AI:[0m

[1m> Finished chain.[0m


'5/6 is equivalent to 0.83333 when expressed as a decimal. It is also a proper fraction, meaning the numerator (5) is less than the denominator (6).'

In [9]:
conversation.predict(input="What is my name?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Jessica
AI: Hello Jessica, it's nice to meet you. My name is AI. How can I assist you today?
Human: What is 5/6?
AI: 5/6 is equivalent to 0.83333 when expressed as a decimal. It is also a proper fraction, meaning the numerator (5) is less than the denominator (6).
Human: What is my name?
AI:[0m

[1m> Finished chain.[0m


'Your name is Jessica, as you mentioned earlier.'

In [10]:
print(memory.buffer)

Human: Hi, my name is Jessica
AI: Hello Jessica, it's nice to meet you. My name is AI. How can I assist you today?
Human: What is 5/6?
AI: 5/6 is equivalent to 0.83333 when expressed as a decimal. It is also a proper fraction, meaning the numerator (5) is less than the denominator (6).
Human: What is my name?
AI: Your name is Jessica, as you mentioned earlier.


In [11]:
#The empty dictionary is for more advanced features that can be used with a more sophisticated input 
memory.load_memory_variables({})

{'history': "Human: Hi, my name is Jessica\nAI: Hello Jessica, it's nice to meet you. My name is AI. How can I assist you today?\nHuman: What is 5/6?\nAI: 5/6 is equivalent to 0.83333 when expressed as a decimal. It is also a proper fraction, meaning the numerator (5) is less than the denominator (6).\nHuman: What is my name?\nAI: Your name is Jessica, as you mentioned earlier."}

- we add new things to the memory if we wish to do it explicitly. 

In [13]:
memory = ConversationBufferMemory()

In [14]:
memory.save_context({"input": "Hi"}, 
                    {"output": "What's up"})

In [15]:
print(memory.buffer)

Human: Hi
AI: What's up


In [16]:
memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up"}

In [17]:
memory.save_context({"input": "Not much, just hanging"}, 
                    {"output": "Cool"})

In [18]:
memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up\nHuman: Not much, just hanging\nAI: Cool"}

## ConversationBufferWindowMemory ##

In [21]:
from langchain.memory import ConversationBufferWindowMemory

In [22]:
memory = ConversationBufferWindowMemory(k=1)               

In [23]:
memory.save_context({"input": "Hi"},
                    {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})


In [24]:
memory.load_memory_variables({})

{'history': 'Human: Not much, just hanging\nAI: Cool'}

In [27]:
llm = ChatOpenAI(temperature=0.0, model=llm_model)
memory = ConversationBufferWindowMemory(k=1)
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=False
)

In [28]:
conversation.predict(input="Hi, my name is Jessica")

"Hello Jessica, it's nice to meet you. My name is AI. How can I assist you today?"

In [29]:
conversation.predict(input="What is 4+1?")

'4+1 is equal to 5.'

In [30]:
conversation.predict(input="What is my name?")

"I'm sorry, I don't have access to that information. Could you please tell me your name?"

### ConversationTokenBufferMemory ###

In [32]:
#!pip install tiktoken

In [33]:
from langchain.memory import ConversationTokenBufferMemory
from langchain.llms import OpenAI
llm = ChatOpenAI(temperature=0.0, model=llm_model)

In [34]:
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=50)
memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"}, 
                    {"output": "Charming!"})

In [35]:
memory.load_memory_variables({})

{'history': 'AI: Amazing!\nHuman: Backpropagation is what?\nAI: Beautiful!\nHuman: Chatbots are what?\nAI: Charming!'}

In [36]:
# increase the limit to 100 will buffer all context
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=100)
memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"}, 
                    {"output": "Charming!"})

In [37]:
memory.load_memory_variables({})

{'history': 'Human: AI is what?!\nAI: Amazing!\nHuman: Backpropagation is what?\nAI: Beautiful!\nHuman: Chatbots are what?\nAI: Charming!'}

### ConversationSummaryMemory ###

In [38]:
from langchain.memory import ConversationSummaryBufferMemory

In [39]:
# create a long string
schedule = "There is a meeting at 8am with your product team. \
You will need your powerpoint presentation prepared. \
9am-12pm have time to work on your LangChain \
project which will go quickly because Langchain is such a powerful tool. \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."

memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=100)
memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.save_context({"input": "What is on the schedule today?"}, 
                    {"output": f"{schedule}"})

In [40]:
memory.load_memory_variables({})

{'history': "System: The human and AI engage in small talk before discussing the day's schedule. The AI informs the human of a morning meeting with the product team, time to work on the LangChain project, and a lunch meeting with a customer interested in the latest AI developments."}

In [41]:
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

In [42]:
conversation.predict(input="What would be a good demo to show?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
System: The human and AI engage in small talk before discussing the day's schedule. The AI informs the human of a morning meeting with the product team, time to work on the LangChain project, and a lunch meeting with a customer interested in the latest AI developments.
Human: What would be a good demo to show?
AI:[0m

[1m> Finished chain.[0m


"Based on the customer's interest in AI developments, I would suggest showcasing our latest natural language processing capabilities. We could demonstrate how our AI can accurately understand and respond to complex language queries, and even provide personalized recommendations based on the user's preferences. Additionally, we could highlight our AI's ability to learn and adapt over time, making it a valuable tool for businesses looking to improve their customer experience."

In [43]:
memory.load_memory_variables({})

{'history': "System: The human and AI engage in small talk before discussing the day's schedule. The AI informs the human of a morning meeting with the product team, time to work on the LangChain project, and a lunch meeting with a customer interested in the latest AI developments. The human asks what would be a good demo to show.\nAI: Based on the customer's interest in AI developments, I would suggest showcasing our latest natural language processing capabilities. We could demonstrate how our AI can accurately understand and respond to complex language queries, and even provide personalized recommendations based on the user's preferences. Additionally, we could highlight our AI's ability to learn and adapt over time, making it a valuable tool for businesses looking to improve their customer experience."}