# <center style="font-family: consolas; font-size: 32px; font-weight: bold;">  Hands-On LangChain for LLM Applications Development: Chatbots Memory </center>

***


When interacting with language models, such as Chatbots, the absence of memory poses a significant hurdle in creating natural and seamless conversations. Users expect continuity and context retention, which traditional models lack. This limitation becomes particularly evident in applications where ongoing dialogue is crucial for user engagement and satisfaction.

LangChain offers robust solutions to address this challenge. Memory, in this context, refers to the ability of the language model to remember previous parts of a conversation and use that information to inform subsequent interactions. By incorporating memory into the model’s architecture, LangChain enables Chatbots and similar applications to maintain a conversational flow that mimics human-like dialogue.

LangChain’s memory capabilities extend beyond mere recall of past interactions. It encompasses sophisticated mechanisms for storing, organizing, and retrieving relevant information, ensuring that the Chatbot can respond appropriately based on the context of the conversation. This not only enhances the user experience but also enables the Chatbot to provide more accurate and relevant responses over time.

#### <a id="top"></a>
# <div style="box-shadow: rgb(60, 121, 245) 0px 0px 0px 3px inset, rgb(255, 255, 255) 10px -10px 0px -3px, rgb(31, 193, 27) 10px -10px, rgb(255, 255, 255) 20px -20px 0px -3px, rgb(255, 217, 19) 20px -20px, rgb(255, 255, 255) 30px -30px 0px -3px, rgb(255, 156, 85) 30px -30px, rgb(255, 255, 255) 40px -40px 0px -3px, rgb(255, 85, 85) 40px -40px; padding:20px; margin-right: 40px; font-size:30px; font-family: consolas; text-align:center; display:fill; border-radius:15px; color:rgb(60, 121, 245);"><b>Table of contents</b></div>

<div style="background-color: rgba(60, 121, 245, 0.03); padding:30px; font-size:15px; font-family: consolas;">
<ul>
    <li><a href="#1" target="_self" rel=" noreferrer nofollow">1. Setting Up Working Environment & Getting Starting </a> </li>
    <li><a href="#2" target="_self" rel=" noreferrer nofollow">2. Conversation Buffer Memory </a></li>
    <li><a href="#3" target="_self" rel=" noreferrer nofollow">3. Conversation Buffer Window Memory </a></li>
    <li><a href="#3" target="_self" rel=" noreferrer nofollow">4. Conversation Token Buffer Memory </a></li>
    <li><a href="#3" target="_self" rel=" noreferrer nofollow">5. Conversation Summary Memory </a></li>
</ul>
</div>

***



<a id="1"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(51, 51, 51) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 1. Setting Up Working Environment & Getting Started </b></div>

To get started we are going to import OS, import OpenAI, and load my OpenAI secret key. If you’re running this locally, and you don’t have OpenAI installed yet, you might need to run pip to install OpenAI.

In [None]:
!pip install langchain
!pip install langchain_community
!pip install openai

In [None]:
import os
import openai
from openai import OpenAI
from getpass import getpass

OPENAI_API_KEY =  getpass('Please enter your open_api_key:')
api_key = {
    "OPENAI_API_KEY":OPENAI_API_KEY
}
locals().update(api_key)
os.environ.update(api_key)

llm_model = "gpt-3.5-turbo"

<a id="2"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(51, 51, 51) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 2. Conversation Buffer Memory </b></div>


Let’s start with a motivating example for memory, using LangChain to manage a chat or a chatbot conversation. I am going to set the LLM as a chat interface of OpenAI with a temperature equal to 0. We will use the memory as a ConversationBufferMemory and then build a conversation chain.

In [4]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

llm = ChatOpenAI(temperature=0.0, model=llm_model, openai_api_key=OPENAI_API_KEY)
memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=llm,
    memory = memory,
    verbose=True
)

Let's start a conversation using conversation.predict() and the given input will be “Hi, my name is Youssef” and let’s see the response.

In [5]:
conversation.predict(input="Hi, my name is Akar")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, my name is Akar
AI:[0m

[1m> Finished chain.[0m


"Hello Akar! It's nice to meet you. How can I assist you today?"

Then, let’s ask it, what is 1 plus 1? 1 plus 1 is 2, and then ask it again, you know, what’s my name?

In [6]:
conversation.predict(input="What is 3*3?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Akar
AI: Hello Akar! It's nice to meet you. How can I assist you today?
Human: What is 3*3?
AI:[0m

[1m> Finished chain.[0m


'3 multiplied by 3 is equal to 9. Is there anything else you would like to know?'

Now we will ask it again, “What’s my name?” and you can see its response “Your name is Youssef, as you mentioned earlier”.

In [7]:
conversation.predict(input="What is my name?")




[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Akar
AI: Hello Akar! It's nice to meet you. How can I assist you today?
Human: What is 3*3?
AI: 3 multiplied by 3 is equal to 9. Is there anything else you would like to know?
Human: What is my name?
AI:[0m

[1m> Finished chain.[0m


'Your name is Akar.'

So this is a prompt that LangChain has generated to have the system have a hopeful and friendly conversation with you. When you execute this on the second and third parts of the conversations, it keeps the prompt as follows.

Since I have used the memory variable to store the memory. So if I were to print memory.buffer, it has stored the conversation so far.

In [8]:
print(memory.buffer)


Human: Hi, my name is Akar
AI: Hello Akar! It's nice to meet you. How can I assist you today?
Human: What is 3*3?
AI: 3 multiplied by 3 is equal to 9. Is there anything else you would like to know?
Human: What is my name?
AI: Your name is Akar.


You can also print this out using the memory.loadMemoryVariables({}). The curly braces here are an empty dictionary. There are some more advanced features that you can use with a more sophisticated input, but it is not covered in this article but you can find it in LangChain documentation. So don’t worry about why there’s an empty curly braces here. But this is what LangChain has remembered in the memory of the conversation so far. It’s just everything that the AI or the human has said.

In [9]:
memory.load_memory_variables({})

{'history': "Human: Hi, my name is Akar\nAI: Hello Akar! It's nice to meet you. How can I assist you today?\nHuman: What is 3*3?\nAI: 3 multiplied by 3 is equal to 9. Is there anything else you would like to know?\nHuman: What is my name?\nAI: Your name is Akar."}

The way that LangChain is storing the conversation is with this ConversationBufferMemory. If I were to use the ConversationBufferMemory to specify a couple of inputs and outputs, this is how you add new things to the memory if you wish to do so explicitly.

In [10]:
memory = ConversationBufferMemory()
memory.save_context({"input": "Hi"},
                    {"output": "What's up"})


Memory.saveContext says, hi, what’s up? I know this is not the most exciting conversation, but I wanted to give a short example. And with that, this is what the status of the memory is. Once again, let me show the memory variables. Now, if you want to add additional data to the memory, you can keep on saving additional context. So, the conversation goes on, not much, just hanging, cool.

In [11]:
print(memory.buffer)

Human: Hi
AI: What's up


When you use a large language model for a chat conversation, the large language model itself is stateless. The language model itself does not remember the conversation you’ve had so far. Therefore each call to the API endpoint is independent. Chatbots have memory only because there’s usually rapid code that provides the full conversation that’s been had so far as context to the LLM. Therefore the memory can store explicitly the terms or the utterances so far. Hi, my name is Youssef. Hello, it’s just nice to meet you and so on.

This memory storage is used as input or additional context to the LLM so that it can generate output as if it’s just having the next conversational turn, knowing what’s been said before.

As the conversation becomes long, the amount of memory needed becomes long, and the cost of sending a lot of tokens to the LLM, which usually charges based on the number of tokens it needs to process, will also become more expensive.

So LangChain provides several convenient kinds of memory to store and accumulate the conversation. So far, we’ve been looking at the ConversationBufferMemory. Let’s look at a different type of memory.

<a id="3"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(51, 51, 51) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 3. Conversation Buffer Window Memory</b></div>

Let's start with importing the conversation buffer window memory that only keeps a window of memory. If I set memory to conversational buffer window memory with k equals one, the variable k =1 specifies that I want to remember just one conversational exchange. That is one utterance from me and one utterance from a chatbot.



In [12]:
from langchain.memory import ConversationBufferWindowMemory
memory = ConversationBufferWindowMemory(k=1)
memory.load_memory_variables({})

{'history': ''}

If I were to have it save context, “Hi, what’s up, not much, just hanging”. If I were to look at memory.load_memory_variables({}), it only remembers the most recent utterance as you can see below.

In [13]:
memory.save_context({"input": "Hi"},
                    {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.load_memory_variables({})

{'history': 'Human: Not much, just hanging\nAI: Cool'}

It only returns the recent conversation because k was equal to one. So this is a nice feature because it lets you keep track of just the most recent few conversational terms. In practice, you probably won’t use this with k equals one. You use this with k set to a larger number.

 If I were to rerun the conversation that we had above, we would say “Hi, my name is Youssef”. Then we will say “What is 1 plus 1?”. Finally, I will ask it “What is my name?”. Because we set k equals 1, it only remembers the last exchange versus what is 1 plus 1? The answer is 1 plus 1 equals 2, and it’s forgotten this early exchange which is now, now says, sorry, don’t have access to that information.

In [14]:
conversation.predict(input="Hi, my name is Akar")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Akar
AI: Hello Akar! It's nice to meet you. How can I assist you today?
Human: What is 3*3?
AI: 3 multiplied by 3 is equal to 9. Is there anything else you would like to know?
Human: What is my name?
AI: Your name is Akar.
Human: Hi, my name is Akar
AI:[0m

[1m> Finished chain.[0m


"Hello Akar! It's nice to meet you again. How can I assist you today?"

In [15]:
conversation.predict(input="What is 3*3?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Akar
AI: Hello Akar! It's nice to meet you. How can I assist you today?
Human: What is 3*3?
AI: 3 multiplied by 3 is equal to 9. Is there anything else you would like to know?
Human: What is my name?
AI: Your name is Akar.
Human: Hi, my name is Akar
AI: Hello Akar! It's nice to meet you again. How can I assist you today?
Human: What is 3*3?
AI:[0m

[1m> Finished chain.[0m


'3 multiplied by 3 is equal to 9. Is there anything else you would like to know?'

In [16]:
conversation.predict(input="What is my name?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Akar
AI: Hello Akar! It's nice to meet you. How can I assist you today?
Human: What is 3*3?
AI: 3 multiplied by 3 is equal to 9. Is there anything else you would like to know?
Human: What is my name?
AI: Your name is Akar.
Human: Hi, my name is Akar
AI: Hello Akar! It's nice to meet you again. How can I assist you today?
Human: What is 3*3?
AI: 3 multiplied by 3 is equal to 9. Is there anything else you would like to know?
Human: What is my name?
AI:[0m

[1m> Finished chain.[0m


'Your name is Akar.'

<a id="4"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(51, 51, 51) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 4. Conversation Token Buffer Memory</b></div>

With the conversational token buffer memory, the memory will limit the number of tokens saved. Since a lot of LLM pricing is based on tokens, this maps more directly to the cost of the LLM calls. Let's set the max token limit to 50. Let’s input the following conversation “AI is what? Amazing. Backpropagation is what? Beautiful. Chatbot is what? Charming”

In [17]:
!pip install tiktoken
from langchain.memory import ConversationTokenBufferMemory

memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=50)
memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"},
                    {"output": "Charming!"})

memory.load_memory_variables({})


Collecting tiktoken
  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tiktoken
Successfully installed tiktoken-0.7.0


{'history': 'AI: Amazing!\nHuman: Backpropagation is what?\nAI: Beautiful!\nHuman: Chatbots are what?\nAI: Charming!'}

 If I run this with a high token limit, it has almost the whole conversation. If I increase the token limit to 100, it now has the whole conversation.

In [18]:
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=50)
memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"},
                    {"output": "Charming!"})
memory.load_memory_variables({})

{'history': 'AI: Amazing!\nHuman: Backpropagation is what?\nAI: Beautiful!\nHuman: Chatbots are what?\nAI: Charming!'}

If I decrease it to 20, Then, you know, it chops off the earlier parts of this conversation to retain the number of tokens corresponding to the most recent exchanges but subject to not exceeding the token limit.

In [19]:
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=20)
memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"},
                    {"output": "Charming!"})
memory.load_memory_variables({})

{'history': 'AI: Charming!'}

<a id="5"></a>
# <div style="box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px inset, rgb(51, 51, 51) 0px 0px 0px 3px inset; padding:20px; font-size:32px; font-family: consolas; text-align:center; display:fill; border-radius:15px;  color:rgb(34, 34, 34);"> <b> 5. Conversation Summary Memory</b></div>

Finally, there’s one last type of memory we will explore, which is the conversation summary buffer memory. The idea is that instead of limiting the memory to a fixed number of tokens based on the most recent utterances or a fixed number of conversational exchanges, let’s use an LLM to write a summary of the conversation and let that be the memory.

So here’s an example where I’m going to create a long string to schedule a meeting with the product team:

In [20]:
# create a long string
schedule = "There is a meeting at 8am with your product team. \
You will need your powerpoint presentation prepared. \
9am-12pm have time to work on your LangChain \
project which will go quickly because Langchain is such a powerful tool. \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."


We will use a conversation summary buffer memory with a max token limit of 400 in this case which is a pretty high token limit, and I’m going to insert in a few conversational terms in which we start with “Hello, what’s up, Not much just hanging, cool, what is on the schedule today, and the response is the long schedule above”.

So this memory now has quite a lot of text in it. Let’s take a look at the memory variables. It contains that entire piece of text because 400 tokens were sufficient to store all this text.

In [21]:
from langchain.memory import ConversationSummaryBufferMemory

memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=100)
memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.save_context({"input": "What is on the schedule today?"},
                    {"output": f"{schedule}"})

memory.load_memory_variables({})

{'history': "System: The human and AI exchange greetings and discuss the schedule for the day. The AI informs the human of a morning meeting with the product team, work on the LangChain project, and a lunch meeting with a customer interested in AI. The AI emphasizes the importance of being prepared for the day's events."}

If we were to reduce the max token limit to 100 tokens, then the conversation summary buffer memory would use an LLM, the OpenAI endpoint in this case to generate a summary of the conversation as shown below:

In [22]:
memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=400)
memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.save_context({"input": "What is on the schedule today?"},
                    {"output": f"{schedule}"})

If we were to have a conversation, using this LLM, let me create a conversation chain, the same as before. Let’s say that we were to give an input “What would be a good demo to show” and I would set the verbose equals true. The LLM thinks the current conversation has had this discussion so far because that’s the summary of the conversation.

In [23]:
conversation = ConversationChain(
    llm=llm,
    memory = memory,
    verbose=True
)
conversation.predict(input="What would be a good demo to show?")




[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hello
AI: What's up
Human: Not much, just hanging
AI: Cool
Human: What is on the schedule today?
AI: There is a meeting at 8am with your product team. You will need your powerpoint presentation prepared. 9am-12pm have time to work on your LangChain project which will go quickly because Langchain is such a powerful tool. At Noon, lunch at the italian resturant with a customer who is driving from over an hour away to meet you to understand the latest in AI. Be sure to bring your laptop to show the latest LLM demo.
Human: What would be a good demo to show?
AI:[0m

[1m> Finished chain.[0m


"I would recommend showcasing the Language Model for Language Chains (LLM) demo. It's a cutting-edge AI tool that can generate realistic and coherent text based on a given prompt. It's sure to impress your customer and demonstrate the power of AI technology."

With the conversation summary buffer memory, the aim is to explicitly store messages up to a specified token limit. In this case, we’re capping explicit storage at 100 tokens, as requested. Any additional content beyond this limit is summarized using CLM, as demonstrated above.

***

LangChain also encompasses additional memory types, with one of the most notable being vector data memory. This feature proves particularly potent for those familiar with word embeddings and text embeddings, as it involves storing such embeddings within the vector database. Utilizing this type of vector database memory enables LangChain to retrieve the most relevant blocks of text, enhancing its memory capabilities significantly.

When developing applications with LangChain, you have the flexibility to utilize various memory types. This includes leveraging conversation memory, as demonstrated in this article, along with entity memory to recall specific individuals.

This approach allows for the retention of both a summarized version of the conversation and explicit details about key individuals involved. Additionally, developers often opt to store the entire conversation in a conventional database, such as a key-value store or SQL database.

 This enables easy reference back to the conversation for auditing purposes or further system enhancements. So, these memory types offer a robust framework for building your applications.

# <div style="box-shadow: rgba(240, 46, 170, 0.4) -5px 5px inset, rgba(240, 46, 170, 0.3) -10px 10px inset, rgba(240, 46, 170, 0.2) -15px 15px inset, rgba(240, 46, 170, 0.1) -20px 20px inset, rgba(240, 46, 170, 0.05) -25px 25px inset; padding:20px; font-size:30px; font-family: consolas; display:fill; border-radius:15px; color: rgba(240, 46, 170, 0.7)"> <b> ༼⁠ ⁠つ⁠ ⁠◕⁠‿⁠◕⁠ ⁠༽⁠つ Thank You!</b></div>

<p style="font-family:verdana; color:rgb(34, 34, 34); font-family: consolas; font-size: 16px;"> 💌 Thank you for taking the time to read through my notebook. I hope you found it interesting and informative. If you have any feedback or suggestions for improvement, please don't hesitate to let me know in the comments. <br><br> 🚀 If you liked this notebook, please consider upvoting it so that others can discover it too. Your support means a lot to me, and it helps to motivate me to create more content in the future. <br><br> ❤️ Once again, thank you for your support, and I hope to see you again soon!</p>