# LLM Memory

Author: Pavel Agurov, pavel_agurov@epam.com

When you work with ChatGpt, you can see that it supports conversation. You can ask a question and clarify details. This means that ChatGpt can remember the context of the conversation.
But LLM only has only string prompt as input and has no memory. This means we have to store all the content into prompt and find a way to stay in prompt size limit.
Let's see how it works.

In [None]:
%pip install openai > /dev/null
%pip install tiktoken > /dev/null
%pip install langchain > /dev/null
%pip install langchain_openai > /dev/null
%pip install langchain_core > /dev/null
%pip install langchain_community > /dev/null
%pip install langchain_text_splitters > /dev/null
%pip install sentence-transformers > /dev/null

Let's create connection to EPAM DIAL

In [4]:
import os
from langchain_openai import AzureChatOpenAI
from langchain_core.output_parsers import StrOutputParser

epam_dial = AzureChatOpenAI(
        api_version     = "2023-07-01-preview",
        azure_endpoint  = "https://ai-proxy.lab.epam.com",
        model           = "gpt-4o-mini-2024-07-18",
        temperature     = 0.0
    ) 


query_chain  = epam_dial | StrOutputParser()

And let's send simple query:

In [7]:
query_chain.invoke("Hello, my name is Pavel")

'Hello Pavel! How can I assist you today?'

OK, not let's ask model about my name:

In [8]:
query_chain.invoke("What is my name?")

"I'm sorry, but I don't have access to personal information about individuals unless it has been shared with me in the course of our conversation."

As expected, model has no information about name.

## ConversationBufferWindowMemory

Let's add memory. We will use ConversationBufferWindowMemory (https://python.langchain.com/v0.1/docs/modules/memory/types/buffer_window/). Parameter k defines how many interactions will be used. This can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large.

In [29]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k = 10)
conversation = ConversationChain(
    llm    = epam_dial,
    memory = memory,
    verbose= False
)

In [30]:
response_object = conversation.invoke("Hello, my name is Pavel")
print(response_object["response"])

Hello Pavel! It's nice to meet you. How can I assist you today?


In [31]:
response_object = conversation.invoke("What is my name?")
print(response_object["response"])

Your name is Pavel.


As you can see model remembers my first message and can response about name. Memory is stored inside ConversationBufferWindowMemory and provided into prompt. When model generates answer all memory will be return in answer object in field 'history'. 

In [32]:
print(response_object["history"])

Human: Hello, my name is Pavel
AI: Hello Pavel! It's nice to meet you. How can I assist you today?


We can see stored memory in conversation object:

In [33]:
conversation.memory.load_memory_variables({})

{'history': "Human: Hello, my name is Pavel\nAI: Hello Pavel! It's nice to meet you. How can I assist you today?\nHuman: What is my name?\nAI: Your name is Pavel."}

## ConversationSummaryMemory

There are more complex memory type - ConversationSummaryMemory (https://python.langchain.com/v0.1/docs/modules/memory/types/summary/): 
*"This type of memory creates a summary of the conversation over time. This can be useful for condensing information from the conversation over time. Conversation summary memory summarizes the conversation as it happens and stores the current summary in memory. This memory can then be used to inject the summary of the conversation so far into a prompt/chain. This memory is most useful for longer conversations, where keeping the past message history in the prompt verbatim would take up too many tokens."*

In [39]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationSummaryMemory

# memory uses LLM to build summary
memory = ConversationSummaryMemory(llm = epam_dial)

conversation = ConversationChain(
    llm    = epam_dial,
    memory = memory,
    verbose= False
)

In [40]:
response_object = conversation.invoke("Hello, my name is Pavel")
print(response_object["response"])

Hello Pavel! It's nice to meet you. How can I assist you today?


In [41]:
response_object = conversation.invoke("Oh, sorry, I was wrong, my name is Bob")
print(response_object["response"])

Hello Bob! No problem at all. How can I assist you today?


In [42]:
response_object = conversation.invoke("What is my name?")
print(response_object["response"])

Your name is Bob.


But let's see now what we have in the memory - it's not just our dialog, it's summary of all our conversations. But please be aware that we can lost important details during summarization.

## Save and load memory

We can get memory as json and save it into file. In fact, it will save all context of our conversation and we can easy restore it later.

In [43]:
conversation.memory.load_memory_variables({})

{'history': 'The human introduces themselves as Pavel, but corrects themselves and says their name is Bob. The AI greets Bob and asks how it can assist him. When Bob asks what his name is, the AI confirms that his name is indeed Bob.'}

We can upload some information into memory in advance (from file or db).

In [44]:
from langchain.memory import ConversationSummaryMemory, ChatMessageHistory
from langchain.chains import ConversationChain
from langchain.memory import ConversationSummaryMemory

history = ChatMessageHistory()
history.add_user_message("My name is Pavel")
history.add_user_message("My favorite color is blue")

memory = ConversationSummaryMemory.from_messages(
    llm         = epam_dial,
    chat_memory = history
)

conversation = ConversationChain(
    llm    = epam_dial,
    memory = memory,
    verbose= False
)

# we can also add memory directly
# conversation.memory.save_context({"input": "hi"}, {"output": "whats up"})

In [46]:
response_object = conversation.invoke("What is my name and what color is the best?")
print(response_object["response"])

Your name is Pavel, and you mentioned that your favorite color is blue.


## Other memory types

**ConversationSummaryBufferMemory** (https://python.langchain.com/v0.1/docs/modules/memory/types/summary_buffer/) combines the two ideas. It keeps a buffer of recent interactions in memory, but rather than just completely flushing old interactions it compiles them into a summary and uses both. It uses token length rather than number of interactions to determine when to flush interactions.

**ConversationTokenBufferMemory** (https://python.langchain.com/v0.1/docs/modules/memory/types/token_buffer/) keeps a buffer of recent interactions in memory, and uses token length rather than number of interactions to determine when to flush interactions.