# More on chat memory

First part of the chain, just setup api key (on `azure` in this example). This cell prints the name of the base deployment just to verify that we correctly loaded the .env file.

Remember you need a .env file with the following content:

```
OPENAI_API_KEY=xxxxxxxxx
OPENAI_API_BASE=https://alkopenai2.openai.azure.com/
HUGGINGFACEHUB_API_TOKEN=xxxxxxx
PINECONE_KEY=xxxxxxx
PINECONE_ENV=us-west1-gcp-free
SERPAPI_API_KEY=xxxxx
```

In [None]:
import os
import openai
from pprint import pprint

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_type = "azure"
openai.api_version = "2023-03-15-preview"

# Remember that you need to set the OPENAI_API_BASE to point openai to your specific deployment.

openai.api_base = os.getenv('OPENAI_API_BASE')
openai.api_key = os.getenv("OPENAI_API_KEY")

# print base to verify you are pointint to the right deployment
print(openai.api_base)


As always I'm creating various llm 

In [None]:
from langchain.chat_models import AzureChatOpenAI
from langchain.llms import AzureOpenAI  

# Pay attention that gpt35 and gpt4 are chat based llms
gpt35 =  AzureChatOpenAI(
    openai_api_version="2023-03-15-preview",
    deployment_name="Gpt35", 
    model_name="gpt-35-turbo"
)

gpt4 =  AzureChatOpenAI(
    openai_api_version="2023-03-15-preview",
    deployment_name="Gpt4", 
    model_name="gpt-4"
)

# this is not a Chat model this is a text model
davinci = AzureOpenAI(
	temperature=0,
	openai_api_version="2023-03-15-preview",
    deployment_name="text-davinci-003", 
	model_name="text-davinci-003"
)

In [None]:
# a simple helper
from langchain.callbacks import get_openai_callback

def conversate(conversation, question):
    
    with get_openai_callback() as cb:
        result = conversation.predict(input= question)
        print(cb)
        print(result)

We use a different memory, this time the memory is a  `ConversationBufferWindowMemory` that contains a parameter k to limit the number of interaction that are sent with the requests.

In [None]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferWindowMemory
from pprint import pprint

memory = ConversationBufferWindowMemory(k = 3) # k is the size of the window

conversation = ConversationChain(
    llm=gpt35,
    memory=memory,
    verbose=True #Feel free to set to true to look for internal working, in this example is not necessary
)

# Actually the conversation has some actual memory 
pprint(memory.load_memory_variables({}))

result = conversation.predict(input = "Hi my name is Gian Maria, what is your name?")
pprint(result)
pprint("--------------------------")
result = conversation.predict(input= "What is my name?")
pprint(result)

This kind of memory contains all the conversation but sends only the last k iterations to the LLM during the actual call. 
Continue to call the LLM after some iterations the original context seems to be lost and it does not know anymore my name

In [None]:

result = conversation.predict(input = "Can you do some basic math like 10+34?")
pprint(result)
pprint("--------------------------")
result = conversation.predict(input = "Now 10 raised to the power of two?")
pprint(result)
pprint("--------------------------")
result = conversation.predict(input = "Natural Logarithm of 100?")
pprint(result)
pprint("--------------------------")
result = conversation.predict(input= "What is my name?")
pprint(result)


As you can verify, when you are talking with the LLM only the last k (in this situation is equal to 3) conversation are passed. But if you dump content of the memory you can verify that indeed the memory contains all the conversation.

In [None]:
pprint (f"memory buffer length is: {len(memory.buffer)}")
pprint (f"memory.chat_memory.messages length is: {len(memory.chat_memory.messages)}")

pprint (f"memory.k is: {memory.k}")
# as you can see load_memory_variables is the method that actually loads the content to send to the llm
pprint(memory.load_memory_variables({}))

So it is quite evident that we can change the k whenever we want, changing the windows of the content.

In [None]:
memory.k = 1 #make the memory size 1
print(memory.load_memory_variables({}))
memory.k = 2
print(memory.load_memory_variables({}))


So I can ask to send all the content

In [None]:
memory.k = 100
conversate(conversation, "What is my name? Please find this information from the beginning of the chat")

In [None]:
memory.clear() # same as previous one
conversate(conversation, "what is my name?")

Lets make a recap of the ConversationBufferwindowMemory class.

In [None]:
test_memory = ConversationBufferWindowMemory(k=2)
test_memory.save_context({"input" : "hello my name is Gian Maria"}, {"output" : "I'm an AI model and I'm here to help you"})
test_memory.save_context({"input" : "What is my name?"}, {"output" : "Your name is Gian Maria"})
test_memory.save_context({"input" : "Can you tell me the result of 1+1?"}, {"output" : "Sure the result is 2"})

print(f"Buffer property has len: {len(test_memory.buffer)}")
print(f"load_memory_variables property has len: {len(test_memory.load_memory_variables({}))}")

print(f"load_memory_variables k = 2 => {test_memory.load_memory_variables({})}")
test_memory.k = 1
print(f"load_memory_variables with k = 1 => {test_memory.load_memory_variables({})}")


# Memory that keeps track of used tokens

Having a memory that keep latest K messages is not so smart, better sometimes using token, because token is the real HARD limit of LLMs

In [117]:
from langchain.memory import ConversationTokenBufferMemory
memory = ConversationTokenBufferMemory(
    llm=davinci,  # since we are using a token limit we really need to specify the model used.
    max_token_limit=300)

memory.save_context({"input" : "hello my name is Gian Maria"}, {"output" : "I'm an AI model and I'm here to help you"})
print(f"We have {memory.llm.get_num_tokens_from_messages(memory.buffer)} tokens in memory")

memory.save_context({"input" : "What is my name?"}, {"output" : "Your name is Gian Maria"})
memory.save_context({"input" : "can you tell me some lorem ipsum?"}, {"output" : "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum"})
print(f"We have {memory.llm.get_num_tokens_from_messages(memory.buffer)} tokens in memory")

print("Print the memory that will be pased to conversation")
print(memory.load_memory_variables({}))



We have 22 tokens in memory
We have 203 tokens in memory
Print the memory that will be pased to conversation
{'history': "Human: hello my name is Gian Maria\nAI: I'm an AI model and I'm here to help you\nHuman: What is my name?\nAI: Your name is Gian Maria\nHuman: can you tell me some lorem ipsum?\nAI: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum"}


Now you can try to change the limit of the token but you will find that the content passed to the LLM does not change

In [118]:
memory.max_token_limit = 20
print(f"Limit is now {memory.max_token_limit} token - Actually we have {memory.llm.get_num_tokens_from_messages(memory.buffer)} tokens in memory")
print("Print the memory that will be pased to conversation")
print(memory.load_memory_variables({}))


Limit is now 20 token - Actually we have 203 tokens in memory
Print the memory that will be pased to conversation
{'history': "Human: hello my name is Gian Maria\nAI: I'm an AI model and I'm here to help you\nHuman: What is my name?\nAI: Your name is Gian Maria\nHuman: can you tell me some lorem ipsum?\nAI: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum"}


In [119]:
pprint(memory.chat_memory)

ChatMessageHistory(messages=[HumanMessage(content='hello my name is Gian Maria', additional_kwargs={}, example=False), AIMessage(content="I'm an AI model and I'm here to help you", additional_kwargs={}, example=False), HumanMessage(content='What is my name?', additional_kwargs={}, example=False), AIMessage(content='Your name is Gian Maria', additional_kwargs={}, example=False), HumanMessage(content='can you tell me some lorem ipsum?', additional_kwargs={}, example=False), AIMessage(content='Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum', additional_kwargs={}, example=False)])


This because the max_token_limit is not enforced when you read the buffer, but it is an hard limit of what is kept in memory, now that the limit is 20 tokens, we can simply add more content and the limit is enforced.

In [120]:
memory.save_context({"input" : "lets play"}, {"output" : "Sure cool I like play"})
print(memory.load_memory_variables({}))
print(memory.buffer)

{'history': 'Human: lets play\nAI: Sure cool I like play'}
[HumanMessage(content='lets play', additional_kwargs={}, example=False), AIMessage(content='Sure cool I like play', additional_kwargs={}, example=False)]


This is **a strong `difference from the ConversationBufferWindowMemory class ` you cannot change number of tokens on the fly.

In [121]:
pprint(memory.chat_memory)

ChatMessageHistory(messages=[HumanMessage(content='lets play', additional_kwargs={}, example=False), AIMessage(content='Sure cool I like play', additional_kwargs={}, example=False)])
