# Understanding Memory in LLMs

In the previous Notebook 03, we successfully explored how OpenAI models can enhance the results from Azure Cognitive Search. [Bing Chat](http://chat.bing.com/) is a search engine with a GPT-4 model that utilizes the content of search results to provide context and deliver accurate responses to queries.

However, we have yet to discover how to engage in a conversation with the LLM. With Bing Chat, this is possible, as the LLM can understand and reference the previous responses.

There is a common misconception that GPT models have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

The aim of this Notebook is to demonstrate how we can "provide memory" to the LLM by utilizing prompts and context.

In [1]:
import os
import random
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains import ConversationChain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT
from langchain.memory import ConversationBufferMemory
from openai.error import OpenAIError
from langchain.docstore.document import Document
from langchain.memory import CosmosDBChatMessageHistory

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from common.utils import (
    get_search_results,
    order_search_results,
    model_tokens_limit,
    num_tokens_from_docs,
    embed_docs,
    search_docs,
    get_answer,
)

from common.prompts import COMBINE_QUESTION_PROMPT, COMBINE_PROMPT, COMBINE_CHAT_PROMPT

from dotenv import load_dotenv
load_dotenv("credentials2.env")

import logging

# Get the root logger
logger = logging.getLogger()
# Set the logging level to a higher level to ignore INFO messages
logger.setLevel(logging.WARNING)

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_BASE"] = os.environ["AZURE_OPENAI_ENDPOINT"]
os.environ["OPENAI_API_KEY"] = os.environ["AZURE_OPENAI_API_KEY"]
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]
os.environ["OPENAI_API_TYPE"] = "azure"

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [3]:
QUESTION = "Tell me some use cases for reinforcement learning?"
FOLLOW_UP_QUESTION = "Can you summarize your last response?"

In [4]:
# Define model
MODEL = "chatgpt"
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5)

In [5]:
# We create a very simple prompt template, just the question as is:
prompt = PromptTemplate(
    input_variables=["question"],
    template="{question}",
)

chain = LLMChain(llm=llm, prompt=prompt)

In [6]:
# Let's see what the GPT model responds
response = chain.run(QUESTION)
printmd(response)

1. Game playing: Reinforcement learning has been successfully used to train agents to play games such as chess, Go, and poker.

2. Robotics: Reinforcement learning can be used to train robots to perform complex tasks such as grasping objects, walking, and navigating through an environment.

3. Autonomous vehicles: Reinforcement learning can be used to train autonomous vehicles to make decisions such as when to accelerate or brake, and how to avoid obstacles.

4. Personalized recommendations: Reinforcement learning can be used to personalize recommendations for products or services based on a user's behavior and preferences.

5. Healthcare: Reinforcement learning can be used to optimize treatment plans for patients with chronic diseases such as diabetes or cancer.

6. Finance: Reinforcement learning can be used to optimize investment strategies and portfolio management.

7. Advertising: Reinforcement learning can be used to optimize ad targeting and placement to maximize engagement and conversions.

8. Energy management: Reinforcement learning can be used to optimize energy consumption in buildings and reduce costs.

9. Supply chain management: Reinforcement learning can be used to optimize inventory management and logistics operations.

10. Agriculture: Reinforcement learning can be used to optimize crop yields and reduce waste.

In [7]:
#Now let's ask a follow up question
chain.run(FOLLOW_UP_QUESTION)

"I'm sorry, as an AI language model, I do not have access to your previous conversation history. Please provide me with more context or information so that I can assist you better."

As you can see, it doesn't remember what it just responded. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [8]:
# The prompt template is updated to include the history
hist_prompt = PromptTemplate(
    input_variables=["history", "question"],
    template="""
                {history}
                Human: {question}
                AI:
            """
    )
chain = LLMChain(llm=llm, prompt=hist_prompt)

In [9]:
# here we define history as being the question and response
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response)

In [10]:
# we now tell the chain to use the history to answer the follow up question
chain.run({"history":Conversation_history, "question": FOLLOW_UP_QUESTION})

'Reinforcement learning has various use cases, including game playing, robotics, autonomous vehicles, personalized recommendations, healthcare, finance, advertising, energy management, supply chain management, and agriculture. It can be used to optimize various tasks and processes, ranging from investment strategies to crop yields.'

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

In order to not duplicate code, we have put many of the code used in Notebook 3 into functions. These functions are in the `common/utils.py` and `common/prompts.py` files This way we can use these functios in the app that we will build later.

In [11]:
index1_name = "0137752-index-files"
index2_name = "0137752-index-csv"
indexes = [index1_name, index2_name]

agg_search_results = get_search_results(QUESTION, indexes)
ordered_results = order_search_results(agg_search_results, reranker_threshold=1)

In [12]:
docs = []
for key,value in ordered_results.items():
    for page in value["chunks"]:
        docs.append(Document(page_content=page, metadata={"source": value["location"]}))

# Calculate number of tokens of our docs
tokens_limit = model_tokens_limit(MODEL)

if(len(docs)>0):
    num_tokens = num_tokens_from_docs(docs)
    print("Custom token limit for", MODEL, ":", tokens_limit)
    print("Combined docs tokens count:",num_tokens)
        
else:
    print("NO RESULTS FROM AZURE SEARCH")


Custom token limit for chatgpt : 3000
Combined docs tokens count: 237


In [13]:
%%time
if num_tokens > tokens_limit:
    index = embed_docs(docs)
    top_docs = search_docs(index,QUESTION)
    
    # Now we need to recalculate the tokens count of the top results from similarity vector search
    # in order to select the chain type: stuff or map_reduce
    
    num_tokens = num_tokens_from_docs(top_docs)   
    print("Token count after similarity search:", num_tokens)
    chain_type = "map_reduce" if num_tokens > tokens_limit else "stuff"
    
else:
    # if total tokens is less than our limit, we don't need to vectorize and do similarity search
    top_docs = docs
    chain_type = "stuff"
    
print("Chain Type selected:", chain_type)

Chain Type selected: stuff
CPU times: total: 0 ns
Wall time: 0 ns


In [14]:
# Get the answer
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type)
response['output_text']

'One use case for reinforcement learning is scheduling and rescheduling production tasks for a mask manufacturer during an emergency response, as demonstrated in a study using an end-to-end neural network trained by reinforcement learning. The study showed that the neural network scheduler can solve problem instances with hundreds of tasks within seconds and achieve significantly better results than existing constructive heuristics. \nSOURCES: https://demodatasetsp.blob.core.windows.net/litcovid/train.csv'

And if we ask the follow up question:

In [15]:
response = get_answer(llm=llm, docs=top_docs,  query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type)
response['output_text']

'The article proposes an end-to-end neural network for scheduling mask production tasks in real-time during the COVID-19 pandemic. The network is trained using reinforcement learning and can solve instances with hundreds of tasks within seconds, outperforming existing heuristics. \nSOURCES: https://demodatasetsp.blob.core.windows.net/litcovid/train.csv'

Until now we just have the same as the prior Notebook 03: results from Azure Search enhanced by OpenAI model, with no memory

**Now let's add memory to it:**

Reference: https://python.langchain.com/en/latest/modules/memory/examples/adding_memory_chain_multiple_inputs.html

In [16]:
# memory object, which is neccessary to track the inputs/outputs and hold a conversation.
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question")

response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
response['output_text']

'One use case for reinforcement learning is scheduling and rescheduling production tasks in real-time. For example, a medical mask manufacturer during the peak of COVID-19 in China used an end-to-end neural network trained by reinforcement learning to efficiently schedule emergency production tasks. The network took a sequence of production tasks as inputs and produced a schedule of tasks in real-time, with negative total tardiness as the reward signal. The neural network scheduler was able to solve problem instances with hundreds of tasks within seconds and achieved significantly better results than existing constructive heuristics. Source: https://demodatasetsp.blob.core.windows.net/litcovid/train.csv'

In [17]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
response['output_text']

'In my last response, I provided an example of a use case for reinforcement learning, which is scheduling and rescheduling production tasks in real-time. Specifically, I mentioned a medical mask manufacturer in China during the peak of COVID-19 that used an end-to-end neural network trained by reinforcement learning to efficiently schedule emergency production tasks. The network took a sequence of production tasks as inputs and produced a schedule of tasks in real-time, with negative total tardiness as the reward signal. The neural network scheduler was able to solve problem instances with hundreds of tasks within seconds and achieved significantly better results than existing constructive heuristics. \nSOURCES: https://demodatasetsp.blob.core.windows.net/litcovid/train.csv'

In [18]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
response['output_text']

'In this document, the authors propose an end-to-end neural network trained by reinforcement learning to schedule and reschedule production tasks for a medical mask manufacturer during the peak of COVID-19 in China. The network takes a sequence of production tasks as inputs and produces a schedule of tasks in real-time, with negative total tardiness as the reward signal. Computational results show that the neural network scheduler can solve problem instances with hundreds of tasks within seconds and achieved significantly better results than existing constructive heuristics. Source: https://demodatasetsp.blob.core.windows.net/litcovid/train.csv'

You might get a different answer on the above cell, and it is ok, this bot is not yet well configured to answer any question that is not related to its knowledge base, including salutations.

Let's check our memory to see that it's keeping the conversation

In [19]:
memory.buffer

'Human: Tell me some use cases for reinforcement learning?\nAI: One use case for reinforcement learning is scheduling and rescheduling production tasks in real-time. For example, a medical mask manufacturer during the peak of COVID-19 in China used an end-to-end neural network trained by reinforcement learning to efficiently schedule emergency production tasks. The network took a sequence of production tasks as inputs and produced a schedule of tasks in real-time, with negative total tardiness as the reward signal. The neural network scheduler was able to solve problem instances with hundreds of tasks within seconds and achieved significantly better results than existing constructive heuristics. Source: https://demodatasetsp.blob.core.windows.net/litcovid/train.csv\nHuman: Can you summarize your last response?\nAI: In my last response, I provided an example of a use case for reinforcement learning, which is scheduling and rescheduling production tasks in real-time. Specifically, I ment

## Using CosmosDB as persistent memory

In previous cell we have added local RAM memory to our chatbot. However, it is not persistent, it gets deleted once the app user's session is terminated. It is necessary then to use a Database for persistent storage of each of the bot user conversations, not only for Analytics and Auditing, but also if we wisg to provide recommendations. 

Here we will store the conversation history into CosmosDB for future auditing purpose.
We will use a class in LangChain use CosmosDBChatMessageHistory, see [HERE](https://python.langchain.com/en/latest/_modules/langchain/memory/chat_message_histories/cosmos_db.html)

In [20]:
# Create CosmosDB instance from langchain cosmos class.
cosmos = CosmosDBChatMessageHistory(
    cosmos_endpoint=os.environ['AZURE_COSMOSDB_ENDPOINT'],
    cosmos_database=os.environ['AZURE_COSMOSDB_NAME'],
    cosmos_container=os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
    connection_string=os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
    session_id="Agent-Test-Session" + str(random.randint(1, 1000)),
    user_id="Agent-Test-User" + str(random.randint(1, 1000))
    )

# prepare the cosmosdb instance
cosmos.prepare_cosmos()

In [21]:
# Create or Memory Object
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question",chat_memory=cosmos)

In [22]:
# Testing using our Question
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
response['output_text']

'One use case for reinforcement learning is in scheduling production tasks, such as in the case of a medical mask manufacturer during the COVID-19 outbreak. An end-to-end neural network can be trained using reinforcement learning, with the negative total tardiness as the reward signal, to produce a schedule of tasks in real-time. This approach was successfully applied to schedule emergency production tasks for a medical mask manufacturer in China, with the neural network scheduler able to solve problem instances with hundreds of tasks within seconds. Source: https://demodatasetsp.blob.core.windows.net/litcovid/train.csv'

In [23]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
response['output_text']

'In my last response, I mentioned that one use case for reinforcement learning is in scheduling production tasks, such as in the case of a medical mask manufacturer during the COVID-19 outbreak. The manufacturer can use an end-to-end neural network trained using reinforcement learning to produce a schedule of tasks in real-time, with the negative total tardiness as the reward signal. This approach was successfully applied to schedule emergency production tasks for a medical mask manufacturer in China, with the neural network scheduler able to solve problem instances with hundreds of tasks within seconds. Source: https://demodatasetsp.blob.core.windows.net/litcovid/train.csv'

In [24]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
response['output_text']

'The paper proposes an end-to-end neural network trained using reinforcement learning to schedule production tasks for a medical mask manufacturer during the COVID-19 outbreak. The network takes a sequence of tasks as inputs and produces a real-time schedule, with the negative total tardiness as the reward signal. The approach was successfully applied in China, with the neural network scheduler able to solve problem instances with hundreds of tasks within seconds. The objective function value obtained by the neural network scheduler is significantly better than those of existing constructive heuristics and close to those of the state-of-the-art metaheuristics. Source: https://demodatasetsp.blob.core.windows.net/litcovid/train.csv'

Let's check our Azure CosmosDB to see the whole conversation


In [25]:
#load message from cosmosdb
cosmos.load_messages()
cosmos.messages

[HumanMessage(content='Tell me some use cases for reinforcement learning?', additional_kwargs={}, example=False),
 AIMessage(content='One use case for reinforcement learning is in scheduling production tasks, such as in the case of a medical mask manufacturer during the COVID-19 outbreak. An end-to-end neural network can be trained using reinforcement learning, with the negative total tardiness as the reward signal, to produce a schedule of tasks in real-time. This approach was successfully applied to schedule emergency production tasks for a medical mask manufacturer in China, with the neural network scheduler able to solve problem instances with hundreds of tasks within seconds. Source: https://demodatasetsp.blob.core.windows.net/litcovid/train.csv', additional_kwargs={}, example=False),
 HumanMessage(content='Can you summarize your last response?', additional_kwargs={}, example=False),
 AIMessage(content='In my last response, I mentioned that one use case for reinforcement learning 

![CosmosDB Memory](./images/cosmos-chathistory.png)

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We added persitent memory using CosmosDB.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, it searches for similar docs everytime, it struggles to respond to prompts like: Hello, Thank you, Bye, What's your name, What's the weather and any other task that is not search in the knowledge base.



# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

But, does this solve all the possible scenarios that a virtual assistant will require?  **What about if the answer to the Smart Search Engine is not related to text, but instead requires to look into tabular data?** The next notebook 05 explains and solves the tabular problem and the concept of Agents