# Understanding Memory in LLMs

In the previous Notebook, we successfully explored how OpenAI models can enhance the results from Azure Cognitive Search. 

However, we have yet to discover how to engage in a conversation with the LLM. With [Bing Chat](http://chat.bing.com/), for example, this is possible, as it can understand and reference the previous responses.

There is a common misconception that GPT models have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

In this Notebook, our goal is to illustrate how we can effectively "endow the LLM with memory" by employing prompts and context.

In [1]:
import os
import random
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains import ConversationChain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT
from langchain.memory import ConversationBufferMemory, ConversationTokenBufferMemory
from openai.error import OpenAIError
from langchain.embeddings import OpenAIEmbeddings
from langchain.docstore.document import Document
from langchain.memory import CosmosDBChatMessageHistory

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from common.utils import (
    get_search_results,
    order_search_results,
    update_vector_indexes,
    model_tokens_limit,
    num_tokens_from_docs,
    get_answer,
)

from common.prompts import COMBINE_QUESTION_PROMPT, COMBINE_PROMPT, COMBINE_CHAT_PROMPT

from dotenv import load_dotenv
load_dotenv("credentials.env")

import logging

# Get the root logger
logger = logging.getLogger()
# Set the logging level to a higher level to ignore INFO messages
logger.setLevel(logging.WARNING)

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_BASE"] = os.environ["AZURE_OPENAI_ENDPOINT"]
os.environ["OPENAI_API_KEY"] = os.environ["AZURE_OPENAI_API_KEY"]
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]
os.environ["OPENAI_API_TYPE"] = "azure"

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [3]:
QUESTION = "Tell me some use cases for reinforcement learning?"
FOLLOW_UP_QUESTION = "Give me the main points of our conversation"

In [4]:
# Define model
MODEL = "gpt-35-turbo"
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=500)

In [5]:
# We create a very simple prompt template, just the question as is:
prompt = PromptTemplate(
    input_variables=["question"],
    template="{question}",
)

chain = LLMChain(llm=llm, prompt=prompt)

In [6]:
# Let's see what the GPT model responds
response = chain.run(QUESTION)
printmd(response)

Reinforcement learning (RL) has various applications across different domains. Here are some use cases for reinforcement learning:

1. Game Playing: RL has excelled in game playing tasks, such as AlphaGo and AlphaZero, where it has achieved superhuman performance in complex board games like Go, chess, and shogi.

2. Robotics: RL can be used to train robots for tasks like object manipulation, grasping, locomotion, and navigation in dynamic and uncertain environments.

3. Autonomous Vehicles: RL can help in training self-driving cars to make decisions in real-time, such as lane changing, merging, and navigating complex traffic situations.

4. Resource Management: RL can optimize resource allocation in areas like energy management, traffic control, and supply chain management, ensuring efficient utilization and minimizing costs.

5. Healthcare: RL can be applied in personalized medicine, treatment optimization, and clinical decision-making, helping doctors determine the best treatment plans for patients.

6. Recommender Systems: RL can be used to build recommendation engines that learn user preferences and make personalized recommendations for movies, music, products, and more.

7. Finance: RL can assist in portfolio management, algorithmic trading, and risk management by learning optimal investment strategies and adapting to market dynamics.

8. Industrial Control: RL can optimize control systems in manufacturing, energy, and process industries, improving efficiency, reducing energy consumption, and minimizing waste.

9. Natural Language Processing: RL can be applied to dialogue systems and conversational agents, enabling them to learn how to respond to user queries and carry out meaningful conversations.

10. Education: RL can be used to create intelligent tutoring systems that adapt to individual student needs, providing personalized instruction and feedback.

These are just a few examples, and RL has the potential to be applied in various other domains to solve complex decision-making problems.

In [7]:
#Now let's ask a follow up question
chain.run(FOLLOW_UP_QUESTION)

"I'm sorry, but I cannot recall the details of our conversation as I am an AI language model and do not have the capability to remember past interactions. However, if you provide me with specific information or questions, I'll be happy to assist you."

As you can see, it doesn't remember what it just responded, sometimes it responds based only on the system prompt, or just randomly. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [8]:
hist_prompt = PromptTemplate(
    input_variables=["history", "question"],
    template="""
                {history}
                Human: {question}
                AI:
            """
    )
chain = LLMChain(llm=llm, prompt=hist_prompt)

In [9]:
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response)

In [10]:
printmd(chain.run({"history":Conversation_history, "question": FOLLOW_UP_QUESTION}))

- Reinforcement learning (RL) has various applications across different domains.
- Some use cases for RL include game playing, robotics, autonomous vehicles, resource management, healthcare, recommender systems, finance, industrial control, natural language processing, and education.
- RL has achieved superhuman performance in game playing tasks like Go, chess, and shogi.
- RL can be used to train robots for tasks like object manipulation, grasping, locomotion, and navigation.
- RL can help in training self-driving cars to make real-time decisions in complex traffic situations.
- RL can optimize resource allocation in areas like energy management, traffic control, and supply chain management.
- RL can be applied in personalized medicine, treatment optimization, and clinical decision-making in healthcare.
- RL can be used to build recommendation engines for personalized recommendations.
- RL can assist in portfolio management, algorithmic trading, and risk management in finance.
- RL can optimize control systems in manufacturing, energy, and process industries.
- RL can be applied to dialogue systems and conversational agents in natural language processing.
- RL can be used to create intelligent tutoring systems for personalized instruction and feedback in education.

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

In [11]:
# Since Memory adds tokens to the prompt, we would need a better model that allows more space on the prompt
MODEL = "gpt-35-turbo-16k"
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=1000)
embedder = OpenAIEmbeddings(deployment="text-embedding-ada-002", chunk_size=1) 

In [12]:
index1_name = "cogsrch-index-files"
index2_name = "cogsrch-index-csv"
index3_name = "cogsrch-index-books-vector"
text_indexes = [index1_name, index2_name]
vector_indexes = [index+"-vector" for index in text_indexes] + [index3_name]

In [13]:
%%time

# Search in text-based indexes first and update vector indexes
top_k=10
agg_search_results = get_search_results(QUESTION, text_indexes, k=top_k, vector_search=False)
ordered_results = order_search_results(agg_search_results, k=top_k, reranker_threshold=1, vector_search=False)
update_vector_indexes(ordered_search_results=ordered_results, embedder=embedder)

# Search in all vector-based indexes available
agg_search_results = get_search_results(QUESTION, vector_indexes, k=top_k , vector_search=True, 
                                        query_vector = embedder.embed_query(QUESTION))
top_similarity_k = 5
ordered_results = order_search_results(agg_search_results, k=top_similarity_k,
                                       vector_search = True)


CPU times: user 517 ms, sys: 41.1 ms, total: 558 ms
Wall time: 2.87 s


In [19]:
# ordered_results

In [15]:
top_docs = []
for key,value in ordered_results.items():
    location = value["location"] if value["location"] is not None else ""
    top_docs.append(Document(page_content=value["chunk"], metadata={"source": location+os.environ['BLOB_SAS_TOKEN']}))
        
print("Number of chunks:",len(top_docs))

Number of chunks: 5


In [16]:
# Calculate number of tokens of our docs
if(len(top_docs)>0):
    tokens_limit = model_tokens_limit(MODEL) # this is a custom function we created in common/utils.py
    num_tokens = num_tokens_from_docs(top_docs) # this is a custom function we created in common/utils.py
    print("Custom token limit for", MODEL, ":", tokens_limit)
    print("Combined docs tokens count:",num_tokens)
        
else:
    print("NO RESULTS FROM AZURE SEARCH")
    
chain_type = "map_reduce" if num_tokens > tokens_limit else "stuff"  
print("Chain Type selected:", chain_type)

Custom token limit for gpt-35-turbo-16k : 14000
Combined docs tokens count: 1019
Chain Type selected: stuff


In [17]:
%%time
# Get the answer
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

Reinforcement learning can be used in various use cases, including:
1. Learning prevention strategies for epidemics of infectious diseases, such as pandemic influenza, in order to automatically learn mitigation policies in complex epidemiological models with a large state space<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup>.
2. Personalized hybrid recommendation algorithm for music based on reinforcement learning, which recommends song sequences that match listeners' preferences better, by simulating the interaction process and continuously updating the model based on preferences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[2]</a></sup>.
3. Learning sparse reward tasks in reinforcement learning by combining self-imitation learning with exploration bonuses, which enhances both exploitation and exploration to reduce sample complexity<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[3]</a></sup>.
4. Automatic feature engineering in machine learning projects, where a framework called CAFEM (Cross-data Automatic Feature Engineering Machine) is used to optimize the feature transformation graph and learn fine-grained feature engineering strategies<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[4]</a></sup>.
5. Job scheduling in data centers using Advantage Actor-Critic (A2C) deep reinforcement learning, where the A2cScheduler agent learns the scheduling policy automatically and achieves competitive scheduling performance<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206316/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[5]</a></sup>.

These use cases demonstrate the versatility of reinforcement learning in solving complex problems and optimizing decision-making processes.

CPU times: user 39.6 ms, sys: 3.5 ms, total: 43.1 ms
Wall time: 14.1 s


And if we ask the follow up question:

In [19]:
response = get_answer(llm=llm, docs=top_docs,  query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

I'm sorry, but I cannot provide the main points of our conversation as it is not mentioned in the provided extracted parts.

You might get a different response from above, but it doesn't matter what response you get, it will be based on the context given, not on previous answers.

Until now we just have the same as the prior Notebook 03: results from Azure Search enhanced by OpenAI model, with no memory

**Now let's add memory to it:**

Reference: https://python.langchain.com/docs/modules/memory/how_to/adding_memory_chain_multiple_inputs

In [20]:
# memory object, which is neccessary to track the inputs/outputs and hold a conversation.
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question")

response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning has various use cases across different domains. Here are some examples:

1. **Epidemic Prevention**: In the context of pandemic influenza, deep reinforcement learning can be used to automatically learn prevention strategies. By constructing epidemiological models and using reinforcement learning techniques, policies can be learned to control the spread of infectious diseases<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup>.

2. **Sparse Reward Tasks**: Reinforcement learning can be challenging when dealing with tasks that have sparse rewards. One approach is self-imitation learning, where the agent imitates past good trajectories to encourage exploitation. Another approach is exploration bonuses, which enhance exploration by providing intrinsic rewards for visiting novel states. A novel framework called Explore-then-Exploit (EE) combines these two approaches to strengthen their effects and achieve better performance in episodic reward settings<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[2]</a></sup>.

3. **Personalized Music Recommendation**: Reinforcement learning can be applied to personalized music recommendation systems. By using reinforcement learning algorithms, such as weighted matrix factorization and convolutional neural networks, the system can learn and update models based on users' preferences for songs and song transitions. This approach improves the accuracy of song sequence recommendations compared to traditional methods<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[3]</a></sup>.

4. **Automatic Feature Engineering**: Feature engineering is a crucial task in machine learning projects. A framework called Cross-data Automatic Feature Engineering Machine (CAFEM) uses reinforcement learning to optimize the feature transformation process. By learning fine-grained feature engineering strategies and utilizing meta-learning on a collection of datasets, CAFEM can generate effective features and improve learning performance<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[4]</a></sup>.

5. **Job Scheduling in Data Centers**: Efficient job scheduling in data centers is crucial for resource allocation. A deep reinforcement learning approach called A2cScheduler uses Advantage Actor-Critic (A2C) to learn scheduling policies automatically. By reducing gradient estimation variance and updating parameters efficiently, A2cScheduler achieves competitive scheduling performance in both simulated and real data center environments<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206316/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target

In [21]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points we discussed:

1. Reinforcement learning has various use cases across different domains.
2. In the context of pandemic influenza, deep reinforcement learning can be used to automatically learn prevention strategies for controlling the spread of infectious diseases<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup>.
3. Sparse reward tasks in reinforcement learning can be challenging. Approaches like self-imitation learning and exploration bonuses can be used to enhance exploration and exploitation<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[2]</a></sup>.
4. Reinforcement learning can be applied to personalized music recommendation systems to improve the accuracy of song sequence recommendations<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[3]</a></sup>.
5. Feature engineering is a crucial task in machine learning projects. A framework called CAFEM uses reinforcement learning to optimize the feature transformation process and generate effective features<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[4]</a></sup>.
6. Efficient job scheduling in data centers can be achieved using deep reinforcement learning approaches like A2cScheduler<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206316/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[5]</a></sup>.

Please let me know if there's anything else I can assist you with.

In [22]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points we discussed:

1. Reinforcement learning has various use cases across different domains.
2. In the context of pandemic influenza, deep reinforcement learning can be used to automatically learn prevention strategies for controlling the spread of infectious diseases<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup>.
3. Sparse reward tasks in reinforcement learning can be challenging. Approaches like self-imitation learning and exploration bonuses can be used to enhance exploration and exploitation<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[2]</a></sup>.
4. Reinforcement learning can be applied to personalized music recommendation systems to improve the accuracy of song sequence recommendations<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[3]</a></sup>.
5. Feature engineering is a crucial task in machine learning projects. A framework called CAFEM uses reinforcement learning to optimize the feature transformation process and generate effective features<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[4]</a></sup>.
6. Efficient job scheduling in data centers can be achieved using deep reinforcement learning approaches like A2cScheduler<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206316/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[5]</a></sup>.

Please let me know if there's anything else I can assist you with.

You might get a different answer on the above cell, and it is ok, this bot is not yet well configured to answer any question that is not related to its knowledge base, including salutations.

Let's check our memory to see that it's keeping the conversation

In [23]:
memory.buffer

'Human: Tell me some use cases for reinforcement learning?\nAI: Reinforcement learning has various use cases across different domains. Here are some examples:\n\n1. **Epidemic Prevention**: In the context of pandemic influenza, deep reinforcement learning can be used to automatically learn prevention strategies. By constructing epidemiological models and using reinforcement learning techniques, policies can be learned to control the spread of infectious diseases<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup>.\n\n2. **Sparse Reward Tasks**: Reinforcement learning can be challenging when dealing with tasks that have sparse rewards. One approach is self-imitation learning, where the agent imitates past good trajectories to encourage exploitation. Another approach is exploration bonuses, which enhance explora

## Using CosmosDB as persistent memory

In previous cell we have added local RAM memory to our chatbot. However, it is not persistent, it gets deleted once the app user's session is terminated. It is necessary then to use a Database for persistent storage of each of the bot user conversations, not only for Analytics and Auditing, but also if we wisg to provide recommendations. 

Here we will store the conversation history into CosmosDB for future auditing purpose.
We will use a class in LangChain use CosmosDBChatMessageHistory, see [HERE](https://python.langchain.com/en/latest/_modules/langchain/memory/chat_message_histories/cosmos_db.html)

In [24]:
# Create CosmosDB instance from langchain cosmos class.
cosmos = CosmosDBChatMessageHistory(
    cosmos_endpoint=os.environ['AZURE_COSMOSDB_ENDPOINT'],
    cosmos_database=os.environ['AZURE_COSMOSDB_NAME'],
    cosmos_container=os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
    connection_string=os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
    session_id="Agent-Test-Session" + str(random.randint(1, 1000)),
    user_id="Agent-Test-User" + str(random.randint(1, 1000))
    )

# prepare the cosmosdb instance
cosmos.prepare_cosmos()

In [25]:
# Create or Memory Object
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question",chat_memory=cosmos)

In [26]:
# Testing using our Question
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning has various use cases in different domains. Here are some examples:

1. **Epidemic Prevention**: Reinforcement learning can be used to automatically learn prevention strategies for infectious diseases. For example, a study used deep reinforcement learning to learn mitigation policies in complex epidemiological models with a large state space<sup>[1]</sup>.

2. **Sparse Reward Tasks**: Reinforcement learning can be applied to tasks with sparse rewards. One approach is self-imitation learning, which encourages the agent to exploit past good trajectories. Another approach is exploration bonuses, which enhance exploration by providing intrinsic rewards for visiting novel states. A novel framework called Explore-then-Exploit (EE) interleaves self-imitation learning with an exploration bonus to strengthen the effect of both algorithms<sup>[2]</sup>.

3. **Personalized Recommendation Systems**: Reinforcement learning can be used to improve personalized recommendation systems. For example, a personalized hybrid recommendation algorithm for music based on reinforcement learning was proposed. It recommends song sequences that match listeners' preferences better by simulating the interaction process and continuously updating the model based on preferences<sup>[3]</sup>.

4. **Feature Engineering**: Reinforcement learning can aid in automating feature engineering, which is a time-consuming task in machine learning projects. A framework called Cross-data Automatic Feature Engineering Machine (CAFEM) formalizes the feature engineering problem as an optimization problem over a Feature Transformation Graph (FTG). CAFEM consists of a FE learner (FeL) and a Cross-data Component (CdC) to speed up feature engineering learning on unseen datasets<sup>[4]</sup>.

5. **Job Scheduling**: Reinforcement learning can be used for efficient job scheduling in data centers. An innovative Advantage Actor-Critic (A2C) deep reinforcement learning approach called A2cScheduler was proposed for job scheduling. It consists of two agents, the actor and the critic, which learn the scheduling policy and reduce the estimation error, respectively<sup>[5]</sup>.

These are just a few examples of the use cases for reinforcement learning. The applications of reinforcement learning are diverse and continue to expand as researchers explore its potential in various domains.

Let me know if there's anything else I can help with!

References:
[1]<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup> - "Deep Reinforcement Learning for Autonomous Epidemic Prevention in Complex Epidemiological Models"
[2]<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[2]</a></sup> - "Explore-then-Exploit: A Framework for Efficient Exploration in Reinforcement Learning"
[3]<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[3]</a></sup> - "Personalized Hybrid Recommendation Algorithm for Music Based on Reinforcement Learning"
[4]<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJ

In [27]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points:

1. Reinforcement learning has various use cases in different domains.
2. Epidemic prevention: Reinforcement learning can be used to learn prevention strategies for infectious diseases<sup>[1]</sup>.
3. Sparse reward tasks: Reinforcement learning can be applied to tasks with sparse rewards, using self-imitation learning and exploration bonuses<sup>[2]</sup>.
4. Personalized recommendation systems: Reinforcement learning can improve personalized recommendation systems, such as music recommendations<sup>[3]</sup>.
5. Feature engineering: Reinforcement learning can automate feature engineering tasks, speeding up machine learning projects<sup>[4]</sup>.
6. Job scheduling: Reinforcement learning can be used for efficient job scheduling in data centers<sup>[5]</sup>.

These points provide an overview of the use cases for reinforcement learning. For more details, you can refer to the corresponding references.

Let me know if there's anything else I can help with!

References:
[1]<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup> - "Deep Reinforcement Learning for Autonomous Epidemic Prevention in Complex Epidemiological Models"
[2]<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[2]</a></sup> - "Explore-then-Exploit: A Framework for Efficient Exploration in Reinforcement Learning"
[3]<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[3]</a></sup> - "Personalized Hybrid Recommendation Algorithm for Music Based on Reinforcement Learning"
[4]<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[4]</a></sup> - "Cross-data Automatic Feature Engineering Machine: Learning Feature Engineering for Unseen Datasets"
[5]<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206316/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[5]</a></sup> - "A2cScheduler: Advantage Actor-Critic Deep Reinforcement Learning for Job Scheduling in Data Centers"

Is there anything else you would like to know?

In [28]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points:

1. Reinforcement learning has various use cases in different domains.
2. Epidemic prevention: Reinforcement learning can be used to learn prevention strategies for infectious diseases[1].
3. Sparse reward tasks: Reinforcement learning can be applied to tasks with sparse rewards, using self-imitation learning and exploration bonuses[2].
4. Personalized recommendation systems: Reinforcement learning can improve personalized recommendation systems, such as music recommendations[3].
5. Feature engineering: Reinforcement learning can automate feature engineering tasks, speeding up machine learning projects[4].
6. Job scheduling: Reinforcement learning can be used for efficient job scheduling in data centers[5].

These points provide an overview of the use cases for reinforcement learning. For more details, you can refer to the corresponding references.

Let me know if there's anything else I can help with!

References:
[1][1] - "Deep Reinforcement Learning for Autonomous Epidemic Prevention in Complex Epidemiological Models"
[2][2] - "Explore-then-Exploit: A Framework for Efficient Exploration in Reinforcement Learning"
[3][3] - "Personalized Hybrid Recommendation Algorithm for Music Based on Reinforcement Learning"
[4][4] - "Cross-data Automatic Feature Engineering Machine: Learning Feature Engineering for Unseen Datasets"
[5][5] - "A2cScheduler: Advantage Actor-Critic Deep Reinforcement Learning for Job Scheduling in Data Centers"

Is there anything else you would like to know?

Let's check our Azure CosmosDB to see the whole conversation


In [29]:
#load message from cosmosdb
cosmos.load_messages()
cosmos.messages

[HumanMessage(content='Tell me some use cases for reinforcement learning?', additional_kwargs={}, example=False),
 AIMessage(content='Reinforcement learning has various use cases in different domains. Here are some examples:\n\n1. **Epidemic Prevention**: Reinforcement learning can be used to automatically learn prevention strategies for infectious diseases. For example, a study used deep reinforcement learning to learn mitigation policies in complex epidemiological models with a large state space<sup>[1]</sup>.\n\n2. **Sparse Reward Tasks**: Reinforcement learning can be applied to tasks with sparse rewards. One approach is self-imitation learning, which encourages the agent to exploit past good trajectories. Another approach is exploration bonuses, which enhance exploration by providing intrinsic rewards for visiting novel states. A novel framework called Explore-then-Exploit (EE) interleaves self-imitation learning with an exploration bonus to strengthen the effect of both algorithm

![CosmosDB Memory](./images/cosmos-chathistory.png)

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We added persitent memory using CosmosDB.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, it searches for similar docs everytime, regardless of the input and it struggles to respond to prompts like: Hello, Thank you, Bye, What's your name, What's the weather and any other task that is not search in the knowledge base.


## <u>Important Note</u>:<br>
As we proceed, while all the code will remain compatible with GPT-3.5 models, we highly recommend transitioning to GPT-4. Here's why:

**GPT-3.5-Turbo** can be likened to a 7-year-old child. You can provide it with concise instructions, but it frequently struggles to follow them accurately. Additionally, its limited memory can make sustained conversations challenging.

**GPT-3.5-Turbo-16k** resembles the same 7-year-old, but with an increased attention span for longer instructions. However, it still faces difficulties accurately executing them about half the time.

**GPT-4** exhibits the capabilities of a 10-12-year-old child. It possesses enhanced reasoning skills and more consistently adheres to instructions. While its memory retention for instructions is moderate, it excels at following them.

**GPT-4-32k** is akin to the 10-12-year-old child with an extended memory. It comprehends lengthy sets of instructions and engages in meaningful conversations. Thanks to its robust memory, it offers detailed responses.

Understanding this analogy above will become clearer as you complete the final notebook.


# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

But, does this solve all the possible scenarios that a virtual assistant will require?  **What about if the answer to the Smart Search Engine is not related to text, but instead requires to look into tabular data?** The next notebook explains and solves the tabular problem and the concept of Agents