# Understanding Memory in LLMs

In the previous Notebook, we successfully explored how OpenAI models can enhance the results from Azure Cognitive Search. 

However, we have yet to discover how to engage in a conversation with the LLM. With [Bing Chat](http://chat.bing.com/), for example, this is possible, as it can understand and reference the previous responses.

There is a common misconception that GPT models have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

In this Notebook, our goal is to illustrate how we can effectively "endow the LLM with memory" by employing prompts and context.

In [1]:
import os
import random
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from openai.error import OpenAIError
from langchain.embeddings import OpenAIEmbeddings
from langchain.docstore.document import Document
from langchain.memory import CosmosDBChatMessageHistory

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from common.utils import (
    get_search_results,
    update_vector_indexes,
    model_tokens_limit,
    num_tokens_from_docs,
    num_tokens_from_string,
    get_answer,
)

from common.prompts import COMBINE_CHAT_PROMPT_TEMPLATE

from dotenv import load_dotenv
load_dotenv("credentials.env")

import logging

# Get the root logger
logger = logging.getLogger()
# Set the logging level to a higher level to ignore INFO messages
logger.setLevel(logging.WARNING)

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_BASE"] = os.environ["AZURE_OPENAI_ENDPOINT"]
os.environ["OPENAI_API_KEY"] = os.environ["AZURE_OPENAI_API_KEY"]
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]
os.environ["OPENAI_API_TYPE"] = "azure"

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [3]:
QUESTION = "Tell me some use cases for reinforcement learning"
FOLLOW_UP_QUESTION = "Give me the main points of our conversation"

In [4]:
# Define model
MODEL = "gpt-35-turbo"
COMPLETION_TOKENS = 500
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=COMPLETION_TOKENS)

In [5]:
# We create a very simple prompt template, just the question as is:
prompt = PromptTemplate(
    input_variables=["question"],
    template="{question}",
)

chain = LLMChain(llm=llm, prompt=prompt)

In [6]:
# Let's see what the GPT model responds
response = chain.run(QUESTION)
printmd(response)

1. Autonomous driving: Reinforcement learning can be used to train self-driving cars to make decisions in real-time, such as navigating through traffic, merging lanes, and avoiding obstacles.

2. Robotics: Reinforcement learning can be applied to train robots to perform complex tasks, such as grasping objects, assembling parts, or navigating through cluttered environments.

3. Game playing: Reinforcement learning has been successfully used to train agents to play various games, such as chess, Go, poker, and video games. These agents can learn strategies and tactics by playing against themselves or human players.

4. Resource management: Reinforcement learning can be used to optimize the allocation of resources in various domains, such as energy management, traffic control, or supply chain management. The agent can learn to make decisions that maximize efficiency and minimize costs.

5. Personalized recommendation systems: Reinforcement learning can be used to develop recommendation systems that learn and adapt to individual user preferences over time. The system can learn to recommend products, movies, or music based on user feedback and interactions.

6. Healthcare: Reinforcement learning can be applied to optimize treatment plans for patients with chronic diseases. The agent can learn to make decisions about medication dosage, scheduling appointments, or recommending lifestyle changes based on patient outcomes.

7. Finance and trading: Reinforcement learning can be used to develop trading strategies that maximize profits in financial markets. The agent can learn to make buy/sell decisions based on market conditions, historical data, and feedback from previous trades.

8. Natural language processing: Reinforcement learning can be applied to improve dialogue systems and chatbots. The agent can learn to generate responses that are contextually relevant and engaging based on user interactions and feedback.

9. Control systems: Reinforcement learning can be used to optimize control policies for complex systems, such as power grids, manufacturing processes, or chemical plants. The agent can learn to make control decisions that optimize performance and minimize energy consumption.

10. Advertising and marketing: Reinforcement learning can be used to optimize advertising campaigns and marketing strategies. The agent can learn to allocate budgets, target specific audiences, and adjust campaign parameters based on user responses and feedback.

In [7]:
#Now let's ask a follow up question
chain.run(FOLLOW_UP_QUESTION)

"I'm sorry, but as an AI language model, I don't have the capability to remember or recall past conversations. However, if you provide me with the main points or context of our current conversation, I'll be happy to help you with any questions or information you need."

As you can see, it doesn't remember what it just responded, sometimes it responds based only on the system prompt, or just randomly. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [8]:
hist_prompt = PromptTemplate(
    input_variables=["history", "question"],
    template="""
                {history}
                Human: {question}
                AI:
            """
    )
chain = LLMChain(llm=llm, prompt=hist_prompt)

In [9]:
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response)

In [10]:
printmd(chain.run({"history":Conversation_history, "question": FOLLOW_UP_QUESTION}))

- Reinforcement learning can be used in autonomous driving to train self-driving cars to navigate through traffic, merge lanes, and avoid obstacles.
- It can be applied in robotics to train robots to perform complex tasks such as grasping objects, assembling parts, or navigating through cluttered environments.
- Reinforcement learning has been successful in training game-playing agents for various games like chess, Go, poker, and video games.
- It can be used in resource management to optimize the allocation of resources in domains like energy management, traffic control, or supply chain management.
- Reinforcement learning can be applied in personalized recommendation systems to develop systems that learn and adapt to individual user preferences over time.
- In healthcare, it can optimize treatment plans for patients with chronic diseases by making decisions about medication dosage, scheduling appointments, or recommending lifestyle changes.
- It can be used in finance and trading to develop trading strategies that maximize profits in financial markets.
- Reinforcement learning can improve dialogue systems and chatbots in natural language processing by generating contextually relevant and engaging responses.
- It can optimize control policies for complex systems like power grids, manufacturing processes, or chemical plants.
- Reinforcement learning can be used to optimize advertising campaigns and marketing strategies by allocating budgets, targeting specific audiences, and adjusting campaign parameters based on user responses and feedback.

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

In [11]:
# Since Memory adds tokens to the prompt, we would need a better model that allows more space on the prompt
MODEL = "gpt-35-turbo-16k"
COMPLETION_TOKENS = 1000
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=COMPLETION_TOKENS)
embedder = OpenAIEmbeddings(deployment="text-embedding-ada-002", chunk_size=1) 

In [12]:
index1_name = "cogsrch-index-files"
index2_name = "cogsrch-index-csv"
index3_name = "cogsrch-index-books-vector"
text_indexes = [index1_name, index2_name]
vector_indexes = [index+"-vector" for index in text_indexes] + [index3_name]

In [13]:
%%time

# Search in text-based indexes first and update vector indexes
k=10 # Top k results per each text-based index
ordered_results = get_search_results(QUESTION, text_indexes, k=k, reranker_threshold=1, vector_search=False)
update_vector_indexes(ordered_search_results=ordered_results, embedder=embedder)

# Search in all vector-based indexes available
similarity_k = 5 # top results from multi-vector-index similarity search
ordered_results = get_search_results(QUESTION, vector_indexes, k=k, vector_search=True,
                                        similarity_k=similarity_k,
                                        query_vector = embedder.embed_query(QUESTION))
print("Number of results:",len(ordered_results))

Number of results: 5
CPU times: user 6.18 s, sys: 140 ms, total: 6.32 s
Wall time: 26.2 s


In [14]:
# Uncomment the below line if you want to inspect the ordered results
# ordered_results

In [15]:
top_docs = []
for key,value in ordered_results.items():
    location = value["location"] if value["location"] is not None else ""
    top_docs.append(Document(page_content=value["chunk"], metadata={"source": location+os.environ['BLOB_SAS_TOKEN']}))
        
print("Number of chunks:",len(top_docs))

Number of chunks: 5


In [16]:
# Calculate number of tokens of our docs
if(len(top_docs)>0):
    tokens_limit = model_tokens_limit(MODEL) # this is a custom function we created in common/utils.py
    prompt_tokens = num_tokens_from_string(COMBINE_CHAT_PROMPT_TEMPLATE) # this is a custom function we created in common/utils.py
    context_tokens = num_tokens_from_docs(top_docs) # this is a custom function we created in common/utils.py
    
    requested_tokens = prompt_tokens + context_tokens + COMPLETION_TOKENS
    
    chain_type = "map_reduce" if requested_tokens > 0.9 * tokens_limit else "stuff"  
    
    print("System prompt token count:",prompt_tokens)
    print("Max Completion Token count:", COMPLETION_TOKENS)
    print("Combined docs (context) token count:",context_tokens)
    print("--------")
    print("Requested token count:",requested_tokens)
    print("Token limit for", MODEL, ":", tokens_limit)
    print("Chain Type selected:", chain_type)
        
else:
    print("NO RESULTS FROM AZURE SEARCH")

System prompt token count: 2464
Max Completion Token count: 1000
Combined docs (context) token count: 1985
--------
Requested token count: 5449
Token limit for gpt-35-turbo-16k : 16384
Chain Type selected: stuff


In [17]:
%%time
# Get the answer
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

Reinforcement learning can be used in various use cases, such as:
1. Learning prevention strategies for pandemic influenza in a complex epidemiological model with a large state space<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup>.
2. Personalized hybrid recommendation algorithm for music based on reinforcement learning, which considers the simulation of interaction process to capture changes in listeners' preferences sensitively<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[2]</a></sup>.
3. Computing lockdown decisions for individual cities or regions, balancing health and economic considerations, in the context of the Covid-19 pandemic<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[3]</a></sup>.
4. Speeding up reinforcement learning by separating plannable parts of the world and focusing planning efforts in those areas<sup><a href="https://demodatasetsp.blob.core.windows.net/arxivcs/0212/0212025v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[4]</a></sup>.
5. Maintaining a long-term balance between accuracy and fairness in interactive recommender systems using reinforcement learning<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206277/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[5]</a></sup>.

References:
[1] Source: [1]
[2] Source: [2]
[3] Source: [3]
[4] Source: [4]
[5] Source: [5]

CPU times: user 10.4 ms, sys: 44 µs, total: 10.5 ms
Wall time: 20.9 s


And if we ask the follow up question:

In [18]:
response = get_answer(llm=llm, docs=top_docs,  query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

I'm sorry, but I couldn't find any extracted parts that contain information about our conversation.

You might get a different response from above, but it doesn't matter what response you get, it will be based on the context given, not on previous answers.

Until now we just have the same as the prior Notebook 03: results from Azure Search enhanced by OpenAI model, with no memory

**Now let's add memory to it:**

Reference: https://python.langchain.com/docs/modules/memory/how_to/adding_memory_chain_multiple_inputs

In [19]:
# memory object, which is neccessary to track the inputs/outputs and hold a conversation.
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question")

response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning has various use cases in different domains. Here are some examples:

1. **Epidemiology**: Reinforcement learning can be used to learn prevention strategies in the context of pandemic influenza. For example, a study used deep reinforcement learning to automatically learn prevention strategies for pandemic influenza in a meta-population model of Great Britain<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup>. The study demonstrated that deep reinforcement learning can be used to learn mitigation policies in complex epidemiological models.

2. **Recommendation Systems**: Reinforcement learning can be used to improve personalized recommendation systems. For example, a study proposed a personalized hybrid recommendation algorithm for music based on reinforcement learning. The algorithm, called PHRR, recommends song sequences that match listeners' preferences better by simulating the interaction process and continuously updating the model based on listeners' preferences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[2]</a></sup>.

3. **Public Health Policy**: Reinforcement learning can be used to compute lockdown decisions for individual cities or regions while balancing health and economic considerations. This approach allows for quantitative decision-making by learning policies based on disease parameters and population characteristics<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[3]</a></sup>. This approach provides a viable quantitative approach towards lockdown decisions.

These are just a few examples of the use cases for reinforcement learning. It can be applied in various other domains such as robotics, finance, gaming, and more. Let me know if you have any other questions!

References:
1. [1]
2. [2]
3. [3]

In [20]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points:

1. Reinforcement learning can be used in epidemiology to learn prevention strategies for pandemic influenza in complex epidemiological models. It has been demonstrated that deep reinforcement learning can be used to learn mitigation policies in these models<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup>.

2. Reinforcement learning can be applied to improve personalized recommendation systems, such as music recommendation algorithms. These algorithms use reinforcement learning to simulate the interaction process and continuously update the model based on users' preferences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[2]</a></sup>.

3. Reinforcement learning can be used to compute lockdown decisions for individual cities or regions in the context of the COVID-19 pandemic. This approach balances health and economic considerations by learning policies based on disease parameters and population characteristics<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[3]</a></sup>.

These are the main points we discussed during our conversation. Let me know if there's anything else I can help you with!

References:
1. [1]
2. [2]
3. [3]

In [21]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points:

1. Reinforcement learning can be used in epidemiology to learn prevention strategies for pandemic influenza in complex epidemiological models. It has been demonstrated that deep reinforcement learning can be used to learn mitigation policies in these models<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup>.

2. Reinforcement learning can be applied to improve personalized recommendation systems, such as music recommendation algorithms. These algorithms use reinforcement learning to simulate the interaction process and continuously update the model based on users' preferences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[2]</a></sup>.

3. Reinforcement learning can be used to compute lockdown decisions for individual cities or regions in the context of the COVID-19 pandemic. This approach balances health and economic considerations by learning policies based on disease parameters and population characteristics<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[3]</a></sup>.

4. Fairness in recommendation systems has gained attention, and there are reinforcement learning-based frameworks that aim to dynamically maintain a balance between accuracy and fairness in interactive recommender systems<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206277/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[4]</a></sup>.

These are the main points we discussed during our conversation. Let me know if there's anything else I can help you with!

References:
1. [1]
2. [2]
3. [3]
4. [4]

You might get a different answer on the above cell, and it is ok, this bot is not yet well configured to answer any question that is not related to its knowledge base, including salutations.

Let's check our memory to see that it's keeping the conversation

In [22]:
memory.buffer

'Human: Tell me some use cases for reinforcement learning\nAI: Reinforcement learning has various use cases in different domains. Here are some examples:\n\n1. **Epidemiology**: Reinforcement learning can be used to learn prevention strategies in the context of pandemic influenza. For example, a study used deep reinforcement learning to automatically learn prevention strategies for pandemic influenza in a meta-population model of Great Britain<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup>. The study demonstrated that deep reinforcement learning can be used to learn mitigation policies in complex epidemiological models.\n\n2. **Recommendation Systems**: Reinforcement learning can be used to improve personalized recommendation systems. For example, a study proposed a personalized hybrid recommendation algo

## Using CosmosDB as persistent memory

In previous cell we have added local RAM memory to our chatbot. However, it is not persistent, it gets deleted once the app user's session is terminated. It is necessary then to use a Database for persistent storage of each of the bot user conversations, not only for Analytics and Auditing, but also if we wisg to provide recommendations. 

Here we will store the conversation history into CosmosDB for future auditing purpose.
We will use a class in LangChain use CosmosDBChatMessageHistory, see [HERE](https://python.langchain.com/en/latest/_modules/langchain/memory/chat_message_histories/cosmos_db.html)

In [23]:
# Create CosmosDB instance from langchain cosmos class.
cosmos = CosmosDBChatMessageHistory(
    cosmos_endpoint=os.environ['AZURE_COSMOSDB_ENDPOINT'],
    cosmos_database=os.environ['AZURE_COSMOSDB_NAME'],
    cosmos_container=os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
    connection_string=os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
    session_id="Agent-Test-Session" + str(random.randint(1, 1000)),
    user_id="Agent-Test-User" + str(random.randint(1, 1000))
    )

# prepare the cosmosdb instance
cosmos.prepare_cosmos()

In [24]:
# Create or Memory Object
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question",chat_memory=cosmos)

In [25]:
# Testing using our Question
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning (RL) has various use cases in different domains. Here are a few examples:

1. **Epidemiology**: RL can be used to learn prevention strategies in the context of pandemic influenza. By constructing epidemiological models and using RL algorithms, prevention strategies can be automatically learned to control the spread of infectious diseases<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup>.

2. **Music Recommendation**: RL can enhance personalized music recommendation systems by capturing the changes in listeners' preferences sensitively. By using RL algorithms, song sequences that better match listeners' preferences can be recommended<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[2]</a></sup>.

3. **Public Health and Economic Policy**: RL can be used to compute lockdown decisions for individual cities or regions while balancing health and economic considerations. By learning policies automatically based on disease parameters and population characteristics, RL algorithms can help in making viable quantitative approaches towards lockdowns<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[3]</a></sup>.

4. **Planning**: RL can be integrated with planning to speed up learning in uncertain environments. By separating plannable parts of the world and focusing planning efforts in those areas, RL algorithms can find optimal policies and near-optimal macro actions, enabling faster learning<sup><a href="https://demodatasetsp.blob.core.windows.net/arxivcs/0212/0212025v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[4]</a></sup>.

5. **Interactive Recommender Systems**: RL can be used to dynamically maintain a balance between accuracy and fairness in recommendation systems. By compressing user preferences and the system's fairness status into a state representation, RL algorithms can generate recommendations that maximize a designed cumulative reward combining accuracy and fairness<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206277/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[5]</a></sup>.

These are just a few examples, and RL can be applied to various other domains such as robotics, finance, and gaming to name a few. Let me know if there's anything else I can help with!

References:
[1]<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco

In [26]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points:

1. Reinforcement learning (RL) has various use cases in different domains.
2. In epidemiology, RL can be used to learn prevention strategies for controlling the spread of infectious diseases, such as pandemic influenza<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup>.
3. RL can enhance personalized music recommendation systems by capturing changes in listeners' preferences sensitively<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[2]</a></sup>.
4. RL can be used to compute lockdown decisions for individual cities or regions, balancing health and economic considerations<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[3]</a></sup>.
5. RL can be integrated with planning to speed up learning in uncertain environments<sup><a href="https://demodatasetsp.blob.core.windows.net/arxivcs/0212/0212025v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[4]</a></sup>.
6. RL can be used in interactive recommender systems to dynamically maintain a balance between accuracy and fairness in recommendations<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206277/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[5]</a></sup>.

These are just a few examples of the use cases of reinforcement learning. RL can also be applied in domains such as robotics, finance, and gaming, among others. Let me know if there's anything else I can help with!

References:
[1]<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup> [2]<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rlt

In [27]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points:

1. Reinforcement learning (RL) has various use cases in different domains.
2. In epidemiology, RL can be used to learn prevention strategies for controlling the spread of infectious diseases, such as pandemic influenza[1].
3. RL can enhance personalized music recommendation systems by capturing changes in listeners' preferences sensitively[2].
4. RL can be used to compute lockdown decisions for individual cities or regions, balancing health and economic considerations[3].
5. RL can be integrated with planning to speed up learning in uncertain environments[4].
6. RL can be used in interactive recommender systems to dynamically maintain a balance between accuracy and fairness in recommendations[5].

These are just a few examples of the use cases of reinforcement learning. RL can also be applied in domains such as robotics, finance, and gaming, among others. Let me know if there's anything else I can help with!

References:
[1]<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup> [2]<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[2]</a></sup> [3]<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[3]</a></sup> [4]<sup><a href="https://demodatasetsp.blob.core.windows.net/arxivcs/0212/0212025v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[4]</a></sup> [5]<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206277/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[5]</a></sup>

Let's check our Azure CosmosDB to see the whole conversation


In [28]:
#load message from cosmosdb
cosmos.load_messages()
cosmos.messages

[HumanMessage(content='Tell me some use cases for reinforcement learning'),
 AIMessage(content='Reinforcement learning (RL) has various use cases in different domains. Here are a few examples:\n\n1. **Epidemiology**: RL can be used to learn prevention strategies in the context of pandemic influenza. By constructing epidemiological models and using RL algorithms, prevention strategies can be automatically learned to control the spread of infectious diseases<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup>.\n\n2. **Music Recommendation**: RL can enhance personalized music recommendation systems by capturing the changes in listeners\' preferences sensitively. By using RL algorithms, song sequences that better match listeners\' preferences can be recommended<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/article

![CosmosDB Memory](./images/cosmos-chathistory.png)

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We added persitent memory using CosmosDB.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, it searches for similar docs everytime, regardless of the input and it struggles to respond to prompts like: Hello, Thank you, Bye, What's your name, What's the weather and any other task that is not search in the knowledge base.


## <u>Important Note</u>:<br>
As we proceed, while all the code will remain compatible with GPT-3.5 models, we highly recommend transitioning to GPT-4. Here's why:

**GPT-3.5-Turbo** can be likened to a 7-year-old child. You can provide it with concise instructions, but it frequently struggles to follow them accurately. Additionally, its limited memory can make sustained conversations challenging.

**GPT-3.5-Turbo-16k** resembles the same 7-year-old, but with an increased attention span for longer instructions. However, it still faces difficulties accurately executing them about half the time.

**GPT-4** exhibits the capabilities of a 10-12-year-old child. It possesses enhanced reasoning skills and more consistently adheres to instructions. While its memory retention for instructions is moderate, it excels at following them.

**GPT-4-32k** is akin to the 10-12-year-old child with an extended memory. It comprehends lengthy sets of instructions and engages in meaningful conversations. Thanks to its robust memory, it offers detailed responses.

Understanding this analogy above will become clearer as you complete the final notebook.


# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

But, does this solve all the possible scenarios that a virtual assistant will require?  **What about if the answer to the Smart Search Engine is not related to text, but instead requires to look into tabular data?** The next notebook explains and solves the tabular problem and the concept of Agents