# Understanding Memory in LLMs

In the previous Notebook, we successfully explored how OpenAI models can enhance the results from Azure Cognitive Search. 

However, we have yet to discover how to engage in a conversation with the LLM. With [Bing Chat](http://chat.bing.com/), for example, this is possible, as it can understand and reference the previous responses.

There is a common misconception that GPT models have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

In this Notebook, our goal is to illustrate how we can effectively "endow the LLM with memory" by employing prompts and context.

In [2]:
import os
import random
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from langchain.embeddings import AzureOpenAIEmbeddings
from langchain.docstore.document import Document
from langchain.memory import CosmosDBChatMessageHistory

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from common.utils import (
    get_search_results,
    update_vector_indexes,
    model_tokens_limit,
    num_tokens_from_docs,
    num_tokens_from_string,
    get_answer,
)

from common.prompts import COMBINE_CHAT_PROMPT_TEMPLATE

from dotenv import load_dotenv
load_dotenv("credentials_NONSMC.env")

import logging

# Get the root logger
logger = logging.getLogger()
# Set the logging level to a higher level to ignore INFO messages
logger.setLevel(logging.WARNING)

In [3]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [4]:
QUESTION = "Tell me some use cases for reinforcement learning"
FOLLOW_UP_QUESTION = "Give me the main points of our conversation"

In [5]:
# Define model
MODEL = "gpt-35-turbo"
COMPLETION_TOKENS = 500
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=COMPLETION_TOKENS)

In [6]:
# We create a very simple prompt template, just the question as is:
prompt = PromptTemplate(
    input_variables=["question"],
    template="{question}",
)

chain = LLMChain(llm=llm, prompt=prompt)

In [7]:
# Let's see what the GPT model responds
response = chain.run(QUESTION)
printmd(response)

1. Autonomous Vehicles: Reinforcement learning can be used to train self-driving cars to make decisions on the road, such as navigating traffic, following traffic rules, and avoiding accidents.

2. Robotics: Reinforcement learning can be applied to train robots to perform complex tasks, such as grasping objects, manipulating tools, or even playing sports.

3. Game Playing: Reinforcement learning has been successfully used to train agents to play various games, including board games like chess and Go, as well as video games like Dota 2 and StarCraft II.

4. Finance: Reinforcement learning can be used in the field of finance to optimize trading strategies, portfolio management, and risk management, by learning from historical data and making decisions based on market conditions.

5. Healthcare: In healthcare, reinforcement learning can be used to optimize treatment plans, such as personalized medication dosage, disease diagnosis, and patient monitoring.

6. Energy Management: Reinforcement learning can be employed to optimize energy consumption and management in smart grids, by learning and adapting to changing energy demands and supply conditions.

7. Advertising: Reinforcement learning can be used in online advertising to optimize the selection and display of ads to users, by learning from user feedback and maximizing the click-through rates or conversions.

8. Recommendation Systems: Reinforcement learning can be applied to build personalized recommendation systems, by learning user preferences and optimizing the selection of recommended items or content.

9. Industrial Process Optimization: Reinforcement learning can be used to optimize complex industrial processes, such as chemical production, supply chain management, and resource allocation, by learning from data and maximizing efficiency.

10. Natural Language Processing: Reinforcement learning can be utilized to train conversational agents or chatbots to interact with users and improve their responses over time, by learning from user feedback and maximizing user satisfaction.

In [8]:
#Now let's ask a follow up question
chain.run(FOLLOW_UP_QUESTION)

"I'm sorry, but as an AI language model, I don't have the capability to recall previous conversations. However, if you provide some context or specific details about our conversation, I can try to help you summarize the main points."

As you can see, it doesn't remember what it just responded, sometimes it responds based only on the system prompt, or just randomly. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [9]:
hist_prompt = PromptTemplate(
    input_variables=["history", "question"],
    template="""
                {history}
                Human: {question}
                AI:
            """
    )
chain = LLMChain(llm=llm, prompt=hist_prompt)

In [10]:
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response)

In [11]:
printmd(chain.run({"history":Conversation_history, "question": FOLLOW_UP_QUESTION}))

- Reinforcement learning can be used in various fields such as autonomous vehicles, robotics, game playing, finance, healthcare, energy management, advertising, recommendation systems, industrial process optimization, and natural language processing.
- It can be applied to train self-driving cars, robots, and game-playing agents.
- In finance, it can optimize trading strategies, portfolio management, and risk management.
- In healthcare, it can optimize treatment plans and patient monitoring.
- It can optimize energy consumption and management in smart grids.
- It can optimize online advertising and personalized recommendation systems.
- It can optimize industrial processes and resource allocation.
- It can train conversational agents to interact with users and improve responses over time.

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

In [29]:
# Since Memory adds tokens to the prompt, we would need a better model that allows more space on the prompt
MODEL = "gpt-35-turbo-16k"
COMPLETION_TOKENS = 2000
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=COMPLETION_TOKENS)
embedder = AzureOpenAIEmbeddings(deployment="text-embedding-ada-002", chunk_size=1) 

In [30]:
index1_name = "cogsrch-index-files"
index2_name = "cogsrch-index-csv"
index3_name = "cogsrch-index-books-vector"
text_indexes = [index1_name, index2_name]
vector_indexes = [index+"-vector" for index in text_indexes] + [index3_name]

In [31]:
%%time

# Search in text-based indexes first and update vector indexes
k=10 # Top k results per each text-based index
ordered_results = get_search_results(QUESTION, text_indexes, k=k, reranker_threshold=1, vector_search=False)
update_vector_indexes(ordered_search_results=ordered_results, embedder=embedder)

# Search in all vector-based indexes available
similarity_k = 5 # top results from multi-vector-index similarity search
ordered_results = get_search_results(QUESTION, vector_indexes, k=k, vector_search=True,
                                        similarity_k=similarity_k,
                                        query_vector = embedder.embed_query(QUESTION))
print("Number of results:",len(ordered_results))

Number of results: 5
CPU times: user 827 ms, sys: 27.9 ms, total: 855 ms
Wall time: 4.58 s


In [32]:
# Uncomment the below line if you want to inspect the ordered results
# ordered_results

In [33]:
top_docs = []
for key,value in ordered_results.items():
    location = value["location"] if value["location"] is not None else ""
    top_docs.append(Document(page_content=value["chunk"], metadata={"source": location+os.environ['BLOB_SAS_TOKEN']}))
        
print("Number of chunks:",len(top_docs))

Number of chunks: 5


In [34]:
# Calculate number of tokens of our docs
if(len(top_docs)>0):
    tokens_limit = model_tokens_limit(MODEL) # this is a custom function we created in common/utils.py
    prompt_tokens = num_tokens_from_string(COMBINE_CHAT_PROMPT_TEMPLATE) # this is a custom function we created in common/utils.py
    context_tokens = num_tokens_from_docs(top_docs) # this is a custom function we created in common/utils.py
    
    requested_tokens = prompt_tokens + context_tokens + COMPLETION_TOKENS
    
    chain_type = "map_reduce" if requested_tokens > 0.9 * tokens_limit else "stuff"  
    
    print("System prompt token count:",prompt_tokens)
    print("Max Completion Token count:", COMPLETION_TOKENS)
    print("Combined docs (context) token count:",context_tokens)
    print("--------")
    print("Requested token count:",requested_tokens)
    print("Token limit for", MODEL, ":", tokens_limit)
    print("Chain Type selected:", chain_type)
        
else:
    print("NO RESULTS FROM AZURE SEARCH")

System prompt token count: 2464
Max Completion Token count: 2000
Combined docs (context) token count: 2123
--------
Requested token count: 6587
Token limit for gpt-35-turbo-16k : 16384
Chain Type selected: stuff


In [35]:
%%time
# Get the answer
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

Reinforcement learning can be used in various use cases such as:

1. Learning prevention strategies in the context of pandemic influenza, specifically in constructing epidemiological models and controlling districts in a community<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[1]</a></sup>.
2. Personalized music recommendation based on reinforcement learning, capturing changes in listeners' preferences and recommending song sequences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[2]</a></sup>.
3. Job scheduling in data centers using deep reinforcement learning, optimizing the allocation of resources over time and space<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206316/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[3]</a></sup>.

These are just a few examples of how reinforcement learning can be applied in different domains.

CPU times: user 21 ms, sys: 931 µs, total: 21.9 ms
Wall time: 8.05 s


And if we ask the follow up question:

In [36]:
response = get_answer(llm=llm, docs=top_docs,  query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

I'm sorry, but I cannot provide the main points of our conversation as it is not mentioned in the provided extracted parts.

You might get a different response from above, but it doesn't matter what response you get, it will be based on the context given, not on previous answers.

Until now we just have the same as the prior Notebook 03: results from Azure Search enhanced by OpenAI model, with no memory

**Now let's add memory to it:**

Reference: https://python.langchain.com/docs/modules/memory/how_to/adding_memory_chain_multiple_inputs

In [37]:
# memory object, which is neccessary to track the inputs/outputs and hold a conversation.
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question")

response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning has various use cases in different domains. Here are a few examples:

1. Prevention strategies for epidemics: In the context of pandemic influenza, deep reinforcement learning can be used to automatically learn prevention strategies. By constructing epidemiological models and using reinforcement learning techniques, policies can be learned to control and mitigate the spread of infectious diseases within a community of districts<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[1]</a></sup>.

2. Personalized music recommendation: Reinforcement learning can be used to improve music recommendation systems by considering the simulation of the interaction process between listeners and songs. By continuously updating the model based on listeners' preferences, reinforcement learning algorithms can recommend song sequences that better match the listeners' preferences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[2]</a></sup>.

3. Job scheduling in data centers: Reinforcement learning can be applied to job scheduling in data centers to efficiently allocate multi-dimensional resources over time and space. By using Advantage Actor-Critic (A2C) deep reinforcement learning, scheduling policies can be learned to optimize job allocation and improve performance<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206316/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[3]</a></sup>.

These are just a few examples of how reinforcement learning can be applied in different domains. The key idea is to use an agent that learns from interactions with an environment to make decisions and optimize certain objectives. By using feedback in the form of rewards, the agent can learn to take actions that maximize its long-term cumulative reward.

In [38]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points:

1. Reinforcement learning has various use cases in different domains.
2. In the context of pandemic influenza, deep reinforcement learning can be used to automatically learn prevention strategies.
3. Reinforcement learning can be applied to improve music recommendation systems by considering the simulation of the interaction process between listeners and songs.
4. Reinforcement learning can be used for job scheduling in data centers to efficiently allocate multi-dimensional resources over time and space.
5. The key idea of reinforcement learning is to use an agent that learns from interactions with an environment to make decisions and optimize certain objectives.
6. By using feedback in the form of rewards, the agent can learn to take actions that maximize its long-term cumulative reward.

These points summarize the various use cases and the fundamental concept of reinforcement learning that we discussed.

References:
[1]<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[1]</a></sup> 
[2]<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[2]</a></sup> 
[3]<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206316/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[3]</a></sup>

Is there anything else I can assist you with?

In [39]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
printmd(response['output_text'])

I'm sorry, but I couldn't find any relevant information in the extracted parts that directly answers your question about the main points of our conversation. Is there anything else I can assist you with?

You might get a different answer on the above cell, and it is ok, this bot is not yet well configured to answer any question that is not related to its knowledge base, including salutations.

Let's check our memory to see that it's keeping the conversation

In [40]:
memory.buffer

'Human: Tell me some use cases for reinforcement learning\nAI: Reinforcement learning has various use cases in different domains. Here are a few examples:\n\n1. Prevention strategies for epidemics: In the context of pandemic influenza, deep reinforcement learning can be used to automatically learn prevention strategies. By constructing epidemiological models and using reinforcement learning techniques, policies can be learned to control and mitigate the spread of infectious diseases within a community of districts<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[1]</a></sup>.\n\n2. Personalized music recommendation: Reinforcement learning can be used to improve music recommendation systems by considering the simulation of the interaction process between listeners and songs. By continuously updating the model based on liste

## Using CosmosDB as persistent memory

In previous cell we have added local RAM memory to our chatbot. However, it is not persistent, it gets deleted once the app user's session is terminated. It is necessary then to use a Database for persistent storage of each of the bot user conversations, not only for Analytics and Auditing, but also if we wisg to provide recommendations. 

Here we will store the conversation history into CosmosDB for future auditing purpose.
We will use a class in LangChain use CosmosDBChatMessageHistory, see [HERE](https://python.langchain.com/en/latest/_modules/langchain/memory/chat_message_histories/cosmos_db.html)

In [41]:
# Create CosmosDB instance from langchain cosmos class.
cosmos = CosmosDBChatMessageHistory(
    cosmos_endpoint=os.environ['AZURE_COSMOSDB_ENDPOINT'],
    cosmos_database=os.environ['AZURE_COSMOSDB_NAME'],
    cosmos_container=os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
    connection_string=os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
    session_id="Agent-Test-Session" + str(random.randint(1, 1000)),
    user_id="Agent-Test-User" + str(random.randint(1, 1000))
    )

# prepare the cosmosdb instance
cosmos.prepare_cosmos()

In [42]:
# Create or Memory Object
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question",chat_memory=cosmos)

In [43]:
# Testing using our Question
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning has various use cases across different domains. Here are a few examples:

1. **Prevention strategies for epidemics**: Reinforcement learning can be used to automatically learn prevention strategies in the context of pandemics, such as pandemic influenza. By constructing epidemiological models and using reinforcement learning algorithms like Proximal Policy Optimization, deep reinforcement learning can learn mitigation policies in complex epidemiological models with a large state space<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[1]</a></sup>.

2. **Personalized music recommendation**: Reinforcement learning can improve personalized music recommendation systems by considering the simulation of the interaction process. By using techniques like weighted matrix factorization and convolutional neural networks, reinforcement learning algorithms can recommend song sequences that better match listeners' preferences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[2]</a></sup>.

3. **Efficient job scheduling in data centers**: Reinforcement learning can be used for job scheduling in data centers. Algorithms like Advantage Actor-Critic (A2C) can automatically learn scheduling policies and reduce estimation errors. A2C-based approaches have shown competitive scheduling performance using both simulated workloads and real data collected from academic data centers<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206316/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[3]</a></sup>.

These are just a few examples of the use cases for reinforcement learning. It is a versatile approach that can be applied to various domains and problems. Let me know if there's anything else I can help with!

References:
1. [1]
2. [2]
3. [3]

In [44]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points:

1. Reinforcement learning has various use cases across different domains.
2. One use case is the application of reinforcement learning in prevention strategies for epidemics, such as pandemic influenza. Deep reinforcement learning algorithms like Proximal Policy Optimization can learn mitigation policies in complex epidemiological models with a large state space<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[1]</a></sup>.
3. Another use case is personalized music recommendation. Reinforcement learning algorithms can improve personalized music recommendation systems by considering the simulation of the interaction process and recommending song sequences that better match listeners' preferences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[2]</a></sup>.
4. Reinforcement learning can also be applied to efficient job scheduling in data centers. Algorithms like Advantage Actor-Critic (A2C) can automatically learn scheduling policies and reduce estimation errors<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206316/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[3]</a></sup>.

These are just a few examples of the use cases for reinforcement learning. It is a versatile approach that can be applied to various domains and problems.

References:
1. [1]
2. [2]
3. [3]

Let me know if there's anything else I can help with!

In [45]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
printmd(response['output_text'])

I'm sorry, but I couldn't find any relevant information in the extracted parts that directly answers your question about the main points of our conversation. The extracted parts mainly discuss reinforcement learning and its applications in different domains, such as prevention strategies for epidemics, personalized music recommendation, and efficient job scheduling in data centers. If you have any specific questions or need further clarification on any of the topics we discussed, please let me know and I'll be happy to help!

Let's check our Azure CosmosDB to see the whole conversation


In [46]:
#load message from cosmosdb
cosmos.load_messages()
cosmos.messages

[HumanMessage(content='Tell me some use cases for reinforcement learning'),
 AIMessage(content='Reinforcement learning has various use cases across different domains. Here are a few examples:\n\n1. **Prevention strategies for epidemics**: Reinforcement learning can be used to automatically learn prevention strategies in the context of pandemics, such as pandemic influenza. By constructing epidemiological models and using reinforcement learning algorithms like Proximal Policy Optimization, deep reinforcement learning can learn mitigation policies in complex epidemiological models with a large state space<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[1]</a></sup>.\n\n2. **Personalized music recommendation**: Reinforcement learning can improve personalized music recommendation systems by considering the simulation of the i

![CosmosDB Memory](./images/cosmos-chathistory.png)

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We added persitent memory using CosmosDB.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, it searches for similar docs everytime, regardless of the input and it struggles to respond to prompts like: Hello, Thank you, Bye, What's your name, What's the weather and any other task that is not search in the knowledge base.


## <u>Important Note</u>:<br>
As we proceed, while all the code will remain compatible with GPT-3.5 models, we highly recommend transitioning to GPT-4. Here's why:

**GPT-3.5-Turbo** can be likened to a 7-year-old child. You can provide it with concise instructions, but it frequently struggles to follow them accurately. Additionally, its limited memory can make sustained conversations challenging.

**GPT-3.5-Turbo-16k** resembles the same 7-year-old, but with an increased attention span for longer instructions. However, it still faces difficulties accurately executing them about half the time.

**GPT-4** exhibits the capabilities of a 10-12-year-old child. It possesses enhanced reasoning skills and more consistently adheres to instructions. While its memory retention for instructions is moderate, it excels at following them.

**GPT-4-32k** is akin to the 10-12-year-old child with an extended memory. It comprehends lengthy sets of instructions and engages in meaningful conversations. Thanks to its robust memory, it offers detailed responses.

Understanding this analogy above will become clearer as you complete the final notebook.


# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

But, does this solve all the possible scenarios that a virtual assistant will require?  **What about if the answer to the Smart Search Engine is not related to text, but instead requires to look into tabular data?** The next notebook explains and solves the tabular problem and the concept of Agents