# Understanding Memory in LLMs

In the previous Notebooks, we successfully explored how OpenAI models can enhance the results from Azure AI Search queries. 

However, we have yet to discover how to engage in a conversation with the LLM. With [Bing Chat](http://chat.bing.com/), for example, this is possible, as it can understand and reference the previous responses.

There is a common misconception that LLMs (Large Language Models) have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

In this Notebook, our goal is to illustrate how we can effectively "endow the LLM with memory" by employing prompts and context.

In [1]:
import os
import random
from langchain_community.chat_message_histories import ChatMessageHistory, CosmosDBChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables import ConfigurableFieldSpec
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from operator import itemgetter
from typing import List

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from common.utils import CustomAzureSearchRetriever, get_answer
from common.prompts import DOCSEARCH_PROMPT

from dotenv import load_dotenv
load_dotenv("credentials.env")

import logging

# Get the root logger
logger = logging.getLogger()
# Set the logging level to a higher level to ignore INFO messages
logger.setLevel(logging.WARNING)

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [3]:
QUESTION = "Tell me some use cases for reinforcement learning"
FOLLOW_UP_QUESTION = "What was my prior question?"

In [4]:
COMPLETION_TOKENS = 1000
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=os.environ["GPT4oMINI_DEPLOYMENT_NAME"], 
                      temperature=0.5, max_tokens=COMPLETION_TOKENS)

In [5]:
# We create a very simple prompt template, just the question as is:
output_parser = StrOutputParser()
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an assistant that give thorough responses to users."),
    ("user", "{input}")
])

In [6]:
# Let's see what the GPT model responds
chain = prompt | llm | output_parser
response_to_initial_question = chain.invoke({"input": QUESTION})
display(Markdown(response_to_initial_question))

Reinforcement Learning (RL) is a powerful area of machine learning where an agent learns to make decisions by taking actions in an environment to maximize cumulative rewards. Here are some notable use cases across various domains:

### 1. **Robotics**
   - **Autonomous Navigation**: Robots can learn to navigate through complex environments, avoiding obstacles and optimizing paths.
   - **Manipulation Tasks**: RL can be used to train robots to perform tasks like grasping objects or assembling parts, adapting to different shapes and weights.

### 2. **Gaming**
   - **Game AI**: RL has been used to develop AI that can play games at superhuman levels, such as AlphaGo for Go and OpenAI Five for Dota 2.
   - **Procedural Content Generation**: RL can be used to adapt game environments dynamically based on player behavior, enhancing engagement.

### 3. **Healthcare**
   - **Personalized Treatment Plans**: RL can optimize treatment strategies for chronic diseases by learning from patient responses to different interventions.
   - **Drug Discovery**: It can help in identifying promising drug candidates by optimizing molecular structures through simulation.

### 4. **Finance**
   - **Algorithmic Trading**: RL can be employed to develop trading strategies that adapt to market conditions by learning from historical data.
   - **Portfolio Management**: It can optimize asset allocation by learning the best strategies based on market movements and risk factors.

### 5. **Natural Language Processing**
   - **Dialogue Systems**: RL can enhance chatbots and virtual assistants by optimizing responses based on user satisfaction and engagement metrics.
   - **Text Summarization**: RL can help in generating concise summaries by maximizing relevance and coherence from the original text.

### 6. **Autonomous Vehicles**
   - **Driving Policies**: RL can be used to develop driving algorithms that learn from real-world driving scenarios, improving safety and efficiency.
   - **Traffic Management**: It can optimize traffic signal timings and routing to reduce congestion and improve flow.

### 7. **Recommendation Systems**
   - **Dynamic Recommendations**: RL can optimize recommendations in real-time based on user interactions, improving user engagement and satisfaction.
   - **Content Personalization**: It can be used to personalize content delivery on platforms like streaming services, adapting to user preferences over time.

### 8. **Energy Management**
   - **Smart Grids**: RL can optimize energy distribution in smart grids, balancing supply and demand efficiently.
   - **Building Energy Management**: It can learn to manage heating, ventilation, and air conditioning (HVAC) systems to minimize energy consumption while maintaining comfort.

### 9. **Manufacturing**
   - **Supply Chain Optimization**: RL can help in managing inventory levels and production schedules, reducing costs and improving efficiency.
   - **Quality Control**: It can be used to optimize inspection processes, learning to identify defects in products through trial and error.

### 10. **Education**
   - **Adaptive Learning Systems**: RL can tailor educational content to individual learning styles and paces, enhancing student engagement and effectiveness.
   - **Game-based Learning**: It can optimize learning pathways in educational games, adapting to the player’s progress and difficulties.

### Conclusion
These use cases demonstrate the versatility and potential of reinforcement learning across various sectors. By leveraging the trial-and-error learning process, RL can optimize complex decision-making tasks, leading to improved efficiency, effectiveness, and user satisfaction in numerous applications.

In [7]:
#Now let's ask a follow up question
printmd(chain.invoke({"input": FOLLOW_UP_QUESTION}))

I'm unable to access previous interactions or questions. However, I'm here to help you with any new questions or topics you'd like to discuss! What can I assist you with today?

As you can see, it doesn't remember what it just responded, sometimes it responds based only on the system prompt, or just randomly. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [8]:
hist_prompt = ChatPromptTemplate.from_template(
"""
    {history}
    Human: {question}
    AI:
"""
)
chain = hist_prompt | llm | output_parser

In [9]:
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response_to_initial_question)

In [10]:
printmd(chain.invoke({"history":Conversation_history, "question": FOLLOW_UP_QUESTION}))

Your prior question was asking for some use cases for reinforcement learning.

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

From Langchain website:
    
A memory system needs to support two basic actions: reading and writing. Recall that every chain defines some core execution logic that expects certain inputs. Some of these inputs come directly from the user, but some of these inputs can come from memory. A chain will interact with its memory system twice in a given run.

    AFTER receiving the initial user inputs but BEFORE executing the core logic, a chain will READ from its memory system and augment the user inputs.
    AFTER executing the core logic but BEFORE returning the answer, a chain will WRITE the inputs and outputs of the current run to memory, so that they can be referred to in future runs.
    
So this process adds delays to the response, but it is a necessary delay :)

![image](./images/memory_diagram.png)

In [22]:
index1_name = "srch-index-files"
index2_name = "srch-index-csv"
index3_name = "srch-index-books"
indexes = [index1_name, index2_name, index3_name]

In [23]:
# Initialize our custom retriever 
retriever = CustomAzureSearchRetriever(indexes=indexes, topK=10, reranker_threshold=1)

If you check closely in prompts.py, there is an optional variable in the `DOCSEARCH_PROMPT` called `history`. Now it is the time to use it. It is basically a place holder were we will inject the conversation in the prompt so the LLM is aware of it before it answers.

**Now let's add memory to it:**

In [24]:
store = {} # Our first memory will be a dictionary in memory

# We have to define a custom function that takes a session_id and looks somewhere
# (in this case in a dictionary in memory) for the conversation
def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


In [25]:
# We use our original chain with the retriever but removing the StrOutputParser
chain = (
    {
        "context": itemgetter("question") | retriever, 
        "question": itemgetter("question"),
        "history": itemgetter("history")
    }
    | DOCSEARCH_PROMPT
    | llm
)

## Then we pass the above chain to another chain that adds memory to it

output_parser = StrOutputParser()

chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
) | output_parser

In [26]:
# This is where we configure the session id
config={"configurable": {"session_id": "abc123"}}

Notice below, that we are adding a `history` variable in the call. This variable will hold the chat historywithin the prompt.

In [27]:
printmd(chain_with_history.invoke({"question": QUESTION}, config=config))

Reinforcement learning (RL) has a variety of applications across different domains. Here are some notable use cases:

1. **Epidemic Modeling and Control:** RL can be utilized to model the spread of infectious diseases and to formulate optimal intervention strategies. For instance, a multi-agent epidemic model allows individual agents to make decisions that affect disease transmission, which can be optimized using game theory and reinforcement learning techniques [[1]](https://arxiv.org/pdf/2004.12959v1.pdf).

2. **Lockdown Decision Making During Pandemics:** In the context of the COVID-19 pandemic, RL algorithms can automatically compute lockdown decisions for specific cities or regions. These policies are based on various disease parameters and population characteristics, balancing health and economic considerations [[2]](https://arxiv.org/pdf/2003.14093v2.pdf).

3. **Preventive Strategies for Influenza:** A deep reinforcement learning approach has been developed to learn prevention strategies for pandemic influenza. This involves a meta-population model that captures the infection process and uses RL to learn effective mitigation policies across interconnected districts [[3]](https://arxiv.org/pdf/2003.13676v1.pdf).

4. **Personalized Recommendation Systems:** RL can enhance recommendation systems by predicting user preferences and adapting recommendations based on user interactions. For example, a hybrid recommendation algorithm uses reinforcement learning to recommend song sequences that better match listeners' evolving preferences [[4]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/).

5. **Fairness in Interactive Recommender Systems:** To address bias and discrimination in recommendations, an RL-based framework has been proposed to maintain a balance between accuracy and fairness dynamically. This approach allows the system to adapt to changing user preferences and fairness considerations over time [[5]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206277/).

6. **Job Scheduling in Data Centers:** RL methods can be applied to optimize job scheduling in data centers, where multi-dimensional resources need to be allocated efficiently. A specific approach called A2cScheduler employs deep reinforcement learning to improve scheduling performance in complex computing environments [[6]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206316/).

7. **Automatic Feature Engineering in Machine Learning:** Reinforcement learning can also be used to automate the feature engineering process, which is often time-consuming and requires expert knowledge. A framework called CAFEM employs RL to optimize feature transformation strategies across different datasets [[7]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/).

These use cases illustrate the versatility and effectiveness of reinforcement learning in solving complex problems across various fields, from healthcare to technology and beyond.

In [28]:
# Remembers
printmd(chain_with_history.invoke({"question": FOLLOW_UP_QUESTION},config=config))

Your prior question was, "Tell me some use cases for reinforcement learning."

In [29]:
# Remembers
printmd(chain_with_history.invoke({"question": "Thank you! Good bye"},config=config))

Empty Search Response


You're welcome! If you have more questions in the future, feel free to ask. Goodbye and take care!

## Using CosmosDB as persistent memory

In previous cell we have added local RAM memory to our chatbot. However, it is not persistent, it gets deleted once the app user's session is terminated. It is necessary then to use a Database for persistent storage of each of the bot user conversations, not only for Analytics and Auditing, but also if we wish to provide recommendations in the future. 

Here we will store the conversation history into CosmosDB for future auditing purpose.
We will use a class in LangChain use CosmosDBChatMessageHistory

In [37]:
# Create the function to retrieve the conversation

def get_session_history(session_id: str, user_id: str) -> CosmosDBChatMessageHistory:
    cosmos = CosmosDBChatMessageHistory(
        cosmos_endpoint=os.environ['AZURE_COSMOSDB_ENDPOINT'],
        cosmos_database=os.environ['AZURE_COSMOSDB_NAME'],
        cosmos_container=os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
        connection_string=os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
        session_id=session_id,
        user_id=user_id
        )

    # prepare the cosmosdb instance
    cosmos.prepare_cosmos()
    return cosmos


In [38]:
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="user_id",
            annotation=str,
            name="User ID",
            description="Unique identifier for the user.",
            default="",
            is_shared=True,
        ),
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="Unique identifier for the conversation.",
            default="",
            is_shared=True,
        ),
    ],
) | output_parser

In [39]:
# This is where we configure the session id and user id
random_session_id = "session"+ str(random.randint(1, 1000))
ramdom_user_id = "user"+ str(random.randint(1, 1000))

config={"configurable": {"session_id": random_session_id, "user_id": ramdom_user_id}}

In [40]:
config

{'configurable': {'session_id': 'session988', 'user_id': 'user220'}}

In [41]:
printmd(chain_with_history.invoke({"question": QUESTION}, config=config))

Reinforcement learning (RL) has a wide range of applications across various fields. Here are some notable use cases:

1. **Epidemic Modeling**: RL can be utilized to model and predict the spread of infectious diseases. For instance, a multi-agent epidemic model allows agents to make decisions that affect disease transmission. By applying game theory and reinforcement learning, optimal decisions can be derived to predict disease spread and necessitate external interventions for better regulation of agent behaviors [[1]](https://arxiv.org/pdf/2004.12959v1.pdf).

2. **Lockdown Policy Optimization**: In the context of pandemics like COVID-19, RL algorithms can compute lockdown decisions for cities or regions. These policies are learned automatically based on disease parameters and population characteristics, balancing health and economic considerations while accounting for the realities of imperfect lockdowns [[2]](https://arxiv.org/pdf/2003.14093v2.pdf).

3. **Prevention Strategies for Infectious Diseases**: RL techniques can be applied to learn prevention strategies in complex epidemiological models, such as pandemic influenza. By using deep reinforcement learning, effective mitigation policies can be developed to control the spread of diseases across multiple districts [[3]](https://arxiv.org/pdf/2003.13676v1.pdf).

4. **Music Recommendation Systems**: A personalized hybrid recommendation algorithm based on RL can enhance music recommendations by simulating the interaction process of listeners. This approach captures subtle changes in listener preferences, improving the recommendation of song sequences [[4]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/).

5. **Fairness in Interactive Recommender Systems**: RL frameworks can maintain a balance between accuracy and fairness in recommendation systems by dynamically adapting to changes in user preferences and fairness status. This ensures that recommendations are both fair and of high quality [[5]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206277/).

6. **Job Scheduling in Data Centers**: An RL-based approach called A2cScheduler can be used for efficient job scheduling in data centers. This method employs deep reinforcement learning to manage resource allocation effectively, adapting to complex computing environments [[6]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206316/).

7. **Automatic Feature Engineering**: RL can also play a role in feature engineering for machine learning projects. A framework called Cross-data Automatic Feature Engineering Machine (CAFEM) utilizes RL to optimize feature generation across different datasets, improving the efficiency and performance of machine learning models [[7]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/).

These examples illustrate the versatility of reinforcement learning in addressing complex decision-making problems across various domains.

In [42]:
# Remembers
printmd(chain_with_history.invoke({"question": FOLLOW_UP_QUESTION},config=config))

Your prior question was about the use cases for reinforcement learning.

In [43]:
# Remembers
printmd(chain_with_history.invoke(
    {"question": "Can you tell me a one line summary of our conversation?"},
    config=config))

We discussed various use cases for reinforcement learning, including applications in epidemic modeling, lockdown policy optimization, music recommendation systems, and job scheduling in data centers.

In [47]:
try:
    printmd(chain_with_history.invoke(
    {"question": "Thank you very much!"},
    config=config))
except Exception as e:
    print(e)

Empty Search Response


You're very welcome! If you have any more questions or need assistance, feel free to ask. Enjoy your day!

In [48]:
printmd(chain_with_history.invoke(
    {"question": "I do have one more question, why did you give me a one line summary?"},
    config=config))

Empty Search Response


I provided a one-line summary in response to your request for a concise recap of our conversation about the use cases for reinforcement learning. If you have any further questions or need clarification, feel free to ask!

In [49]:
printmd(chain_with_history.invoke(
    {"question": "why not 2?"},
    config=config))

Empty Search Response


I aimed to keep it brief and focused, as you specifically requested a one-line summary. However, if you prefer a two-line summary or more detail, I can certainly provide that! Would you like me to expand on it?

#### Let's check our Azure CosmosDB to see the whole conversation


![CosmosDB Memory](./images/cosmos-chathistory.png)

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We added persitent memory using CosmosDB.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, many times it searches for similar docs everytime, regardless of the input. This doesn't seem efficient, but regardless, we are very close to finish our first RAG-talk to your data Bot.

# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

In the next notebook 6, we are going to build our first RAG bot. In order to do this we will introduce the concept of Agents.