# LangChain: Memory

- LLMs are *stateless* = each transaction is independent
- Chatbots appear to have memory by providing the full conversation (history) as context

- **memory** = how you remember previous parts of a conversation and feed that into the language model so they can have a conversational flow each time you interact with them
- memory input is used as additional context so the LLM can generate output like it generating the next conversational turn knowing what's been said before
- **HOWEVER**, as conversation becomes long, amount of memory becomes large, thus also **cost** of sending many tokens to the LLM
- LangChain provides several ways to store memory and accumulate the conversation

## Outline
* **ConversationBufferMemory** *(entire conversation from start to end)*
* **ConversationBufferWindowMemory** *(specify window for memory, e.g. `k=1` conversational exchanges (=1 user utterance + 1 LLM response)*
* **ConversationTokenBufferMemory** *(limit the number of tokens in memory `max_token_limit=x`)*
* **ConversationSummaryMemory** *(let the LLM write a summary of the current conversation and use this as memory, specify `max_token_limit=x`)*

**Additional memory types:**

* **Vector data memory** --> *for storing embeddings*
    - stores text (from conversation or elsewhere) in a vector database and retrieves the most relevant block of text
* **Entity memories**
    - using an LLM, remembers details about specific entities</br></br>

- you can use multiple memories at one time, e.g., ConversationMemory + EntityMemory to recall individuals
- you can also store the conversation in a conventional database (e.g., key-value store or SQL)

## ConversationBufferMemory 
### *(entire conversation from start to end)*

In [None]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

import warnings
warnings.filterwarnings('ignore')

In [None]:
# account for deprecation of LLM model
import datetime
# Get the current date
current_date = datetime.datetime.now().date()

# Define the date after which the model should be set to "gpt-3.5-turbo"
target_date = datetime.date(2024, 6, 12)

# Set the model variable based on the current date
if current_date > target_date:
    llm_model = "gpt-3.5-turbo"
else:
    llm_model = "gpt-3.5-turbo-0301"

In [4]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory


In [None]:
# temperature: degree of randomness (default 0.7)
llm = ChatOpenAI(temperature=0.0, model=llm_model)
memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

# set 'verbose=False' to not output the actual chain when running 'conversation.predict()''

In [None]:
conversation.predict(input="Hi, my name is Max.")

**Probably gets this one wrong. But you can run it repeatedly and see that the response changes, even if the value stays wrong.**

In [None]:
conversation.predict(input="What is the sum of digits of 111111911111?")

**It will remember your name**

In [None]:
conversation.predict(input="What is my name?")

**View the chat history (from the `memory` variable declared above)**

In [None]:
print(memory.buffer)

**Load memory variables**
 - *notice the empty `dict()` as input for the function (the output of the function will be stored there)*

In [None]:
memory.load_memory_variables({})
# memory.load_memory_variables(dict())

**You can explicitly add things to the memory, if you wish to do so**

In [5]:
memory = ConversationBufferMemory()

In [6]:
memory.save_context({"input": "Hi"}, 
                    {"output": "What's up"})

In [7]:
print(memory.buffer)

Human: Hi
AI: What's up


In [8]:
memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up"}

In [9]:
memory.save_context({"input": "Not much, just hanging"}, 
                    {"output": "Cool"})

In [10]:
memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up\nHuman: Not much, just hanging\nAI: Cool"}

## ConversationBufferWindowMemory

### *(specify window for memory, e.g. `k=1` conversational exchanges ( = 1 user utterance + 1 LLM response)*

In [11]:
from langchain.memory import ConversationBufferWindowMemory

In [12]:
memory = ConversationBufferWindowMemory(k=1)               

**Try to save `k=2` conversational exchanges to memory**

In [14]:
memory.save_context({"input": "Hi"},
                    {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})


**Only the last exchange `k=1` will be remembered**

In [15]:
memory.load_memory_variables({})

{'history': 'Human: Not much, just hanging\nAI: Cool'}

**Try the same conversation from before again, but this time with `k=1` window**

In [None]:
llm = ChatOpenAI(temperature=0.0, model=llm_model)
memory = ConversationBufferWindowMemory(k=1)
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

# try this with 'verbose=True' to see the actual chain output

In [None]:
conversation.predict(input="Hi, my name is Max")

In [None]:
conversation.predict(input="What is 1+1?")

**This time, it won't remember your name anymore**

In [None]:
conversation.predict(input="What is my name?")

## ConversationTokenBufferMemory

### *(limit the number of tokens in memory `max_token_limit=x`)*

In [None]:
#!pip install tiktoken

In [None]:
from langchain.memory import ConversationTokenBufferMemory
from langchain.llms import OpenAI
llm = ChatOpenAI(temperature=0.0, model=llm_model)

In [None]:
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=25)
memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"}, 
                    {"output": "Charming!"})

In [None]:
memory.load_memory_variables({})

## ConversationSummaryMemory
### *(let the LLM write a summary of the current conversation and use this as memory)*

In [18]:
from langchain.memory import ConversationSummaryBufferMemory

**Example for a long string (for a long conversation)**
- **Specify `llm=llm` as different LLMs use different ways of counting tokens**
- **Specify `max_token_limit=x`**

In [None]:
schedule = "There is a meeting at 8am with your product team. \
You will need your powerpoint presentation prepared. \
9am-12pm have time to work on your LangChain \
project which will go quickly because Langchain is such a powerful tool. \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."

memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=100)
memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.save_context({"input": "What is on the schedule today?"}, 
                    {"output": f"{schedule}"})

In [None]:
memory.load_memory_variables({})

In [None]:
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

In [None]:
conversation.predict(input="What would be a good demo to show?")

In [None]:
memory.load_memory_variables({})

### Source: https://learn.deeplearning.ai/langchain/lesson/2/models,-prompts-and-parsers