# Lesson 5.3: Advanced Memory Types

---

In Lesson 5.2, we learned about **ConversationBufferMemory** and **ConversationBufferWindowMemory**, two basic Memory types for maintaining context. However, for very long conversations, storing the entire history or a fixed window of messages might be inefficient in terms of cost or exceed token limits. This lesson will introduce more advanced Memory types designed to address these challenges.

## 1. ConversationSummaryMemory

### 1.1. Concept and How it Works

**ConversationSummaryMemory** addresses the problem of long conversations by not storing the entire raw messages. Instead, it uses a **Large Language Model (LLM)** to **periodically summarize the conversation**. Only this summary is stored and passed into the LLM's prompt in subsequent turns.

* **Mechanism:**
    1.  After each or a few conversation turns, an LLM is used to generate a concise summary of the conversation up to that point.
    2.  This summary (instead of the raw messages) is stored in memory.
    3.  In subsequent turns, the summary is passed into the prompt along with the most recent messages (if any).



### 1.2. Pros and Cons

* **Pros:**
    * **Long-term Context Retention:** Capable of maintaining context for very long conversations without exceeding token limits, as only the summary is stored.
    * **Reduced Token Cost:** The number of tokens passed to the LLM in each turn is often significantly less than passing the entire history.
    * **Effective for Complex Conversations:** The LLM can focus on the main points of the conversation without being distracted by minor details.

* **Cons:**
    * **Additional Cost and Latency:** Calling an LLM for summarization incurs extra API costs and latency.
    * **Summary Quality Depends on LLM:** If the LLM's summarization is poor, important context might be lost or misinterpreted.
    * **Can Lose Specific Details:** The nature of summarization is to abstract away details, so if you need to query very specific details from older conversation turns, `ConversationSummaryMemory` might not be the best choice.

### 1.3. Practical Example Using ConversationSummaryMemory

We will practice with `ConversationSummaryMemory` to see how it maintains context through summarization.

In [None]:
# Install libraries if not already installed
# pip install langchain-openai openai

import os
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationSummaryMemory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Set environment variable for OpenAI API key
# os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

# 1. Initialize LLM (used for both ConversationChain and summarization)
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7)

# 2. Initialize ConversationSummaryMemory
# Pass the LLM so it can perform summarization
summary_memory = ConversationSummaryMemory(llm=llm)

# 3. Define Prompt for ConversationChain
prompt_summary = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Answer user questions. Maintain conversation context."),
    MessagesPlaceholder(variable_name="history"), # Placeholder for conversation history (will contain summary)
    ("human", "{input}"),
])

# 4. Initialize ConversationChain with Summary Memory
conversation_summary = ConversationChain(
    llm=llm,
    memory=summary_memory,
    prompt=prompt_summary,
    verbose=True # To see the summarization process
)

print("--- Practical Example with ConversationSummaryMemory ---")

# Conversation turn 1
user_input_s1 = "Hello, I'm An. I'm learning about artificial intelligence."
print(f"\nYou: {user_input_s1}")
response_s1 = conversation_summary.invoke({"input": user_input_s1})
print(f"AI: {response_s1['response']}")

# Conversation turn 2
user_input_s2 = "Can you explain machine learning?"
print(f"\nYou: {user_input_s2}")
response_s2 = conversation_summary.invoke({"input": user_input_s2})
print(f"AI: {response_s2['response']}")

# Conversation turn 3
user_input_s3 = "So, how is deep learning different from machine learning?"
print(f"\nYou: {user_input_s3}")
response_s3 = conversation_summary.invoke({"input": user_input_s3})
print(f"AI: {response_s3['response']}")

# Conversation turn 4 (After a few turns, memory will automatically summarize)
user_input_s4 = "Summarize our conversation so far."
print(f"\nYou: {user_input_s4}")
response_s4 = conversation_summary.invoke({"input": user_input_s4})
print(f"AI: {response_s4['response']}")


print("\n--- Summary Content in Memory ---")
# You can check the stored summary content
print(summary_memory.buffer)

**Explanation:**
* You will see in the `verbose=True` output that after a certain number of conversation turns, the LLM will be invoked to generate a summary. This summary is then used in place of older raw messages.
* When you ask to "Summarize our conversation so far," the LLM will use the summary in memory (along with the most recent messages if any) to provide the answer.

## 2. ConversationSummaryBufferMemory (Revisiting and Clarifying)

Although briefly introduced in Lesson 5.1, **ConversationSummaryBufferMemory** is a very important advanced Memory type that combines the advantages of both **ConversationBufferWindowMemory** and **ConversationSummaryMemory**.

### 2.1. Concept and How it Works

**ConversationSummaryBufferMemory** retains a certain number of the most recent messages in a buffer (like `ConversationBufferWindowMemory`) and summarizes older messages when the total token count exceeds a certain threshold (like `ConversationSummaryMemory`).

* **`max_token_limit`**: The primary parameter, defining the maximum total tokens the memory will attempt to maintain. When this threshold is exceeded, the oldest messages will be summarized.



### 2.2. Pros and Cons

* **Pros:**
    * **Optimal Balance:** Provides detailed context for recent conversation turns and general context for the rest of the conversation.
    * **Efficient Token Control:** Very good for managing LLM token limits while still retaining long-term context.
    * **Good User Experience:** Maintains conversation fluidity without losing too much information.

* **Cons:**
    * **More Complex:** Requires an LLM to perform summarization, which can be more costly and slightly slower than simple buffer memory types.
    * **Summary Quality:** Depends on the quality of the LLM used for summarization.

### 2.3. Practical Example Using ConversationSummaryBufferMemory

In [None]:
# Install libraries if not already installed
# pip install langchain-openai openai

import os
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationSummaryBufferMemory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Set environment variable for OpenAI API key
# os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

llm_buffer_summary = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7)

# Initialize ConversationSummaryBufferMemory with max_token_limit
# It will summarize when the total token count exceeds the limit
summary_buffer_memory = ConversationSummaryBufferMemory(
    llm=llm_buffer_summary,
    max_token_limit=150 # Token limit for the buffer
)

prompt_buffer_summary = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Answer user questions. Maintain conversation context."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}"),
])

conversation_buffer_summary = ConversationChain(
    llm=llm_buffer_summary,
    memory=summary_buffer_memory,
    prompt=prompt_buffer_summary,
    verbose=True
)

print("\n--- Practical Example with ConversationSummaryBufferMemory ---")

# Conversation turn 1
user_input_bs1 = "Hello, I'm Minh. I'm learning about wild animals."
print(f"\nYou: {user_input_bs1}")
response_bs1 = conversation_buffer_summary.invoke({"input": user_input_bs1})
print(f"AI: {response_bs1['response']}")

# Conversation turn 2
user_input_bs2 = "Can you name some wild animals in Vietnam?"
print(f"\nYou: {user_input_bs2}")
response_bs2 = conversation_buffer_summary.invoke({"input": user_input_bs2})
print(f"AI: {response_bs2['response']}")

# Conversation turn 3 (might trigger summarization if max_token_limit is exceeded)
user_input_bs3 = "What are the prominent characteristics of the Indochinese tiger?"
print(f"\nYou: {user_input_bs3}")
response_bs3 = conversation_buffer_summary.invoke({"input": user_input_bs3})
print(f"AI: {response_bs3['response']}")

# Conversation turn 4 (check if memory remembers the name)
user_input_bs4 = "What's my name?"
print(f"\nYou: {user_input_bs4}")
response_bs4 = conversation_buffer_summary.invoke({"input": user_input_bs4})
print(f"AI: {response_bs4['response']}")

print("\n--- Content in ConversationSummaryBufferMemory ---")
print(summary_buffer_memory.buffer)

**Explanation:**
* You will see that `ConversationSummaryBufferMemory` will try to keep recent messages in raw form and summarize older messages when the total token count exceeds `max_token_limit`.
* This helps maintain both short-term details and long-term context effectively.

## 3. ConversationKGMemory (Optional)

### 3.1. Concept and How it Works

**ConversationKGMemory** (Knowledge Graph Memory) is a more advanced Memory type that not only stores messages but also builds a **Knowledge Graph (KG)** from the conversation. This KG contains entities and relationships extracted from conversation turns.

* **Mechanism:**
    1.  An LLM is used to extract entities and relationships from each new conversation turn.
    2.  These entities and relationships are added to an internal knowledge graph.
    3.  When the LLM needs context, it can query the knowledge graph to retrieve relevant information, or the entire graph can be serialized and passed into the prompt.



### 3.2. Pros and Cons

* **Pros:**
    * **Specific Information Query:** Very effective when you need to query specific information or relationships from the conversation history (e.g., "Who is An's friend?", "When was Company X founded?").
    * **Effective for Complex Context:** Can capture complex relationships between concepts in the conversation.
    * **Reduces Hallucinations:** By relying on extracted facts, it can reduce the LLM's tendency to fabricate information.

* **Cons:**
    * **More Complex:** Requires an LLM for KG extraction, incurring additional cost and latency.
    * **Requires Powerful LLM:** The quality of the KG depends on the LLM's extraction capabilities.
    * **Not Always Necessary:** For simple conversations, the benefits might not outweigh the cost and complexity.

### 3.3. Practical Example Using ConversationKGMemory (Conceptual)

Setting up `ConversationKGMemory` is more complex and might require additional dependencies (like `networkx` for graph representation). Here's a conceptual example:

In [None]:
# Install libraries if not already installed (might need networkx)
# pip install langchain-openai openai networkx

# import os
# from langchain_openai import ChatOpenAI
# from langchain.chains import ConversationChain
# from langchain.memory import ConversationKGMemory
# from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# # Set environment variable for OpenAI API key
# # os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

# llm_kg = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0) # Often use low temperature for extraction

# # Initialize ConversationKGMemory
# # It will use the LLM to extract entities and relationships
# kg_memory = ConversationKGMemory(llm=llm_kg, verbose=True)

# # Define Prompt for ConversationChain
# prompt_kg = ChatPromptTemplate.from_messages([
#     ("system", "You are a helpful assistant. Answer user questions. Based on known facts: {kg_triplets}"), # kg_triplets is placeholder for knowledge graph
#     MessagesPlaceholder(variable_name="history"), # Still can have raw message history
#     ("human", "{input}"),
# ])

# conversation_kg = ConversationChain(
#     llm=llm_kg,
#     memory=kg_memory,
#     prompt=prompt_kg,
#     verbose=True
# )

# print("\n--- Practical Example with ConversationKGMemory (Conceptual) ---")

# # Conversation turn 1
# # user_input_kg1 = "My name is An. I live in Hanoi."
# # print(f"\nUser: {user_input_kg1}")
# # response_kg1 = conversation_kg.invoke({"input": user_input_kg1})
# # print(f"AI: {response_kg1['response']}")

# # Conversation turn 2
# # user_input_kg2 = "My friend is Binh. He works at Google."
# # print(f"\nUser: {user_input_kg2}")
# # response_kg2 = conversation_kg.invoke({"input": user_input_kg2})
# # print(f"AI: {response_kg2['response']}")

# # Conversation turn 3 (Ask about a relationship)
# # user_input_kg3 = "Where does Binh work?"
# # print(f"\nUser: {user_input_kg3}")
# # response_kg3 = conversation_kg.invoke({"input": user_input_kg3})
# # print(f"AI: {response_kg3['response']}")

# print("\n--- Content in ConversationKGMemory (Triplets) ---")
# # print(kg_memory.get_graph_as_string()) # View the knowledge graph as a string

## 4. VectorStoreRetrieverMemory (Optional)

### 4.1. Concept and How it Works

**VectorStoreRetrieverMemory** is the most powerful Memory type for extremely long conversations. Instead of storing the entire history in memory or summarizing them, it stores **each conversation turn (or chunks of them)** in a **Vector Store**. When context is needed, it performs a similarity search within the Vector Store to retrieve the most relevant conversation turns to the current query.

* **Mechanism:**
    1.  Each (input, output) pair of the conversation is converted into an embedding and stored in a Vector Store.
    2.  When a new query arrives, it is also converted into an embedding.
    3.  The Vector Store is queried to find (input, output) pairs in the history that are semantically similar to the current query.
    4.  These relevant conversation turns are then passed into the LLM's prompt.



### 4.2. Pros and Cons

* **Pros:**
    * **Handles Extremely Long Conversations:** Nearly unlimited scalability, as it leverages the capabilities of a Vector Store.
    * **Precise Context Retrieval:** Can semantically retrieve the most relevant historical segments, even if they occurred a very long time ago.
    * **Efficient Token Cost Reduction:** Only passes truly relevant historical segments to the LLM.

* **Cons:**
    * **More Complex to Set Up:** Requires a Vector Store (Chroma, FAISS, Pinecone, etc.) and an embedding model.
    * **More Costly:** Incurs Vector Store storage costs and embedding model call costs for each conversation turn.
    * **Latency:** Searching in the Vector Store can add latency to the response process.

### 4.3. Practical Example Using VectorStoreRetrieverMemory (Conceptual)

Setting up `VectorStoreRetrieverMemory` requires a Vector Store and an embedding model. Here's a conceptual example:

In [None]:
# Install libraries if not already installed
# pip install langchain-openai openai chromadb

# import os
# from langchain_openai import ChatOpenAI, OpenAIEmbeddings
# from langchain.chains import ConversationChain
# from langchain.memory import VectorStoreRetrieverMemory
# from langchain_community.vectorstores import Chroma
# from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# # Set environment variable for OpenAI API key
# # os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

# llm_vs = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7)
# embeddings_vs = OpenAIEmbeddings(model="text-embedding-ada-002")

# # Initialize a Vector Store (e.g., Chroma)
# # persist_directory_vs = "./chroma_vs_memory_db"
# # if os.path.exists(persist_directory_vs):
# #     shutil.rmtree(persist_directory_vs)
# # vector_store_vs = Chroma(embedding_function=embeddings_vs, persist_directory=persist_directory_vs)

# # # Create Retriever from Vector Store
# # retriever_vs = vector_store_vs.as_retriever(search_kwargs={"k": 2}) # Retrieve 2 most relevant segments

# # Initialize VectorStoreRetrieverMemory
# # vs_memory = VectorStoreRetrieverMemory(retriever=retriever_vs)

# # Define Prompt for ConversationChain
# # prompt_vs = ChatPromptTemplate.from_messages([
# #     ("system", "You are a helpful assistant. Answer user questions. Maintain conversation context. Relevant historical segments: {history}"),
# #     ("human", "{input}"),
# # ])

# # conversation_vs = ConversationChain(
# #     llm=llm_vs,
# #     memory=vs_memory,
# #     prompt=prompt_vs,
# #     verbose=True
# # )

# print("\n--- Practical Example with VectorStoreRetrieverMemory (Conceptual) ---")

# # Conversation turn 1
# # user_input_vs1 = "I like science fiction movies. Can you recommend any?"
# # print(f"\nUser: {user_input_vs1}")
# # response_vs1 = conversation_vs.invoke({"input": user_input_vs1})
# # print(f"AI: {response_vs1['response']}")

# # Conversation turn 2
# # user_input_vs2 = "I also like history books. Any good ones?"
# # print(f"\nUser: {user_input_vs2}")
# # response_vs2 = conversation_vs.invoke({"input": user_input_vs2})
# # print(f"AI: {response_vs2['response']}")

# # Conversation turn 3 (Ask about the first preference again)
# # user_input_vs3 = "Do you remember what movie genre I like?"
# # print(f"\nUser: {user_input_vs3}")
# # response_vs3 = conversation_vs.invoke({"input": user_vs3})
# # print(f"AI: {response_vs3['response']}")

# # if os.path.exists(persist_directory_vs):
# #     shutil.rmtree(persist_directory_vs)


---

## Lesson Summary

This lesson expanded your knowledge of **Memory** in LangChain by introducing advanced types. You learned about **ConversationSummaryMemory**, which helps maintain long-term context by summarizing the conversation, and **ConversationSummaryBufferMemory**, an effective combination of window and summarization. Additionally, we introduced the concepts of **ConversationKGMemory** (using a knowledge graph) and **VectorStoreRetrieverMemory** (storing history in a Vector Store), two powerful Memory types for complex and extremely long use cases. Choosing the right Memory type is crucial for balancing context retention, cost control, and performance in your LLM applications.