# Memory

## LangChain Memory Types

Memory in LangChain allows your applications to maintain context across multiple interactions, making conversations more coherent and natural. Let's explore each memory type in detail:

### **ConversationBufferMemory**

This is the simplest and most straightforward memory type. It stores the complete conversation history in a buffer, keeping every single message exchanged between the user and the AI.

**How it works:** Every message (both user inputs and AI responses) is stored sequentially in memory. When the AI needs to respond, it has access to the entire conversation from the beginning.

**Best use cases:** Short conversations where you need complete context, such as customer support chats or brief consultation sessions.

**Limitations:** As conversations grow longer, this memory type consumes more tokens and can eventually hit context window limits. It's like keeping every page of notes from a meeting—eventually, you'll run out of space.

**Example scenario:** A customer asking about their order status, then asking follow-up questions about shipping. The AI needs to remember the order number mentioned at the start.

---

### **ConversationBufferWindowMemory**

This memory type maintains only the most recent K interactions (where K is a number you specify), creating a "sliding window" of conversation history.

**How it works:** Instead of keeping everything, it only remembers the last N messages. For example, if you set the window to 5, it will keep the 5 most recent exchanges and forget anything older.

**Best use cases:** Longer conversations where only recent context matters, like technical troubleshooting or step-by-step tutorials where earlier steps become irrelevant.

**Advantages:** Prevents token count from growing indefinitely while maintaining relevant recent context. It's like taking notes but only keeping the last few pages.

**Trade-off:** You lose access to earlier conversation details, which might be important in some scenarios.

**Example scenario:** A coding assistant helping debug an issue across multiple files. You care about the current file being discussed, not the one from 20 messages ago.

---

### **ConversationTokenBufferMemory**

Rather than tracking a fixed number of messages, this memory type limits conversation history based on token count—the actual "size" of the text.

**How it works:** You set a maximum token limit (e.g., 2000 tokens). The memory keeps as many recent messages as possible without exceeding this limit. Since different messages have different lengths, this provides more flexible memory management.

**Best use cases:** Applications where you need to precisely control API costs or stay within specific model context limits.

**Advantages:** More efficient token usage compared to fixed message windows. A short "yes" uses fewer tokens than a paragraph-long explanation, so you can fit more short messages or fewer long ones.

**Why it matters:** Different AI models have different context windows and pricing based on tokens. This memory type gives you fine-grained control over resource usage.

**Example scenario:** A chatbot that needs to work within a strict budget constraint or support models with smaller context windows.

---

### **VectorStore Memory**

This is the most sophisticated memory type, using semantic similarity rather than recency to retrieve relevant past conversations.

**How it works:** Every conversation is converted into embeddings (numerical representations of meaning) and stored in a vector database. When a new query comes in, the system searches for semantically similar past conversations and retrieves the most relevant ones, regardless of when they occurred.

**Best use cases:** Long-term memory scenarios where you might need to recall information from days or weeks ago, such as personal assistants, medical history tracking, or customer relationship management.

**Advantages:** Can handle virtually unlimited conversation history because it doesn't load everything into context—only what's relevant. It's like having a searchable archive rather than trying to remember everything at once.

**Technical note:** Requires a vector database (like Pinecone, Weaviate, or Chroma) and embedding models to function.

**Example scenario:** A personal AI assistant remembering that you mentioned your daughter's birthday three weeks ago when you now ask for gift recommendations.

---

### **Custom Memory**

LangChain allows you to build your own memory implementations tailored to specific needs that aren't covered by the built-in types.

**When to use:** When you have unique requirements like:
- Combining multiple memory types (e.g., recent buffer + long-term vector store)
- Implementing business logic for what should be remembered or forgotten
- Integrating with existing databases or storage systems
- Creating memory that respects data privacy rules or retention policies

**Implementation approach:** You inherit from LangChain's base memory classes and override methods to define how data is saved, loaded, and retrieved.

**Example scenarios:** 
- A healthcare chatbot that must forget certain sensitive information after 24 hours for compliance
- An e-commerce assistant that prioritizes remembering purchase history over casual conversation
- A multi-user system where each user has isolated memory

---

## ** Where Memory is Required**

### **Chatbots**

Chatbots absolutely need memory to create natural, human-like conversations. Without memory, every message would be treated as the first message, creating a frustrating and disjointed experience.

**Why it's critical:** Users expect chatbots to remember context within a conversation. If a user says "My name is Sarah" and then asks a question, they expect the bot to still know their name without being told again.

**Real-world impact:** Memory transforms a chatbot from a simple Q&A system into a conversational partner. It enables the bot to:
- Maintain topic continuity
- Reference earlier points in the discussion
- Build upon previous answers
- Understand pronouns and references ("it," "that one," "like I mentioned")

**Example:** A banking chatbot where the user asks about their checking account balance, then asks "What about savings?"—the bot needs to remember we're discussing the user's accounts.

---

### **Assistants**

Personal and professional assistants (like virtual secretaries, scheduling assistants, or research helpers) require memory to provide personalized, context-aware assistance over time.

**Why it's essential:** Assistants need to learn user preferences, remember past requests, and maintain awareness of ongoing projects or tasks.

**Memory requirements:**
- **Short-term:** Remember the current task or conversation goal
- **Long-term:** Recall user preferences, past decisions, and historical interactions

**Value proposition:** Memory enables assistants to become more helpful over time, reducing the need for users to repeatedly provide the same information.

**Example:** A research assistant that remembers you're writing a paper on climate change, so when you ask "Find more sources," it knows what topic you mean and can even reference sources it already found for you.

---

### **Agents with Long Workflows**

AI agents that perform multi-step tasks or workflows require memory to track progress, maintain state, and coordinate complex operations.

**Why it's necessary:** Long workflows often involve multiple steps where the output of one step becomes the input for the next. Without memory, the agent would lose track of what it's doing.

**Challenges addressed:**
- **State management:** Keeping track of which steps are completed
- **Data passing:** Carrying information from one step to another
- **Error recovery:** Remembering context if something fails and needs to be retried
- **Decision making:** Using results from earlier steps to inform later choices

**Example scenarios:**
- A data analysis agent that: (1) loads data, (2) cleans it, (3) performs analysis, (4) generates visualizations, (5) writes a report
- A customer onboarding agent that guides users through multiple forms and verification steps over several sessions
- A code generation agent that needs to remember the project structure, dependencies, and previous code it generated

**Technical consideration:** For very long workflows, you might combine memory types—using buffer memory for immediate task context and vector memory for accessing relevant information from earlier in the workflow.

---

### **Key Takeaway**

Memory isn't just a nice-to-have feature—it's fundamental to creating AI applications that feel intelligent and contextually aware. The choice of memory type depends on your specific use case: conversation length, token budget, need for long-term recall, and the importance of semantic versus chronological relevance. Understanding these trade-offs helps you build more efficient and effective LLM applications.

## ConversationBufferMemory

In [3]:
import os
from dotenv import load_dotenv

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory

load_dotenv()
assert os.getenv("OPENAI_API_KEY"), "OPENAI_API_KEY not found in env!"

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful tutor. Answer clearly."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

chain = prompt | llm

# In-memory store of sessions
_store = {}

def get_session_history(session_id: str):
    if session_id not in _store:
        _store[session_id] = ChatMessageHistory()
    return _store[session_id]

chat = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history",
)

# Helper to ask in one line
def ask(text, session_id="s1"):
    return chat.invoke({"input": text}, config={"configurable": {"session_id": session_id}}).content


## ConversationBufferMemory (Full history)

In [5]:
print(ask("My name is Keerthi."))
print(ask("What is my name?"))

Nice to meet you, Keerthi! How can I assist you today?
Your name is Keerthi. How can I help you further?


## ConversationBufferWindowMemory (Last K messages)

Meaning: keep only the last K turns (sliding window).

In [7]:
from langchain_openai import ChatOpenAI
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# 1) LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# 2) Prompt that expects "history"
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

chain = prompt | llm

# 3) Store histories per session
_store_window = {}

# 4) Helper: trim last k turns (k human+ai pairs = 2k messages)
def trim_history(history: InMemoryChatMessageHistory, k_turns: int):
    max_msgs = 2 * k_turns
    if len(history.messages) > max_msgs:
        history.messages = history.messages[-max_msgs:]

def get_window_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in _store_window:
        _store_window[session_id] = InMemoryChatMessageHistory()
    return _store_window[session_id]

# 5) Wrap chain with message history
chat_window = RunnableWithMessageHistory(
    chain,
    get_session_history=get_window_history,
    input_messages_key="input",
    history_messages_key="history"
)

# 6) Ask function with window trimming
def ask_window(text, session_id="w1", k_turns=2):
    result = chat_window.invoke(
        {"input": text},
        config={"configurable": {"session_id": session_id}}
    )
    # trim AFTER adding new messages
    trim_history(get_window_history(session_id), k_turns=k_turns)
    return result.content


# Demo
print(ask_window("Remember: I like cotton kurtis.", "w1", k_turns=2))
print(ask_window("Remember: My city is Bengaluru.", "w1", k_turns=2))
print(ask_window("Remember: My favorite color is maroon.", "w1", k_turns=2))

# Because k_turns=2, only the last 2 turns remain strongly in memory
print(ask_window("What do you remember about me?", "w1", k_turns=2))

# Inspect what's stored
print("\nStored messages count:", len(get_window_history("w1").messages))
for m in get_window_history("w1").messages:
    print(type(m).__name__, ":", m.content)


Got it! Cotton kurtis are a great choice for comfort and style. They come in various designs, colors, and patterns, making them versatile for different occasions. Do you have any specific styles or colors in mind that you like?
Got it! You're in Bengaluru. If you're looking for cotton kurtis, there are plenty of great local shops and markets in the city, as well as online options. Would you like recommendations for places to shop or tips on styling your kurtis?
Got it! Your favorite color is maroon. Maroon cotton kurtis can look stunning and are perfect for various occasions. If you need suggestions for styles, patterns, or where to find maroon kurtis in Bengaluru, just let me know!
I remember that you are in Bengaluru and that your favorite color is maroon. If there's anything else you'd like me to remember or if you have any specific questions or topics you'd like to discuss, feel free to let me know!

Stored messages count: 4
HumanMessage : Remember: My favorite color is maroon.
AIM

## ConversationTokenBufferMemory (Keep history until token budget)

Meaning: keep messages until a token limit is hit; then drop older ones.

Here’s a simple token-based truncation using tiktoken:

In [9]:
from langchain_openai import ChatOpenAI
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

import tiktoken

# 1) LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# 2) Prompt expects "history"
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

chain = prompt | llm

# 3) Store per session
_store_token = {}

def get_token_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in _store_token:
        _store_token[session_id] = InMemoryChatMessageHistory()
    return _store_token[session_id]

chat_token = RunnableWithMessageHistory(
    chain,
    get_session_history=get_token_history,
    input_messages_key="input",
    history_messages_key="history"
)

# 4) Token counting + trimming
enc = tiktoken.get_encoding("cl100k_base")  # good default for OpenAI-style tokenization

def count_tokens_in_history(history: InMemoryChatMessageHistory) -> int:
    # Very simple token estimate: sum tokens of message content
    return sum(len(enc.encode(m.content)) for m in history.messages)

def trim_to_token_budget(history: InMemoryChatMessageHistory, max_tokens: int):
    # Keep removing oldest messages until within budget
    while history.messages and count_tokens_in_history(history) > max_tokens:
        history.messages.pop(0)

# 5) Ask function
def ask_token(text, session_id="t1", max_tokens=120):
    result = chat_token.invoke(
        {"input": text},
        config={"configurable": {"session_id": session_id}}
    )
    # trim AFTER messages were added
    hist = get_token_history(session_id)
    trim_to_token_budget(hist, max_tokens=max_tokens)
    return result.content


# Demo (small budget so students can see truncation happening)
print(ask_token("My boutique name is Myntra and I sell kurtis.", "t1", max_tokens=120))
print(ask_token("We sell cotton, muslin, and party wear designs.", "t1", max_tokens=120))
print(ask_token("Price range is 1000 to 2000 and weekend discounts.", "t1", max_tokens=120))

print(ask_token("Summarize what you know about my boutique.", "t1", max_tokens=120))

# Inspect stored history
hist = get_token_history("t1")
print("\nStored messages:", len(hist.messages))
print("Token count:", count_tokens_in_history(hist))
for m in hist.messages:
    print(type(m).__name__, ":", m.content)


That's great! "Myntra" is a catchy name for a boutique, especially for selling kurtis. Here are a few ideas to help you promote your boutique and attract customers:

1. **Brand Identity**: Create a unique logo and color scheme that reflects the essence of your boutique. Consider using traditional motifs or modern designs that resonate with your target audience.

2. **Social Media Presence**: Utilize platforms like Instagram and Facebook to showcase your kurtis. Post high-quality images, styling tips, and customer testimonials. Engage with your audience through stories and live sessions.

3. **Website**: If you don’t have one already, consider creating a website where customers can browse your collection, learn about your brand, and make purchases online.

4. **Local Events**: Participate in local fashion shows, fairs, or markets to showcase your kurtis. This can help you reach a wider audience and build a community around your brand.

5. **Collaborations**: Partner with local influence

## VectorStore Memory (Semantic long-term memory)

Instead of sending full chat history, it stores messages as embeddings and retrieves only the most relevant past info. This is commonly done via VectorStoreRetrieverMemory

In [19]:
import os
from dotenv import load_dotenv

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.documents import Document

load_dotenv()
assert os.getenv("OPENAI_API_KEY"), "OPENAI_API_KEY not found in env!"

# 1) LLM + Embeddings
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
embeddings = OpenAIEmbeddings()

# 2) Create a Vector DB (FAISS) to store "memories"
vectorstore = FAISS.from_documents(
    [Document(page_content="(init)")],
    embedding=embeddings
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# 3) Functions: save memory + recall memory
def save_memory(text: str):
    vectorstore.add_documents([Document(page_content=text)])

def recall_memory(query: str) -> str:
    docs = retriever.invoke(query)
    return "\n".join([d.page_content for d in docs])

# 4) Prompt that uses recalled memory
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Use the MEMORY section to answer."),
    ("human", "MEMORY:\n{memory}\n\nQUESTION:\n{question}")
])

# 5) Store some "memories"
save_memory("Boutique name is Myntra.")
save_memory("Boutique is in Nagarabhavi, Bengaluru.")
save_memory("Main products: cotton kurtis, sarees, and muslin dress sets.")
save_memory("Customer budget range is 1000 to 2000 INR.")

# 6) Ask question: retrieve relevant memory, then answer with LLM
question = "What do you remember about my boutique and customer preferences?"
memory_text = recall_memory(question)

response = (prompt | llm).invoke({"memory": memory_text, "question": question})
print(response.content)


Your boutique, Myntra, is located in Nagarabhavi, Bengaluru. It specializes in cotton kurtis, sarees, and muslin dress sets. If you have any specific customer preferences or additional details you'd like to share, feel free to let me know!


In [2]:
import sys, numpy as np
print(sys.version)
print(np.__version__)

# pip uninstall -y faiss faiss-cpu
# conda install -c conda-forge faiss-cpu

3.12.7 | packaged by Anaconda, Inc. | (main, Oct  4 2024, 13:17:27) [MSC v.1929 64 bit (AMD64)]
1.26.4


## Custom Memory (you decide what to store + how to inject it)

Example: store only user profile fields (name, city, preference) and inject them into the prompt every time.

In [14]:
import os
from dotenv import load_dotenv

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory

load_dotenv()
assert os.getenv("OPENAI_API_KEY")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# ---- Custom memory store (structured) ----
profile_store = {}

def update_profile(session_id: str, user_text: str):
    p = profile_store.get(session_id, {"name": None, "city": None, "preference": None})

    t = user_text.lower()
    if "my name is" in t:
        p["name"] = user_text.split("is", 1)[-1].strip().strip(".")
    if "i live in" in t:
        p["city"] = user_text.split("in", 1)[-1].strip().strip(".")
    if "i like" in t:
        p["preference"] = user_text.split("like", 1)[-1].strip().strip(".")

    profile_store[session_id] = p

# Chat prompt that injects profile + chat history
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Use PROFILE + chat HISTORY."),
    ("system", "PROFILE: {profile}"),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

base_chain = prompt | llm

# store chat history (buffer memory)
history_store = {}
def get_history(session_id: str):
    if session_id not in history_store:
        history_store[session_id] = ChatMessageHistory()
    return history_store[session_id]

chat = RunnableWithMessageHistory(
    base_chain,
    get_history,
    input_messages_key="input",
    history_messages_key="history",
)

def ask(text: str, session_id="s1"):
    update_profile(session_id, text)  # <-- custom memory update
    profile = profile_store.get(session_id, {})
    resp = chat.invoke(
        {"input": text, "profile": profile},
        config={"configurable": {"session_id": session_id}}
    )
    return resp.content

print(ask("My name is Keerthi.", "s1"))
print(ask("I live in Bengaluru.", "s1"))
print(ask("I like cotton kurtis.", "s1"))
print(ask("Suggest 3 items for me based on my profile.", "s1"))

print("\nPROFILE STORE:", profile_store["s1"])


Nice to meet you, Keerthi! How can I assist you today?
Bengaluru is a vibrant city with a lot to offer! Do you have any specific interests or topics you'd like to discuss related to Bengaluru?
Cotton kurtis are a great choice, especially for the warm climate in Bengaluru! Do you have a favorite style or color when it comes to cotton kurtis? Or are you looking for recommendations on where to shop for them?
Based on your preference for cotton kurtis, here are three suggestions you might like:

1. **Floral Printed Cotton Kurti**: A knee-length kurti with vibrant floral prints can be perfect for casual outings or even office wear. Look for one with a comfortable fit and breathable fabric.

2. **Anarkali Style Cotton Kurti**: An Anarkali kurti with a flared bottom can be a stylish choice for festive occasions. Opt for one with intricate embroidery or mirror work for a more traditional look.

3. **Straight Cut Cotton Kurti with Side Slits**: A straight-cut kurti with side slits can be both c