# üß† LangChain Memory Types

## 1. Buffer Memory
**Definition:**  
Stores the entire conversation history in memory and passes all previous messages to the model each time a new response is generated.

**Key Points:**
- Keeps **all messages** (inputs and outputs).
- Can consume **a lot of tokens** if the conversation is long.
- Good for **short, context-rich** conversations.

**Example Use Case:**  
Chatbots that must remember everything said during the current session.

---

## 2. Buffer Window Memory
**Definition:**  
Stores **only the last N interactions (or turns)** instead of the entire conversation.

**Key Points:**
- Keeps a **sliding window** of recent messages.
- Reduces token usage compared to Buffer Memory.
- Good for **long conversations** where only recent context matters.

**Example Use Case:**  
Support bots or assistants where only the last few user requests are relevant.

---

## 3. Summary Memory
**Definition:**  
Instead of storing all messages, it creates and maintains a **summary** of the conversation so far.

**Key Points:**
- Uses a **language model to summarize** previous messages.
- Keeps memory **compact** and **token-efficient**.
- May **lose details**, since summaries can generalize.

**Example Use Case:**  
When maintaining long-term context with limited token budget.

---

## 4. Entity Memory
**Definition:**  
Tracks and remembers **specific entities** (people, places, things) mentioned in the conversation.

**Key Points:**
- Extracts entities and stores **facts or attributes** about them.
- Helps models remember **who is who** and **what was said** about each entity.
- Useful for **personalized or knowledge-based** chats.

**Example Use Case:**  
Virtual assistants that need to recall user preferences, names, or prior facts.


In [None]:
!pip install -U langchain-community
! pip install langchain

In [None]:
!pip install -q --upgrade transformers accelerate
! pip install sentencepiece

In [None]:
! pip install bitsandbytes


In [None]:
import transformers
print(transformers.__version__)


In [None]:
!pip install -U flash-attn --no-build-isolation


In [None]:
# ==============================
# Load environment variables
# ==============================
from dotenv import dotenv_values

# Load variables from app.env
env_vars = dotenv_values("app.env")
print("Loaded ENV variables:", env_vars)

# Extract required variables
openai_api_key = env_vars["OPENAI_API_KEY"]
openai_api_base = env_vars.get("OPENAI_API_BASE")
openai_api_name = env_vars["OPENAI_API_Name"]

In [None]:
# ==============================
# Create LangChain Chat Model
# ==============================
from langchain_openai import ChatOpenAI

chat = ChatOpenAI(
    model=openai_api_name,
    api_key=openai_api_key,
    base_url=openai_api_base,
    temperature=0.7,
    max_tokens=100
)

# Conversation Buffer

In [None]:
# new answer has relation to the previous cells
# ==============================

message_1 = "Can you list 3 places to visit in France?"
response_1 = chat.invoke(message_1)
print("AI Response:", response_1.content)
# ==============================

In [None]:
message_2 = "Which one is the most popular?"
response_2 = chat.invoke(message_2)
print("AI Response:", response_2.content)

In [None]:
!pip install -U langchain-core langchain-openai langchain-community python-dotenv

In [None]:
# Install latest packages if needed
%pip install -U langchain-core langchain-openai langchain-community python-dotenv

# Use existing `history` (InMemoryChatMessageHistory) and `conversation` (RunnableWithMessageHistory)
# defined in other cells. Use `session_id` from other cell (e.g. "session_1").

def send_message(message, session_id="session_1"):
    # Invoke the RunnableWithMessageHistory which will use the shared history
    response = conversation.invoke(
        {"input": [message]},
        config={"configurable": {"session_id": session_id}}
    )
    output = response["output"]

    # Ensure history has the entries (RunnableWithMessageHistory usually handles this,
    # but add explicitly if needed)
    try:
        history.add_user_message(message)
        history.add_ai_message(output)
    except Exception:
        pass

    return output

# Messages
message_1 = "Can you list 3 places in Paris to visit?"
message_2 = "Which place has a lake view?"

# Get responses
output_1 = send_message(message_1, session_id)
print("Response 1:", output_1)

output_2 = send_message(message_2, session_id)
print("Response 2:", output_2)

# Print conversation history
print("\nConversation History:")
for msg in history.messages:
    print(f"{msg.type}: {msg.content}")
    print('---------------')

In [None]:
# Install latest packages if needed

from langchain.memory import ConversationBufferMemory

# Initialize memory
memory = ConversationBufferMemory(return_messages=True)

# Function to send message and store in memory
def send_message(message):
    # Combine previous messages from memory
    previous_messages = "\n".join(
        [f"{m.type}: {m.content}" for m in memory.chat_memory.messages]
    )
    prompt = f"{previous_messages}\nHuman: {message}\nAI:"

    # Get model response
    response = llm.call_as_llm(prompt)
    
    # Add message and response to memory
    memory.chat_memory.add_user_message(message)
    memory.chat_memory.add_ai_message(response)

    return response

# Messages
message_1 = "Can you list 3 places in Paris to visit?"
message_2 = "Which place has a lake view?"

# Get responses
output_1 = send_message(message_1)
print("Response 1:", output_1)

output_2 = send_message(message_2)
print("Response 2:", output_2)

# Print conversation history
print("\nConversation History:")
for msg in memory.chat_memory.messages:
    print(f"{msg.type}: {msg.content}")
    print('---------------')


In [None]:
# ------------------------------
# Conversation Memory
# ------------------------------
history = ChatMessageHistory()

def get_session_history(session_id):
    return history

conversation = RunnableWithMessageHistory(
    runnable=chat,               
    get_session_history=get_session_history,
    input_messages_key="input",
    history_messages_key="history"
)

In [None]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.callbacks import get_openai_callback
from langchain_openai import ChatOpenAI  # ÿ™ÿ£ŸÉÿØ ŸÖŸÜ ÿ™ÿ´ÿ®Ÿäÿ™ langchain_openai

chat = ChatOpenAI(
    model=openai_api_name,
    api_key=openai_api_key,
    base_url=openai_api_base,
    temperature=0.7,
    max_tokens=100
)

# ------------------------------
# ÿ•ÿπÿØÿßÿØ ÿßŸÑÿ∞ÿßŸÉÿ±ÿ©
# ------------------------------
history = ChatMessageHistory()

conversation = RunnableWithMessageHistory(
    runnable=chat,                 # ÿßŸÑŸÄ LLM
    get_session_history=lambda session_id: history,  # ŸÖÿ®ÿßÿ¥ÿ±ÿ© ŸÑÿ™ÿßÿ±ŸäÿÆ
    input_messages_key="input",
    history_messages_key="history"
)

# ------------------------------
# ÿßŸÑŸÖÿ≠ÿßÿØÿ´ÿ©
# ------------------------------
from langchain_core.messages import HumanMessage

session_id = "session_1"

message_1 = "Can you list 3 places to visit in France?"
message_2 = "Which one is the most popular?"

with get_openai_callback() as cb:
    # ÿ™ŸÖÿ±Ÿäÿ± dict Ÿäÿ≠ÿ™ŸàŸä ÿπŸÑŸâ ÿßŸÑŸÖŸÅÿ™ÿßÿ≠ 'input'
    response_1 = conversation.invoke(
        {"input": [message_1]},
        config={"configurable": {"session_id": session_id}}
    )
    print("AI Response 1:", response_1["output"])

    response_2 = conversation.invoke(
        {"input": [message_2]},
        config={"configurable": {"session_id": session_id}}
    )
    print("AI Response 2:", response_2["output"])

# Conversation Buffer memory

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-3b"
save_path = "/kaggle/working/h2ogpt_model"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    trust_remote_code=True,
    device_map="auto"
)

# Move model to CPU for safe saving
model.to("cpu")

# Save with safetensors + shards
model.save_pretrained(save_path, safe_serialization=True, max_shard_size="2GB")
tokenizer.save_pretrained(save_path)


tokenizer_config.json:   0%|          | 0.00/729 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/534k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message


config.json:   0%|          | 0.00/775 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


pytorch_model.bin:   0%|          | 0.00/6.85G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/6.85G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

('/kaggle/working/h2ogpt_model/tokenizer_config.json',
 '/kaggle/working/h2ogpt_model/special_tokens_map.json',
 '/kaggle/working/h2ogpt_model/tokenizer.model',
 '/kaggle/working/h2ogpt_model/added_tokens.json')

In [None]:
from transformers  import AutoTokenizer , AutoModelForCausalLM, pipeline
from langchain.llms import HuggingFacePipeline

pipe=pipeline("text-generation",
              model=base_model,

tokenizer=tokenizer,
temperature=0.5

)

llms =HuggingFacePipeline(pipeline=pipe)


# Buffer Memory

## without  Buffer Memory

In [None]:
message_1 = "Can you list 3 places in Paris to visit?"

message_2 = "Which place has a lake view?"

print (llms(message_1))
print ()
print ("===========================================")
print ()
print (llms(message_2))

## using  Buffer Memory

In [None]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

conversation = ConversationChain(
    llm=llms,
    memory=ConversationBufferMemory()
)

message_1 = "Can you list 3 places in Paris to visit?"

message_2 = "Which place has a lake view?"

In [None]:
output_1 = conversation.predict(input=message_1)
print(output_1)

In [None]:
output_2 = conversation.predict(input=message_2)
print(output_2)

In [None]:
for msg in conversation.memory.chat_memory.messages:
    print(msg)
    print('---------------')

In [None]:
for msg in conversation.memory.chat_memory.messages:
    print(msg)
    print('---------------')

# Buffer Window Memory


In [None]:
from langchain.memory import ConversationBufferWindowMemory

conversation = ConversationChain(
    llm=llms,
    memory=ConversationBufferWindowMemory( k=1 ),
    verbose=True
)

In [None]:

message_1 = "Can you list 3 places in Paris to visit?"

message_2 = "Which place has a lake view?"

message_3 = "Can I visit in winter?"


In [None]:
print( conversation.predict(input=message_1) )

In [None]:
print( conversation.predict(input=message_2) )

In [None]:
print( conversation.predict(input=message_3) )

# Summary Memory

In [None]:
from langchain.memory import ConversationSummaryBufferMemory
conversation = ConversationChain(
    llm=llms,
    memory=ConversationSummaryBufferMemory(llm=llms), # # can used another model for summary
    verbose=True
)

message_1 = "Can you list 3 places in Paris to visit?"

message_2 = "Which place has a lake view?"

message_3 = "Can I visit in winter?"

In [None]:
print( conversation.predict(input=message_1) )

In [None]:
print( conversation.predict(input=message_2) )

In [None]:
print( conversation.predict(input=message_3) )

###  Notes: Why `ConversationSummaryMemory` Uses Many Tokens

- `ConversationSummaryMemory` summarizes previous messages using the LLM after each turn.  
- This causes **two LLM calls per message** ‚Üí one for generating a reply, and another for updating the summary.  
- Summarization consumes tokens because the model must "read" previous messages.  
- The generated summary is also added to each new prompt, increasing token count.  
- LangChain adds extra prompt text (e.g., ‚ÄúHere‚Äôs the summary so far‚Äù), which also uses tokens.  
  


#  Entity Mmory

In [None]:
from langchain.memory import ConversationEntityMemory
from langchain.memory.prompt import ENTITY_MEMORY_CONVERSATION_TEMPLATE

conversation = ConversationChain(
    llm=llms,
    memory=ConversationEntityMemory(llm=llms),
    prompt=ENTITY_MEMORY_CONVERSATION_TEMPLATE,
    verbose=True
)

In [None]:
print(conversation.predict(input="Can you list 3 places in Paris to visit?"))

In [None]:
from pprint import pprint
pprint(conversation.memory.entity_store.store)

# Save to File

In [None]:
from langchain.memory import ConversationSummaryMemory
from langchain.schema import messages_from_dict, messages_to_dict
import json

conversation = ConversationChain(
    llm=llms,
    memory=ConversationSummaryMemory(llm=llms),
    verbose=True
)

In [None]:
message_1 = "Can you list 3 places in Paris to visit?"
message_2 = "Which place has a lake view?"

output_1 = conversation.predict(input=message_1)
output_2 = conversation.predict(input=message_2)

In [None]:
from langchain.schema import message_to_dict, messages_from_dict
import json

dicts = [message_to_dict(m) for m in conversation.memory.chat_memory.messages]

with open("/content/conversation-memory.json", "w", encoding="utf-8") as dest:
    json.dump(dicts, dest, ensure_ascii=False, indent=2)


In [None]:
from langchain.memory import ChatMessageHistory

with open("/content/conversation-memory.json") as src:
    saved_history = json.loads(src.read())
history =ChatMessageHistory()
history.messages=messages_from_dict(saved_history)

In [None]:
memory =ConversationSummaryMemory()
memory = memory.from_messages(chat_memory=history, llm=llms)

memory = ConversationSummaryMemory(llm=llm)
memory = memory.from_messages(chat_memory=history, llm=llm)

conversation = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True
)

In [None]:
message_3 = "Can I visit in winter?"

print( conversation.predict(input=message_3) )