#Persistent Memory in Agents: Summarization & Context

#✅ Objective
Create an agent that:

  Summarizes a user’s ongoing conversation

  Stores the summary in a Vector Store (FAISS/Chroma) for long-term memory

  Retrieves relevant summaries to provide context in future interactions

  Uses Google Gemini API for LLM inference

  #📦 Prerequisites

In [5]:
!pip install langchain faiss-cpu google-generativeai tiktoken langchain-google-genai langchain-community

Collecting langchain-community
  Downloading langchain_community-0.3.25-py3-none-any.whl.metadata (2.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain-community)
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB

#🪄 Step-by-Step Code Walkthrough

In [7]:
# Step 1: Set up Gemini API and LangChain imports
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain.chains.summarize import load_summarize_chain
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
import os

# Set your Google Gemini API Key
os.environ["GOOGLE_API_KEY"] = "AIzaSyDqsEze47E4A8Xlm9WbGRTDYzpwY3VIrTA"

In [14]:
# Step 2: Initialize Gemini LLM
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=0.3)
embedding_model = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

In [15]:
# Step 3: Create sample chat logs (simulating a persistent conversation)
user_conversation_logs = """
User: Hi, I want to learn about solar power systems for my home.
Agent: Sure! Are you looking for off-grid or grid-connected systems?
User: Grid-connected. I want to reduce my electricity bill.
Agent: Great, I’ll explain how net metering works and how many panels you might need.
User: Please do. Also, what would be the cost estimate?
Agent: Based on a 3kW system, the cost can be approx ₹1.5 lakhs after subsidy.
"""


In [16]:
# Step 4: Split the chat into smaller chunks for summarization
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.create_documents([user_conversation_logs])


In [17]:
# Step 5: Summarize the chat using map-reduce chain
summary_chain = load_summarize_chain(llm, chain_type="map_reduce")
summary = summary_chain.run(docs)
print("Summary:\n", summary)


Summary:
 Home solar system inquiry resulted in a ₹1.5 lakh (post-subsidy) 3kW system cost estimate, including net metering explanation.


In [18]:
# Step 6: Create Document object and embed + store in FAISS
doc = Document(page_content=summary)

# Create vector store for memory
vector_store = FAISS.from_documents([doc], embedding_model)

# Save it for persistent use
vector_store.save_local("memory_store")


In [20]:
# Step 7: Later in a new session, load memory and retrieve context
vector_store = FAISS.load_local("memory_store", embedding_model, allow_dangerous_deserialization=True)

# Simulate a new user query
new_query = "Can you remind me what we discussed about the solar panel cost?"

# Get similar summaries from memory
relevant_docs = vector_store.similarity_search(new_query, k=2)

print("\nRelevant Memory Context:")
for i, d in enumerate(relevant_docs):
    print(f"\nMemory {i+1}:\n{d.page_content}")


Relevant Memory Context:

Memory 1:
Home solar system inquiry resulted in a ₹1.5 lakh (post-subsidy) 3kW system cost estimate, including net metering explanation.


In [21]:
# Step 8: Use the retrieved memory as context in a new LLM query
context = "\n\n".join([doc.page_content for doc in relevant_docs])
final_prompt = f"Based on previous memory:\n{context}\n\nAnswer this: {new_query}"

response = llm.invoke(final_prompt)
print("\nAgent Response:\n", response.content)



Agent Response:
 We discussed a ₹1.5 lakh (post-subsidy) cost estimate for a 3kW home solar system, which included net metering.  This was the final price after any applicable subsidies.


#🧩 What You Learned
How to use LangChain’s Summarization chain

How to store conversation summaries in Vector Store

How to retrieve context using similarity search

How to integrate Gemini API with LangChain agents

