# Session 2 — Memory & Retrieval Basics in LangChain

## Focus: Add long-term context and ground answers with documents

In this session, we'll extend our FAQ chatbot to use more advanced memory techniques and implement a basic RAG (Retrieval Augmented Generation) system.

In [None]:
# Install required packages"
!pip install langchain faiss-cpu sentence-transformers

### 1. SummaryMemory: The Executive Assistant Analogy

Think of SummaryMemory like an executive assistant who takes detailed meeting notes but provides you with concise summaries. Instead of remembering every single word from a 2-hour meeting, your assistant gives you the key points, decisions made, and action items.

**How it works:**
- Stores a running summary of the conversation rather than verbatim text
- Uses the LLM itself to generate summaries of past interactions
- More efficient for long conversations as it doesn't grow linearly with chat length
- Maintains context without consuming excessive tokens

**When to use it:**
- Long conversations where full history would be too expensive
- When you need to maintain context over many interactions
- Situations where the gist of past conversations matters more than exact wording

In [2]:
from langchain.memory import ConversationSummaryMemory
from langchain.prompts import PromptTemplate

# Initialize OpenRouter LLM (using the free model from previous session)
from langchain_openai import ChatOpenAI
import os

llm = ChatOpenAI(
    model="meta-llama/llama-3.1-8b-instruct",
    openai_api_base="https://openrouter.ai/api/v1",
    openai_api_key='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',
    temperature=0.7,
    max_tokens=256,
)

# Initialize summary memory
summary_memory = ConversationSummaryMemory(
    llm=llm,
    memory_key="chat_history",
    return_messages=True
)

# Example usage
summary_memory.save_context(
    {"input": "What subscription plans do you offer?"},
    {"output": "We offer Basic ($10), Pro ($25), and Enterprise ($50) plans."}
)

summary_memory.save_context(
    {"input": "Which includes phone support?"},
    {"output": "Pro and Enterprise plans include phone support."}
)

# View the summarized memory
print("Summary Memory Content:")
print(summary_memory.load_memory_variables({})['chat_history'])

  summary_memory = ConversationSummaryMemory(


Summary Memory Content:
[SystemMessage(content='Current summary:\nThe human asks about subscription plans. The AI offers Basic ($10), Pro ($25), and Enterprise ($50) plans.\n\nNew lines of conversation:\nHuman: Which includes phone support?\nAI: Pro and Enterprise plans include phone support.\n\nNew summary:\nThe human asks about subscription plans. The AI offers Basic ($10), Pro ($25), and Enterprise ($50) plans. The Pro and Enterprise plans include phone support.', additional_kwargs={}, response_metadata={})]


### 2. VectorStoreRetrieverMemory: The Library Analogy

Imagine you're a researcher in a vast library. Instead of memorizing every book, you have a skilled librarian who can quickly find the most relevant books based on your current question. VectorStoreRetrieverMemory works exactly like this librarian.

**How it works:**
- Stores conversation snippets in a vector database
- Uses semantic similarity to find relevant past conversations
- Retrieves only the most contextually relevant memories
- Doesn't require remembering everything, just what's potentially useful

**Key components:**
- Embeddings: Convert text to numerical representations
- Vector store: Database optimized for similarity search
- Retriever: Algorithm that finds the most relevant memories

In [5]:
from langchain.memory import VectorStoreRetrieverMemory
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.schema import Document

# Initialize embeddings
embeddings = HuggingFaceEmbeddings(
    model_name="all-MiniLM-L6-v2"
)

# Create sample conversation memories
memory_documents = [
    Document(page_content="Customer asked about subscription plans. We offer Basic, Pro, Enterprise."),
    Document(page_content="Customer inquired about phone support. Available in Pro and Enterprise plans."),
    Document(page_content="Customer asked about system requirements. Minimum 4GB RAM, works on Windows, Mac, Linux."),
    Document(page_content="Customer requested refund information. 30-day money-back guarantee for all plans.")
]

# Create vector store
vectorstore = FAISS.from_documents(memory_documents, embeddings)

# Create retriever
retriever = vectorstore.as_retriever(search_kwargs=dict(k=2))

# Initialize vector memory
vector_memory = VectorStoreRetrieverMemory(
    retriever=retriever,
    memory_key="recent_context",
    return_docs=True
)

# Test the memory retrieval
relevant_memories = vector_memory.load_memory_variables(
    {"prompt": "What support options are available?"}
)
print("Relevant memories for support question:")
for doc in relevant_memories["recent_context"]:
    print(f"- {doc.page_content}")

Relevant memories for support question:
- Customer inquired about phone support. Available in Pro and Enterprise plans.
- Customer asked about subscription plans. We offer Basic, Pro, Enterprise.


  vector_memory = VectorStoreRetrieverMemory(


### 3. Embeddings + FAISS: The GPS for Text Analogy

Think of embeddings as a sophisticated GPS system for text. Just as a GPS converts physical addresses into coordinates that can be mathematically compared, embeddings convert text into numerical vectors that capture semantic meaning.

**Embeddings explained:**
- Transform text into high-dimensional vectors (typically 300-1000 dimensions)
- Semantically similar texts have similar vector representations
- Enable mathematical operations on text (e.g., "king" - "man" + "woman" ≈ "queen")

**FAISS (Facebook AI Similarity Search):**
- Specialized library for efficient similarity search in high-dimensional spaces
- Can quickly find the most similar vectors from millions of possibilities
- Optimized for speed and memory efficiency

**Why this combination matters:**
- Allows finding semantically related content without exact keyword matches
- Enables efficient retrieval from large knowledge bases
- Forms the foundation for modern retrieval systems

In [6]:
import numpy as np
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.schema import Document

# Initialize embeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Sample documents for our knowledge base
documents = [
    "Our company offers three subscription plans: Basic ($10/mo), Pro ($25/mo), and Enterprise ($50/mo)",
    "All plans include 24/7 email support. Pro and Enterprise include phone support.",
    "Enterprise plan includes a dedicated account manager and custom integrations.",
    "Our software works on Windows, Mac, and Linux systems with minimum 4GB RAM.",
    "Return policy: 30-day money-back guarantee for all plans.",
    "Setup process takes approximately 15 minutes with our guided installation wizard."
]

# Convert to Document objects
doc_objects = [Document(page_content=doc) for doc in documents]

# Create FAISS vector store
vectorstore = FAISS.from_documents(doc_objects, embeddings)

# Test similarity search
query = "What support options come with the Pro plan?"
similar_docs = vectorstore.similarity_search(query, k=2)

print(f"Query: {query}")
print("Most relevant documents:")
for i, doc in enumerate(similar_docs):
    print(f"{i+1}. {doc.page_content}")

Query: What support options come with the Pro plan?
Most relevant documents:
1. All plans include 24/7 email support. Pro and Enterprise include phone support.
2. Our company offers three subscription plans: Basic ($10/mo), Pro ($25/mo), and Enterprise ($50/mo)


### 4. RAG Workflow Fundamentals: The Research Assistant Analogy

Imagine you're writing a research paper. Instead of relying solely on your existing knowledge (like a standard LLM), you have a research assistant who:
1. Listens to your question
2. Goes to the library to find relevant sources
3. Brings back the most pertinent information
4. Helps you craft a well-informed answer

This is exactly what RAG (Retrieval Augmented Generation) does.

**RAG Components:**
1. **Retriever**: The "library goer" that finds relevant information
2. **Generator**: The "writer" that crafts the response
3. **Knowledge Base**: The "library" of information

**Why RAG matters:**
- Grounds responses in factual information
- Allows incorporating up-to-date knowledge
- Reduces hallucination by providing source material
- Enables domain-specific expertise without retraining models

In [8]:
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains import ConversationalRetrievalChain

# Create a simpler memory that works with RAG
memory = ConversationBufferWindowMemory(
    memory_key="chat_history",
    k=5,  # Keep last 5 exchanges
    return_messages=True,
    output_key='answer'
)

# Create a RAG chain that works with conversation memory
qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever(search_kwargs=dict(k=3)),
    memory=memory,
    return_source_documents=True,
    verbose=True
)

# Test the RAG system
response = qa_chain({"question": "What subscription plans do you offer and what support is included in each?"})
print("Answer:", response['answer'])
print("\nSource documents used:")
for i, doc in enumerate(response['source_documents']):
    print(f"{i+1}. {doc.page_content}")

  memory = ConversationBufferWindowMemory(




[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: Use the following pieces of context to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
Our company offers three subscription plans: Basic ($10/mo), Pro ($25/mo), and Enterprise ($50/mo)

All plans include 24/7 email support. Pro and Enterprise include phone support.

Enterprise plan includes a dedicated account manager and custom integrations.
Human: What subscription plans do you offer and what support is included in each?[0m

[1m> Finished chain.[0m

[1m> Finished chain.[0m
Answer: We offer three subscription plans: Basic ($10/mo), Pro ($25/mo), and Enterprise ($50/mo).

Here's a breakdown of the support included in each plan:

* All plans include 24/7 email support.
* Pro and Enterprise plans include phone support, in addition to email support.
* The

# Hands-On: Extend the FAQ bot to a mini-RAG over docs

In [11]:
# Let's create a simplified version that works with OpenRouter
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferWindowMemory
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.prompts import PromptTemplate
from langchain.schema import Document

class AdvancedFAQBot:
    def __init__(self, llm, knowledge_docs):
        self.llm = llm

        # Initialize embeddings and vector store
        self.embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
        self.vectorstore = FAISS.from_documents(knowledge_docs, self.embeddings)

        # Use simpler memory that works better with RAG
        self.memory = ConversationBufferWindowMemory(
            k=3,  # Keep last 3 exchanges
            memory_key="chat_history",
            return_messages=True
        )

        # Create comprehensive prompt
        self.prompt = PromptTemplate(
            template="""As a customer support agent, use the following information to answer the question:

Knowledge Base Context:
{relevant_context}

Conversation History:
{chat_history}

Question: {question}

Please provide a helpful, accurate answer based on the context provided. If the context doesn't contain the answer, politely say you don't know.

Answer:""",
            input_variables=["relevant_context", "chat_history", "question"]
        )

        # Create the chain
        self.chain = LLMChain(
            llm=llm,
            prompt=self.prompt,
            verbose=False
        )

    def ask(self, question):
        # Get relevant context from knowledge base using similarity search
        similar_docs = self.vectorstore.similarity_search(question, k=2)
        context_info = "\n".join([doc.page_content for doc in similar_docs])

        # Get conversation history
        history = self.memory.load_memory_variables({})["chat_history"]

        # Get response
        response = self.chain.run(
            relevant_context=context_info,
            chat_history=history,
            question=question
        )

        # Save to memory
        self.memory.save_context({"input": question}, {"output": response})

        return response

# Prepare knowledge documents
knowledge_documents = [
    Document(page_content="Subscription Plans: Basic ($10/month), Pro ($25/month), Enterprise ($50/month)"),
    Document(page_content="Support: All plans include email support. Phone support for Pro and Enterprise. Dedicated manager for Enterprise."),
    Document(page_content="System Requirements: Windows 10+, macOS 10.14+, Linux Ubuntu 16.04+. Minimum 4GB RAM."),
    Document(page_content="Refund Policy: 30-day money-back guarantee for all subscription plans."),
    Document(page_content="Setup: Average setup time 15 minutes. Guided installation wizard available."),
    Document(page_content="Integration: API access available for Pro and Enterprise plans. Webhooks supported."),
]

# Initialize the advanced bot
advanced_bot = AdvancedFAQBot(llm, knowledge_documents)

# Test conversation
questions = [
    "What are your subscription options?",
    "Which plan would you recommend for a small business?",
    "What if I need to get a refund?",
    "How difficult is the setup process?",
    "Do you offer API access?"
]

print("Advanced FAQ Bot Demonstration:")
print("=" * 50)

for question in questions:
    print(f"\nQ: {question}")
    answer = advanced_bot.ask(question)
    print(f"A: {answer}")
    print("-" * 50)

Advanced FAQ Bot Demonstration:

Q: What are your subscription options?
A: We offer three subscription plans to choose from: Basic, Pro, and Enterprise. Our Basic plan is $10/month, our Pro plan is $25/month, and our Enterprise plan is $50/month. If you're unsure which plan is right for you, I'd be happy to help you compare the features and benefits of each.
--------------------------------------------------

Q: Which plan would you recommend for a small business?
A: Based on our conversation, I would recommend the Pro plan for a small business. This plan offers phone support, which may be beneficial for small businesses that need immediate assistance, in addition to email support. The Pro plan also seems to offer a good balance between cost and features, as it's priced at $25/month, which is lower than the Enterprise plan but still offers additional support beyond the Basic plan. However, if you'd like a more detailed comparison of the features and benefits of each plan, I'd be happy 

# Case Study: Internal knowledge-base assistant for an SME

In [13]:
# Let's create a simplified version of SMEKnowledgeAssistant that works
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferWindowMemory
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.prompts import PromptTemplate
from langchain.schema import Document

class SMEKnowledgeAssistant:
    def __init__(self, llm, company_docs, product_docs, policy_docs):
        self.llm = llm
        self.embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

        # Create separate vector stores for different knowledge types
        self.company_store = FAISS.from_documents(company_docs, self.embeddings)
        self.product_store = FAISS.from_documents(product_docs, self.embeddings)
        self.policy_store = FAISS.from_documents(policy_docs, self.embeddings)

        # Use simpler memory
        self.memory = ConversationBufferWindowMemory(
            k=3,
            memory_key="history",
            return_messages=True
        )

        # Create specialized prompt
        self.prompt = PromptTemplate(
            template="""As an internal knowledge assistant for {company_name}, use the following information to answer the question.

Company Information:
{company_info}

Product Details:
{product_info}

Policies & Procedures:
{policy_info}

Conversation History:
{history}

Question: {question}

Please provide a comprehensive answer based on the available information. If information is missing, suggest where to find it or who to contact.

Answer:""",
            input_variables=["company_name", "company_info", "product_info", "policy_info", "history", "question"]
        )

        # Create chain without memory (we'll handle memory separately)
        self.chain = LLMChain(
            llm=llm,
            prompt=self.prompt,
            verbose=True
        )

    def query(self, question, company_name="our company"):
        # Retrieve relevant information from all knowledge bases
        company_info = self.company_store.similarity_search(question, k=2)
        product_info = self.product_store.similarity_search(question, k=2)
        policy_info = self.policy_store.similarity_search(question, k=2)

        # Format the information
        company_text = "\n".join([doc.page_content for doc in company_info])
        product_text = "\n".join([doc.page_content for doc in product_info])
        policy_text = "\n".join([doc.page_content for doc in policy_info])

        # Get conversation history
        history = self.memory.load_memory_variables({})["history"]

        # Get response
        response = self.chain.run(
            company_name=company_name,
            company_info=company_text,
            product_info=product_text,
            policy_info=policy_text,
            history=history,
            question=question
        )

        # Save to memory
        self.memory.save_context({"input": question}, {"output": response})

        return response

# Sample knowledge base documents for an SME
company_docs = [
    Document(page_content="Company founded in 2018. Headquarters in San Francisco. 50 employees globally."),
    Document(page_content="Mission: To simplify business operations through innovative software solutions."),
    Document(page_content="Departments: Engineering, Sales, Customer Support, Marketing, Operations."),
    Document(page_content="Key executives: CEO - Jane Smith, CTO - John Doe, CFO - Alice Johnson."),
]

product_docs = [
    Document(page_content="Flagship product: OperationsHub - workflow automation platform."),
    Document(page_content="DataSync tool: Real-time data integration across platforms."),
    Document(page_content="ReportBuilder: Custom analytics and reporting dashboard."),
    Document(page_content="Mobile app: iOS and Android apps available for all products."),
]

policy_docs = [
    Document(page_content="Vacation policy: 15 days PTO for new employees, increasing with tenure."),
    Document(page_content="Expense policy: Submit expenses within 30 days through Expensify system."),
    Document(page_content="Remote work: Hybrid model - 3 days in office, 2 days remote."),
    Document(page_content="IT policy: Company devices only for work. Regular security training required."),
]

# Initialize the SME assistant
sme_assistant = SMEKnowledgeAssistant(llm, company_docs, product_docs, policy_docs)

# Test various internal queries
internal_questions = [
    "What's our company's mission?",
    "What products do we offer?",
    "What's the vacation policy for employees?",
    "Who are our key executives?",
    "What's the remote work policy?"
]

print("SME Knowledge Assistant Demonstration:")
print("=" * 60)

for question in internal_questions:
    print(f"\nQ: {question}")
    answer = sme_assistant.query(question, "TechSolutions Inc")
    print(f"A: {answer}")
    print("-" * 60)

SME Knowledge Assistant Demonstration:

Q: What's our company's mission?


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mAs an internal knowledge assistant for TechSolutions Inc, use the following information to answer the question.

Company Information:
Mission: To simplify business operations through innovative software solutions.
Key executives: CEO - Jane Smith, CTO - John Doe, CFO - Alice Johnson.

Product Details:
Flagship product: OperationsHub - workflow automation platform.
ReportBuilder: Custom analytics and reporting dashboard.

Policies & Procedures:
Remote work: Hybrid model - 3 days in office, 2 days remote.
IT policy: Company devices only for work. Regular security training required.

Conversation History:
[]

Question: What's our company's mission?

Please provide a comprehensive answer based on the available information. If information is missing, suggest where to find it or who to contact.

Answer:[0m

[1m> Finished chain.[0m
A: Our

## Implementation Notes and Best Practices

### Memory Management Strategies
1. **Hybrid Approach**: Combine different memory types for optimal performance
2. **Memory Window**: Use buffer window memory for recent context + summary memory for long-term context
3. **Cost Consideration**: Summary memory uses additional LLM calls but saves tokens in long run

### RAG Optimization Tips
1. **Chunk Size**: Experiment with different document chunk sizes (200-500 words often works well)
2. **Retrieval Count**: Retrieve 3-5 documents for balance between context and token usage
3. **Relevance Threshold**: Implement score thresholds to filter out irrelevant documents

### Performance Considerations
1. **Embedding Model Choice**: Balance between quality and speed (all-MiniLM-L6-v2 is good for development)
2. **Vector Database**: FAISS is great for development; consider Pinecone or Weaviate for production
3. **Caching**: Implement caching for frequent queries to reduce API costs

### Error Handling
1. **Fallback Mechanisms**: Implement fallbacks when retrieval returns empty results
2. **Timeout Handling**: Set appropriate timeouts for vector search operations
3. **Rate Limiting**: Respect API rate limits, especially with free-tier models

### Evaluation Metrics
1. **Retrieval Accuracy**: How often retrieved documents are relevant to the query
2. **Answer Quality**: Human evaluation of response usefulness and accuracy
3. **Latency**: Response time from query to answer
4. **Cost Efficiency**: Token usage per conversation

This comprehensive approach provides a robust foundation for building knowledge-aware conversational agents that can scale from simple FAQ bots to sophisticated internal knowledge assistants.

In [None]:
# Additional configuration for OpenRouter models
# If you encounter issues with the default model, try these alternatives:

# Option 1: Google Gemini Pro (if available)
gemini_llm = ChatOpenAI(
    model="google/gemini-pro",
    openai_api_base="https://openrouter.ai/api/v1",
    openai_api_key=os.environ["OPENROUTER_API_KEY"],
    temperature=0.7,
    max_tokens=256,
)

# Option 2: Mistral 7B
mistral_llm = ChatOpenAI(
    model="mistralai/mistral-7b-instruct",
    openai_api_base="https://openrouter.ai/api/v1",
    openai_api_key=os.environ["OPENROUTER_API_KEY"],
    temperature=0.7,
    max_tokens=256,
)

# Option 3: Check available models (if none work)
import requests

def check_available_models():
    url = "https://openrouter.ai/api/v1/models"
    response = requests.get(url)
    if response.status_code == 200:
        models = response.json().get('data', [])
        free_models = [model for model in models if model.get('pricing', {}).get('prompt') == '0']
        print("Available free models:")
        for model in free_models:
            print(f"- {model['id']}")
    else:
        print("Error fetching models")

# Uncomment to check available models if needed
# check_available_models()