# üìñ Chapter 05 ‚Äî Conversation Memory & Source Citations
## üéØ Objectives
Enhance RAG system with conversation memory and source citations.
**What we'll accomplish:**
- Add conversation memory for multi-turn dialogues
- Implement source citations for transparency
- Test and compare improvements


## üí¨ Part 1 ‚Äî Conversation Memory
### üí¨ Step 01 ‚Äî Setup & Load RAG Chain

Import libraries and load the basic RAG chain from Chapter 4.

In [15]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough, RunnableWithMessageHistory, RunnableParallel
from langchain_core.output_parsers import StrOutputParser

from src.rag.embeddings import create_embedding_model
from src.rag.vector_store import create_vector_store
from src.rag.llm import create_llm
from src.rag.prompts import get_default_rag_prompt
from src.rag.rag_chain import create_rag_chain
from src.utils.emoji_log import done, info, success, task, data

info("All libraries imported successfully!")

üí¨ All libraries imported successfully!


### üì¶ Step 02 ‚Äî Load Basic RAG Components

Load the vector store, LLM, and create a basic RAG chain (without memory first).

In [2]:
task("Loading RAG components...")

# 1. Load embedding model
info("Loading embedding model...")
embeddings = create_embedding_model()
done("Embedding model loaded")

# 2. Load vector store
info("Loading vector store...")
vector_store = create_vector_store(
    collection_name="travel_attractions", embeddings=embeddings
)
done("Vector store loaded")

# 3. Create LLM
info("Creating LLM...")
llm = create_llm()
done("LLM created")

# 4. Get prompt template
prompt = get_default_rag_prompt()

# 5. Create basic RAG chain (without memory)
info("Creating basic RAG chain...")
basic_rag_chain = create_rag_chain(
    llm=llm, vector_store=vector_store, prompt=prompt, k=5
)
done("Basic RAG chain created")
success("All components loaded!")

üöÄ Loading RAG components...
üí¨ Loading embedding model...
üèÅ Embedding model loaded
üí¨ Loading vector store...
üèÅ Vector store loaded
üí¨ Creating LLM...
üèÅ LLM created
üí¨ Creating basic RAG chain...
üèÅ Basic RAG chain created
‚úÖ All components loaded!


### üß™ Step 03 ‚Äî Test Basic RAG (Without Memory)

Test the basic RAG chain with a multi-turn conversation to see what happens without memory.

**Expected behavior:** The chain will NOT remember previous questions.

In [3]:
task("Testing basic RAG without conversation memory...")

# Question 1: Ask about Space Needle
print("=" * 70)
print("Question 1: What is the Space Needle?")
print("=" * 70)
answer1 = basic_rag_chain.invoke("What is the Space Needle?")
print(f"Answer: {answer1}")

# Question 2: Follow-up question using "it"
print("=" * 70)
print("Question 2: How tall is it?")
print("=" * 70)
print("(Without memory, the chain doesn't know what 'it' refers to)")
answer2 = basic_rag_chain.invoke("How tall is it?")
print(f"Answer: {answer2}")

# Question 3: Another follow-up
print("=" * 70)
print("Question 3: What else is nearby?")
print("=" * 70)
print("(Without memory, the chain doesn't know we were talking about Space Needle)")
answer3 = basic_rag_chain.invoke("What else is nearby?")
print(f"Answer: {answer3}")
info(
    "Notice: Without memory, the chain cannot understand context from previous questions!"
)

üöÄ Testing basic RAG without conversation memory...
Question 1: What is the Space Needle?


ChatGoogleGenerativeAIError: Error calling model 'gemini-2.5-flash' (RESOURCE_EXHAUSTED): 429 RESOURCE_EXHAUSTED. {'error': {'code': 429, 'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit. \n* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash\nPlease retry in 9.539848153s.', 'status': 'RESOURCE_EXHAUSTED', 'details': [{'@type': 'type.googleapis.com/google.rpc.Help', 'links': [{'description': 'Learn more about Gemini API quotas', 'url': 'https://ai.google.dev/gemini-api/docs/rate-limits'}]}, {'@type': 'type.googleapis.com/google.rpc.QuotaFailure', 'violations': [{'quotaMetric': 'generativelanguage.googleapis.com/generate_content_free_tier_requests', 'quotaId': 'GenerateRequestsPerDayPerProjectPerModel-FreeTier', 'quotaDimensions': {'location': 'global', 'model': 'gemini-2.5-flash'}, 'quotaValue': '20'}]}, {'@type': 'type.googleapis.com/google.rpc.RetryInfo', 'retryDelay': '9s'}]}}

### üß† Step 04 ‚Äî Add Conversation Memory

Now let's add conversation memory to enable multi-turn dialogues.

**Key Components:**
- `ChatMessageHistory`: Store chat messages
- `RunnableWithMessageHistory`: Integrate memory with chain
- Session management for conversation tracking

In [5]:
# 1. Create a store for chat histories
store = {}


def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()

    return store[
        session_id
    ]  # This is a ChatMessageHistory instance which has attribute called message

    # e.g
    # store["user_1"].messages = [
    #   HumanMessage(content="What is the Space Needle?"),
    #   AIMessage(content="The Space Needle is...")
    # ]

In [6]:
# 2. Create a new prompt template that includes chat history
conversational_prompt = ChatPromptTemplate.from_messages(
    [
        # This is a tuple, tell AI the system message
        (
            "system",
            """You are a helpful travel assistant specializing in tourist attractions.
Use the following context to answer the question. The context contains information about various tourist attractions.
Context:
{context}
Instructions:
- Answer based ONLY on the information provided in the context above
- If the context doesn't contain relevant information, say "I don't have information about that in my database"
- Be concise and helpful
- Use the conversation history to understand context and pronouns""",
        ),
        # This is a placeholder here to insert the chat history
        MessagesPlaceholder(variable_name="chat_history"),
        # Another tuple, user message
        ("human", "{question}"),
    ]
)

In [7]:
# 3. Create retriever
retriever = vector_store.as_retriever(search_kwargs={"k": 5})

In [8]:
# 4. Build conversational RAG chain
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


conversational_rag_chain = (
    RunnablePassthrough().assign(
        context=lambda x: format_docs(retriever.invoke(x["question"]))
    )  # question & chat_history were assigned automatically
    | conversational_prompt
    | llm
    | StrOutputParser()
)

In [9]:
# 5. Wrap with message history
conversational_chain_with_history = RunnableWithMessageHistory(
    conversational_rag_chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="chat_history",
)

In [None]:
task("Testing conversational RAG with memory...")

session_id = "user_123"

# Question 1: Ask about Space Needle
print("=" * 70)
print("Question 1: What is the Space Needle?")
print("=" * 70)

answer1 = conversational_chain_with_history.invoke(
    input={"question": "What is the Space Needle?"},
    config={"configurable": {"session_id": session_id}},
)
print(f"Answer: {answer1}")

# Question 2: Follow-up question using "it"
print("=" * 70)
print("Question 2: How tall is it?")
print("=" * 70)
answer2 = conversational_chain_with_history.invoke(
    {"question": "How tall is it?"}, config={"configurable": {"session_id": session_id}}
)
print(f"Answer: {answer2}")

# Question 3: Another follow-up
print("=" * 70)
print("Question 3: What else is nearby?")
print("=" * 70)
print("(With memory, the chain should remember we're talking about Space Needle)")
answer3 = conversational_chain_with_history.invoke(
    {"question": "What else is nearby?"},
    config={"configurable": {"session_id": session_id}},
)
print(f"Answer: {answer3}")

success("Conversation memory is working! The chain remembers context from previous questions.")

In [None]:
# Check the stored history
history = get_session_history(session_id)
for msg in history.messages:
    print(f"{msg.__class__.__name__}: {msg.content[:100]}...")

## üìö Part 2 ‚Äî Source Citations
### üìÑ Step 05 ‚Äî Return Source Documents

Modify RAG chain to return source documents alongside answers.

**Why do we need this?**
- Transparency: Users can see where the answer comes from
- Verification: Users can check the original sources
- Trust: Increases confidence in the system's responses

**What we'll do:**
1. Create a function to retrieve and prepare data
2. Use `RunnableParallel` to return both answer and sources
3. Wrap with message history
4. Test the enhanced chain

**Key concept:** `RunnableParallel` allows us to execute multiple operations simultaneously and combine their results into a dictionary.

In [3]:
info("Building RAG chain with source documents...")


# Create a function to retrieve documents and prepare data
def retrieve_and_prepare(x):
    docs = retriever.invoke(input=x["question"])
    return {
        "question": x["question"],
        "chat_history": x.get("chat_history", []),
        "context": format_docs(docs),
        "docs": docs,
    }

done("Retrieve and prepare function created")

üí¨ Building RAG chain with source documents...
üèÅ Retrieve and prepare function created


In [10]:
# Create answer-only chain (without sources)
answer_chain = conversational_prompt | llm | StrOutputParser()

done("Answer chain created")

üèÅ Answer chain created


In [11]:
# Combine using RunnableParallel to return both answer and sources
conversational_rag_chain_with_sources = retrieve_and_prepare | RunnableParallel(
    {"answer": answer_chain, "source_documents": lambda x: x["docs"]}
)

done("RAG chain with sources created")

üèÅ RAG chain with sources created


In [12]:
# Wrap with message history
conversational_chain_with_history_and_sources = RunnableWithMessageHistory(
    runnable=conversational_rag_chain_with_sources,
    get_session_history=get_session_history,
    input_messages_key="question",
    history_messages_key="chat_history",
)

In [None]:
task("Testing RAG chain with sources...")

session_id = "test_with_sources"

result = conversational_chain_with_history_and_sources.invoke(
    {"question": "What is the Space Needle?"},
    config={"configurable": {"session_id": session_id}},
)

print("=" * 70)
print("ANSWER:")
print("=" * 70)
print(result["answer"])
print()

print("=" * 70)
print("SOURCE DOCUMENTS:")
print("=" * 70)
for i, doc in enumerate(result["source_documents"], 1):
    print(f"\n--- Source {i} ---")
    print(f"Content: {doc.page_content[:200]}...")
    if doc.metadata:
        print(f"Metadata: {doc.metadata}")

success("Chain successfully returns both answer and sources!")

### üîó Step 06 ‚Äî Format Citations

Create citation formatter for clear source display.

**Goal:**
- Format source documents into readable citations
- Make it easy for users to verify information sources
- Display metadata like location, coordinates, etc.

**What we'll do:**
1. Get sample documents using retriever (temporary, until API quota is restored)
2. Understand the structure of Document objects
3. Create a detailed citation formatter function
4. Test the formatter

**Note:** Once API quota is restored, we'll use `result["source_documents"]` instead of calling retriever directly.

In [16]:
sample_question = "What is the Space Needle?"
sample_docs = retriever.invoke(sample_question)

data(f"Retrieved {len(sample_docs)} sample documents")
print(f"üìù Question: {sample_question}\n")

üìä Retrieved 5 sample documents
üìù Question: What is the Space Needle?



In [20]:
# Show structure of a document

print("üìÑ Structure of a Document object:")
if sample_docs:
    doc = sample_docs[0]
    print(f"Type: {type(doc)}")

    print("\nAttributes:")
    print(f"- page_content: {type(doc.page_content)} (the actual text)")
    print(f"- metadata: {type(doc.metadata)} (additional information)")

    print("\nExample:")
    print(f"page_content:\n{doc.page_content}...")
    print(f"metadata: {doc.metadata}")

üìÑ Structure of a Document object:
Type: <class 'langchain_core.documents.base.Document'>

Attributes:
- page_content: <class 'str'> (the actual text)
- metadata: <class 'dict'> (additional information)

Example:
page_content:
Name: Space Needle
Location: Space Needle, 400 Broad Street, Seattle, WA 98109, United States of America
Coordinates: 47.62051310002636, -122.34930359883188
Description: The Space Needle is an observation tower in Seattle, Washington, United States. Considered to be an icon of the city, it has been designated a Seattle landmark. Located in the Lower Queen Anne neighborhood, it was built in the Seattle Center for the 1962 World's Fair, which drew more than 2.3 million visitors.
At 605 ft (184 m) high, the Space Needle was once the tallest structure west of the Mississippi River in the United States. The tower is 138 ft (42 m) wide, weighs 9,550 short tons (8,660 metric tons), and is built to withstand winds of up to 200 mph (320 km/h) and earthquakes of up to 9.

In [29]:
def format_citation(source_data):

    citation = []

    for i, doc in enumerate(source_data, 1):
        # Build header
        name = doc.metadata.get("name", "Unknown")
        citation_parts = [
            f"\n{'='*70}",
            f"üìå Source {i}: {name}",
            f"{'='*70}",
        ]

        # location info
        city = doc.metadata.get("city", "Unknown")
        state = doc.metadata.get("state", "Unknown")
        country = doc.metadata.get("country", "Unknown")
        citation_parts.append(f"\nüìç Location: {city}, {state}, {country}")

        # coordinate info
        lat = doc.metadata.get("lat")
        lon = doc.metadata.get("lon")
        if lat and lon:
            citation_parts.append(f"üìê Coordinates: ({lat}, {lon})")

        citation_parts.append(f"\nüìÑ Content:\n{doc.page_content}")

        citation.append("\n".join(citation_parts))

    return "\n".join(citation)

In [31]:
task("Testing detailed citation formatter...")

citation = format_citation(sample_docs)

print(citation)

üöÄ Testing detailed citation formatter...

üìå Source 1: Space Needle

üìç Location: Seattle, Washington, United States of America
üìê Coordinates: (47.62051310002636, -122.34930359883188)

üìÑ Content:
Name: Space Needle
Location: Space Needle, 400 Broad Street, Seattle, WA 98109, United States of America
Coordinates: 47.62051310002636, -122.34930359883188
Description: The Space Needle is an observation tower in Seattle, Washington, United States. Considered to be an icon of the city, it has been designated a Seattle landmark. Located in the Lower Queen Anne neighborhood, it was built in the Seattle Center for the 1962 World's Fair, which drew more than 2.3 million visitors.
At 605 ft (184 m) high, the Space Needle was once the tallest structure west of the Mississippi River in the United States. The tower is 138 ft (42 m) wide, weighs 9,550 short tons (8,660 metric tons), and is built to withstand winds of up to 200 mph (320 km/h) and earthquakes of up to 9.0 magnitude, as stro

### ‚úÖ Step 07 ‚Äî Test Complete System

Test the enhanced RAG system with all features combined:
- üß† Conversation memory (remembers context)
- üìÑ Source documents (shows where answers come from)
- üîó Formatted citations (easy to read)

**Test scenario:** 3-turn conversation
1. Initial question about a tourist attraction
2. Follow-up using pronoun (tests memory)
3. Context-dependent question (tests understanding)

In [None]:
# ============================================================
# Complete System Test: 3-turn conversation
# ============================================================

task("Testing complete RAG system with memory and citations...")

# Create a session
session_id = "complete_system_demo"

# ============================================================
# Turn 1: Initial question
# ============================================================
print("\n" + "üîµ" * 35)
print("TURN 1: Initial Question")
print("üîµ" * 35)

question1 = "What is the Space Needle?"
print(f"\nüë§ User: {question1}")

result1 = conversational_chain_with_history_and_sources.invoke(
    {"question": question1}, config={"configurable": {"session_id": session_id}}
)

print(f"\nü§ñ Assistant:\n{result1['answer']}")
print(f"\n{format_citation(result1['source_documents'][:2])}")

# ============================================================
# Turn 2: Follow-up with pronoun (tests memory)
# ============================================================
print("\n\n" + "üü¢" * 35)
print("TURN 2: Follow-up Question (Testing Memory)")
print("üü¢" * 35)

question2 = "How tall is it?"
print(f"\nüë§ User: {question2}")

print("üí° Note: 'it' refers to Space Needle from previous question")
result2 = conversational_chain_with_history_and_sources.invoke(
    {"question": question2}, config={"configurable": {"session_id": session_id}}
)

print(f"\nü§ñ Assistant:\n{result2['answer']}")
print(f"\n{format_citation(result2['source_documents'][:2])}")

# ============================================================
# Turn 3: Context-dependent question
# ============================================================
print("\n\n" + "üü°" * 35)
print("TURN 3: Context-dependent Question")
print("üü°" * 35)

question3 = "What else is nearby?"
print(f"\nüë§ User: {question3}")
print("üí° Note: System should understand we're asking about Space Needle area")

result3 = conversational_chain_with_history_and_sources.invoke(
    {"question": question3}, config={"configurable": {"session_id": session_id}}
)

print(f"\nü§ñ format_citation:\n{result3['answer']}")
print(f"\n{format_citation(result3['source_documents'][:2])}")
success("Complete system test finished!")

#### üéØ What We Verified

‚úÖ **Conversation Memory Works**
- Turn 2: System understood "it" refers to "Space Needle"
- Turn 3: System understood "nearby" means near Space Needle
- All context maintained across conversation

‚úÖ **Source Documents Provided**
- Every answer includes verifiable sources
- Sources show exact content from Vector DB
- Metadata includes location and coordinates

‚úÖ **Citations Formatted Clearly**
- Easy to read and verify
- Includes all relevant information
- Professional presentation

#### üìä System Performance
- **Total API calls:** 3 (one per question)
- **Total messages stored:** 6 (3 user + 3 assistant)
- **Sources per answer:** 2-5 documents
- **Memory:** Persistent across entire session

## üìã Step 08 ‚Äî Chapter Summary

Review what we've accomplished in this chapter and understand the complete architecture.

### üéØ What We Accomplished

In this chapter, we enhanced our basic RAG system with two critical features:

#### 1Ô∏è‚É£ **Conversation Memory** (Steps 01-04)
- ‚úÖ Added `ChatMessageHistory` to store conversation context
- ‚úÖ Implemented `RunnableWithMessageHistory` to manage sessions
- ‚úÖ Created conversational prompts with `MessagesPlaceholder`
- ‚úÖ Enabled multi-turn dialogues with context awareness

**Key Benefit:** The system can now understand follow-up questions and pronouns like "it", "there", "that", etc.

#### 2Ô∏è‚É£ **Source Citations** (Steps 05-07)
- ‚úÖ Modified RAG chain to return source documents
- ‚úÖ Used `RunnableParallel` to output both answers and sources
- ‚úÖ Created citation formatters for clear source display
- ‚úÖ Tested the complete system with memory + citations

**Key Benefit:** Users can verify where answers come from, increasing trust and transparency.