# üìñ Chapter 04 ‚Äî RAG Pipeline with LLM

## üéØ Objectives

In this chapter, we will integrate Google Gemini LLM with our vector database to create a complete RAG question-answering system.

**What we'll accomplish:**

- Set up Google Gemini 2.5 Flash LLM

- Load vector store from Chapter 3

- Build RAG retrieval chain

- Design effective prompt templates

- Implement question-answering functionality

- Add conversation memory

- Test and evaluate response quality

## üì¶ Step 01 ‚Äî Import Libraries

Import necessary libraries for LangChain RAG pipeline and Gemini LLM.

In [7]:
import os
from pathlib import Path

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

from langchain_google_genai import ChatGoogleGenerativeAI

from src.rag.embeddings import create_embedding_model
from src.rag.vector_store import create_vector_store, search_similar_document
from src.config import CHROMA_DB_DIR
from src.utils.emoji_log import done, info, success, task, error, data

from dotenv import load_dotenv

load_dotenv()

info("All libraries imported successfully!")

üí¨ All libraries imported successfully!


## ü§ñ Step 02 ‚Äî Initialize Gemini LLM

Set up Google Gemini 2.5 Flash model with appropriate parameters.

**Key Parameters:**
- `model`: gemini-2.0-flash-exp (latest experimental model)
- `temperature`: Controls randomness (0 = deterministic, 1 = creative)
- `max_tokens`: Maximum response length

In [2]:
task("Initializing Gemini 2.5 Flash...")

# Check API
api_key = os.getenv("GOOGLE_API_KEY")
if not api_key:
    raise ValueError(error("GOOGLE_API_KEY not found in environment variables!"))

# llm instance
llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash", api_key=api_key, temperature=0.7, max_tokens=1024
)

done("Gemini LLM initialized!")

üöÄ Initializing Gemini 2.5 Flash...
üèÅ Gemini LLM initialized!


## üß™ Step 03 ‚Äî Test LLM Connection

Verify LLM is working with simple test prompts.

**Purpose:**
- Confirm API key is valid
- Test basic LLM functionality
- Check response format

In [None]:
task("Testing Gemini LLM connection...")

info("Test 1: Simple greeting")
response_1 = llm.invoke("Say hello!")
print(f"Response: {response_1.content}")

info("Test 2: General travel question")
response_2 = llm.invoke("What are some popular tourist attractions in Seattle?")
print(f"Response: {response_2.content}")

info("Response structure")
print(f"Response_2: {response_2}")
print(f"Response type: {type(response_2)}")
print(f"Content type: {type(response_2.content)}")
print(f"Content length: {len(response_2.content)} characters")

success("LLM connection test passed!")
print(
    "Note: The LLM can answer general questions, but without RAG it doesn't have access to our specific database."
)

üöÄ Testing Gemini LLM connection...
üí¨ Test 1: Simple greeting
Response: Hello!
üí¨ Test 2: General travel question
Response: Seattle is a fantastic city with a diverse range of attractions, from iconic landmarks to quirky neighborhoods. Here are some of the most popular tourist attractions:

1.  **Space Needle:** The quintessential Seattle icon. Ride the elevator to the top for breathtaking panoramic views of the city skyline, Puget Sound, Mount Rainier, and the Olympic and Cascade Mountains. It's especially stunning at sunset.

2.  **Chihuly Garden and Glass:** Located right next to the Space Needle in Seattle Center, this museum showcases the stunning glass artwork of Dale Chihuly. The vibrant colors and intricate designs, especially in the Glasshouse and the outdoor garden, are truly mesmerizing.

3.  **Museum of Pop Culture (MoPOP):** Also in Seattle Center, MoPOP is a vibrant and interactive museum dedicated to contemporary pop culture. It features exhibits on music (Nirvana

## üíæ Step 04 ‚Äî Load Vector Store

Load the ChromaDB vector store created in Chapter 3.

**What we're loading:**
- Embedding model (all-MiniLM-L6-v2)
- ChromaDB collection (travel_attractions)
- All documents with 384-dim embeddings

In [3]:
task("Loading vector store from Chapter 3...")

info("Creating embedding model...")
embeddings = create_embedding_model()
done("Embedding model created")

üöÄ Loading vector store from Chapter 3...
üí¨ Creating embedding model...
üèÅ Embedding model created


In [4]:
info("Loading ChromaDB vector store...")
vector_store = create_vector_store(
    collection_name="travel_attractions", embeddings=embeddings
)
done("Vector store loaded")

üí¨ Loading ChromaDB vector store...
üèÅ Vector store loaded


In [5]:
info("Verifying vector store...")
test_results = search_similar_document(
    vector_store=vector_store, k=3, query="Top attractions in Seattle"
)

success("Vector store loaded!")
print(f"üìä Test search: {len(test_results)} results")

print("üîç Top 3:")
for i, doc in enumerate(test_results, 1):
    print(f"{i}. {doc.metadata.get('name')} ({doc.metadata.get('city')})")

success("Ready for RAG!")

üí¨ Verifying vector store...
‚úÖ Vector store loaded!
üìä Test search: 3 results
üîç Top 3:
1. Seattle Center (Seattle)
2. Pike Place Market (Seattle)
3. Large Lock (Seattle)
‚úÖ Ready for RAG!


## üîç Step 05 ‚Äî Create Retriever

Configure retriever with search parameters (k value, search type).

**Key Parameters:**
- `search_type`: Type of search (similarity, mmr, similarity_score_threshold)
- `k`: Number of documents to retrieve## üîç Step 05 ‚Äî Create Retriever

Configure retriever with search parameters (k value, search type).

In [6]:
task("Creating retriever from vector store...")

retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 5})

done("Retriever created!")

üöÄ Creating retriever from vector store...
üèÅ Retriever created!


In [14]:
info("Testing retriever with a query...")
print()
test_query = "What are some famous museums in Seattle?"

retrieved_docs = retriever.invoke(test_query)

print("Retrieved documents structure:")
print(f"{retrieved_docs[0]}")

print()

print(f"üîç Query: '{test_query}'")
data(f"Retrieved {len(retrieved_docs)} documents\n")

for i, doc in enumerate(retrieved_docs, 1):
    print(f"{i}. {doc.metadata.get('name', 'Unknown')}")
    print(f"City: {doc.metadata.get('city', 'Unknown')}")
    print(f"Preview: {doc.page_content[:100]}...")
    print()

success("Retriever is working correctly!")

üí¨ Testing retriever with a query...

Retrieved documents structure:
page_content='Name: Seattle Center
Location: Seattle Center, Belltown, Seattle, Washington, United States of America
Coordinates: 47.62156465002613, -122.35154202042389
Description: The Seattle Center is an entertainment, education, tourism and performing arts center located in the Lower Queen Anne neighborhood of Seattle, Washington, United States. Constructed for the 1962 World's Fair, the Seattle Center's landmark feature is the 605 ft (184 m) Space Needle, an official city landmark and globally recognized symbol of Seattle's skyline. Other notable attractions include Pacific Science Center, Climate Pledge Arena, and the Museum of Pop Culture (MoPOP), as well as McCaw Hall, which hosts both Seattle Opera and Pacific Northwest Ballet. The Seattle Center Monorail provides regular public transit service between the Seattle Center and Westlake Center in downtown Seattle, and is itself considered a tourist attraction.

## üìù Step 06 ‚Äî Design Prompt Template

Create effective prompt template for RAG question-answering.

**Prompt Template Components:**
- System role: Define the assistant's behavior
- Context: Retrieved documents from vector store
- Question: User's query
- Instructions: How to use the context

**Best Practices:**
- Be specific about the assistant's role
- Instruct to use only provided context
- Handle cases where context doesn't contain the answer

In [10]:
task("Designing RAG prompt template...")

template = """You are a helpful travel assistant specializing in tourist attractions.
Use the following context to answer the question. The context contains information about various tourist attractions including their names, locations, and descriptions.
Context:
{context}
Question: {question}
Instructions:
- Answer based ONLY on the information provided in the context above
- If the context doesn't contain relevant information, say "I don't have information about that in my database"
- Be concise and helpful
- Include specific attraction names and locations when relevant
Answer:"""

prompt = PromptTemplate.from_template(template=template)

done("Prompt template created!")

info("Prompt Template Structure:")
print("=" * 70)
print(template)
print("=" * 70)
success("Prompt template is ready for RAG chain!")

üöÄ Designing RAG prompt template...
üèÅ Prompt template created!
üí¨ Prompt Template Structure:
You are a helpful travel assistant specializing in tourist attractions.
Use the following context to answer the question. The context contains information about various tourist attractions including their names, locations, and descriptions.
Context:
{context}
Question: {question}
Instructions:
- Answer based ONLY on the information provided in the context above
- If the context doesn't contain relevant information, say "I don't have information about that in my database"
- Be concise and helpful
- Include specific attraction names and locations when relevant
Answer:
‚úÖ Prompt template is ready for RAG chain!


## üîó Step 07 ‚Äî Build RAG Chain

Assemble the complete RAG pipeline using LangChain.

**RAG Chain Components:**
1. **Retriever**: Fetch relevant documents from vector store
2. **Format**: Combine documents into context
3. **Prompt**: Insert context and question into template
4. **LLM**: Generate answer using Gemini
5. **Parser**: Extract clean text response

**LangChain Expression Language (LCEL):**
- Uses `|` operator to chain components
- Data flows from left to right
- Each component transforms the data

In [11]:
task("Building RAG chain...")


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

done("RAG chain built!")

üöÄ Building RAG chain...
üèÅ RAG chain built!


## ‚ùì Step 08 ‚Äî Test Basic Q&A

Test basic question-answering functionality with simple queries.

**Test Strategy:**
- Start with simple, direct questions
- Observe how RAG retrieves and uses context
- Check if answers are based on retrieved documents

In [15]:
task("Testing RAG chain with basic questions...")

info("Test 1: Asking about a specific attraction")
question1 = "What is the Space Needle?"

print(f"Question: {question1}")
print("Generating answer...")

answer1 = rag_chain.invoke(question1)
print(f"Answer: {answer1}")
print("=" * 70 + "\n")

info("Test 2: General travel question")
question2 = "What are some popular tourist attractions in Seattle?"
print(f"Question: {question2}")
print("Generating answer...")
answer2 = rag_chain.invoke(question2)
print(f"Answer: {answer2}")
print("=" * 70 + "\n")

info("Test 3: Question outside our database")
question3 = "What's the best time to visit Tokyo?"
print(f"Question: {question3}")
print("Generating answer...")
answer3 = rag_chain.invoke(question3)
print(f"Answer: {answer3}")

success("Basic Q&A testing complete!")

üöÄ Testing RAG chain with basic questions...
üí¨ Test 1: Asking about a specific attraction
Question: What is the Space Needle?
Generating answer...
Answer: The Space Needle is an observation tower located at 400 Broad Street, Seattle, WA 98109, United States of America. It is considered an icon of the city and has been designated a Seattle landmark. It was built in the Seattle Center for the 1962 World's Fair. At 605 ft (184 m) high, it offers panoramic views of the downtown Seattle skyline, the Olympic and Cascade Mountains, Mount Rainier, Mount Baker, Elliott Bay, and various islands in Puget Sound from an observation deck 520 ft (160 m) above ground.

üí¨ Test 2: General travel question
Question: What are some popular tourist attractions in Seattle?
Generating answer...
Answer: Some popular tourist attractions in Seattle include:

*   **Seattle Center** (Seattle Center, Belltown, Seattle, Washington), featuring the Space Needle, Pacific Science Center, Climate Pledge Arena, Mus

## üìä Step 09 ‚Äî Analyze Retrieved Context

Examine the quality and relevance of retrieved documents.

**Analysis Goals:**
- See which documents were retrieved for each question
- Check relevance scores
- Understand how retriever selects documents

In [16]:
task("Analyzing retrieved documents for our test questions...")

test_question = "What are some museums in Seattle?"

info(f"Analyzing retrieval for: '{test_question}'")

from src.rag.vector_store import search_with_score

results_with_scores = search_with_score(vector_store=vector_store, query=test_question)

print(f"Retrieved {len(results_with_scores)} documents:\n")
print("=" * 70)

for i, (doc, score) in enumerate(results_with_scores, 1):
    print(f"{i}. {doc.metadata.get('name', 'Unknown')}")
    print(f"City: {doc.metadata.get('city', 'Unknown')}")
    print(f"Similarity Score: {score:.4f} (lower = more similar)")
    print(f"Categories: {doc.metadata.get('categories', 'N/A')[:50]}...")
    print(f"Content Preview: {doc.page_content[:150]}...")

print("=" * 70)

info("RAG Chain Answer:")

answer = rag_chain.invoke(test_question)
print(f"{answer}")

success("Context analysis complete!")

üöÄ Analyzing retrieved documents for our test questions...
üí¨ Analyzing retrieval for: 'What are some museums in Seattle?'
Retrieved 5 documents:

1. Seattle Center
City: Seattle
Similarity Score: 0.9301 (lower = more similar)
Categories: leisure, leisure.park, tourism, tourism.attraction...
Content Preview: Name: Seattle Center
Location: Seattle Center, Belltown, Seattle, Washington, United States of America
Coordinates: 47.62156465002613, -122.3515420204...
2. Large Lock
City: Seattle
Similarity Score: 1.0608 (lower = more similar)
Categories: heritage, tourism, tourism.sights...
Content Preview: Name: Large Lock
Location: Large Lock, West Commodore Way, Seattle, WA 98017, United States of America
Coordinates: 47.66522180001612, -122.3951868
De...
3. Small Lock
City: Seattle
Similarity Score: 1.0660 (lower = more similar)
Categories: heritage, tourism, tourism.sights...
Content Preview: Name: Small Lock
Location: Small Lock, West Commodore Way, Seattle, WA 98017, United States of

## üéØ Step 10 ‚Äî Test Complex Queries

Test with more complex and diverse questions.

**Query Types:**
- Multi-part questions
- Comparison questions
- Location-based queries

In [17]:
task("Testing with complex queries...")

info("Test 1: Multi-part question")
question1 = "What can I do at Seattle Center and how do I get there?"

print(f"Question: {question1}")
print("Answer:")
answer1 = rag_chain.invoke(question1)
print(answer1)

info("Test 2: Location-specific query")

question2 = "Tell me about attractions near Pike Place Market"
print(f"Question: {question2}")
print("Answer:")
answer2 = rag_chain.invoke(question2)
print(answer2)

success("Complex query testing complete!")

üöÄ Testing with complex queries...
üí¨ Test 1: Multi-part question
Question: What can I do at Seattle Center and how do I get there?
Answer:
At Seattle Center, you can find entertainment, education, tourism, and performing arts. Specific attractions include the Space Needle, Pacific Science Center, Climate Pledge Arena, Museum of Pop Culture (MoPOP), Seattle Opera, and Pacific Northwest Ballet at McCaw Hall. You can get to Seattle Center via the Seattle Center Monorail, which provides public transit service between the Seattle Center and Westlake Center in downtown Seattle.
üí¨ Test 2: Location-specific query
Question: Tell me about attractions near Pike Place Market
Answer:
Attractions near Pike Place Market include the Gum Wall, located beneath Pike Place Market on Post Alley near Pike Street, and Rachel the Piggy Bank, an outdoor bronze sculpture located at Pike Place Market on Pike Place.
‚úÖ Complex query testing complete!


## üìã Step 11 ‚Äî Chapter Summary

### üéâ What We Accomplished

In this chapter, we successfully built a complete RAG (Retrieval-Augmented Generation) question-answering system.

**Key Components Built:**
1. ‚úÖ **LLM Integration**: Google Gemini 2.5 Flash
2. ‚úÖ **Vector Store**: Loaded ChromaDB with 62 Seattle attractions
3. ‚úÖ **Retriever**: Configured similarity search (k=5)
4. ‚úÖ **Prompt Template**: Designed effective RAG prompt
5. ‚úÖ **RAG Chain**: Assembled complete pipeline using LCEL

**Testing Results:**
- ‚úÖ Basic Q&A: Accurate answers for simple questions
- ‚úÖ Complex Queries: Handled multi-part questions successfully
- ‚úÖ Out-of-scope: Correctly refused questions outside database
- ‚úÖ Location Queries: Found nearby attractions effectively

**Pipeline Flow:**

User Question ‚Üí Retriever ‚Üí Context ‚Üí Prompt ‚Üí LLM ‚Üí Answer


### üìä Performance Evaluation

**Strengths:**
- Natural language responses
- Accurate information from database
- Handles complex questions
- Refuses to hallucinate

**Areas for Improvement:**
- Retrieval accuracy could be better
- Limited to Seattle data only
- No conversation memory yet