# Week 2 Video 2: Structured Outputs with Grounding Context

## Overview

This notebook demonstrates **grounding** in RAG systems - the practice of linking generated answers back to their source documents. This is crucial for:

- **Transparency**: Users can see which products influenced the answer
- **Verification**: Users can validate the AI's reasoning
- **Trust**: Providing sources increases confidence in recommendations
- **Debugging**: Developers can trace why certain products were chosen

## Key Concepts

### What is Grounding?

Grounding means returning not just the answer, but also **references** to the source documents used. In our case:
- Product IDs (ASINs) that were used
- Short descriptions of those products
- This allows users to "click through" to see the original products

### Building on Video 1

Video 1 showed basic structured outputs with `RAGGenerationResponse(answer: str)`. Now we extend this to include a `references` field containing a list of `RAGUsedContext` objects.

### The Challenge

The LLM must:
1. Answer the question
2. Identify which products it actually used
3. Provide concise descriptions of those products

This requires careful prompt engineering to guide the model.

## Import

In [None]:
import openai
import instructor
from qdrant_client import QdrantClient
from pydantic import BaseModel, Field


# Part 1: Baseline RAG Pipeline (From Video 1)

This section demonstrates the basic RAG pipeline from Video 1 using structured outputs.

**Key Features:**
- Simple `RAGGenerationResponse` with just an `answer` field
- Basic prompt without grounding requirements
- Returns the answer as a validated Pydantic model

**What's Missing:**
- No references to source products
- Can't verify which products influenced the answer
- Users must trust the AI without being able to check sources

This baseline helps us understand what we're improving in Part 2.

In [None]:
client = instructor.from_openai(openai.OpenAI())

class RAGGenerationResponse(BaseModel):
    answer: str = Field(description="The answer to the question")


def get_embedding(text, model="text-embedding-3-small"):
    """
    Convert text to 1536-dimensional embedding vector using OpenAI.

    Args:
        text: Input text to embed (query or document)
        model: OpenAI embedding model name

    Returns:
        List of floats representing the embedding vector
    """
    response = openai.embeddings.create(
        input=text,
        model=model,
    )

    return response.data[0].embedding


def retrieve_data(query, qdrant_client, k=5):
    """
    Retrieve top-k most relevant products from Qdrant vector database.

    Args:
        query: User's question text
        qdrant_client: Connected Qdrant client instance
        k: Number of similar products to retrieve

    Returns:
        Dict with retrieved_context_ids, retrieved_context,
        retrieved_context_ratings, and similarity_scores
    """
    # Convert query to embedding for semantic search
    query_embedding = get_embedding(query)

    # Search Qdrant for nearest neighbors using cosine similarity
    results = qdrant_client.query_points(
        collection_name="Amazon-items-collection-00",
        query=query_embedding,
        limit=k,
    )

    # Extract metadata from matching results
    retrieved_context_ids = []
    retrieved_context = []
    similarity_scores = []
    retrieved_context_ratings = []

    for result in results.points:
        retrieved_context_ids.append(result.payload["parent_asin"])
        retrieved_context.append(result.payload["description"])
        retrieved_context_ratings.append(result.payload["average_rating"])
        similarity_scores.append(result.score)

    return {
        "retrieved_context_ids": retrieved_context_ids,
        "retrieved_context": retrieved_context,
        "retrieved_context_ratings": retrieved_context_ratings,
        "similarity_scores": similarity_scores,
    }


def process_context(context):
    """
    Format retrieved product data into readable text for LLM.

    Args:
        context: Dict with retrieved_context_ids, retrieved_context,
                and retrieved_context_ratings

    Returns:
        Formatted string with product information
    """
    formatted_context = ""

    for id, chunk, rating in zip(
        context["retrieved_context_ids"],
        context["retrieved_context"],
        context["retrieved_context_ratings"],
    ):
        formatted_context += f"- ID: {id}, rating: {rating}, description: {chunk}\n"

    return formatted_context


def build_prompt(preprocessed_context, question):
    """
    Construct the final prompt sent to the language model.

    Args:
        preprocessed_context: Formatted string of retrieved products
        question: User's original question

    Returns:
        Complete prompt with system instructions, context, and question
    """
    prompt = f"""
You are a shopping assistant that can answer questions about the products in stock.

You will be given a question and a list of context.

Instructions:
- You need to answer the question based on the provided context only.
- Never use word context and refer to it as the available products.

Context:
{preprocessed_context}

Question:
{question}
"""

    return prompt


def generate_answer(prompt):
    """
    Generate structured answer using instructor library for type-safe responses.

    **KEY CHANGE FROM WEEK 1:** Now returns structured Pydantic model instead of raw string.

    Args:
        prompt: Complete prompt with instructions, context, and question

    Returns:
        RAGGenerationResponse Pydantic model with validated answer field

    Note:
        - Must use instructor client, NOT raw openai client
        - create_with_completion() returns (model, raw_response) tuple
        - Response IS the Pydantic model, not nested in .choices array
    """
    # CRITICAL: Create instructor client for structured outputs
    # This patches the OpenAI client to support response_model parameter
    instructor_client = instructor.from_openai(openai.OpenAI())

    # Use instructor's create_with_completion for structured output
    # Returns both the validated Pydantic model and raw API response
    response, raw_response = instructor_client.chat.completions.create_with_completion(
        model="gpt-4.1-mini",
        messages=[{"role": "system", "content": prompt}],
        temperature=0,
        response_model=RAGGenerationResponse,  # Pydantic model enforces structure
    )

    # response is already a RAGGenerationResponse object - no need to extract from .choices
    return response


def rag_pipeline(question, top_k=5):
    """
    Complete RAG pipeline with structured outputs.

    **ENHANCED IN WEEK 2:** Now returns structured Pydantic models instead of raw text.

    Args:
        question: User's question about products
        top_k: Number of products to retrieve (default: 5)

    Returns:
        Dict with:
            - datamodel: RAGGenerationResponse Pydantic model
            - answer: Extracted answer string (for convenience)
            - question: Original user query
            - retrieved_context_ids: Product ASINs used
            - retrieved_context: Product descriptions used
            - similarity_scores: Relevance scores for each product
    """
    # Initialize Qdrant client
    qdrant_client = QdrantClient(url="http://localhost:6333")

    # Step 1: Retrieve relevant products from vector database
    retrieved_context = retrieve_data(question, qdrant_client, top_k)

    # Step 2: Format products into readable text
    preprocessed_context = process_context(retrieved_context)

    # Step 3: Build complete prompt with instructions and context
    prompt = build_prompt(preprocessed_context, question)

    # Step 4: Generate structured answer using instructor (NEW IN WEEK 2)
    answer = generate_answer(prompt)

    # Step 5: Package result with both structured model and metadata
    final_result = {
        "datamodel": answer,  # Full Pydantic model
        "answer": answer.answer,  # Extracted answer string for convenience
        "question": question,
        "retrieved_context_ids": retrieved_context["retrieved_context_ids"],
        "retrieved_context": retrieved_context["retrieved_context"],
        "similarity_scores": retrieved_context["similarity_scores"],
    }

    return final_result

In [None]:
qdrant_client = QdrantClient(url="http://localhost:6333")


In [None]:
# WRONG: Don't pass qdrant_client - function creates its own internally
# output = rag_pipeline(
#     "Can I get a charging cable? Please suggest me a good one.", qdrant_client
# )

# CORRECT: Only pass the question (and optionally top_k)
output = rag_pipeline(
    "Can I get a charging cable? Please suggest me a good one."
)

In [None]:
output


In [None]:
print(output["answer"])


# Part 2: RAG Pipeline with Grounding Context

## What's New in This Section

This section extends the RAG pipeline to include **grounding** - returning references to the source products used in the answer.

### Nested Pydantic Models

We now have **two** Pydantic models working together:

1. **`RAGUsedContext`**: Represents a single product reference
   - `id`: The product ASIN (Amazon Standard Identification Number)
   - `description`: A concise 1-2 sentence summary of the product

2. **`RAGGenerationResponse`**: The complete structured output
   - `answer`: The full answer to the user's question
   - `references`: A list of `RAGUsedContext` objects

### Why Nested Models?

Nested models allow us to represent **complex, structured data**:
- The outer model (`RAGGenerationResponse`) represents the complete response
- The inner model (`RAGUsedContext`) represents each individual reference
- instructor validates the entire nested structure automatically

### Enhanced Prompt Engineering

The prompt in `build_prompt()` now explicitly instructs the LLM to:
- Identify which products were actually used in the answer
- Only include products that contributed to the response (not all retrieved products)
- Provide concise descriptions with product names
- Format the answer with detailed specifications in bullet points

This is a key insight: **Structured outputs require structured prompts**. The more specific your prompt, the better the structured output.

### Key Differences from Part 1

| Aspect | Part 1 (Baseline) | Part 2 (With Grounding) |
|--------|-------------------|-------------------------|
| Pydantic Model | Simple (1 field) | Nested (2 models) |
| Prompt | Basic instructions | Explicit grounding requirements |
| Return Value | Just answer string | Answer + references list |
| Transparency | None | Full source attribution |
| Use Case | Quick answers | Production systems requiring verification |

## Define Pydantic Models

In [None]:
# Define nested Pydantic models for grounded structured outputs

class RAGUsedContext(BaseModel):
    """
    Represents a single product reference (grounding context).
    
    This model captures information about one product that was used
    to generate the answer. Multiple instances of this model will be
    returned in the references list.
    
    Fields:
        id: The product ASIN (Amazon Standard Identification Number)
            - Unique identifier for the product
            - Can be used to link back to full product details
            - Example: "B09X12ABC"
        
        description: Short human-readable summary (1-2 sentences)
            - Should include the product name
            - Brief description of what the product is
            - Example: "GREPHONE USB-C to Lightning Cable - 6ft MFi certified charging cable"
    
    Why this structure?
        - Balances detail vs. brevity
        - ID enables programmatic lookup
        - Description enables human understanding
    """
    id: str = Field(description="The unique identifier to the answer the question")
    description: str = Field(description="Short description of the item used to answer the question")


class RAGGenerationResponse(BaseModel):
    """
    Complete structured response with grounding context.
    
    This is the TOP-LEVEL model that instructor will validate and return.
    It contains both the answer and the list of source references.
    
    Fields:
        answer: The full natural language answer to the user's question
            - Should be detailed and informative
            - Should reference products by name
            - Should include specifications in bullet points
        
        references: List of RAGUsedContext objects (grounding)
            - CRITICAL: Only include products actually used in the answer
            - Not all retrieved products will be used
            - The LLM must decide which products are relevant
            - Each reference links the answer back to its source
    
    Why nested structure?
        - Instructor automatically validates nested Pydantic models
        - Type-safe: references must be list[RAGUsedContext]
        - Self-documenting: structure makes relationship clear
        - Enables rich tooling: IDEs can autocomplete fields
    
    Common Pitfall:
        Don't confuse retrieved_context (all products fetched from DB)
        with references (only products used in answer). The LLM performs
        this filtering based on relevance to the specific question.
    """
    answer: str = Field(description="The answer to the question")
    references: list[RAGUsedContext] = Field(description="List of items used to generate the answer")

In [None]:
# ===== RETRIEVAL FUNCTIONS =====
# These functions are the same as Part 1 - grounding happens at the prompt/model level

def retrieve_data(query, qdrant_client, k=5):
    """
    Retrieve top-k most relevant products from vector database.
    
    This function is UNCHANGED from Part 1. Grounding doesn't change retrieval,
    it changes what we do with the retrieved data.
    
    Args:
        query: User's question text
        qdrant_client: Connected Qdrant instance
        k: Number of products to retrieve from vector DB
    
    Returns:
        Dict with lists of IDs, descriptions, ratings, and similarity scores
    """
    # Convert text query to 1536-dim embedding vector (same semantic space as stored products)
    query_embedding = get_embedding(query)
    
    # Search Qdrant using cosine similarity to find k nearest neighbors
    results = qdrant_client.query_points(
        collection_name="Amazon-items-collection-00",  # Collection from Week 1 preprocessing
        query=query_embedding,
        limit=k,  # Retrieve top-k most similar products
    )
    
    # Initialize lists to store extracted product metadata
    retrieved_context_ids = []
    retrieved_context = []
    similarity_scores = []
    retrieved_context_ratings = []
    
    # Extract metadata from each result point
    for result in results.points:
        retrieved_context_ids.append(result.payload["parent_asin"])  # Product ID
        retrieved_context.append(result.payload["description"])  # Full description
        retrieved_context_ratings.append(result.payload["average_rating"])  # Customer rating
        similarity_scores.append(result.score)  # Cosine similarity (0-1, higher = more similar)
    
    # Return structured dict (NOT a Pydantic model - this is internal data structure)
    return {
        "retrieved_context_ids": retrieved_context_ids,
        "retrieved_context": retrieved_context,
        "retrieved_context_ratings": retrieved_context_ratings,
        "similarity_scores": similarity_scores,
    }


def process_context(context):
    """
    Format retrieved products into text for the LLM prompt.
    
    This function is UNCHANGED from Part 1. It converts structured data
    into a formatted string that the LLM can parse.
    
    Args:
        context: Dict from retrieve_data() with product information
    
    Returns:
        Multi-line string with one product per line
        Example: "- ID: B09X12ABC, rating: 4.5, description: USB cable..."
    """
    formatted_context = ""
    
    # Build formatted string with bullet points for each product
    # zip() iterates through three lists in parallel
    for id, chunk, rating in zip(
        context["retrieved_context_ids"],
        context["retrieved_context"],
        context["retrieved_context_ratings"]
    ):
        formatted_context += f"- ID: {id}, rating: {rating}, description: {chunk}\n"
    
    return formatted_context


# ===== PROMPT ENGINEERING FOR GROUNDING =====
# This is where the magic happens - the prompt tells the LLM to return grounded references

def build_prompt(preprocessed_context, question):
    """
    Build enhanced prompt that explicitly requests grounding context.
    
    **KEY CHANGE FROM PART 1**: This prompt now instructs the LLM to:
    1. Identify which products were ACTUALLY used (not all retrieved products)
    2. Return product IDs for those products
    3. Provide concise descriptions
    
    This is a critical insight: Structured outputs require structured prompts.
    The Pydantic model defines the STRUCTURE, but the prompt guides the CONTENT.
    
    Args:
        preprocessed_context: Formatted string with all retrieved products
        question: User's original question
    
    Returns:
        Complete prompt with instructions for grounded structured output
    """
    prompt = f"""
You are a shopping assistant that can answer questions about the products in stock.

You will be given a question and a list of context.

Instructions:
- You need to answer the question based on the provided context only.
- Never use word context and refer to it as the available products.
- As an output you need to provide:

* The answer to the question based on the provided context.
* The list of the IDs of the chunks that were used to answer the question. Only return the ones that are used in the answer.
* Short description (1-2 sentences) of the item based on the description provided in the context.

- The short description should have the name of the item.
- The answer to the question should contain detailed information about the product and returned with detailed specification in bullet points.

Context:
{preprocessed_context}

Question:
{question}
"""
    return prompt


# ===== GENERATION WITH INSTRUCTOR =====
# Instructor enforces the nested Pydantic structure

def generate_answer(prompt):
    """
    Generate STRUCTURED answer with grounding using instructor library.
    
    **UNCHANGED FROM PART 1** - instructor handles nested models automatically!
    The only change is that RAGGenerationResponse now has a references field.
    
    Args:
        prompt: Complete prompt with grounding instructions
    
    Returns:
        RAGGenerationResponse with answer AND references fields populated
    
    How instructor works:
        1. Sends prompt to OpenAI with response_model=RAGGenerationResponse
        2. OpenAI generates JSON matching the Pydantic schema
        3. instructor validates the JSON against the schema
        4. Returns a fully validated Pydantic object
        5. Raises ValidationError if JSON doesn't match schema
    """
    # Create instructor-patched client (enables response_model parameter)
    instructor_client = instructor.from_openai(openai.OpenAI())
    
    # Generate structured output with automatic validation
    # create_with_completion returns (validated_model, raw_api_response)
    response, raw_response = instructor_client.chat.completions.create_with_completion(
        model="gpt-4.1-mini",
        messages=[{"role": "system", "content": prompt}],
        temperature=0,  # Deterministic for consistency
        response_model=RAGGenerationResponse,  # Nested Pydantic model (with references!)
    )
    
    # response is already a RAGGenerationResponse object with validated fields:
    # - response.answer (str)
    # - response.references (list[RAGUsedContext])
    return response


# ===== ORCHESTRATION =====
# The main pipeline function that ties everything together

def rag_pipeline(question, top_k=5):
    """
    Complete RAG pipeline with GROUNDING CONTEXT.
    
    **ENHANCED FROM PART 1**: Now returns references field with source attribution.
    
    Pipeline Flow:
        1. Retrieve top-k products from Qdrant (semantic search)
        2. Format products into text prompt
        3. Build enhanced prompt with grounding instructions
        4. Generate structured answer with references using instructor
        5. Package result with both answer and source attribution
    
    Args:
        question: User's question about products
        top_k: Number of products to retrieve (default: 5)
    
    Returns:
        Dict with:
            - original_output: Full RAGGenerationResponse Pydantic model
            - answer: Extracted answer string (convenience field)
            - references: Extracted references list (NEW IN VIDEO 2!)
            - question: Echo back user query
            - retrieved_context_ids: ALL products retrieved from DB
            - retrieved_context: ALL product descriptions
            - similarity_scores: Relevance scores for retrieved products
    
    Key Insight:
        retrieved_context_ids (all 5-10 products) != references (only 2-3 used)
        The LLM performs intelligent filtering based on relevance!
    
    COMMON ERROR:
        DO NOT pass qdrant_client as an argument!
        Function signature: rag_pipeline(question, top_k=5)
        The function creates its own QdrantClient internally.
        
        WRONG: rag_pipeline("question", qdrant_client, top_k=10)
        RIGHT: rag_pipeline("question", top_k=10)
    """
    # Initialize Qdrant client (connects to local instance on port 6333)
    qdrant_client = QdrantClient(url="http://localhost:6333")
    
    # Step 1: Retrieve top-k products from vector database
    retrieved_context = retrieve_data(question, qdrant_client, top_k)
    
    # Step 2: Format products into readable text for LLM
    preprocessed_context = process_context(retrieved_context)
    
    # Step 3: Build prompt with grounding instructions
    prompt = build_prompt(preprocessed_context, question)
    
    # Step 4: Generate structured answer WITH REFERENCES using instructor
    answer = generate_answer(prompt)  # Returns RAGGenerationResponse with answer + references
    
    # Step 5: Package complete result
    final_result = {
        "original_output": answer,  # Full Pydantic model (access with .answer and .references)
        "answer": answer.answer,  # Extracted answer string for convenience
        "references": answer.references,  # NEW! List of RAGUsedContext objects
        "question": question,  # Echo back for logging
        "retrieved_context_ids": retrieved_context["retrieved_context_ids"],  # All retrieved (for comparison)
        "retrieved_context": retrieved_context["retrieved_context"],  # All descriptions
        "similarity_scores": retrieved_context["similarity_scores"],  # Relevance scores
    }
    
    return final_result

In [None]:
# WRONG: Missing closing parenthesis and passing qdrant_client
# result = rag_pipeline("Can I get some earphones?", qdrant_client, top_k=10

# CORRECT: Only pass question and top_k parameter
result = rag_pipeline("Can I get some earphones?", top_k=10)

In [None]:
result


In [None]:
print(result["answer"] )

## Accessing the Grounding Context

The key benefit of grounding is **transparency**. Let's explore the structured output and see how references differ from retrieved context.

In [None]:
# Access the references field (NEW IN VIDEO 2!)
# This contains ONLY the products that were actually used in the answer
print(f"Number of products USED in answer: {len(result['references'])}")
print(f"Number of products RETRIEVED from DB: {len(result['retrieved_context_ids'])}")
print("\n" + "="*80)
print("GROUNDING CONTEXT (Products actually used):")
print("="*80 + "\n")

for i, ref in enumerate(result['references'], 1):
    print(f"{i}. Product ID: {ref.id}")
    print(f"   Description: {ref.description}")
    print()

### Key Insight: Retrieval vs. Usage

Notice how the number of **retrieved** products (e.g., 10) is different from the number **used** in the answer (e.g., 2-3).

**Why this matters:**
- Not all retrieved products are relevant to every question
- The LLM intelligently filters based on the specific question
- Grounding shows which products actually influenced the answer
- This enables verification: users can check if the answer is justified

**Production Use Cases:**
- E-commerce: "Show me the products you based this recommendation on"
- Legal: "Show me the case law citations"
- Medical: "Show me the research papers supporting this advice"
- Customer support: "Show me the KB articles referenced"