# Assignment 2: RAG-Enhanced Pricing Agent with Historical Knowledge

## Objective
Add factual grounding to our pricing agent using a **pricing knowledge base** with historical data, category benchmarks, and proven pricing strategies.

## Requirements
**RAG Knowledge Base Includes:**
- Category-level elasticity benchmarks
- Historical margins
- 20-50 sample past pricing decisions
- Competitor trends
- Price recommendation guidelines

**Agent Behavior:**
- Pulls past examples
- Justifies price using retrieved data
- Reduces hallucination compared to Iteration 1

## Setup & Dependencies

Install the required packages for RAG implementation.

In [1]:
# Install required packages for RAG
!pip install -q langchain langchain-groq langchain-community langchain-text-splitters
!pip install -q chromadb sentence-transformers
!pip install -q pandas numpy

In [2]:
# Import required libraries
from langchain_groq import ChatGroq
from langchain_core.prompts import PromptTemplate
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import SentenceTransformerEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from langchain_core.documents import Document
import os
import getpass
import pandas as pd
import json
from typing import List, Dict, Any, Sequence
from dataclasses import dataclass



In [3]:
# Set up your Groq API key
print("Please enter your Groq API key:")
print("(You can get one free at: https://console.groq.com/)")
groq_api_key = getpass.getpass("Groq API Key: ")
os.environ["GROQ_API_KEY"] = groq_api_key
print("API key set successfully!")

Please enter your Groq API key:
(You can get one free at: https://console.groq.com/)
Groq API Key: ··········
API key set successfully!


## Create Pricing Knowledge Base

Create a comprehensive pricing knowledge base with historical data and benchmarks.

In [4]:
# TODO: Create pricing knowledge base data structures
# Create the following data structures:

# 1. Category-level elasticity benchmarks
# HINT: elasticity_benchmarks = [ {"category": "Electronics", "elasticity": "High", ...} ]
elasticity_benchmarks = [
    # TODO: Add at least 4 categories with elasticity data
    {"category": "Electronics", "elasticity": "High"},
    {"category": "Home & Garden", "elasticity": "Medium"},
    {"category": "Fashion", "elasticity": "High"},
    {"category": "Automotive", "elasticity": "Low"},
    {"category": "Groceries", "elasticity": "Low"}
]

# 2. Historical margin data by category
# HINT: historical_margins = [ {"category": "Electronics", "avg_margin": "12-18%", ...} ]
historical_margins = [
    # TODO: Add margin data for each category
    {"category": "Electronics", "avg_margin": "15-20%"},
    {"category": "Home & Garden", "avg_margin": "25-35%"},
    {"category": "Fashion", "avg_margin": "40-60%"},
    {"category": "Automotive", "avg_margin": "30-45%"},
    {"category": "Groceries", "avg_margin": "5-10%"}
]

# 3. Sample past pricing decisions (20-50 examples)
# HINT: Include product, category, cost, recommended_price, margin, competitor_price, outcome, reasoning
pricing_decisions = [
    # TODO: Add at least 7 pricing decision examples
    {
        "product": "Wireless Noise-Canceling Headphones", "category": "Electronics", "cost": 50.00, "recommended_price": 79.99, "margin": "37.5%", "competitor_price": 85.00, "outcome": "High Sales Volume", "reasoning": "Priced slightly below market leader to capture share; high elasticity category."
    },
    {
        "product": "Winter Parka Jacket", "category": "Fashion", "cost": 45.00, "recommended_price": 129.99, "margin": "65.4%", "competitor_price": 140.00, "outcome": "Successful Season", "reasoning": "Seasonal demand allows for higher margins; competitive pricing still maintained."
    },
    {
        "product": "Ceramic Brake Pads", "category": "Automotive", "cost": 18.00, "recommended_price": 49.99, "margin": "64%", "competitor_price": 45.00, "outcome": "Stable Sales", "reasoning": "Low elasticity; customers prioritize quality/safety over small price differences."
    },
    {
        "product": "Organic Coffee Beans (1lb)", "category": "Groceries", "cost": 8.50, "recommended_price": 14.99, "margin": "43.3%", "competitor_price": 15.50, "outcome": "High Turnover", "reasoning": "Competitive staple product; slight undercut drives volume."
    },
    {
        "product": "Smart LED Bulb", "category": "Home & Garden", "cost": 6.00, "recommended_price": 12.99, "margin": "53.8%", "competitor_price": 14.00, "outcome": "Moderate Growth", "reasoning": "Entry-level smart home product; pricing for adoption."
    },
    {
        "product": "4K Gaming Monitor", "category": "Electronics", "cost": 200.00, "recommended_price": 299.99, "margin": "33.3%", "competitor_price": 320.00, "outcome": "Best Seller", "reasoning": "Aggressive pricing in competitive segment to clear inventory."
    },
    {
        "product": "Leather Handbag", "category": "Fashion", "cost": 80.00, "recommended_price": 250.00, "margin": "68%", "competitor_price": 280.00, "outcome": "High Profitability", "reasoning": "Premium positioning; margin prioritized over volume."
    }
]

# 4. Price recommendation guidelines
# HINT: Include rules for high elasticity, competitive positioning, etc.
pricing_guidelines = [
    # TODO: Add pricing guidelines and rules
    "For High Elasticity products (e.g., Electronics, Fashion), price within 5% of key competitors to maintain market share.",
    "For Low Elasticity products (e.g., Automotive parts, Essentials), prioritize margin; pricing 10-15% above competitors is acceptable if quality is differentiated.",
    "Ensure a minimum gross margin of 20% on all hardware products unless in clearance.",
    "Fashion items should target 50%+ margin to account for end-of-season markdowns.",
    "If competitor stock is low (stockout), increase price by 5-10% to capture demand surplus.",
    "Avoid price wars on commoditized goods; focus on value-add or bundle pricing instead."
]

print(f"Created knowledge base with:")
print(f"- {len(elasticity_benchmarks)} category elasticity benchmarks")
print(f"- {len(historical_margins)} historical margin references")
print(f"- {len(pricing_decisions)} past pricing decisions")
print(f"- {len(pricing_guidelines)} pricing guidelines")

Created knowledge base with:
- 5 category elasticity benchmarks
- 5 historical margin references
- 7 past pricing decisions
- 6 pricing guidelines


## Create Vector Store for RAG

Convert our knowledge base into searchable documents and create embeddings.

In [5]:
def create_pricing_documents() -> List[Document]:
    """Convert pricing knowledge into searchable documents"""
    documents = []

    # TODO: Add elasticity benchmarks to documents
    # HINT: Create Document objects with page_content and metadata
    for benchmark in elasticity_benchmarks:
        # TODO: Format content and create Document
        content = f"Category: {benchmark['category']}\nElasticity: {benchmark['elasticity']}"
        metadata = {"type": "elasticity_benchmark", "category": benchmark['category']}
        documents.append(Document(page_content=content, metadata=metadata))

    # TODO: Add historical margins to documents
    for margin in historical_margins:
        # TODO: Format content and create Document
        content = f"Category: {margin['category']}\nAverage Margin: {margin['avg_margin']}"
        metadata = {"type": "historical_margin", "category": margin['category']}
        documents.append(Document(page_content=content, metadata=metadata))

    # TODO: Add pricing decisions to documents
    for decision in pricing_decisions:
        # TODO: Format content and create Document with all fields
        content = (
            f"Product: {decision['product']}\n"
            f"Category: {decision['category']}\n"
            f"Cost: {decision['cost']}\n"
            f"Recommended Price: {decision['recommended_price']}\n"
            f"Margin: {decision['margin']}\n"
            f"Competitor Price: {decision['competitor_price']}\n"
            f"Outcome: {decision['outcome']}\n"
            f"Reasoning: {decision['reasoning']}"
        )
        metadata = {"type": "pricing_decision", "category": decision['category'], "product": decision['product']}
        documents.append(Document(page_content=content, metadata=metadata))

    # TODO: Add pricing guidelines to documents
    for guideline in pricing_guidelines:
        # TODO: Format content and create Document
        content = guideline
        metadata = {"type": "pricing_guideline"}
        documents.append(Document(page_content=content, metadata=metadata))

    return documents

# Create documents
pricing_docs = create_pricing_documents()
print(f"Created {len(pricing_docs)} searchable documents")

Created 23 searchable documents


In [14]:
# TODO: Initialize embeddings and vector store
# HINT: Use SentenceTransformerEmbeddings with "all-MiniLM-L6-v2" model
import shutil
import os

print("Setting up embeddings...")
embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

# Clear existing Chroma DB to prevent duplicate entries if the cell is run multiple times
CHROMA_DB_DIR = "/tmp/chroma_db" # Changed to /tmp directory for better write permissions
if os.path.exists(CHROMA_DB_DIR):
    print(f"Deleting existing Chroma DB at {CHROMA_DB_DIR}...")
    shutil.rmtree(CHROMA_DB_DIR)

# Ensure the directory exists and has write permissions before Chroma tries to use it
os.makedirs(CHROMA_DB_DIR, exist_ok=True)
os.chmod(CHROMA_DB_DIR, 0o777) # Set full read/write/execute permissions

print("Creating vector store...")
# TODO: Create Chroma vector store from documents
# HINT: Use Chroma.from_documents() with persist_directory
vectorstore = Chroma.from_documents(documents=pricing_docs, embedding=embeddings, persist_directory=CHROMA_DB_DIR)

print("Vector store created successfully!")
print(f"Indexed {len(pricing_docs)} documents for retrieval")

Setting up embeddings...
Creating vector store...
Vector store created successfully!
Indexed 23 documents for retrieval


## RAG-Enhanced Pricing Agent

Create our enhanced pricing agent that uses retrieval to ground its recommendations.

In [19]:
class RAGPricingAgent:
    def __init__(self, vectorstore, embeddings):
        # TODO: Initialize the LLM
        self.llm = ChatGroq(
            model="llama-3.1-8b-instant",
            temperature=0,
            max_tokens=1000,
            api_key=groq_api_key
        )
        # Store the vector store and embeddings
        self.vectorstore = vectorstore
        self.embeddings = embeddings

        # TODO: Create system message for RAG-enhanced agent
        self.system_message = SystemMessage(content="""
        You are a retail pricing agent who provides reasoning in a short and easy to follow manner.
        Use prior knowledge elasticity benchmarks, historical margins and prior pricing decisions to provide a response.
        """)

    def retrieve_relevant_knowledge(self, query: str, k: int = 5):
        """Retrieve relevant pricing knowledge for the query"""

        retriever = self.vectorstore.as_retriever(search_kwargs={"k": k})
        relevant_docs = retriever.invoke(query)
        return relevant_docs

    def format_retrieved_context(self, docs):
        """Format retrieved documents into context string"""
        # TODO: Format documents into a readable context string
        # HINT: Include document content and metadata type
        context_parts = []
        for i, doc in enumerate(docs, 1):
            context_parts.append(f"Document {i}:\nContent: {doc.page_content}\nMetaData Type: {doc.metadata['type']}")

        return "\n\n".join(context_parts)

    def get_rag_price_recommendation(self, product_name, category, cost_price,
                                   current_price=None, target_margin=None,
                                   competitor_price=None, price_elasticity=None):
        """Get price recommendation using RAG"""

        search_query = f"Category: {category}"
        relevant_docs = self.retrieve_relevant_knowledge(search_query)
        context = self.format_retrieved_context(relevant_docs)

        prompt = f"""
        Prior knowledge:
        {context}

        Recommend an optimal selling price following the factors:
          price elasticity= {price_elasticity}
          margin = {target_margin}
          competitor price = {competitor_price}
          cost = {cost_price}
          current price = {current_price}
        """

        # Get response from LLM
        messages = [self.system_message, HumanMessage(content=prompt)]
        response = self.llm.invoke(messages)

        return response.content

# Initialize the RAG pricing agent
print("Initializing RAG Pricing Agent...")
rag_agent = RAGPricingAgent(vectorstore, embeddings)
print("RAG Pricing Agent Ready!")

Initializing RAG Pricing Agent...
RAG Pricing Agent Ready!


## Testing the RAG-Enhanced Agent

Test our RAG agent with the assignment example.

In [20]:
# Test with the assignment example: Puma sneakers
print("Testing Assignment Example: Puma Sneakers")
print("=" * 60)

result = rag_agent.get_rag_price_recommendation(
    product_name="Puma Jacket",
    category="Fashion",
    cost_price=1800,  # Note: High cost from assignment
    target_margin=30,
    price_elasticity="High"
)

print(result)

Testing Assignment Example: Puma Sneakers
Based on the provided documents, we can analyze the factors to determine an optimal selling price.

1. **Price Elasticity**: High elasticity indicates that small price changes can lead to significant changes in demand. To maintain sales volume, we should consider a price that balances profitability with demand sensitivity.

2. **Margin**: The target margin is 30%, which is lower than the recommended margin in Document 3 (50%+). This suggests that we prioritize volume over premium positioning.

3. **Competitor Price**: Since there is no competitor price provided, we will focus on internal factors to determine the optimal price.

4. **Cost**: The cost of the product is $1800.

Considering the high elasticity and the target margin, we can use the pricing guideline from Document 3 as a reference. To account for end-of-season markdowns, the recommended margin should be 50%+. However, since our target margin is 30%, we will aim for a price that balan

In [21]:
# TODO: Compare RAG vs Baseline approaches
# Create a function that gets both RAG and prompt-only recommendations
# and shows the difference

def compare_rag_vs_baseline(product_name, category, cost_price, target_margin=None,
                           competitor_price=None, price_elasticity=None):
    """Compare RAG recommendation with baseline prompt-only approach"""

    # Get RAG recommendation
    rag_result = rag_agent.get_rag_price_recommendation(
        product_name=product_name,
        category=category,
        cost_price=cost_price,
        target_margin=target_margin,
        competitor_price=competitor_price,
        price_elasticity=price_elasticity
    )

    # Get baseline recommendation (prompt-only)
    baseline_llm = ChatGroq(
        model="llama-3.1-8b-instant",
        temperature=0,
        max_tokens=1000,
        api_key=groq_api_key
    )

    baseline_system_message = SystemMessage(content="""
    You are a retail pricing agent. Recommend an optimal selling price based on the given factors.
    Provide reasoning for your recommendation.
    """)

    baseline_prompt = f"""
    Recommend an optimal selling price following the factors:
      Product: {product_name}
      Category: {category}
      Cost: {cost_price}
      Price Elasticity: {price_elasticity}
      Target Margin: {target_margin}
      Competitor Price: {competitor_price}
    """
    baseline_response = baseline_llm.invoke([baseline_system_message, HumanMessage(content=baseline_prompt)])
    baseline_result = baseline_response.content

    # Format and return comparison
    return f"""
--- RAG-Enhanced Recommendation for {product_name} ({category}) ---
{rag_result}

--- Baseline (Prompt-Only) Recommendation for {product_name} ({category}) ---
{baseline_result}
"""

# Test comparison
print(compare_rag_vs_baseline(
    product_name="Puma Jacket",
    category="Fashion",
    cost_price=1800,
    target_margin=30,
    price_elasticity="High"
))


--- RAG-Enhanced Recommendation for Puma Jacket (Fashion) ---
Based on the provided documents, we can analyze the factors to determine an optimal selling price.

1. **Price Elasticity**: High elasticity indicates that small price changes can lead to significant changes in demand. To maintain sales volume, we should consider a price that balances profitability with demand sensitivity.

2. **Margin**: The target margin is 30%, which is lower than the recommended margin in Document 3 (50%+). This suggests that we prioritize volume over premium positioning.

3. **Competitor Price**: Since there is no competitor price provided, we will focus on internal factors to determine the optimal price.

4. **Cost**: The cost of the product is $1800.

Considering the high elasticity and the target margin, we can use the pricing guideline from Document 3 as a reference. To account for end-of-season markdowns, the recommended margin should be 50%+. However, since our target margin is 30%, we will aim f

## Additional Test Cases

Test with various scenarios to see how RAG improves recommendations.

In [22]:
# TODO: Test Case 1: Electronics (High Elasticity)
print("Test Case 1: Electronics Category")
print("=" * 40)

# TODO: Test electronics pricing with RAG agent
electronics_result = rag_agent.get_rag_price_recommendation(
    product_name="Smartwatch X1",
    category="Electronics",
    cost_price=150,
    target_margin=25,
    price_elasticity="High",
    competitor_price=220
)

print(electronics_result)

Test Case 1: Electronics Category
To determine the optimal selling price, we'll consider the given factors and prior knowledge.

1. **Price Elasticity**: High (Document 1)
   High elasticity indicates that small price changes can lead to significant changes in demand. To maximize sales, we should price competitively.

2. **Margin**: 25% (Given)
   A margin of 25% is relatively high, indicating that we want to make a significant profit on each sale.

3. **Competitor Price**: 220 (Document 5)
   The competitor's price is 220, which we'll use as a reference point.

4. **Cost**: 150 (Document 5)
   The cost of the product is 150, which we'll use to calculate the optimal price.

Considering the high elasticity and the competitor's price, we should aim to price competitively. However, we also want to maintain a 25% margin.

Let's calculate the optimal price using the following formula:

Optimal Price = (Cost + (Margin * Cost)) / (1 + Elasticity)

Since elasticity is high, we'll assume an ela

In [23]:
# TODO: Test Case 2: Groceries Goods (Low Elasticity)
print("Test Case 2: Groceries Goods Category")
print("=" * 40)

# TODO: Test Groceries goods pricing with RAG agent
result = rag_agent.get_rag_price_recommendation(
    product_name="Organic Whole Milk (1 Gallon)",
    category="Groceries",
    cost_price=3.50,
    target_margin=15,
    price_elasticity="Low",
    competitor_price=4.99
)

print(result)

Test Case 2: Groceries Goods Category
Based on the provided documents and factors, I will make a recommendation for the optimal selling price.

1. **Elasticity**: Low elasticity in the Groceries category (Document 1) indicates that customers are less sensitive to price changes. This means we can consider a higher price without significantly affecting sales volume.

2. **Margin**: The desired margin is 15% (Document 2). This is lower than the historical margin range of 5-10% (Document 2), but we can aim for this target.

3. **Competitor Price**: The competitor price is $4.99 (Document 5). We should consider undercutting this price to drive volume, as seen in the Organic Coffee Beans example (Document 3).

4. **Cost**: The cost of the product is $3.5.

Considering these factors, I recommend the following optimal selling price:

**Optimal Selling Price**: $4.49

**Rationale**:

- We aim for a 15% margin, which is higher than the historical range but lower than the desired margin for Fashi

## Exploring the Knowledge Base

See what the RAG system retrieves for different queries.

In [24]:
def explore_knowledge_retrieval(query, k=3):
    """Show what gets retrieved for a given query"""
    print(f"Query: '{query}'")
    print("=" * 50)

    # TODO: Implement retrieval exploration
    # Get documents for the query and display them
    docs = rag_agent.retrieve_relevant_knowledge(query, k=k)

    for i, doc in enumerate(docs, 1):
        # TODO: Display each retrieved document
        print(f"--- Retrieved Document {i} ---")
        print(f"Content: {doc.page_content}")
        print(f"Metadata: {doc.metadata}")
        print("\n")

    return docs

# Test different retrieval queries
explore_knowledge_retrieval("high elasticity pricing strategy")
print("\n" + "=" * 70 + "\n")
explore_knowledge_retrieval("fashion items margin strategy")
print("\n" + "=" * 70 + "\n")
explore_knowledge_retrieval("Automotive parts pricing")

Query: 'high elasticity pricing strategy'
--- Retrieved Document 1 ---
Content: For High Elasticity products (e.g., Electronics, Fashion), price within 5% of key competitors to maintain market share.
Metadata: {'type': 'pricing_guideline'}


--- Retrieved Document 2 ---
Content: For Low Elasticity products (e.g., Automotive parts, Essentials), prioritize margin; pricing 10-15% above competitors is acceptable if quality is differentiated.
Metadata: {'type': 'pricing_guideline'}


--- Retrieved Document 3 ---
Content: Category: Electronics
Elasticity: High
Metadata: {'type': 'elasticity_benchmark', 'category': 'Electronics'}




Query: 'fashion items margin strategy'
--- Retrieved Document 1 ---
Content: Fashion items should target 50%+ margin to account for end-of-season markdowns.
Metadata: {'type': 'pricing_guideline'}


--- Retrieved Document 2 ---
Content: Category: Fashion
Average Margin: 40-60%
Metadata: {'category': 'Fashion', 'type': 'historical_margin'}


--- Retrieved Document

[Document(metadata={'type': 'pricing_guideline'}, page_content='For Low Elasticity products (e.g., Automotive parts, Essentials), prioritize margin; pricing 10-15% above competitors is acceptable if quality is differentiated.'),
 Document(metadata={'category': 'Automotive', 'product': 'Ceramic Brake Pads', 'type': 'pricing_decision'}, page_content='Product: Ceramic Brake Pads\nCategory: Automotive\nCost: 18.0\nRecommended Price: 49.99\nMargin: 64%\nCompetitor Price: 45.0\nOutcome: Stable Sales\nReasoning: Low elasticity; customers prioritize quality/safety over small price differences.'),
 Document(metadata={'type': 'historical_margin', 'category': 'Automotive'}, page_content='Category: Automotive\nAverage Margin: 30-45%')]

## Your Turn: Expand the Knowledge Base

**Exercise 1:** Add 5 more pricing decisions to the knowledge base
**Exercise 2:** Add a new category with appropriate benchmarks
**Exercise 3:** Test how the new data affects recommendations

In [None]:
# Exercise 1: Add your pricing decisions here
additional_pricing_decisions = [
    # TODO: Add 5 more pricing decision examples
    # Follow the format from the original pricing_decisions list
]

# Exercise 2: Add a new category
new_category_data = {
    "elasticity_benchmark": {
        # TODO: Add elasticity data for your new category
    },
    "historical_margins": {
        # TODO: Add margin data for your new category
    }
}

# Exercise 3: Test with new category
# TODO: Test your new category with the agent

print("Exercises completed! Test your enhancements.")