### **Introduction to the Price Estimation Notebook**

This Jupyter Notebook demonstrates an AI-powered approach to estimating product prices using retrieval-augmented generation (RAG) with a combination of **sentence embeddings, a vector database, OpenAI's GPT-4o-mini model and DeepSeek's model**. The goal is to predict the price of an item by leveraging similar products stored in a vector database.

### **Notebook Workflow**
1. **Setup and Dependencies**  
2. **Loading and Processing Data**
3. **Vector Search for Similar Products**
4. **GPT-4o-mini Model for Price Prediction**

In [None]:
# Imports 

import os
import re
import math
import json
from tqdm import tqdm
import random
from dotenv import load_dotenv
from huggingface_hub import login
import matplotlib.pyplot as plt
import numpy as np
import pickle
from openai import OpenAI
from sentence_transformers import SentenceTransformer
from datasets import load_dataset
import chromadb
from utils.items import Item
from utils.testing import Tester

### **Evironment setup**

In [None]:
load_dotenv(override=True)
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')
os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN')

In [None]:
openai = OpenAI()


### **Load in the test pickle file for evaluation**


In [None]:
with open('test.pkl', 'rb') as file:
    test = pickle.load(file)

### **Create the make_context function:** 
- This function helps in constructing the context (message) that will be provided to the frontier model for price estimation.


In [None]:
def make_context(similars, prices):
    message = "To provide some context, here are some other items that might be similar to the item you need to estimate.\n\n"
    for similar, price in zip(similars, prices):
        message += f"Potentially related product:\n{similar}\nPrice is ${price:.2f}\n\n"
    return message

**The above function `make_context` creates a message with the context of similar items and their prices. It helps in constructing the context that will be provided to the GPT model for price estimation.**

In [None]:
def messages_for(item, similars, prices):
    system_message = "You estimate prices of items. Reply only with the price, no explanation"
    user_prompt = make_context(similars, prices)
    user_prompt += "And now the question for you:\n\n"
    user_prompt += item.test_prompt().replace(" to the nearest dollar","").replace("\n\nPrice is $","")
    return [
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_prompt},
        {"role": "assistant", "content": "Price is $"}
    ]

**The above function prepares the messages to send to the frontier model. Includes a system message, user prompt with item context, and an assistant message template.**

In [None]:
DB = "products_vectorstore"


In [None]:
# Initialize and create a persistent client for the Chroma DB.

client = chromadb.PersistentClient(path=DB)
collection = client.get_or_create_collection('products')

In [None]:
def description(item):
    text = item.prompt.replace("How much does this cost to the nearest dollar?\n\n", "")
    return text.split("\n\nPrice is $")[0]

**The `description` function extracts the description of an item for price estimation. It cleans up the input text to only include relevant details.**

In [None]:
description(test[0])

### **Get vector representaion of items**

In [None]:
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

In [None]:
def vector(item):
    return model.encode([description(item)])

**The `vector` function gets the vector representation of an item using a sentence transformer model. This will be used for querying similar items in the vector store.**

In [None]:
def find_similars(item):
    results = collection.query(query_embeddings=vector(item).astype(float).tolist(), n_results=5)
    documents = results['documents'][0][:]
    prices = [m['price'] for m in results['metadatas'][0][:]]
    return documents, prices

**The above function `(find_similars)`, finds the top 5 most similar items from the vector store using the encoded vector of the item. it returns both the descriptions and the corresponding prices of the similar items.**

In [None]:
print(test[1].prompt)

In [None]:
documents, prices = find_similars(test[1])

In [None]:
print(make_context(documents, prices))

In [None]:
print(messages_for(test[1], documents, prices))

In [None]:
def get_price(s):
    s = s.replace('$','').replace(',','')
    match = re.search(r"[-+]?\d*\.\d+|\d+", s)
    return float(match.group()) if match else 0

**The above function extracts the actual price from a formatted string (e.g., "$99.99"). This function handles various price formats and returns a float value.**

In [None]:
def gpt_4o_mini_rag(item):
    documents, prices = find_similars(item)
    response = openai.chat.completions.create(
        model="gpt-4o-mini", 
        messages=messages_for(item, documents, prices),
        seed=42,
        max_tokens=5
    )
    reply = response.choices[0].message.content
    return get_price(reply)

**The above function uses the GPT-4o-mini model to estimate the price of an item based on similar items. It returns the price predicted by the model for the item in question.**

In [None]:
# Test the function with a sample item.

gpt_4o_mini_rag(test[1])

In [None]:
# Actual price for comparison.

test[1].price

### **Test the model performance on a set of test data using the Tester class.**

In [None]:
Tester.test(gpt_4o_mini_rag, test)

###  **DeepSeek's API call implementation**

In [None]:
# Connect to DeepSeek using the OpenAI client python library

deepseek_api_key = os.getenv("DEEPSEEK_API_KEY")
deepseek_via_openai_client = OpenAI(api_key=deepseek_api_key,base_url="https://api.deepseek.com")

In [None]:
# Added some retry logic here because DeepSeek is very oversubscribed and sometimes fails..

def deepseek_api_rag(item):
    documents, prices = find_similars(item)
    retries = 8
    done = False
    while not done and retries > 0:
        try:
            response = deepseek_via_openai_client.chat.completions.create(
                model="deepseek-chat", 
                messages=messages_for(item, documents, prices),
                seed=42,
                max_tokens=8
            )
            reply = response.choices[0].message.content
            done = True
        except Exception as e:
            print(f"Error: {e}")
            retries -= 1
    return get_price(reply)

In [None]:
deepseek_api_rag(test[1])

In [None]:
Tester.test(deepseek_api_rag, test)

### **Let's wrap it into the agent class**

In [None]:
from Ensemble_Agent.frontier_agent import FrontierAgent

In [None]:
# Let's print the logs so we can see what's going on

import logging
root = logging.getLogger()
root.setLevel(logging.INFO)

In [None]:
agent = FrontierAgent(collection)

In [None]:
agent.price("Quadcast HyperX condenser mic for high quality podcasting")

In [None]:
from Ensemble_Agent.specialist_agent import SpecialistAgent

In [None]:
agent2 = SpecialistAgent()

In [None]:
agent2.price("Quadcast HyperX condenser mic for high quality podcasting")