# Custom Price Estimator

This notebook mirrors the week 6 day 5 fine-tuning workflow and pushes it a little further with the goal of beating the $76 average error target on the shared product dataset.

## Plan
- Load the curated `Item` objects that we prepared earlier in week 6.
- Create train/validation splits sized for a stronger fine-tune than the baseline.
- Package the conversations in JSONL format and launch an OpenAI fine-tuning job.
- Retrieve the tuned model, score it with the shared tester, and aim for < $76 average error.

## Environment Setup
Pull in the packages, load API keys from `.env`, and make sure we can talk to both the OpenAI and Hugging Face services used elsewhere in the course.

In [None]:
import os
import json
import pickle
import random
import re
from pathlib import Path

import numpy as np
import matplotlib.pyplot as plt
from dotenv import load_dotenv
from huggingface_hub import login

from items import Item
from testing import Tester
from openai import OpenAI


In [None]:
# Load secrets from the .env file so the OpenAI client picks them up.
load_dotenv(override=True)
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'set-your-openai-key')
os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN', 'set-your-hf-token')


In [None]:
# Log in to Hugging Face once per session (needed for the tokenizer used in Item).
hf_token = os.environ['HF_TOKEN']
if hf_token and hf_token != 'set-your-hf-token':
    login(hf_token, add_to_git_credential=True)
else:
    print('⚠️  Provide a valid HF_TOKEN in your .env if you need to download tokenizer weights.')


In [None]:
openai = OpenAI()
%matplotlib inline


## Load the Week 6 Dataset
We reuse the curated pickled `Item` objects. If the pickle files are missing, circle back to the earlier data curation notebook to regenerate them.

In [None]:
#Let's avoid curating all our data again! Load in the pickle files:
with open('train_lite.pkl', 'rb') as file:
    train = pickle.load(file)

with open('test_lite.pkl', 'rb') as file:
    test = pickle.load(file)

len(train), len(test)


We will widen the training split beyond the day 5 baseline to squeeze out better accuracy.

In [None]:
TRAIN_SIZE = 400
VAL_SIZE = 100
RANDOM_SEED = 42

rng = random.Random(RANDOM_SEED)
shuffled = train[:]
rng.shuffle(shuffled)
fine_tune_train = shuffled[:TRAIN_SIZE]
fine_tune_validation = shuffled[TRAIN_SIZE:TRAIN_SIZE+VAL_SIZE]

len(fine_tune_train), len(fine_tune_validation)


## Step 1 — Build Training Conversations
Frontier models handled the unaltered prompt, but for the fine-tune we keep the instruction tight and leave the assistant answer as just the numerical price.

In [None]:
SYSTEM_MESSAGE = 'You are an ecommerce pricing assistant. Respond with the price only, no text before or after.'
ASSISTANT_PREFIX = 'Price is $'

def clean_user_prompt(item):
    prompt = item.test_prompt().replace(' to the nearest dollar', '')
    return prompt.replace(ASSISTANT_PREFIX, '')

def messages_for_training(item):
    return [
        {"role": "system", "content": SYSTEM_MESSAGE},
        {"role": "user", "content": clean_user_prompt(item)},
        {"role": "assistant", "content": f'{ASSISTANT_PREFIX}{item.price:.2f}'}
    ]

def messages_for_inference(item):
    return [
        {"role": "system", "content": SYSTEM_MESSAGE},
        {"role": "user", "content": clean_user_prompt(item)},
        {"role": "assistant", "content": ASSISTANT_PREFIX}
    ]

messages_for_training(fine_tune_train[0])


In [None]:
def make_jsonl(items):
    lines = []
    for item in items:
        lines.append(json.dumps({"messages": messages_for_training(item)}))
    return '\n'.join(lines)

def write_jsonl(items, filename):
    payload = make_jsonl(items)
    with open(filename, 'w') as f:
        f.write(payload)

write_jsonl(fine_tune_train, 'fine_tune_train.jsonl')
write_jsonl(fine_tune_validation, 'fine_tune_validation.jsonl')

Path('fine_tune_train.jsonl').stat().st_size, Path('fine_tune_validation.jsonl').stat().st_size


Upload the datasets so the fine-tuning job can consume them.

In [None]:
with open('fine_tune_train.jsonl', 'rb') as file:
    train_file = openai.files.create(file=file, purpose='fine-tune')
train_file


In [None]:
with open('fine_tune_validation.jsonl', 'rb') as file:
    validation_file = openai.files.create(file=file, purpose='fine-tune')
validation_file


## Step 2 — Launch the Fine-Tune
Weights & Biases logging is optional but handy for tracking metrics over time.

In [None]:
wandb_integration = {"type": "wandb", "wandb": {"project": "gpt-pricer"}}
train_file.id, validation_file.id


In [None]:
fine_tune_job = openai.fine_tuning.jobs.create(
    training_file=train_file.id,
    validation_file=validation_file.id,
    model='gpt-4o-mini-2024-07-18',
    seed=RANDOM_SEED,
    hyperparameters={"n_epochs": 2, "learning_rate_multiplier": 1.5},
    suffix='emmy-pricer'
)
fine_tune_job


In [None]:
job_id = fine_tune_job.id
job_id


In [None]:
openai.fine_tuning.jobs.retrieve(job_id)


In [None]:
openai.fine_tuning.jobs.list_events(fine_tuning_job_id=job_id, limit=10).data


If you connected Weights & Biases under Settings → Integrations in the OpenAI dashboard, sync the run for richer charts.

In [None]:
import wandb
from wandb.integration.openai.fine_tuning import WandbLogger

wandb.login()
WandbLogger.sync(fine_tune_job_id=job_id, project='gpt-pricer')


## Step 3 — Evaluate the Tuned Model
Once the job is complete, grab the resulting model name and use the shared tester harness to verify we cleared the $76 average error goal.

In [None]:
fine_tuned_model_name = openai.fine_tuning.jobs.retrieve(job_id).fine_tuned_model
fine_tuned_model_name


In [None]:
def get_price(text):
    cleaned = text.replace('$', '').replace(',', '').strip()
    match = re.search(r'[-+]?\d*\.?\d+', cleaned)
    return float(match.group()) if match else 0.0

def gpt_pricer(item):
    response = openai.chat.completions.create(
        model=fine_tuned_model_name,
        messages=messages_for_inference(item),
        seed=RANDOM_SEED,
        max_tokens=8
    )
    reply = response.choices[0].message.content
    return get_price(reply)


In [None]:
Tester.test(gpt_pricer, test)
