## **Fine-Tuning an OpenAI Model for Product Pricing**  

This guide walks through the process of **fine-tuning an OpenAI model** (GPT-4o-mini) to estimate product prices based on their descriptions. It covers data preparation, fine-tuning, monitoring training with Weights & Biases, and testing the fine-tuned model.


### **1. Introduction**  

Fine-tuning is the process of **training a pre-trained model on a custom dataset** to adapt it to a specific task. In this case, we want the model to predict item prices based on their descriptions.

#### **Why Fine-Tune?**  
- Improves performance on domain-specific tasks  
- Reduces the need for prompt engineering  
- Allows adaptation to specific data structures  

---

### **2. Setting Up the Environment**  

Before we begin, we need to **import necessary libraries** and **configure API keys**. 

In [None]:
# General utilities
import os
import re
import json
import pickle
import random
import numpy as np
import matplotlib.pyplot as plt

# Custom modules for testing
from utils.testing import Tester
from utils.items import Item

In [None]:
# OpenAI and Hugging Face
from openai import OpenAI

# Load environment variables
import os
from dotenv import load_dotenv

# Load variables from .env file
load_dotenv()

# Retrieve API keys from environment variables
openai_api_key = os.getenv('OPENAI_API_KEY')

# Check if API keys are properly loaded
if not openai_api_key:
    raise ValueError("Missing API keys. Ensure OPENAI_API_KEY is set in the .env file.")

# Set environment variables explicitly (optional)
os.environ['OPENAI_API_KEY'] = openai_api_key

In [None]:
# Initialize OpenAI client

openai = OpenAI()

### **Loading and Preparing Data**  


In [None]:
# Load datasets
with open('train.pkl', 'rb') as file:
    train = pickle.load(file)

with open('test.pkl', 'rb') as file:
    test = pickle.load(file)

### **Select Fine-Tuning and Validation Sets**  


OpenAI recommends using **50-100 examples** for fine-tuning, but since our examples are small, we use **200 training examples and 50 validation examples**.

In [None]:
# First 200 samples for training
fine_tune_train = train[:200]

# Next 50 samples for validation
fine_tune_validation = train[200:250]  

###  Step 1: **Formatting Data for Fine-Tuning**  

OpenAI requires fine-tuning data in **JSONL (JSON Lines) format**, where each entry contains structured messages.

#### **Define the Prompt Format**  

Each example consists of a **system message, user prompt, and expected assistant response**.

In [None]:
def messages_for(item):
    system_message = "You estimate prices of items. Reply only with the price, no explanation"
    user_prompt = item.test_prompt().replace(" to the nearest dollar", "").replace("\n\nPrice is $", "")
    return [
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_prompt},
        {"role": "assistant", "content": f"Price is ${item.price:.2f}"}
    ]

In [None]:
messages_for(train[0])

**Note:** The model is trained to respond with **only the price** and no extra text.

---

#### **Convert Data to JSONL Format**  


In [None]:
def make_jsonl(items):
    result = ""
    for item in items:
        messages = messages_for(item)
        messages_str = json.dumps(messages)
        result += '{"messages": ' + messages_str + '}\n'
    return result.strip()

In [None]:
print(make_jsonl(train[:3]))

#### **Save JSONL Files**  


In [None]:
def write_jsonl(items, filename):
    with open(filename, "w") as f:
        jsonl = make_jsonl(items)
        f.write(jsonl)


In [None]:
# Save training and validation data

write_jsonl(fine_tune_train, "fine_tune_train.jsonl")
write_jsonl(fine_tune_validation, "fine_tune_validation.jsonl")

**Note:** JSONL format ensures each training example is processed individually.

---

### **Uploading Data to OpenAI for Fine-Tuning**  

In [None]:
# Upload training data
with open("fine_tune_train.jsonl", "rb") as f:
    train_file = openai.files.create(file=f, purpose="fine-tune")

# Upload validation data
with open("fine_tune_validation.jsonl", "rb") as f:
    validation_file = openai.files.create(file=f, purpose="fine-tune")

**Note:** OpenAI securely stores files for fine-tuning.

---

## **Step 2: Integrating Weights & Biases with OpenAI for Fine-Tuning**  

[Weights & Biases (W&B)](https://wandb.ai) is a **powerful and free** platform for tracking machine learning experiments, including fine-tuning runs with OpenAI. By integrating W&B, you can **monitor training progress, visualize performance metrics, and debug issues more effectively**.  

## **1. Set Up Your Weights & Biases Account**  

1. Go to **[Weights & Biases](https://wandb.ai)** and **sign up for a free account**.  
2. Click on your **avatar (top right corner) → "Settings"**.  
3. Scroll down to the **API Keys** section and **generate a new API key**.  

## **2. Connect Weights & Biases to OpenAI**  

1. Visit your **OpenAI account settings**:  
    [OpenAI Dashboard](https://platform.openai.com/account/organization)  
2. Navigate to the **"Integrations"** section.  
3. Find **Weights & Biases** and **add your W&B API key**.  


---

### **Fine-Tuning the Model**  


Weights & Biases (**W&B**) helps monitor fine-tuning progress. 

In [None]:
wandb_integration = {"type": "wandb", "wandb": {"project": "gpt-pricer"}}

In [None]:
train_file 

In [None]:
train_file.id

#### **Start Fine-Tuning**  


In [None]:
openai.fine_tuning.jobs.create(
    training_file=train_file.id,
    validation_file=validation_file.id,
    model="gpt-4o-mini-2024-07-18",
    seed=42,
    hyperparameters={"n_epochs": 1},
    integrations=[wandb_integration],
    suffix="pricer"
)

**Note:** The **suffix** makes it easier to identify the fine-tuned model.

---

#### **Monitoring and Retrieving the Fine-Tuned Model** 

In [None]:
openai.fine_tuning.jobs.list(limit=1)


#### **Retrieve Fine-Tuned Model Name**  


In [None]:
job_id = openai.fine_tuning.jobs.list(limit=1).data[0].id

In [None]:
job_id

In [None]:
openai.fine_tuning.jobs.retrieve(job_id)

In [None]:
openai.fine_tuning.jobs.list_events(fine_tuning_job_id=job_id, limit=10).data

## **Step 3: Test our fine tuned model**

In [None]:
fine_tuned_model_name = openai.fine_tuning.jobs.retrieve(job_id).fine_tuned_model

In [None]:
fine_tuned_model_name

In [None]:
# The prompt

def messages_for(item):
    system_message = "You estimate prices of items. Reply only with the price, no explanation"
    user_prompt = item.test_prompt().replace(" to the nearest dollar","").replace("\n\nPrice is $","")
    return [
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_prompt},
        {"role": "assistant", "content": "Price is $"}
    ]

In [None]:
# Try this out

messages_for(test[0])

#### **Extract Price from Text Output**  


In [None]:
def get_price(s):
    s = s.replace('$', '').replace(',', '')
    match = re.search(r"[-+]?\d*\.\d+|\d+", s)
    return float(match.group()) if match else 0

In [None]:
get_price("The price is roughly $99.99 because blah blah")

### **Testing the Fine-Tuned Model**  

In [None]:
def gpt_fine_tuned(item):
    response = openai.chat.completions.create(
        model=fine_tuned_model_name, 
        messages=messages_for(item),
        seed=42,
        max_tokens=7
    )
    reply = response.choices[0].message.content
    return get_price(reply)

#### **Run the Model on Test Data**  


In [None]:
print(test[0].price)  # Actual price
print(gpt_fine_tuned(test[0]))  # Predicted price

#### **Evaluate Model Performance**  


In [None]:
Tester.test(gpt_fine_tuned, test)


**Note:** The model is tested on unseen data to measure accuracy.
