[![Roboflow Notebooks](https://media.roboflow.com/notebooks/template/bannertest2-2.png?ik-sdk-version=javascript-1.4.3&updatedAt=1672932710194)](https://github.com/roboflow/notebooks)

# OpenAI GPT-4o fine-tuning
---

## Setup

### Configure your API keys

To fine-tune GPT-4o, you need to provide your OpenAI API key and Roboflow API key. Follow these steps:

- Open your [`OpenAI Settings`](https://platform.openai.com/settings) page. Click `User API keys` then `Create new secret key` to generate new token.
- Go to your [`Roboflow Settings`](https://app.roboflow.com/settings/api) page. Click `Copy`. This will place your private key in the clipboard.
- In Colab, go to the left pane and click on `Secrets` (🔑).
    - Store OpenAI API key under the name `OPENAI_API_KEY`.
    - Store Roboflow API Key under the name `ROBOFLOW_API_KEY`.

## Install dependencies

In [18]:
!pip install -q openai roboflow maestro==0.2.0rc5

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.4/42.4 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.6/2.6 MB[0m [31m46.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.0/43.0 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m296.4/296.4 kB[0m [31m16.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m151.3/151.3 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m45.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m56.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for flash-attn (setup.py) ... [?25l[?25hdone


## Download dataset

In [2]:
from roboflow import Roboflow
from google.colab import userdata

ROBOFLOW_API_KEY = userdata.get('ROBOFLOW_API_KEY')
rf = Roboflow(api_key=ROBOFLOW_API_KEY)

workspace = rf.workspace("april-public-yibrz")
project = workspace.project("focal-length")
version = project.version(1)
dataset = version.download("openai")

loading Roboflow workspace...
loading Roboflow project...


Downloading Dataset Version Zip in Focal-Length-1 to openai:: 100%|██████████| 4/4 [00:00<00:00, 2926.94it/s]





Extracting Dataset Version Zip to Focal-Length-1 in openai:: 100%|██████████| 5/5 [00:00<00:00, 1204.71it/s]


In [3]:
!head -n 5 {dataset.location}/_annotations.train.jsonl

{"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What focal length is this photo?"},{"role":"user","content":[{"type":"image_url","image_url":{"url":"https://transform.roboflow.com/SFgRaqEsIPfd7Vj37buG/018867065caf451d0098b749fae0f310/transformed.jpg"}}]},{"role":"assistant","content":"55.0mm"}]}
{"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What focal length is this photo?"},{"role":"user","content":[{"type":"image_url","image_url":{"url":"https://transform.roboflow.com/SFgRaqEsIPfd7Vj37buG/44561655d9c73836724db71e9639dc63/transformed.jpg"}}]},{"role":"assistant","content":"50.0mm"}]}
{"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What focal length is this photo?"},{"role":"user","content":[{"type":"image_url","image_url":{"url":"https://transform.roboflow.com/SFgRaqEsIPfd7Vj37buG/4b3887ae2840343d284e47a582a47f9d/transformed.jpg"}}]},{

## Run GPT-4o fine-tuning

**NOTE:** At the time of publishing this notebook, only the `gpt-4o-2024-08-06` model can be fine-tuned with vision datasets.

In [9]:
# @title Initiate OpenAI client

from openai import OpenAI
from google.colab import userdata

OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

client = OpenAI(api_key=OPENAI_API_KEY)

In [14]:
# @title Upload a training and validation file

training_file_upload_response = client.files.create(
  file=open(f"{dataset.location}/_annotations.train.jsonl", "rb"),
  purpose="fine-tune"
)

validation_file_upload_response = client.files.create(
  file=open(f"{dataset.location}/_annotations.valid.jsonl", "rb"),
  purpose="fine-tune"
)

print("treaining file response:", training_file_upload_response)
print("validation file response:", validation_file_upload_response)

treaining file response: FileObject(id='file-ByuHoRS2fs7TM1TQEsQJft05', bytes=12146, created_at=1727882579, filename='_annotations.train.jsonl', object='file', purpose='fine-tune', status='processed', status_details=None)
validation file response: FileObject(id='file-n2qoSkM5FA1AvxYINDJnKXmx', bytes=3471, created_at=1727882579, filename='_annotations.valid.jsonl', object='file', purpose='fine-tune', status='processed', status_details=None)


In [16]:
# @title Create a fine-tuned model

import re

def process_suffix(text: str) -> str:
    """
    Converts a string into kebab-case, where spaces are replaced with hyphens
    and all letters are lowercase.

    Args:
        text (str): The input string to be converted. Typically, words are
          separated by spaces.

    Returns:
        str: The kebab-case version of the input string, where spaces are
          replaced by hyphens and the text is lowercase.

    Example:
        >>> process_suffix("Focal Length")
        'focal-length'
    """
    return re.sub(r'\s+', '-', text.strip()).lower()


fine_tuning_response = client.fine_tuning.jobs.create(
    training_file=training_file_upload_response.id,
    validation_file=validation_file_upload_response.id,
    suffix=process_suffix(dataset.name),
    model="gpt-4o-2024-08-06"
)

fine_tuning_response

FineTuningJob(id='ftjob-JCGY6KZqPitJwXD5rPwsu7hb', created_at=1727882809, error=Error(code=None, message=None, param=None), fine_tuned_model=None, finished_at=None, hyperparameters=Hyperparameters(n_epochs='auto', batch_size='auto', learning_rate_multiplier='auto'), model='gpt-4o-2024-08-06', object='fine_tuning.job', organization_id='org-sLGE3gXNesVjtWzgho17NkRy', result_files=[], seed=1660100124, status='validating_files', trained_tokens=None, training_file='file-ByuHoRS2fs7TM1TQEsQJft05', validation_file='file-n2qoSkM5FA1AvxYINDJnKXmx', estimated_finish=None, integrations=[], user_provided_suffix='focal-length')

⚠️ After you've started a fine-tuning job, it may take some time to complete. Your job may be queued behind other jobs in our system, and training a model can take minutes or hours depending on the model and dataset size. After the model training is completed, the user who created the fine-tuning job will receive an email confirmation.

In addition to creating a fine-tuning job, you can also list existing jobs, retrieve the status of a job, or cancel a job.

In [25]:
# @title Check training job status

status_response = client.fine_tuning.jobs.retrieve(fine_tuning_response.id)

status_response

FineTuningJob(id='ftjob-JCGY6KZqPitJwXD5rPwsu7hb', created_at=1727882809, error=Error(code=None, message=None, param=None), fine_tuned_model='ft:gpt-4o-2024-08-06:personal:focal-length:ADvvXOAF', finished_at=1727884296, hyperparameters=Hyperparameters(n_epochs=4, batch_size=1, learning_rate_multiplier=2), model='gpt-4o-2024-08-06', object='fine_tuning.job', organization_id='org-sLGE3gXNesVjtWzgho17NkRy', result_files=['file-gKmIB1gmlxjQsDSHtGoeSkig'], seed=1660100124, status='succeeded', trained_tokens=101972, training_file='file-ByuHoRS2fs7TM1TQEsQJft05', validation_file='file-n2qoSkM5FA1AvxYINDJnKXmx', estimated_finish=None, integrations=[], user_provided_suffix='focal-length')

**NOTE:** When the training status changes to `succeeded`, the model is ready to use.

In [26]:
# @title Use a fine-tuned model

import random
from torch.utils.data import Dataset
from maestro.trainer.common.utils.file_system import read_jsonl

class JSONLDataset(Dataset):
    @classmethod
    def from_jsonl_file(cls, path: str):
        file_content = read_jsonl(path=path)
        random.shuffle(file_content)
        return cls(jsons=file_content)

    def __init__(self, jsons: list[dict]) -> None:
        self.jsons = jsons

    def __getitem__(self, index):
        return self.jsons[index]

    def __len__(self) -> int:
        return len(self.jsons)

    def shuffle(self) -> None:
        random.shuffle(self.jsons)


test_dataset = JSONLDataset.from_jsonl_file(f"{dataset.location}/_annotations.test.jsonl")

In [30]:
test_dataset[0]['messages']

[{'role': 'system', 'content': 'You are a helpful assistant.'},
 {'role': 'user', 'content': 'What focal length is this photo?'},
 {'role': 'user',
  'content': [{'type': 'image_url',
    'image_url': {'url': 'https://transform.roboflow.com/SFgRaqEsIPfd7Vj37buG/cb5a2dc8fe341a5360aea91ea00bdd15/transformed.jpg'}}]},
 {'role': 'assistant', 'content': '135.0mm'}]

**NOTE:** When querying the model, we need to remove the last element of the messages list, which contains the expected model response.

In [31]:
completion = client.chat.completions.create(
  model=status_response.fine_tuned_model,
  messages=test_dataset[0]['messages'][:-1]
)
print(completion.choices[0].message)

ChatCompletionMessage(content='35.0mm', refusal=None, role='assistant', function_call=None, tool_calls=None)


In [36]:
# @title Evaluate fine-tuned model

from maestro.trainer.common.utils.metrics import WordErrorRateMetric, CharacterErrorRateMetric

targets = []
predistions = []

for i in range(len(test_dataset)):
    messages = test_dataset[i]['messages'][:-1]
    target = test_dataset[i]['messages'][-1]['content']

    completion = client.chat.completions.create(
        model=status_response.fine_tuned_model,
        messages=messages
    )
    prediction = completion.choices[0].message.content

    targets.append(target)
    predistions.append(prediction)

wer = WordErrorRateMetric().compute(targets=targets, predictions=predistions)
cer = CharacterErrorRateMetric().compute(targets=targets, predictions=predistions)

print(f"WER: {wer}")
print(f"CER: {cer}")

WER: {'wer': 1.0}
CER: {'cer': 0.319047619047619}


In [37]:
for target, prediction in zip(targets, predistions):
    print(f"Target: {target}")
    print(f"Prediction: {prediction}")
    print()

Target: 135.0mm
Prediction: 87.0mm

Target: 45.0mm
Prediction: 50.0mm

Target: 56.0mm
Prediction: 50.0mm

Target: 85.0mm
Prediction: 66.0mm

Target: 50.0mm
Prediction: 35.0mm

