## 1.0 Introduction

In this notebook we will perform all steps required to finetune a GPT-3.5 Turbo model on our own custom datasets.

The training and validation csv files are the same ones that were also used in the notebook 'Transformer_Model_Training_And_Validation.ipynb' where we trained a multi-lingual DistilBert, Bert and DeBERTa V3 model.

After the GPT-3.5 Turbo model is fine-tuned and validated we can compare the performance on the validation set across the various models.

### 1.1 Updated in latest version

In this latest version (December 5th, 2023) of this notebook I have made the following updates:
* Updated to the latest version of OpenAI (1.3.7) and modified the API calls accordingly.
* Changed the model to the latest version: "gpt-3.5-turbo-1106".

In [1]:
# Import Modules
import json
import os
from io import StringIO
import numpy as np
import pandas as pd

# OpenAI
from openai import OpenAI
import tiktoken # for token counting
from collections import defaultdict

## 2.0 Load Datasets

We will load the training and validation CSV files that were generated earlier with the notebook 'Prepare_Train_and_Validation_Datasets.ipynb'.

In [2]:
# Load Datasets
train_df = pd.read_csv('./data/train_df.csv')
val_df = pd.read_csv('./data/val_df.csv')

# Summary
print(train_df.shape)
print(val_df.shape)

(3069, 11)
(1559, 11)


Let's review a small subset of the training data...

In [3]:
# Summary
train_df.head()

Unnamed: 0,id,title,text,mainSection,published_at,publisher,partisan,url,text_wordcount,max_words_text,labels
0,10706318,Ogen als schoteltjes bij de Tachtigjarige Oorlog,Ogen als schoteltjes bij de Tachtigjarige Oorl...,/home,2018-10-07,trouw,True,www.trouw.nl/home/ogen-als-schoteltjes-bij-de-...,539,Ogen als schoteltjes bij de Tachtigjarige Oorl...,1
1,12633805,"Geen beeld, maar een monument voor Mandela in ...","Geen beeld, maar een monument voor Mandela in ...",/amsterdam,2019-05-10,parool,True,www.parool.nl/amsterdam/geen-beeld-maar-een-mo...,662,"Geen beeld, maar een monument voor Mandela in ...",1
2,7140125,Hoe ga je een onveilige arbeidscultuur zoals i...,Hoe ga je een onveilige arbeidscultuur zoals i...,/,2017-04-18,trouw,True,,494,Hoe ga je een onveilige arbeidscultuur zoals i...,1
3,4490774,Wetenschappers ontdekken lichtgevende discokikker,Wetenschappers ontdekken lichtgevende discokik...,/,2017-03-14,trouw,True,,291,Wetenschappers ontdekken lichtgevende discokik...,1
4,10592180,Meer fouten kabinet bij steun aan strijdgroepe...,Meer fouten kabinet bij steun aan strijdgroepe...,/home,2018-09-11,trouw,True,www.trouw.nl/home/meer-fouten-kabinet-bij-steu...,471,Meer fouten kabinet bij steun aan strijdgroepe...,1


...and also the validation data...

In [4]:
# Summary
val_df.head()

Unnamed: 0,id,title,text,mainSection,published_at,publisher,partisan,url,text_wordcount,max_words_text,labels
0,9266995,Verdachte dodelijke steekpartijen Maastricht l...,Verdachte dodelijke steekpartijen Maastricht l...,/nieuws,2017-12-18,ad,False,www.ad.nl/binnenland/verdachte-dodelijke-steek...,188,Verdachte dodelijke steekpartijen Maastricht l...,0
1,4130077,Honderden arrestaties bij acties tegen mensen ...,Honderden arrestaties bij acties tegen mensen ...,/nieuws,2017-02-11,ad,False,www.ad.nl/buitenland/honderden-arrestaties-bij...,122,Honderden arrestaties bij acties tegen mensen ...,0
2,11147268,Waarom de 'oudejaarsbonus' voor de jongeren va...,Waarom de 'oudejaarsbonus' voor de jongeren va...,/home,2019-01-20,trouw,True,www.trouw.nl/home/waarom-de-oudejaarsbonus-voo...,262,Waarom de 'oudejaarsbonus' voor de jongeren va...,1
3,10749100,Klaar voor de verdediging,Klaar voor de verdedigingOver ruim een week be...,/nieuws,2018-10-16,ad,False,www.ad.nl/binnenland/klaar-voor-de-verdediging...,411,Klaar voor de verdedigingOver ruim een week be...,0
4,10700707,Windvlaag grijpt springmatras en doodt 2-jarig...,Windvlaag grijpt springmatras en doodt 2-jarig...,/nieuws,2018-10-05,ad,False,www.ad.nl/buitenland/windvlaag-grijpt-springma...,286,Windvlaag grijpt springmatras en doodt 2-jarig...,0


## 3.0 Finetuning OpenAI GPT-3.5 Model

In this section we will proces and upload the files for training and validation to OpenAI.

After the files are uploaded we can create a fine-tuning job on OpenAI.

### 3.1 Settings

In [5]:
# Constants
MAX_WORDS = 192

# OpenAI API Key
client = OpenAI(api_key = os.environ["OPENAI_API_KEY"])

### 3.2 Create and Validate OpenAI Files

Part of creating the required files is engineering a prompt that matches what we want the model to perform with the finetuning.

In the earlier notebook we trained the 2 classical Transformer models to classify the input text as either partisan or neutral.

With our prompt we want to achieve the same. As part of the prompt's system message we tell GPT-3.5 that it is a newspaper editor and that it needs to classify each newspaper article as being partisan or neutral.

The news article is then added as part of the prompt.

In [6]:
def create_prompt(item_text, item_label = None, inference = False):
    if inference:
        # Base Prompt
        prompt_text = [{"role": "system", "content": "Je bent redacteur bij een krant. Je beoordeeld een krantenartikel of het politiek of neutraal is. Hieronder staat de tekst van het krantenartikel."}, {"role": "user", "content": ""}]

        # Set Text and Label
        prompt_text[1]['content'] = '### Tekst:\n' + item_text
    else:   
        # Base Prompt
        prompt_text = {"messages": [{"role": "system", "content": "Je bent redacteur bij een krant. Je beoordeeld een krantenartikel of het politiek of neutraal is. Hieronder staat de tekst van het krantenartikel."}, {"role": "user", "content": ""}, {"role": "assistant", "content": ""}]}

        # Set Text and Label
        prompt_text['messages'][1]['content'] = '### Tekst:\n' + item_text
        prompt_text['messages'][2]['content'] = item_label
    
    return prompt_text

The function 'create_openai_file' creates the JSON files for OpenAI based on the input Pandas DataFrame. For each row a prompt is generated and the 'label' for finetuning is set.

As the dataset is in the Dutch language the labels are specified as either 'Politiek' (roughly equals partisan) or 'Neutraal' (neutral).

Note that we use the 'max_words_text' column to make sure that each prompt uses the same text as was used when training the Multi-lingual DistilBert and Bert models.

In [7]:
def create_openai_file(df, file_name):
    # Create Train JSON File
    jsonl_file = []

    # Loop through rows in Pandas DataFrame
    for index, row in df.iterrows():
        text = row['max_words_text']
        partisan = row['partisan']

        if partisan == True:
            label = 'Politiek'
        else:
            label = 'Neutraal'

        jsonl_file.append(create_prompt(text, label))

    # Save to File
    with open(file_name, 'w') as out_file:
        for item in jsonl_file:        
            out_file.write(json.dumps(item) + '\n')

    # Summary
    print(f'\n======== {file_name}')
    print(f'Length Messages: {len(jsonl_file)}')
    print(jsonl_file[:1])

The code to validate the OpenAI files is based on and combined from the OpenAI Cookbook: https://cookbook.openai.com/examples/chat_finetuning_data_prep

Only minor updates have been made for this specific example.

In [8]:
# Validate OpenAI file
def validate_openai_file(data_path):
    print('\n======= OpenAI Validation')
    
    # Load the dataset
    with open(data_path, 'r', encoding='utf-8') as f:
        dataset = [json.loads(line) for line in f]

    # Initial dataset stats
    print("Num examples:", len(dataset))
    print("First example:")
    for message in dataset[0]["messages"]:
        print(message)

    # Format error checks
    format_errors = defaultdict(int)

    for ex in dataset:
        if not isinstance(ex, dict):
            format_errors["data_type"] += 1
            continue
            
        messages = ex.get("messages", None)
        if not messages:
            format_errors["missing_messages_list"] += 1
            continue
            
        for message in messages:
            if "role" not in message or "content" not in message:
                format_errors["message_missing_key"] += 1
            
            if any(k not in ("role", "content", "name", "function_call") for k in message):
                format_errors["message_unrecognized_key"] += 1
            
            if message.get("role", None) not in ("system", "user", "assistant", "function"):
                format_errors["unrecognized_role"] += 1
                
            content = message.get("content", None)
            function_call = message.get("function_call", None)
            
            if (not content and not function_call) or not isinstance(content, str):
                format_errors["missing_content"] += 1
        
        if not any(message.get("role", None) == "assistant" for message in messages):
            format_errors["example_missing_assistant_message"] += 1

    if format_errors:
        print("Found errors:")
        for k, v in format_errors.items():
            print(f"{k}: {v}")
    else:
        print("No errors found")
        
    encoding = tiktoken.get_encoding("cl100k_base")

    # not exact!
    # simplified from https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
    def num_tokens_from_messages(messages, tokens_per_message=3, tokens_per_name=1):
        num_tokens = 0
        for message in messages:
            num_tokens += tokens_per_message
            for key, value in message.items():
                num_tokens += len(encoding.encode(value))
                if key == "name":
                    num_tokens += tokens_per_name
        num_tokens += 3
        return num_tokens

    def num_assistant_tokens_from_messages(messages):
        num_tokens = 0
        for message in messages:
            if message["role"] == "assistant":
                num_tokens += len(encoding.encode(message["content"]))
        return num_tokens

    def print_distribution(values, name):
        print(f"\n#### Distribution of {name}:")
        print(f"min / max: {min(values)}, {max(values)}")
        print(f"mean / median: {np.mean(values)}, {np.median(values)}")
        print(f"p5 / p95: {np.quantile(values, 0.1)}, {np.quantile(values, 0.9)}")
        
    # Warnings and tokens counts
    n_missing_system = 0
    n_missing_user = 0
    n_messages = []
    convo_lens = []
    assistant_message_lens = []

    for ex in dataset:
        messages = ex["messages"]
        if not any(message["role"] == "system" for message in messages):
            n_missing_system += 1
        if not any(message["role"] == "user" for message in messages):
            n_missing_user += 1
        n_messages.append(len(messages))
        convo_lens.append(num_tokens_from_messages(messages))
        assistant_message_lens.append(num_assistant_tokens_from_messages(messages))
        
    print("Num examples missing system message:", n_missing_system)
    print("Num examples missing user message:", n_missing_user)
    print_distribution(n_messages, "num_messages_per_example")
    print_distribution(convo_lens, "num_total_tokens_per_example")
    print_distribution(assistant_message_lens, "num_assistant_tokens_per_example")
    n_too_long = sum(l > 4096 for l in convo_lens)
    print(f"\n{n_too_long} examples may be over the 4096 token limit, they will be truncated during fine-tuning")

    # Pricing and default n_epochs estimate
    MAX_TOKENS_PER_EXAMPLE = 4096

    TARGET_EPOCHS = 1
    MIN_TARGET_EXAMPLES = 100
    MAX_TARGET_EXAMPLES = 25000
    MIN_DEFAULT_EPOCHS = 1
    MAX_DEFAULT_EPOCHS = 2

    n_epochs = TARGET_EPOCHS
    n_train_examples = len(dataset)
    if n_train_examples * TARGET_EPOCHS < MIN_TARGET_EXAMPLES:
        n_epochs = min(MAX_DEFAULT_EPOCHS, MIN_TARGET_EXAMPLES // n_train_examples)
    elif n_train_examples * TARGET_EPOCHS > MAX_TARGET_EXAMPLES:
        n_epochs = max(MIN_DEFAULT_EPOCHS, MAX_TARGET_EXAMPLES // n_train_examples)

    n_billing_tokens_in_dataset = sum(min(MAX_TOKENS_PER_EXAMPLE, length) for length in convo_lens)
    print(f"Dataset has ~{n_billing_tokens_in_dataset} tokens that will be charged for during training")
    print(f"By default, you'll train for {n_epochs} epochs on this dataset")
    print(f"By default, you'll be charged for ~{n_epochs * n_billing_tokens_in_dataset} tokens")

First the training file for OpenAI fine-tuning is created locally and validated with the code from the OpenAI Cookbook.

In [9]:
# Constants
OPENAI_TRAIN_FILE_NAME = f'train_{MAX_WORDS}_v1'

# Create and Validate OpenAI Files
create_openai_file(train_df, f'./data/{OPENAI_TRAIN_FILE_NAME}.jsonl')

# Validate Training file
validate_openai_file(f'./data/{OPENAI_TRAIN_FILE_NAME}.jsonl')


Length Messages: 3069
[{'messages': [{'role': 'system', 'content': 'Je bent redacteur bij een krant. Je beoordeeld een krantenartikel of het politiek of neutraal is. Hieronder staat de tekst van het krantenartikel.'}, {'role': 'user', 'content': "### Tekst:\nOgen als schoteltjes bij de Tachtigjarige Oorlog Het was mijn favoriete oorlog op de basisschool. Tachtig jaar vechten? Onvoorstelbaar. Dat we met Willem van Oranje wonnen van die in de waterlinies verzuipende Spanjaarden, en dat die wrede Alva het nakijken had, was helemaal mooi. Zielig dat Van Oranje dood moest. Maar ja, ons land was nu wel ontstaan.Zo ongeveer herinner ik me het. En nu zat ik zondagmiddag met mijn eigen kinderen te kijken naar de NTR-jeugdserie 'Welkom in de 80-jarige Oorlog', 450 jaar na de eerste gevechten in 1568. Ik was benieuwd welke geschiedversie zij te zien zouden krijgen.In de volwassenenversie '80 Jaar Oorlog' legt Hans Goedkoop een bom onder het beeld van goede protestanten tegen slechte katholieken 

Next the training file should be uploaded to OpenAI.

In [10]:
# Upload Training file to OpenAI
ft_train_file = client.files.create(file = open(f'./data/{OPENAI_TRAIN_FILE_NAME}.jsonl', 'rb'), 
                                    purpose = 'fine-tune')

# Summary
print(ft_train_file.model_dump_json(indent = 2))

{
  "id": "file-X6xEFm1yqZq9uBi8SBJwiMQj",
  "bytes": 4462087,
  "created_at": 1701804524,
  "filename": "train_192_v1.jsonl",
  "object": "file",
  "purpose": "fine-tune",
  "status": "processed",
  "status_details": null
}


Next the validation file for OpenAI fine-tuning is created locally and validated with the code from the OpenAI Cookbook.

In [11]:
# Constants
OPENAI_VALIDATION_FILE_NAME = f'validation_{MAX_WORDS}_v1'

# Create and Validate OpenAI Files
create_openai_file(val_df, f'./data/{OPENAI_VALIDATION_FILE_NAME}.jsonl')

# Validate Validation file
validate_openai_file(f'./data/{OPENAI_VALIDATION_FILE_NAME}.jsonl')


Length Messages: 1559
[{'messages': [{'role': 'system', 'content': 'Je bent redacteur bij een krant. Je beoordeeld een krantenartikel of het politiek of neutraal is. Hieronder staat de tekst van het krantenartikel.'}, {'role': 'user', 'content': '### Tekst:\nVerdachte dodelijke steekpartijen Maastricht langer vastDe 37-jarige man die ervan wordt verdacht afgelopen donderdag twee mensen in Maastricht te hebben doodgestoken, blijft nog twee weken langer vastzitten. Dat heeft de rechter-commissaris vandaag beslist. De man zit in beperkingen en mag dus alleen met zijn advocaat contact hebben.Bij de steekpartijen werden een 46-jarige man bij zijn woning in Botsaartstraat gedood. Ook werd een 56-jarige vrouw in de Joseph Postmesstraat omgebracht. Daar troffen de hulpverleners nog twee gewonden aan: haar 21-jarige dochter en een 50-jarige buurtbewoner die te hulp was geschoten.Korte tijd later bleek dat geen van de aanwezigen in het Maastrichtse wijkcentrum door had dat de dader, met een beb

The validation file should also be uploaded to OpenAI.

In [12]:
# Upload Validation file to OpenAI
ft_validation_file = client.files.create(file = open(f'./data/{OPENAI_VALIDATION_FILE_NAME}.jsonl', 'rb'), 
                                         purpose = 'fine-tune')

# Summary
print(ft_validation_file.model_dump_json(indent = 2))

{
  "id": "file-IQPfzK3lqSpNAuy4w8cVFjHA",
  "bytes": 2278942,
  "created_at": 1701804554,
  "filename": "validation_192_v1.jsonl",
  "object": "file",
  "purpose": "fine-tune",
  "status": "processed",
  "status_details": null
}


We do a final verification to look at all the files present in OpenAI.

In [13]:
# List Files
openai_files = client.files.list()
print(openai_files.model_dump_json(indent = 2))

{
  "data": [
    {
      "id": "file-IQPfzK3lqSpNAuy4w8cVFjHA",
      "bytes": 2278942,
      "created_at": 1701804554,
      "filename": "validation_192_v1.jsonl",
      "object": "file",
      "purpose": "fine-tune",
      "status": "processed",
      "status_details": null
    },
    {
      "id": "file-X6xEFm1yqZq9uBi8SBJwiMQj",
      "bytes": 4462087,
      "created_at": 1701804524,
      "filename": "train_192_v1.jsonl",
      "object": "file",
      "purpose": "fine-tune",
      "status": "processed",
      "status_details": null
    }
  ],
  "object": "list",
  "has_more": false
}


The 2 files are uploaded to OpenAI and can be used in the next section to create a fine-tuning job on OpenAI.

### 3.3 Create OpenAI fine-tuning job

In this section we create a fine-tuning job by specifying the training and validation file Id's, the specific OpenAI model we want to finetune and the number of epochs.

For the number of epochs you can leave it at the default setting of 'auto' or specify it by setting an integer value. With 'auto' OpenAI will determine the best number of epochs to use.

I personally always use 1 or occassionally 2 epochs. With a good dataset that is usually more than enough to get a good quality out of a finetuned GPT-3.5 model. More epochs might lead to a small increase in model quality but the largest impact is likely only on your creditcard bill for OpenAI ;-)

Batch size and learning rate multiplier I will leave at their default setting of 'auto'.

In [14]:
# Create finetuned model
fine_tune_job = client.fine_tuning.jobs.create(training_file = ft_train_file.id,
                                               validation_file = ft_validation_file.id,
                                               model = "gpt-3.5-turbo-1106",
                                               hyperparameters = {"n_epochs": 1,
                                                                  "batch_size": 'auto', 
                                                                  "learning_rate_multiplier": 'auto'})

# Summary
print(fine_tune_job.model_dump_json(indent = 2))

# Get FineTuningJob Id
ft_job_id = fine_tune_job.id

# Summary
print(f'\nFineTuneJob ID: {ft_job_id}')

{
  "id": "ftjob-fHJTmhzvk4XR1V0SvDtz41bC",
  "created_at": 1701804760,
  "error": null,
  "fine_tuned_model": null,
  "finished_at": null,
  "hyperparameters": {
    "n_epochs": 1,
    "batch_size": "auto",
    "learning_rate_multiplier": "auto"
  },
  "model": "gpt-3.5-turbo-1106",
  "object": "fine_tuning.job",
  "organization_id": "org-L65zldBJfoBsfAyAAE4pGgEt",
  "result_files": [],
  "status": "validating_files",
  "trained_tokens": null,
  "training_file": "file-X6xEFm1yqZq9uBi8SBJwiMQj",
  "validation_file": "file-IQPfzK3lqSpNAuy4w8cVFjHA"
}

FineTuneJob ID: ftjob-fHJTmhzvk4XR1V0SvDtz41bC


After creating the fine-tuning job we can use its Id to retrieve the status of the fine-tuning job.

In [16]:
# Retrieve the state of a fine-tune
ft_job = client.fine_tuning.jobs.retrieve(ft_job_id)

# Summary
print(ft_job.model_dump_json(indent = 2))

{
  "id": "ftjob-fHJTmhzvk4XR1V0SvDtz41bC",
  "created_at": 1701804760,
  "error": null,
  "fine_tuned_model": null,
  "finished_at": null,
  "hyperparameters": {
    "n_epochs": 1,
    "batch_size": 2,
    "learning_rate_multiplier": 2
  },
  "model": "gpt-3.5-turbo-1106",
  "object": "fine_tuning.job",
  "organization_id": "org-L65zldBJfoBsfAyAAE4pGgEt",
  "result_files": [],
  "status": "running",
  "trained_tokens": null,
  "training_file": "file-X6xEFm1yqZq9uBi8SBJwiMQj",
  "validation_file": "file-IQPfzK3lqSpNAuy4w8cVFjHA"
}


When using the 'list_events' method we can get even more detailed information from the fine-tuning job while it is running.

It will provide basic information about the progress, training and validation loss and training and validation mean token accuracy.

In [19]:
# List up to N events from a fine-tuning job
list_events = client.fine_tuning.jobs.list_events(fine_tuning_job_id = ft_job_id, 
                                                  limit = 3)
print(list_events.model_dump_json(indent = 2))

{
  "data": [
    {
      "id": "ftevent-yJStnywH0oB9EaVDNWdjqNeP",
      "created_at": 1701808756,
      "level": "info",
      "message": "The job has successfully completed",
      "object": "fine_tuning.job.event",
      "data": {},
      "type": "message"
    },
    {
      "id": "ftevent-blnWcP6jj7oNX7rdgODUTdpH",
      "created_at": 1701808753,
      "level": "info",
      "message": "New fine-tuned model created: ft:gpt-3.5-turbo-1106:lumi-ml-consulting::8SWUDZg1",
      "object": "fine_tuning.job.event",
      "data": {},
      "type": "message"
    },
    {
      "id": "ftevent-nRsvsRpKoNPpG7bxTyz47egf",
      "created_at": 1701808677,
      "level": "info",
      "message": "Step 1501/1535: training loss=0.00",
      "object": "fine_tuning.job.event",
      "data": {
        "step": 1501,
        "train_loss": 0.0,
        "valid_loss": 0.0,
        "train_mean_token_accuracy": 1.0,
        "valid_mean_token_accuracy": 0.0
      },
      "type": "metrics"
    }
  ],
  "objec

Fine-tuning takes little over an hour. After fine-tuning is finished you will receive a confirmation email.

The costs for finetuning this model where 9.99 US dollars according to the OpenAI Portal Usage section. The number of tokens trained on is 1249172.

If we retrieve the FineTuningJob status again we can get the identifiers for the finetuned model and the finetune metrics results file.

In [20]:
# Retrieve the state of a fine-tune
ft_job = client.fine_tuning.jobs.retrieve(ft_job_id)

# Summary
print(ft_job.model_dump_json(indent = 2))

{
  "id": "ftjob-fHJTmhzvk4XR1V0SvDtz41bC",
  "created_at": 1701804760,
  "error": null,
  "fine_tuned_model": "ft:gpt-3.5-turbo-1106:lumi-ml-consulting::8SWUDZg1",
  "finished_at": 1701808752,
  "hyperparameters": {
    "n_epochs": 1,
    "batch_size": 2,
    "learning_rate_multiplier": 2
  },
  "model": "gpt-3.5-turbo-1106",
  "object": "fine_tuning.job",
  "organization_id": "org-L65zldBJfoBsfAyAAE4pGgEt",
  "result_files": [
    "file-QnwsTK2WBeXLdJfepRRRU7vA"
  ],
  "status": "succeeded",
  "trained_tokens": 1249172,
  "training_file": "file-X6xEFm1yqZq9uBi8SBJwiMQj",
  "validation_file": "file-IQPfzK3lqSpNAuy4w8cVFjHA"
}


In [21]:
# Get Results File ID from FinetuningJob
ft_file_results_id = ft_job.result_files
print(f'Metrics Result File ID: {ft_file_results_id}')

# Get FineTuned Model Identifier
ft_model_id = ft_job.fine_tuned_model
print(f'FineTuned Model Identifier: {ft_model_id}')

Metrics Result File ID: ['file-QnwsTK2WBeXLdJfepRRRU7vA']
FineTuned Model Identifier: ft:gpt-3.5-turbo-1106:lumi-ml-consulting::8SWUDZg1


The identifier of the fine-tuned model we will use in the next notebook where we will further explore the validation proces.

The metrics results file is however very interresting as it will show us the detailed information of the model training and validation proces.

In [22]:
 # Get Metric Results File
finetune_metrics = client.files.content(ft_file_results_id[0])

# Show Finetune Metrics
metrics_df = pd.read_csv(StringIO(finetune_metrics.content.decode()))
metrics_df.head(11)

Unnamed: 0,step,train_loss,train_accuracy,valid_loss,valid_mean_token_accuracy
0,1,4.5667,0.55556,4.84392,0.2
1,2,3.60616,0.6,,
2,3,6.16454,0.5,,
3,4,4.067,0.55556,,
4,5,5.32834,0.55556,,
5,6,4.44888,0.55556,,
6,7,4.40362,0.55556,,
7,8,3.91862,0.6,,
8,9,4.42872,0.6,,
9,10,3.80956,0.6,,


Another way to get a quick impression of the model fine-tuning is the overview page for Fine-Tuning on the OpenAI website.

![OpenAI Fine-Tuning Overview](assets/finetuning.png)