# Fine-Tuning Model From OpenAI For Metaphor Identification

This notebook shows the steps of fine-tuningg model from OpenAI for metaphor identification. 

This includes model fine-tuning, and the inferring using the fine-tuned model.

Before start fine-tuning, you need the following packages: openai, pandas

In [None]:
!pip install pandas openai

Import packages

In [2]:
import pandas as pd
from openai import OpenAI
import copy,json

Connect to openai server using your API key.

In [None]:
my_api_key="INSERT YOUR KEY HERE"
client=OpenAI(api_key=my_api_key)

Before moving into fine-tuning, you'll need to load the dataset.

In [3]:
data_fp="data/metaphor_dataset.csv"
data_df=pd.read_csv(data_fp)

And then, perform the train-test split.

This would split the dataset into two independent parts: train set and test set.

The train set will be exposed to the model in fine-tuning, which would make the model "learn" the traits of metaphor.

The test set will remain unexposed to the model during fine-tuning. It will be reserved for evaluation on the performance of model.

Here, we use a typical train-test split ratio of 8:2, and to maximize replicability, you may also wish to set a random seed(here seed =1).

In [4]:
seed=1
train_ratio=0.8

In [5]:
train_df=data_df.sample(frac=train_ratio,random_state=seed)
test_df=data_df.drop(index=train_df.index)

With the train-test split completed, we'll need to compile the chat and uploaded it to the openai server so it can be used for the fine-tuning in the next step.

Please note: the fileid will be used as input when create fine-tuning task in the next step.

In [None]:
jsonl_path=f"data/ft_tr{train_ratio}_s{seed}.jsonl"

user_msg_0="Can you please identify and tag the metaphors in the following text? "

idx=0
json_lines=[]
for idx in range(0,train_df.shape[0]):
    
    raw_text=train_df.iloc[idx]["plain"].replace("\n"," ")
    this_chat={
        "messages":[
            {"role":"user","content":user_msg_0+"\n"+raw_text},
            {"role":"assistant","content":text},
        ]
    }
    json_str=json.dumps(this_chat)
    json_lines.append(json_str)

with open(jsonl_path,"w",-1) as f:
    f.write("\n".join(json_lines))

fileinfo=client.files.create(file=open(jsonl_path, "rb"),purpose="fine-tune")
print(fileinfo.id)

Select a model and create fine-tuning task using the dataset you just uploaded.

Here is a list of models that we tested for fine-tuning in our study.

gpt-4.1-2025-04-14

gpt-4.1-mini-2025-04-14

gpt-4.1-nano-2025-04-14

In [None]:
model="gpt-4.1-mini-2025-04-14"
client.fine_tuning.jobs.create(
    training_file=fileinfo.id,
    suffix=model.replace("-","_")+f"__tr{train_ratio}__s{seed}",
    model=model,
    )

After the fine-tuning is completed, you will get a new modelid that correspond to the fine-tuned model. We will use this new modelid in the inferring.

To check this new modelid, and to check the fine-tuning progress, simply do:

In [None]:
client.fine_tuning.jobs.list().data

which will return you with a list of tasks. 

In the following steps, we will use the fine-tuned modelid:

In [None]:
modelid="ft:gpt-4.1-nano-2025-04-14:university-of-birmingham:gpt-4-1-nano-2025-04-14-tr0-8-s11111:Bx2M81Bq",

as example. Please don't forget to replace the above modelid with yours before proceeding to the next block.

Use the fine-tuned model in inferring.

Input your test text here, or alternatively, fetch one test text in the test set.

In [None]:
test_text=test_df.iloc[0]["plain"]

Run inferring.

In [None]:
user_msg_0="Can you please identify and tag the metaphors in the following text? "

ct=[{"role":"user","content":user_msg_0+"\n"+test_text}]

cr=client.chat.completions.create(model=modelid,messages=ct)

rs=cr.choices[0].message.content

View the result.

In [None]:
print(rs)