<a href="https://colab.research.google.com/github/Ashish-Soni08/Courses/blob/main/Playground/OPENAI/Finetuning_for_Financial_Sentiment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-tuning GPT4o mini for sentiment prediction

Fine-tuning improves the model by training on many more examples than can fit in a prompt, letting you achieve better results on a wide number of tasks. This notebook provides a step-by-step guide for our new GPT-4o mini fine-tuning. We'll perform entity extraction using the [RecipeNLG dataset](https://github.com/Glorf/recipenlg), which provides various recipes and a list of extracted generic ingredients for each. This is a common dataset for named entity recognition (NER) tasks.

Note: **GPT-4o mini fine-tuning is available to developers in our [Tier 4 and 5 usage tiers](https://platform.openai.com/docs/guides/rate-limits/usage-tiers).** You can start fine-tuning GPT-4o mini by visiting your fine-tuning dashboard, clicking "create", and selecting “gpt-4o-mini-2024-07-18” from the base model drop-down.

We will go through the following steps:

1. **Setup:** Loading our dataset and filtering down to one domain to fine-tune on.
2. **Data preparation:** Preparing your data for fine-tuning by creating training and validation examples, and uploading them to the `Files` endpoint.
3. **Fine-tuning:** Creating your fine-tuned model.
4. **Inference:** Using your fine-tuned model for inference on new inputs.

By the end of this you should be able to train, evaluate and deploy a fine-tuned `gpt-4o-mini-2024-07-18` model.

For more information on fine-tuning, you can refer to our [documentation guide](https://platform.openai.com/docs/guides/fine-tuning) or [API reference](https://platform.openai.com/docs/api-reference/fine-tuning).


## Setup


In [1]:
!pip install --upgrade --quiet deeplake openai

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m608.9/608.9 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.8/16.8 MB[0m [31m59.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m362.9/362.9 kB[0m [31m22.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.9/76.9 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.2/139.2 kB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m5.5 MB/s[0m eta [36

In [2]:
import deeplake as dl
from google.colab import userdata
import json
import openai
import os
import pandas as pd
from pprint import pprint

In [3]:
# Connect to OpenAI
client = openai.OpenAI(
    api_key=userdata.get('OPENAI_API_KEY'),
    organization=userdata.get('ORG_ID'),
    project=userdata.get('PROJECT_ID'),
)

Fine-tuning works best when focused on a particular domain. It's important to make sure your dataset is both focused enough for the model to learn, but general enough that unseen examples won't be missed. Having this in mind, we have extracted a subset from the RecipesNLG dataset to only contain documents from [cookbooks.com](https://cookbooks.com/).


# Dataset

In [4]:
DEEPLAKE_API_KEY = userdata.get('deeplake_api')

## Training Data

In [5]:
training_data = dl.load('hub://genai360/FingGPT-sentiment-train-set', token=DEEPLAKE_API_KEY)

|

Opening dataset in read-only mode as you don't have write permissions.


\

This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/genai360/FingGPT-sentiment-train-set



/

hub://genai360/FingGPT-sentiment-train-set loaded successfully.



 

In [7]:
pprint(training_data)

Dataset(path='hub://genai360/FingGPT-sentiment-train-set', read_only=True, tensors=['input', 'instruction', 'output'])


In [10]:
training_data.summary()

Dataset(path='hub://genai360/FingGPT-sentiment-train-set', read_only=True, tensors=['input', 'instruction', 'output'])

   tensor      htype     shape      dtype  compression
   -------    -------   -------    -------  ------- 
    input      text    (20000, 1)    str     None   
 instruction   text    (20000, 1)    str     None   
   output      text    (20000, 1)    str     None   


In [30]:
row = [training_data.input[0].numpy(), training_data.instruction[0].numpy(), training_data.output[0].numpy()]
print("A sample of data from the training set:")
print("-" * 100)
pprint(row)
print("-" * 100)

A sample of data from the training set:
----------------------------------------------------------------------------------------------------
[array(['Tesla Motors recalls 2,700 Model X SUVs https://t.co/BFWS3DbM0U $TSLA'],
      dtype='<U69'),
 array(['What is the sentiment of this tweet? Please choose an answer from {negative/neutral/positive}'],
      dtype='<U93'),
 array(['negative'], dtype='<U8')]
----------------------------------------------------------------------------------------------------


In [63]:
training_dataset = [
    {
        "input": training_data.input[i].data()['value'],
        "output": training_data.output[i].data()['value']
    }
    for i in range(len(training_data))
]



In [64]:
training_df = pd.DataFrame(training_dataset)
training_df.head()

Unnamed: 0,input,output
0,"Tesla Motors recalls 2,700 Model X SUVs https:...",negative
1,Bank stocks have been big-time laggards in the...,mildly negative
2,$NSM ascending base breakout starting to gain ...,positive
3,How Much is ResMed Inc.'s (NYSE:RMD) CEO Getti...,neutral
4,"OUTOTEC OYJ PRESS RELEASE DECEMBER 4 , 2009 10...",positive


In [65]:
training_df.to_csv('training_data.csv', index=False)

In [66]:
training_df.shape

(20000, 2)

## Validation Data

In [31]:
validation_data = dl.load('hub://genai360/FingGPT-sentiment-valid-set')



Opening dataset in read-only mode as you don't have write permissions.


\

This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/genai360/FingGPT-sentiment-valid-set



|

hub://genai360/FingGPT-sentiment-valid-set loaded successfully.



 

In [32]:
pprint(validation_data)

Dataset(path='hub://genai360/FingGPT-sentiment-valid-set', read_only=True, tensors=['input', 'instruction', 'output'])


In [34]:
validation_data.summary()

Dataset(path='hub://genai360/FingGPT-sentiment-valid-set', read_only=True, tensors=['input', 'instruction', 'output'])

   tensor      htype     shape     dtype  compression
   -------    -------   -------   -------  ------- 
    input      text    (2000, 1)    str     None   
 instruction   text    (2000, 1)    str     None   
   output      text    (2000, 1)    str     None   


In [35]:
row = [validation_data.input[10].numpy(), validation_data.instruction[10].numpy(), validation_data.output[10].numpy()]
print("A sample of data from the Validation set:")
print("-" * 100)
pprint(row)
print("-" * 100)

A sample of data from the Validation set:
----------------------------------------------------------------------------------------------------
[array(['Why not give your bedroom a cool makeover for summer .'],
      dtype='<U54'),
 array(['What is the sentiment of this news? Please choose an answer from {negative/neutral/positive}'],
      dtype='<U92'),
 array(['neutral'], dtype='<U7')]
----------------------------------------------------------------------------------------------------


In [46]:
validation_data.input[10].data()

{'value': 'Why not give your bedroom a cool makeover for summer .'}

In [47]:
validation_data.input[10].data()['value']

'Why not give your bedroom a cool makeover for summer .'

In [56]:
validation_dataset = [
    {
        "input": validation_data.input[i].data()['value'],
        "output": validation_data.output[i].data()['value']
    }
    for i in range(len(validation_data))
]



In [61]:
validation_df = pd.DataFrame(validation_dataset)
validation_df.head()

Unnamed: 0,input,output
0,Diageo Shares Surge on Report of Possible Take...,positive
1,"HELSINKI , Finland , Sept. 18 , 2009 ( GLOBE N...",positive
2,"In Finland , 71 % of paper and paperboard is r...",neutral
3,This assignment strengthens Poyry 's position ...,positive
4,"ADP has been experiencing slow, steady growth ...",moderately positive


In [62]:
validation_df.to_csv('validation_data.csv', index=False)

In [67]:
validation_df.shape

(2000, 2)

## Data preparation

We'll begin by preparing our data. When fine-tuning with the `ChatCompletion` format, each training example is a simple list of `messages`. For example, an entry could look like:

```
[{'role': 'system',
  'content': 'You are a helpful recipe assistant. You are to extract the generic ingredients from each of the recipes provided.'},

 {'role': 'user',
  'content': 'Title: No-Bake Nut Cookies\n\nIngredients: ["1 c. firmly packed brown sugar", "1/2 c. evaporated milk", "1/2 tsp. vanilla", "1/2 c. broken nuts (pecans)", "2 Tbsp. butter or margarine", "3 1/2 c. bite size shredded rice biscuits"]\n\nGeneric ingredients: '},

 {'role': 'assistant',
  'content': '["brown sugar", "milk", "vanilla", "nuts", "butter", "bite size shredded rice biscuits"]'}]
```

During the training process this conversation will be split, with the final entry being the `completion` that the model will produce, and the remainder of the `messages` acting as the prompt. Consider this when building your training examples - if your model will act on multi-turn conversations, then please provide representative examples so it doesn't perform poorly when the conversation starts to expand.

Please note that currently there is a 4096 token limit for each training example. Anything longer than this will be truncated at 4096 tokens.


In [68]:
system_message = "You are an expert in financial sentiment prediction. You are to classify each text provided into {negative/neutral/positive}."

# def create_user_message(row):
#     return f"Title: {row['title']}\n\nIngredients: {row['ingredients']}\n\nGeneric ingredients: "

def prepare_example_conversation(row):
    return {
        "messages": [
            {"role": "system", "content": system_message},
            {"role": "user", "content": row['input']},
            {"role": "assistant", "content": row["output"]},
        ]
    }

In [69]:
pprint(prepare_example_conversation(training_df.iloc[0]))

{'messages': [{'content': 'You are an expert in financial sentiment '
                          'prediction. You are to classify each text provided '
                          'into {negative/neutral/positive}.',
               'role': 'system'},
              {'content': 'Tesla Motors recalls 2,700 Model X SUVs https://t.co/BFWS3DbM0U $TSLA',
               'role': 'user'},
              {'content': 'negative', 'role': 'assistant'}]}


Let's now do this for a subset of the dataset to use as our training data. You can begin with even 30-50 well-pruned examples. You should see performance continue to scale linearly as you increase the size of the training set, but your jobs will also take longer.


In [70]:
# apply the prepare_example_conversation function to each row of the training_df
training_data = training_df.apply(prepare_example_conversation, axis=1).tolist()

for example in training_data[:5]:
    print(example)

{'messages': [{'role': 'system', 'content': 'You are an expert in financial sentiment prediction. You are to classify each text provided into {negative/neutral/positive}.'}, {'role': 'user', 'content': 'Tesla Motors recalls 2,700 Model X SUVs https://t.co/BFWS3DbM0U $TSLA'}, {'role': 'assistant', 'content': 'negative'}]}
{'messages': [{'role': 'system', 'content': 'You are an expert in financial sentiment prediction. You are to classify each text provided into {negative/neutral/positive}.'}, {'role': 'user', 'content': 'Bank stocks have been big-time laggards in the ongoing market pullback. This is despite the fact that core features of the economy have been stable and rising interest rates are generally expected to benefit banks since they help expand their margins.'}, {'role': 'assistant', 'content': 'mildly negative'}]}
{'messages': [{'role': 'system', 'content': 'You are an expert in financial sentiment prediction. You are to classify each text provided into {negative/neutral/posit

In addition to training data, we can also **optionally** provide validation data, which will be used to make sure that the model does not overfit your training set.


In [71]:
validation_data = validation_df.apply(
    prepare_example_conversation, axis=1).tolist()

We then need to save our data as `.jsonl` files, with each line being one training example conversation.


In [72]:
def write_jsonl(data_list: list, filename: str) -> None:
    with open(filename, "w") as out:
        for ddict in data_list:
            jout = json.dumps(ddict) + "\n"
            out.write(jout)

In [73]:
training_file_name = "tmp_financial_sentiment_finetune_training.jsonl"
write_jsonl(training_data, training_file_name)

validation_file_name = "tmp_financial_sentiment_finetune_validation.jsonl"
write_jsonl(validation_data, validation_file_name)

This is what the first 5 lines of our training `.jsonl` file look like:


In [74]:
# print the first 5 lines of the training file
!head -n 5 tmp_financial_sentiment_finetune_training.jsonl

{"messages": [{"role": "system", "content": "You are an expert in financial sentiment prediction. You are to classify each text provided into {negative/neutral/positive}."}, {"role": "user", "content": "Tesla Motors recalls 2,700 Model X SUVs https://t.co/BFWS3DbM0U $TSLA"}, {"role": "assistant", "content": "negative"}]}
{"messages": [{"role": "system", "content": "You are an expert in financial sentiment prediction. You are to classify each text provided into {negative/neutral/positive}."}, {"role": "user", "content": "Bank stocks have been big-time laggards in the ongoing market pullback. This is despite the fact that core features of the economy have been stable and rising interest rates are generally expected to benefit banks since they help expand their margins."}, {"role": "assistant", "content": "mildly negative"}]}
{"messages": [{"role": "system", "content": "You are an expert in financial sentiment prediction. You are to classify each text provided into {negative/neutral/posit

### Upload files

You can now upload the files to our `Files` endpoint to be used by the fine-tuned model.


In [None]:
def upload_file(file_name: str, purpose: str) -> str:
    with open(file_name, "rb") as file_fd:
        response = client.files.create(file=file_fd, purpose=purpose)
    return response.id


training_file_id = upload_file(training_file_name, "fine-tune")
validation_file_id = upload_file(validation_file_name, "fine-tune")

print("Training file ID:", training_file_id)
print("Validation file ID:", validation_file_id)

Training file ID: file-3wfAfDoYcGrSpaE17qK0vXT0
Validation file ID: file-HhFhnyGJhazYdPcd3wrtvIoX


## Fine-tuning

Now we can create our fine-tuning job with the generated files and an optional suffix to identify the model. The response will contain an `id` which you can use to retrieve updates on the job.

Note: The files have to first be processed by our system, so you might get a `File not ready` error. In that case, simply retry a few minutes later.


In [None]:
# MODEL = "gpt-4o-mini-2024-07-18"

# response = client.fine_tuning.jobs.create(
#     training_file=training_file_id,
#     validation_file=validation_file_id,
#     model=MODEL,
#     suffix="recipe-ner",
# )

# job_id = response.id

# print("Job ID:", response.id)
# print("Status:", response.status)

Job ID: ftjob-UiaiLwGdGBfdLQDBAoQheufN
Status: validating_files


#### Check job status

You can make a `GET` request to the `https://api.openai.com/v1/alpha/fine-tunes` endpoint to list your alpha fine-tune jobs. In this instance you'll want to check that the ID you got from the previous step ends up as `status: succeeded`.

Once it is completed, you can use the `result_files` to sample the results from the validation set (if you uploaded one), and use the ID from the `fine_tuned_model` parameter to invoke your trained model.


In [77]:
job_id = "ftjob-sIlSttUJ3D7HS7R1Qzm8dlJA"

In [78]:
response = client.fine_tuning.jobs.retrieve(job_id)

print("Job ID:", response.id)
print("Status:", response.status)
print("Trained Tokens:", response.trained_tokens)

Job ID: ftjob-sIlSttUJ3D7HS7R1Qzm8dlJA
Status: validating_files
Trained Tokens: None


We can track the progress of the fine-tune with the events endpoint. You can rerun the cell below a few times until the fine-tune is ready.


In [None]:
# response = client.fine_tuning.jobs.list_events(job_id)

# events = response.data
# events.reverse()

# for event in events:
#     print(event.message)

Step 288/303: training loss=0.00
Step 289/303: training loss=0.01
Step 290/303: training loss=0.00, validation loss=0.31
Step 291/303: training loss=0.00
Step 292/303: training loss=0.00
Step 293/303: training loss=0.00
Step 294/303: training loss=0.00
Step 295/303: training loss=0.00
Step 296/303: training loss=0.00
Step 297/303: training loss=0.00
Step 298/303: training loss=0.01
Step 299/303: training loss=0.00
Step 300/303: training loss=0.00, validation loss=0.04
Step 301/303: training loss=0.16
Step 302/303: training loss=0.00
Step 303/303: training loss=0.00, full validation loss=0.33
Checkpoint created at step 101 with Snapshot ID: ft:gpt-4o-mini-2024-07-18:openai-gtm:recipe-ner:9o1eNlSa:ckpt-step-101
Checkpoint created at step 202 with Snapshot ID: ft:gpt-4o-mini-2024-07-18:openai-gtm:recipe-ner:9o1eNFnj:ckpt-step-202
New fine-tuned model created: ft:gpt-4o-mini-2024-07-18:openai-gtm:recipe-ner:9o1eNNKO
The job has successfully completed


Now that it's done, we can get a fine-tuned model ID from the job:


In [79]:
response = client.fine_tuning.jobs.retrieve(job_id)
fine_tuned_model_id = response.fine_tuned_model

if fine_tuned_model_id is None:
    raise RuntimeError(
        "Fine-tuned model ID not found. Your job has likely not been completed yet."
    )

print("Fine-tuned model ID:", fine_tuned_model_id)

RuntimeError: Fine-tuned model ID not found. Your job has likely not been completed yet.

## Inference


The last step is to use your fine-tuned model for inference. Similar to the classic `FineTuning`, you simply call `ChatCompletions` with your new fine-tuned model name filling the `model` parameter.


In [76]:
test_df = training_df.loc[201:300]
test_row = training_df.iloc[10]
test_messages = []
test_messages.append({"role": "system", "content": system_message})
user_message = test_row["input"]
test_messages.append({"role": "user", "content": user_message})

pprint(test_messages)

[{'content': 'You are an expert in financial sentiment prediction. You are to '
             'classify each text provided into {negative/neutral/positive}.',
  'role': 'system'},
 {'content': 'Horizon Media Study Finds Instagram’s Move to Hide Likes is Somewhat of a Collective Shrug, but with a Glimmer of H… https://t.co/gxnZp3H5lh',
  'role': 'user'}]


In [None]:
response = client.chat.completions.create(
    model=fine_tuned_model_id, messages=test_messages, temperature=0, max_tokens=500
)
print(response.choices[0].message.content)

["beef brisket", "catsup", "water", "onion", "cider vinegar", "horseradish", "mustard", "salt", "pepper"]


## Conclusion

Congratulations, you are now ready to fine-tune your own models using the `ChatCompletion` format! We look forward to seeing what you build
