<a href="https://colab.research.google.com/github/datafyresearcher/datafy-finetuning-university/blob/main/notebooks/Advanced/01_FineTuning_GPT3_5_Turbo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Finetuning with Openai API

OpenAI announced on August 22, 2023, that fine-tuning for GPT-3.5 Turbo is now available. This update allows developers to customize models that perform better for their use cases and run these custom models at scale.

This notebook contains the steps to finetune a GPT-3.5-turbo with custom dataset.

[1]: Docs: https://platform.openai.com/docs/guides/fine-tuning

[2]: Release Notes: https://openai.com/blog/gpt-3-5-turbo-fine-tuning-and-api-updates

[3] Examples: https://platform.openai.com/docs/guides/fine-tuning/fine-tuning-examples

In [None]:
#===> Run this block, when using the Google Colab. Otherwise, do not run it.

if 'google.colab' in str(get_ipython()):
  print('Running on CoLab')
  # Install the package
  ! pip install openai tenacity -q
else:
  print('Not running on CoLab')

In [None]:
# Import python packages
import os, json
import openai
import pandas as pd
from pprint import pprint

# Generating Data for Finetuning with OpenAI API

As a first step, we have to generate finetuned dataset. In this demo, we use the QA dataset generated by Datafy Associates team and convert into a format which is compatible with OpenAI finetuning API. The dataset contains the QA on Islamic and Pakistan Banking System generated through State Bank of Pakistan Website. Users are encourage to generated their own dataset to try this out.


In [None]:
# Download the Dataset from Google Drive (Allied bank)
!gdown 1FXHHKceNzRDDtC53rqKpckMfB1nLptdv

Downloading...
From: https://drive.google.com/uc?id=1FXHHKceNzRDDtC53rqKpckMfB1nLptdv
To: /content/allied_bank.json
  0% 0.00/13.1k [00:00<?, ?B/s]100% 13.1k/13.1k [00:00<00:00, 24.7MB/s]


In [None]:
# Load the JSON data file.
with open('allied_bank.json', 'r') as file:
  data_allied_bank = json.load(file)

In [None]:
# Show the first QA.
pprint(data_allied_bank[0])

{'answer': 'Islamic banking is defined as banking system which is in '
           'consonance with the spirit, ethos and value system of Islam and '
           'governed by the principles laid down by Islamic Shariah. Interest '
           'free banking is a narrow concept denoting a number of banking '
           'instruments or operations which avoid interest. Islamic banking, '
           'the more general term, is based not only to avoid interest-based '
           'transactions prohibited in Islamic Shariah but also to avoid '
           'unethical and un-social practices. In practical sense, Islamic '
           'Banking is the transformation of conventional money lending into '
           'transactions based on tangible assets and real services. The model '
           'of Islamic banking system leads towards the achievement of a '
           'system which helps achieve economic prosperity.',
 'question': 'What is Islamic Banking?'}


## Convert Compatible format

In [None]:
user_input_system = "You are a distinguished banking expert, extensively trained to adeptly handle a wide array of banking and financial matters, with a distinct focus on the intricacies of the Pakistan Banking System"
# Initialize list to store training examples
training_examples = []

# Create training examples in the format required for GPT-3.5 fine-tuning
for data in data_allied_bank:
    training_example = {
        "messages": [
            {"role": "system", "content": user_input_system},
            {"role": "user", "content": data['question']},
            {"role": "assistant", "content": data['answer']}
        ]
    }
    training_examples.append(training_example)


In [None]:
training_examples[1]

{'messages': [{'role': 'system',
   'content': 'You are a distinguished banking expert, extensively trained to adeptly handle a wide array of banking and financial matters, with a distinct focus on the intricacies of the Pakistan Banking System'},
  {'role': 'user', 'content': 'What is Meant By Riba?'},
  {'role': 'assistant',
   'content': 'The word “Riba” means excess, increase or addition, which correctly interpreted according to Shariah terminology, implies any excess compensation without due consideration (consideration does not include time value of money). This definition of Riba is derived from the Quran and is unanimously accepted by all Islamic scholars. Learn about more key terms by downloading Islamic Banking Glossary.'}]}

In [None]:
# Save training examples to a .jsonl file
with open('training_examples_abl.jsonl', 'w') as f:
    for example in training_examples:
        f.write(json.dumps(example) + '\n')

In [None]:
# messages = [{"role": "system", "content" : "You are ChatGPT, a large language model trained by OpenAI. Answer as concisely as possible.\nKnowledge cutoff: 2021-09-01\nCurrent date: 2023-03-02"},
#             {"role": "user", "content" : "How are you?"},
#             {"role": "assistant", "content" : "I am doing well"},
#             {"role": "user", "content" : "What is the mission of the company OpenAI?"}]

In [None]:
!ls

allied_bank.json  sample_data  training_examples_abl.jsonl


# Fine Tuning using Openai GPT-3.5

## Steps / Pre-requisites

1. Cretae the OpenAI API key
2. Training Data Set as per above Format
3. Training Dataset is uploaded to OpenAI server
4. Submit the training job, it may take a while to finish. You will receive an email when job is finished with Model-id to use for inefernece.

> The OpenAI API for finetuning costs you money, be mindful when trying finetuning.



In [None]:
# Enter OpenAI API Key
openai.api_key = ""

## Upload the file to OpenAI

In order to use OpenAI finetuning API, it requires users to upload the training Data using below ENDPOINT.

In [None]:
file_id = openai.File.create(
  file=open("training_examples_abl.jsonl", "rb"),
  purpose='fine-tune'
).id

In [None]:
file_id

'file-uG0yfhtofvVKqUPpbhePzlz8'

## Submit the training job

Once the finetuning dataset is loaded, let us kick start the finetuning using below command. It will take sometie to finish the finetuning, be patient and check the status of finetuning job.


In [None]:
job = openai.FineTuningJob.create(training_file=file_id, model="gpt-3.5-turbo")

job_id = job.id

In [None]:
# Check the status of Finetuning Job
openai.FineTuningJob.list_events(id=job_id, limit=100)

# Finetuned Model

Once the training is finished, a model with unique model_id is created. You will receice an email once it is done. Use below set of commands to retrive the name of the model for inference and other use.

In [None]:
# Retrieve the finetuned model identifier.
model_name_pre_object = openai.FineTuningJob.retrieve(job_id)
model_name = model_name_pre_object.fine_tuned_model
print(model_name)

ft:gpt-3.5-turbo-0613:personal::7rLEWyrE


In order to generate inference, use below code snippt. The finetuned model can be used with LangChain and any other System by replacing the OpenAI Model name with generated model identifier above.

> Finetuned Model inference is usually expensive than Publically Available, almost doubles.

In [None]:
# Create the response from the finetuned Model.
response = openai.ChatCompletion.create(
    model=model_name,
    messages=[
      {
        "role": "system",
        "content": user_input_system,
      },
      {
          "role": "user",
          "content": data_allied_bank[0]["question"],
      }
    ],
)

response.choices[0].message['content']

'Unlike conventional banking, Islamic banking is based on Shariah principles derived from the Holy Quran, Hadiths and Sunnah of the Holy Prophet (PBUH). Shariah principles govern Islamic banking in all aspects of banking, after selection of profitable businesses by Islamic moral principles, it also restricts to invest in religiously prohibited industries especially related to Haram like Alcohol, interest based financial sector, gambling etc. There are many forms of Islamic banking based on different modes of investments/products being offered by Islamic Banks. The major modes of investments are: Murabaha, Ijarah, Mudarabah, Musharakah, Bai Salam, and Istasna. The illustration of each mode of investment is available under FAQ.'

In [None]:
data_allied_bank[0]["question"]

'What is Islamic Banking?'

In [None]:
data_allied_bank[0]["answer"]

'Islamic banking is defined as banking system which is in consonance with the spirit, ethos and value system of Islam and governed by the principles laid down by Islamic Shariah. Interest free banking is a narrow concept denoting a number of banking instruments or operations which avoid interest. Islamic banking, the more general term, is based not only to avoid interest-based transactions prohibited in Islamic Shariah but also to avoid unethical and un-social practices. In practical sense, Islamic Banking is the transformation of conventional money lending into transactions based on tangible assets and real services. The model of Islamic banking system leads towards the achievement of a system which helps achieve economic prosperity.'

# Conclusion

In this demo, we have learned on how to generate the finetuning dataset which is comptable with OpenAI Finetuning API and submit the finetuning job using OpenAI finetuning API. We can used the finetuned model to generate inference and develop application using finetuned LLMs.