In [1]:
from openai import OpenAI
from dotenv import load_dotenv
import os 

In [2]:
# Load the environment variables from the .env file
load_dotenv()
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

Prepare the dataset

Dealing with list format, as shown above, might be convenient for small datasets. However, there are several benefits to saving the data in JSONL (JSON Lines) format. The benefits include scalability, interoperability, simplicity, and also compatibility with OpenAI API, which requires data in JSONL format when creating fine-tuning jobs.

The following code leverages the helper function prepare_data to create both the training and validation data in JSONL formats:

In [3]:
import json
from sklearn.model_selection import train_test_split

# Define the path to the original dataset
file_path = '../data/final_data/final_finetuning.jsonl'

# Load the data, handling possible malformed JSON
data = []
with open(file_path, 'r') as file:
    for line in file:
        try:
            json_obj = json.loads(line)
            data.append(json_obj)
        except json.JSONDecodeError:
            # Handle or log the malformed line if needed
            pass

# Split the data into training (80%) and test (20%) sets
train_data, test_data = train_test_split(data, test_size=0.2, random_state=42)

# Define file paths for the training and test data
train_file_path = '../data/final_data/final_finetuning_train.jsonl'
test_file_path = '../data/final_data/final_finetuning_test.jsonl'

# Save the training set
with open(train_file_path, 'w') as train_file:
    for item in train_data:
        train_file.write(json.dumps(item) + '\n')

# Save the test set
with open(test_file_path, 'w') as test_file:
    for item in test_data:
        test_file.write(json.dumps(item) + '\n')


In [15]:
training_file_id = client.files.create(
  file=open(train_file_path, "rb"),
  purpose="fine-tune"
)

test_file_id = client.files.create(
  file=open(test_file_path, "rb"),
  purpose="fine-tune"
)

print(f"Training File ID: {training_file_id}")
print(f"Test File ID: {test_file_id}")

Training File ID: FileObject(id='file-2Hu9XBBlZrL7gYm8tTyhrjFy', bytes=79696, created_at=1711984006, filename='final_finetuning_train.jsonl', object='file', purpose='fine-tune', status='processed', status_details=None)
Test File ID: FileObject(id='file-hJJahvMXeOKVNfqFOpHx76RN', bytes=19223, created_at=1711984007, filename='final_finetuning_test.jsonl', object='file', purpose='fine-tune', status='processed', status_details=None)


### Create a fine-tuning job

This fine-tuning process is highly inspired by the openai-cookbook performing fine-tuning on Microsoft Azure.

To perform the fine-tuning we will use the following two steps: (1) define hyperparameters, and (2) trigger the fine-tuning.

We will fine-tune the davinci model and run it for 15 epochs using a batch size of 3 and a learning rate multiplier of 0.3 using the training and validation datasets.

Successful execution of the previous code displays below the unique identifier of the training and validation data.

In [16]:
response = client.fine_tuning.jobs.create(
  training_file=training_file_id.id, 
  validation_file=test_file_id.id,
  model="gpt-3.5-turbo", 
  hyperparameters={
    "n_epochs": 15,
	"batch_size": 3,
	"learning_rate_multiplier": 0.3
  }
)
job_id = response.id
status = response.status

print(f'Fine-tunning model with jobID: {job_id}.')
print(f"Training Response: {response}")
print(f"Training Status: {status}")

Fine-tunning model with jobID: ftjob-eLHKJZBbs4V1VMPgHUlfUISj.
Training Response: FineTuningJob(id='ftjob-eLHKJZBbs4V1VMPgHUlfUISj', created_at=1711984008, error=Error(code=None, message=None, param=None, error=None), fine_tuned_model=None, finished_at=None, hyperparameters=Hyperparameters(n_epochs=15, batch_size=3, learning_rate_multiplier=0.3), model='gpt-3.5-turbo-0125', object='fine_tuning.job', organization_id='org-wkUFLlJRyOXDuAkBFUtPtrii', result_files=[], status='validating_files', trained_tokens=None, training_file='file-2Hu9XBBlZrL7gYm8tTyhrjFy', validation_file='file-hJJahvMXeOKVNfqFOpHx76RN', user_provided_suffix=None)
Training Status: validating_files


The code above generates the following information for the jobID (`ftjob-SqZvz9Rpjn2nSxtsn8ozMJu4`), the training response, and the training status (pending).

This pending status does not provide any relevant information. However, we can have more insight into the training process by running the following code:

In [17]:
import signal
import datetime


def signal_handler(sig, frame):
    status = client.fine_tuning.jobs.retrieve(job_id).status
    print(f"Stream interrupted. Job is still {status}.")
    return


print(f"Streaming events for the fine-tuning job: {job_id}")

signal.signal(signal.SIGINT, signal_handler)

events = client.fine_tuning.jobs.list_events(fine_tuning_job_id=job_id)
try:
    for event in events:
        print(
            f'{datetime.datetime.fromtimestamp(event.created_at)} {event.message}'
        )
except Exception:
    print("Stream interrupted (client disconnected).")

Streaming events for the fine-tuning job: ftjob-eLHKJZBbs4V1VMPgHUlfUISj
2024-04-01 11:06:48 Validating training file: file-2Hu9XBBlZrL7gYm8tTyhrjFy and validation file: file-hJJahvMXeOKVNfqFOpHx76RN
2024-04-01 11:06:48 Created fine-tuning job: ftjob-eLHKJZBbs4V1VMPgHUlfUISj


### Check the fine-tuning job status

Let's verify that our operation was successful, and additionally, we can examine all the fine-tuning operations by using a list operation.

In [18]:
import time

status = client.fine_tuning.jobs.retrieve(job_id).status
if status not in ["succeeded", "failed"]:
    print(f"Job not in terminal status: {status}. Waiting.")
    while status not in ["succeeded", "failed"]:
        time.sleep(2)
        status = client.fine_tuning.jobs.retrieve(job_id).status
        print(f"Status: {status}")
else:
    print(f"Finetune job {job_id} finished with status: {status}")
print("Checking other finetune jobs in the subscription.")
result = client.fine_tuning.jobs.list()
print(f"Found {len(result.data)} finetune jobs.")

Job not in terminal status: validating_files. Waiting.
Status: validating_files
Status: validating_files
Status: validating_files
Status: validating_files
Status: validating_files
Status: validating_files
Status: validating_files
Status: validating_files
Status: validating_files
Status: validating_files
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status: running
Status:

### Validation of the model

Finally, the fine-tuned model can be retrieved from the “fine_tuned_model” attribute. The following print statement shows that the name of the final mode is: `ft:davinci-002:personal::8gKnyxn3`

In [19]:
# Retrieve the finetuned model
fine_tuned_model = result.data[0].fine_tuned_model
print(fine_tuned_model)


ft:gpt-3.5-turbo-0125:personal::99Du0H0D


With this model, we can run queries to validate its results by providing a prompt, the model name, and creating a query with the openai.Completion.create() function. The result is retrieved from the answer dictionary as follows:

In [20]:
answer = client.chat.completions.create(
  model=fine_tuned_model,
  messages=[
    {"role": "system", "content": "This is a parameter-based prompt for creating marketing materials"},
    {"role": "user", "content": "Develop comprehensive website content for our New Beginnings Savings Account, specifically designed for permanent residents."}
  ]
)
print(answer.choices[0].message)

# new_prompt = "Design an email for the TD Student Line of Credit, aimed at students seeking flexible funding solutions for their academic journey"
# answer = client.completions.create(
#   model=fine_tuned_model,
#   prompt=new_prompt
# )

# print(answer.choices[0].text)

ChatCompletionMessage(content='Welcome to a life of new beginnings and limitless opportunities! Elevate your savings journey with our exclusive New Beginnings Savings Account, tailored for permanent residents seeking a financial partner that understands their unique aspirations. Designed to be more than just a bank account, it’s your gateway to a wealth of benefits and a trusted ally in realizing your dreams. Say hello to a future where your money grows with purpose and your goals are nurtured with care.', role='assistant', function_call=None, tool_calls=None)


In [21]:
answer = client.chat.completions.create(
  model=fine_tuned_model,
  messages=[
    {"role": "system", "content": "This is a parameter-based prompt for creating marketing materials"},
    {"role": "user", "content": "Develop comprehensive website content for our Savings Account in 100 words, specifically designed for permanent residents."}
  ]
)
print(answer.choices[0].message)

# new_prompt = "Design an email for the TD Student Line of Credit, aimed at students seeking flexible funding solutions for their academic journey"
# answer = client.completions.create(
#   model=fine_tuned_model,
#   prompt=new_prompt
# )

# print(answer.choices[0].text)

ChatCompletionMessage(content='Discover financial stability with our Savings Account, exclusively tailored for permanent residents. Enjoy competitive interest rates, zero monthly fees, and bonus interest every month for three years. Dive into a world of convenience with unlimited free transactions, mobile banking, and quick interbank transfers. Grow your wealth effortlessly, with the ability to easily access your money whenever you need it. Start your journey towards financial success with a bank that understands the importance of long-term planning for permanent residents in Canada. Welcome to a future where your savings thrive with every deposit.', role='assistant', function_call=None, tool_calls=None)


In [22]:
answer = client.chat.completions.create(
  model=fine_tuned_model,
  messages=[
    {"role": "system", "content": "This is a parameter-based prompt for creating marketing materials"},
    {"role": "user", "content": "create detailed social media content for mortage seekers in 300 words, specifically designed for permanent residents."}
  ]
)
print(answer.choices[0].message)

# new_prompt = "Design an email for the TD Student Line of Credit, aimed at students seeking flexible funding solutions for their academic journey"
# answer = client.completions.create(
#   model=fine_tuned_model,
#   prompt=new_prompt
# )

# print(answer.choices[0].text)

ChatCompletionMessage(content="Calling all permanent residents! 🇨🇦 Dreaming of a place to call your very own? Say 'hello' to your dream home with our mortgage solutions tailored just for you. Our team doesn't just open doors; we're here to hand you the keys to your future. Why wait any longer when you're this close to making memories in your forever home? True for today, true for life. Partner with us and turn your dream house into your reality. We offer more than mortgages; we provide a pathway to financial confidence and independence. With our expert guidance, purchasing your home isn't just a process – it's an exciting journey. By choosing us, you're choosing more than just a financial institution; you're choosing a partner with a deep understanding of what home means to you. Our commitment doesn't end with great rates; we're here for you at every step, offering advice, answering questions, and making your journey to homeownership as smooth as possible. Let us handle the numbers, so

In [23]:
answer = client.chat.completions.create(
  model=fine_tuned_model,
  messages=[
    {"role": "system", "content": "This is a parameter-based prompt for creating marketing materials"},
    {"role": "user", "content": "create detailed social media content for mortage seekers, specifically designed for permanent residents."}
  ]
)
print(answer.choices[0].message)

# new_prompt = "Design an email for the TD Student Line of Credit, aimed at students seeking flexible funding solutions for their academic journey"
# answer = client.completions.create(
#   model=fine_tuned_model,
#   prompt=new_prompt
# )

# print(answer.choices[0].text)

ChatCompletionMessage(content="Are you seeking your dream home as a Permanent Resident? 🏡 In Canada, your dreams deserve a solid foundation, and we're here to help. With tailored mortgage solutions, we're dedicated to turning your goals into front-door keys. Our team understands the aspirations of Permanent Residents and collaborates with you to secure your place among the pines and maples. Let's build a future you’re proud to call home. 🌟 #MortgageMagic #PermanentlyYours", role='assistant', function_call=None, tool_calls=None)


In [24]:
answer = client.chat.completions.create(
  model=fine_tuned_model,
  messages=[
    {"role": "system", "content": "This is a parameter-based prompt for creating marketing materials"},
    {"role": "user", "content": "create detailed social media content for mortage seekers in 200 words, specifically designed for permanent residents."}
  ]
)
print(answer.choices[0].message)

# new_prompt = "Design an email for the TD Student Line of Credit, aimed at students seeking flexible funding solutions for their academic journey"
# answer = client.completions.create(
#   model=fine_tuned_model,
#   prompt=new_prompt
# )

# print(answer.choices[0].text)

ChatCompletionMessage(content="Unlock home ownership with tailored mortgage solutions. Secure a place to call your own, build equity, and invest in your future while laying down roots in Canada. Our expert advisors, well-versed in the financial nuances that come with permanent residency, craft personalized strategies to turn your property dreams into tangible assets. We understand the unique challenges and opportunities permanent residents face on their path to homeownership and offer guidance every step of the way. Benefit from competitive rates, flexible terms, and transparent processes built on trust. Your home-buying journey begins here, with a partner that prioritizes your goals and values. Let's navigate the mortgage landscape together, ensuring that each decision aligns with your long-term aspirations. It's not just about buying a house; it's about creating a foundation for your future. Time to lay down roots and watch your investments grow alongside your family. The key to a th

In [25]:
answer = client.chat.completions.create(
  model=fine_tuned_model,
  messages=[
    {"role": "system", "content": "This is a parameter-based prompt for creating marketing materials"},
    {"role": "user", "content": "Create a website marketing campaign for a Checking Account aimed at International Students in 5 lines"}
  ]
)
print(answer.choices[0].message)

ChatCompletionMessage(content='Discover the perfect companion for your studies abroad with our International Student Checking Account - where global convenience meets essential financial tools! Get a head start with no monthly fees, unlimited interbank transfers, and waived ATM withdrawal fees worldwide. Seamlessly manage your money through our user-friendly mobile app and track expenses with real-time alerts. Open your account in minutes online and enjoy exclusive benefits designed to grow with you on your academic journey. Stay connected globally with dedicated customer support and valuable resources tailored to your needs. Ready to make the most of your international experience?', role='assistant', function_call=None, tool_calls=None)
