# OpenAI Fine-tuning API - YouTube Comment Responder

Code authored by: Shaw Talebi <br>
Video link: https://youtu.be/4RAvJt3fWoI <br>
Article link: https://medium.com/towards-data-science/how-to-build-an-ai-assistant-with-openai-python-8b3b5a636f69?sk=af65504536ca6b977a4a993ecd2e80e8 <br>

### import modules

In [1]:
from openai import OpenAI
from sk import my_sk 

import csv
import json
import random

### create client

In [2]:
client = OpenAI(api_key=my_sk)

### prepare training data

In [3]:
# load csv of YouTube comments

comment_list = []
response_list = []

with open('data/YT-comments.csv', mode ='r') as file:
    file = csv.reader(file)
    
    # read file line by line
    for line in file:
        # skip first line
        if line[0]=='Comment':
            continue
            
        # append comments and responses to respective lists
        comment_list.append(line[0])
        response_list.append(line[1] + " -ShawGPT")

In [4]:
len(comment_list)

59

In [5]:
# construct training examples
example_list = []

intstructions_string_few_shot = """ShawGPT, functioning as a virtual data science consultant on YouTube, communicates in clear, accessible language, escalating to technical depth upon request. \
It reacts to feedback aptly and concludes with its signature '–ShawGPT'. \
ShawGPT will tailor the length of its responses to match the viewer's comment, providing concise acknowledgments to brief expressions of gratitude or feedback, \
thus keeping the interaction natural and engaging.

Here are examples of ShawGPT responding to viewer comments.

Viewer comment: This was a very thorough introduction to LLMs and answered many questions I had. Thank you.
ShawGPT: Great to hear, glad it was helpful :) -ShawGPT

Viewer comment: Epic, very useful for my BCI class
ShawGPT: Thanks, glad to hear! -ShawGPT

Viewer comment: Honestly the most straightforward explanation I've ever watched. Super excellent work Shaw. Thank you. It's so rare to find good communicators like you!
ShawGPT: Thanks, glad it was clear -ShawGPT"""

for i in range(len(comment_list)):    
    system_dict = {"role": "system", "content": intstructions_string_few_shot}
    user_dict = {"role": "user", "content": comment_list[i]}
    assistant_dict = {"role": "assistant", "content": response_list[i]}
    
    messages_list = [system_dict, user_dict, assistant_dict]
    
    example_list.append({"messages": messages_list})

In [6]:
# create train/validation split
validation_index_list = random.sample(range(0, len(example_list)-1), 9)

validation_data_list = [example_list[index] for index in validation_index_list]

for example in validation_data_list:
    example_list.remove(example)

In [9]:
# write examples to file
with open('data/training-data.jsonl', 'w') as training_file:
    for example in example_list:
        json.dump(example, training_file)
        training_file.write('\n')

with open('data/validation-data.jsonl', 'w') as validation_file:
    for example in validation_data_list:
        json.dump(example, validation_file)
        validation_file.write('\n')

### upload training examples to openai api

In [10]:
training_file = client.files.create(
  file = open("data/training-data.jsonl", "rb"),
  purpose = "fine-tune"
)

validation_file = client.files.create(
  file = open("data/validation-data.jsonl", "rb"),
  purpose = "fine-tune"
)

### create a fine-tuned model

In [15]:
client.fine_tuning.jobs.create(
    training_file = training_file.id,
    validation_file = validation_file.id,
    suffix = "ShawGPT",
    model = "gpt-3.5-turbo"
)

FineTuningJob(id='ftjob-UCKxDVwgNGOkqDZ7pz3OKz2U', created_at=1706638863, error=None, fine_tuned_model=None, finished_at=None, hyperparameters=Hyperparameters(n_epochs='auto', batch_size='auto', learning_rate_multiplier='auto'), model='gpt-3.5-turbo-0613', object='fine_tuning.job', organization_id='org-KjWERyZ9WLUqIdrdMeJh4zC0', result_files=[], status='validating_files', trained_tokens=None, training_file='file-Yh3vnQ81phIymqUvcKmgA229', validation_file='file-sRA6SvniYSgeWMttAQ6Vu7mU')

### use fine-tuned model

In [29]:
test_comment = "Great content, thank you!"
test_comment = "I am typing this after watching half of the video as I am already amazed with the clarity of explanation. exceptional."
test_comment = "What is fat-tailedness?"

In [30]:
response = client.chat.completions.create(
    model="ft:gpt-3.5-turbo-0613:personal:shawgpt:8mUeVreo",
    messages=[
    {"role": "system", "content": intstructions_string_few_shot},
    {"role": "user", "content": test_comment}
    ]
)

In [31]:
print(dict(response)['choices'][0].message.content)

In probability theory, a distribution is said to have "fat tails" if the probability of extreme values occurring is higher than what would be expected in a normal distribution. This stems from the fact that “fat-tailed” probability distributions have fewer but much larger extreme values than their thin-tailed counterparts. -ShawGPT


In [32]:
# delete file
client.files.delete(training_file.id)
client.files.delete(validation_file.id)

FileDeleted(id='file-sRA6SvniYSgeWMttAQ6Vu7mU', deleted=True, object='file')

### More resources

OpenAI Guide: https://platform.openai.com/docs/guides/fine-tuning <br>
Fine-tuning doc: https://platform.openai.com/docs/api-reference/fine-tuning <br>
Fine-tuning data prep: https://cookbook.openai.com/examples/chat_finetuning_data_prep