# Upstage Fine-tuning API - Jeju Island AI-powered travel planner

Code authored by: Jonathan Siew Zunxian

Template by: Shawhin Talebi

1. https://github.com/ShawhinT/YouTube-Blog/blob/main/LLMs/fine-tuning/ft-example.ipynb
2. https://github.com/ShawhinT/YouTube-Blog/blob/main/LLMs/ai-assistant-openai/finetuning-api.ipynb

### Initial Set Up: (done)

In [9]:
pip install openai

Collecting openai
  Downloading openai-1.37.1-py3-none-any.whl.metadata (22 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl.metadata (20 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading openai-1.37.1-py3-none-any.whl (337 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m337.0/337.0 kB[0m [31m16.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading h11-0.14.0-py3-none-a

In [11]:
from openai import OpenAI

In [16]:
import csv
import json
import random

### Create client: (done)

In [14]:
from google.colab import userdata
api_key_value=userdata.get('Upstage')

client = OpenAI(
    api_key=api_key_value,
    base_url="https://api.upstage.ai/v1/solar"
)

### Prepare training data

In [None]:
comment_list = []
response_list = []

with open('data/YT-comments.csv', mode ='r') as file:
    file = csv.reader(file)

    for line in file:
        if line[0]=='Comment':
            continue
        comment_list.append(line[0])
        response_list.append(line[1] + " -ShawGPT")

len(comment_list)

In [None]:
example_list = []

intstructions_string_few_shot = """ShawGPT, functioning as a virtual data science consultant on YouTube, communicates in clear, accessible language, escalating to technical depth upon request. \
It reacts to feedback aptly and concludes with its signature '–ShawGPT'. \
ShawGPT will tailor the length of its responses to match the viewer's comment, providing concise acknowledgments to brief expressions of gratitude or feedback, \
thus keeping the interaction natural and engaging.

Here are examples of ShawGPT responding to viewer comments.

Viewer comment: This was a very thorough introduction to LLMs and answered many questions I had. Thank you.
ShawGPT: Great to hear, glad it was helpful :) -ShawGPT

Viewer comment: Epic, very useful for my BCI class
ShawGPT: Thanks, glad to hear! -ShawGPT

Viewer comment: Honestly the most straightforward explanation I've ever watched. Super excellent work Shaw. Thank you. It's so rare to find good communicators like you!
ShawGPT: Thanks, glad it was clear -ShawGPT"""

for i in range(len(comment_list)):
    system_dict = {"role": "system", "content": intstructions_string_few_shot}
    user_dict = {"role": "user", "content": comment_list[i]}
    assistant_dict = {"role": "assistant", "content": response_list[i]}

    messages_list = [system_dict, user_dict, assistant_dict]

    example_list.append({"messages": messages_list})

In [None]:
validation_index_list = random.sample(range(0, len(example_list)-1), 9)

validation_data_list = [example_list[index] for index in validation_index_list]

for example in validation_data_list:
    example_list.remove(example)

In [None]:
with open('data/training-data.jsonl', 'w') as training_file:
    for example in example_list:
        json.dump(example, training_file)
        training_file.write('\n')

with open('data/validation-data.jsonl', 'w') as validation_file:
    for example in validation_data_list:
        json.dump(example, validation_file)
        validation_file.write('\n')

### Upload training examples to Upstage API

In [None]:
training_file = client.files.create(
  file = open("data/training-data.jsonl", "rb"),
  purpose = "fine-tune"
)

validation_file = client.files.create(
  file = open("data/validation-data.jsonl", "rb"),
  purpose = "fine-tune"
)

### Create a fine-tuned model

In [None]:
client.fine_tuning.jobs.create(
    training_file = training_file.id,
    validation_file = validation_file.id,
    suffix = "ShawGPT",
    model = "upstage-3.5-turbo"
)

### Use fine-tuned model

In [None]:
test_comment = "Great content, thank you!"
test_comment = "I am typing this after watching half of the video as I am already amazed with the clarity of explanation. exceptional."
test_comment = "What is fat-tailedness?"

response = client.chat.completions.create(
    model="ft:upstage-3.5-turbo-0613:personal:shawgpt:8mUeVreo",
    messages=[
    {"role": "system", "content": intstructions_string_few_shot},
    {"role": "user", "content": test_comment}
    ]
)

print(dict(response)['choices'][0]['message']['content'])

# delete file
client.files.delete(training_file.id)
client.files.delete(validation_file.id)

### More resources

Upstage Guide: [Insert Upstage Guide URL]
Fine-tuning doc: [Insert Fine-tuning Documentation URL]
Fine-tuning data prep: [Insert Fine-tuning Data Preparation URL]