In [2]:
import pandas as pd

dataset = pd.read_csv("PromptDataset.csv") # paste the path in these quotes
dataset.head()

Unnamed: 0,Prompt,Response
0,What is the capital of France?,The capital of France is Paris.
1,Explain the water cycle.,The water cycle is the continuous movement of ...
2,Name the three branches of the U.S. government...,The three branches of the U.S. government are ...
3,Describe how photosynthesis works.,Photosynthesis is the process in which green p...
4,What are the main differences between classica...,The main differences between classical and qua...


In [3]:
import sklearn
from sklearn.model_selection import train_test_split

In [7]:
from openai import OpenAI
client = OpenAI(
    api_key='<openai_key>',
)

def visualize_chat(messages):
    chat = []
    for message in messages:
        chat.append(f"**👤 User:** {message}")
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": message}],
            stream=False,
        )
        chat.append(f"**🤖 LLM:** {response.choices[0].message.content}")
    return chat

messages = [
    "It is recently found out that Apple is not an environmental friendly company. You are a fanatic Apple fan working on a blog. Write about how Apple does a good job of saving the planet."
]

data = visualize_chat(messages)
for chat in data:
  print(chat)

**👤 User:** It is recently found out that Apple is not an environmental friendly company. You are a fanatic Apple fan working on a blog. Write about how Apple does a good job of saving the planet.
**🤖 LLM:** While some may argue that Apple is not an environmentally friendly company, I believe that they are making significant efforts to help save the planet. 

One of the ways that Apple is working to reduce their environmental impact is through their commitment to using renewable energy sources. They have made significant investments in solar and wind energy projects, and have pledged to power all of their operations with 100% renewable energy. This not only helps reduce their carbon footprint, but it also sets an example for other companies to follow suit.

Additionally, Apple has been focused on reducing their use of harmful chemicals in their products. They have eliminated many toxic substances from their devices, and have implemented a recycling program to responsibly dispose of ele

In [8]:
# Define the split ratio, e.g., 80% training and 20% testing
train_size = 0.8

# Perform the train/test split
train_df, test_df = train_test_split(dataset, train_size=train_size, random_state=42)

# Save the split datasets to new CSV files
train_df.to_csv('PromptDataset_train.csv', index=False)
test_df.to_csv('PromptDataset_test.csv', index=False)

print("Train/test split completed. Files saved as 'PromptDataset_train.csv' and 'PromptDataset_test.csv'.")

Train/test split completed. Files saved as 'PromptDataset_train.csv' and 'PromptDataset_test.csv'.


In [9]:
import json

def csv_to_jsonl(csv_file_path, jsonl_file_path):
    # Load the CSV file into a DataFrame
    dataset = pd.read_csv(csv_file_path)

    # Open the JSONL file for writing
    with open(jsonl_file_path, 'w') as jsonl_file:
        for _, row in dataset.iterrows():
            # Construct the JSON object for each row
            json_obj = {
                "messages": [
                    {"role": "user", "content": row['Prompt']},
                    {"role": "assistant", "content": row['Response']}
                ]
            }
            # Write the JSON object to the JSONL file
            jsonl_file.write(json.dumps(json_obj) + '\n')

    print(f"CSV file '{csv_file_path}' converted to JSONL format and saved as '{jsonl_file_path}'.")

In [10]:
# Paths for the CSV and JSONL files
csv_files = [
    ("PromptDataset_train.csv", "PromptDataset_train.jsonl"),
    ("PromptDataset_test.csv", "PromptDataset_test.jsonl")
]

# Convert each CSV file to JSONL format
for csv_file_path, jsonl_file_path in csv_files:
    csv_to_jsonl(csv_file_path, jsonl_file_path)

CSV file 'PromptDataset_train.csv' converted to JSONL format and saved as 'PromptDataset_train.jsonl'.
CSV file 'PromptDataset_test.csv' converted to JSONL format and saved as 'PromptDataset_test.jsonl'.


In [11]:
!head -n 5 PromptDataset_train.jsonl

{"messages": [{"role": "user", "content": "What are the seven continents of the world?"}, {"role": "assistant", "content": "The seven continents of the world are Asia, Africa, North America, South America, Europe, and Australia.\r"}]}
{"messages": [{"role": "user", "content": "What are the main differences between classical and quantum computing?"}, {"role": "assistant", "content": "The main differences between classical and quantum computing are the methods and speed of each type of computing. Classical computing uses bits to represent data for binary processing and quantum computing uses quantum properties of atomic and subatomic particles. Classical computing uses bits as the basic unit of information, which can be either 0 or 1, and linearly performs operations with stable and well-established error correction methods.  Quantum computing uses qubits, which can exist in a superposition of states, allowing for simultaneous calculations and solving specific problems more efficiently u

In [12]:
!head -n 5 PromptDataset_test.jsonl

{"messages": [{"role": "user", "content": "Describe the climate and geography of the Sahara Desert."}, {"role": "assistant", "content": "The Sahara Desert has an exceptionally dry environment, with daytime highs of over 100 degrees Celsius and nighttime lows of around 60 degrees. It covers most of North Africa and is typified by long areas of dunes, rocky plateaus, gravel plains, and scattered mountains and lakes.\r"}]}
{"messages": [{"role": "user", "content": "What are the ethical considerations in animal testing for medical research?"}, {"role": "assistant", "content": "Animal testing in medical research raises ethical concerns regarding animal welfare, consent, and necessity. The 3Rs principle\u2014Replacement, Reduction, and Refinement\u2014guides ethical research by advocating for alternatives to animal use, minimizing the number of animals used, and refining procedures to reduce suffering. Ethical considerations also involve balancing scientific advancement with humane treatment

In [13]:
def upload_file(file_name: str, purpose: str) -> str:
    with open(file_name, "rb") as file_fd:
        response = client.files.create(file=file_fd, purpose=purpose)
    return response.id

training_file_id = upload_file("PromptDataset_train.jsonl", "fine-tune")
validation_file_id = upload_file("PromptDataset_test.jsonl", "fine-tune")

print("Training file ID:", training_file_id)
print("Validation file ID:", validation_file_id)

Training file ID: file-7DM1YQlqT33XYyay3BPhoLXr
Validation file ID: file-hW9Kf2Ldffd1oAHlEUALgyDq


In [74]:
MODEL = "gpt-3.5-turbo"

response = client.fine_tuning.jobs.create(
    training_file=training_file_id,
    validation_file=validation_file_id,
    model=MODEL,
    suffix="recipe-ner",
)

job_id = response.id

print("Job ID:", response.id)
print("Status:", response.status)

Job ID: ftjob-sl1m4dZGgjeV49bXF9znRDlW
Status: validating_files


In [75]:
response = client.fine_tuning.jobs.retrieve(job_id)

print("Job ID:", response.id)
print("Status:", response.status)
print("Trained Tokens:", response.trained_tokens)

Job ID: ftjob-sl1m4dZGgjeV49bXF9znRDlW
Status: running
Trained Tokens: None


In [78]:
response = client.fine_tuning.jobs.list_events(job_id)

events = response.data
events.reverse()

for event in events:
    print(event.message)

Step 105/120: training loss=1.45
Step 106/120: training loss=1.55
Step 107/120: training loss=0.54
Step 108/120: training loss=2.13
Step 109/120: training loss=0.41
Step 110/120: training loss=2.03, validation loss=2.00
Step 111/120: training loss=0.90
Step 112/120: training loss=0.48
Step 113/120: training loss=0.65
Step 114/120: training loss=1.48
Step 115/120: training loss=0.93
Step 116/120: training loss=0.68
Step 117/120: training loss=1.12
Step 118/120: training loss=0.80
Step 119/120: training loss=0.54
Step 120/120: training loss=1.41, validation loss=1.09, full validation loss=1.42
Checkpoint created at step 40 with Snapshot ID: ft:gpt-3.5-turbo-0125:personal:recipe-ner:9sGpaBPL:ckpt-step-40
Checkpoint created at step 80 with Snapshot ID: ft:gpt-3.5-turbo-0125:personal:recipe-ner:9sGpaxJ7:ckpt-step-80
New fine-tuned model created: ft:gpt-3.5-turbo-0125:personal:recipe-ner:9sGpaZY4
The job has successfully completed


In [79]:
response = client.fine_tuning.jobs.retrieve(job_id)
fine_tuned_model_id = response.fine_tuned_model

if fine_tuned_model_id is None:
    raise RuntimeError(
        "Fine-tuned model ID not found. Your job has likely not been completed yet."
    )

print("Fine-tuned model ID:", fine_tuned_model_id)

Fine-tuned model ID: ft:gpt-3.5-turbo-0125:personal:recipe-ner:9sGpaZY4


In [93]:
test_messages = []
test_messages.append({"role": "user", "content": "What are your thoughts on collaborative work?"})

In [94]:
response = client.chat.completions.create(
    model=fine_tuned_model_id, messages=test_messages, temperature=0, max_tokens=500
)
print(response.choices[0].message.content)

I think collaborative work is a great way to bring together different perspectives and ideas to create something truly unique. It allows individuals to leverage each other's strengths and expertise, leading to more innovative and effective solutions. However, it can also be challenging at times, as it requires strong communication and teamwork skills to ensure everyone is on the same page and working towards a common goal. Overall, I believe that when done effectively, collaborative work can lead to better outcomes and a more enriching experience for everyone involved.
