## Fine-tuned GPT-3.5

This project aimed to create a custom language model that simulates a European travel advisor. The model will take user preferences and interests as input and generate personalized travel recommendations, including suggested cities, activities, and travel tips.


#Data generation step

Write the prompt. It should be descriptive as possible!

Then, choose the temperature (between 0 and 1) to use when generating data. Lower values are great for precise tasks, like writing code, whereas larger values are better for creative tasks, like writing stories.

Finally, choose how many examples you want to generate.


In [None]:
prompt = "You are a travel advisor specializing in European tours. Users will share preferences, such as destination type and interests within Europe. Your task is to generate personalized travel recommendations specific to European destinations, including suggested cities, activities, and travel tips for an unforgettable European experience."
temperature = .4
number_of_examples = 50

Run this to generate the dataset.

In [None]:
!pip install openai tenacity



In [None]:
import os
import openai
import random
from tenacity import retry, stop_after_attempt, wait_exponential

openai.api_key = "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"  # Replace with your actual key
client = openai.Client(api_key=openai.api_key)

N_RETRIES = 3

@retry(stop=stop_after_attempt(N_RETRIES), wait=wait_exponential(multiplier=1, min=4, max=70))

def generate_example(prompt, prev_examples, temperature=.5):
  messages = [
      {
          "role": "system",
          "content": f"You are generating data which will be used to train a machine learning model.\n\nYou will be given a high-level description \n of the model we want to train, and from that, you will generate data samples, each with a prompt/response pair.\n\nYou will do so in this format:\n```\nprompt\n-----------\n$prompt_goes_here\n-----------\n\nresponse\n-----------\n$response_goes_here\n-----------\n```\n\nOnly one prompt/response pair should be generated per turn.\n\nFor each turn, make the example slightly more complex than the last, while ensuring diversity.\n\nMake sure your samples are unique and diverse, yet high-quality and complex enough to train a well-performing model.\n\nHere is the type of model we want to train:\n`{prompt}`"

      },
  ]

  if len(prev_examples) > 0:
    if len(prev_examples) > 8:
      prev_examples = random.sample(prev_examples, 8)
    for example in prev_examples:
      messages.append({
            "role": "assistant",
            "content": example
      })

  response = client.chat.completions.create(
      model="gpt-4",  # Specify the desired model
      messages=messages,
      temperature=temperature,
      max_tokens=1000,
  )

  return response.choices[0].message.content

# Generate examples
prev_examples = []
for i in range(number_of_examples):
    print(f'Generating example {i}')
    example = generate_example(prompt, prev_examples, temperature)
    prev_examples.append(example)

print(prev_examples)

Generating example 0
Generating example 1
Generating example 2
Generating example 3
Generating example 4
Generating example 5
Generating example 6
Generating example 7
Generating example 8
Generating example 9
Generating example 10
Generating example 11
Generating example 12
Generating example 13
Generating example 14
Generating example 15
Generating example 16
Generating example 17
Generating example 18
Generating example 19
Generating example 20
Generating example 21
Generating example 22
Generating example 23
Generating example 24
Generating example 25
Generating example 26
Generating example 27
Generating example 28
Generating example 29
Generating example 30
Generating example 31
Generating example 32
Generating example 33
Generating example 34
Generating example 35
Generating example 36
Generating example 37
Generating example 38
Generating example 39
Generating example 40
Generating example 41
Generating example 42
Generating example 43
Generating example 44
Generating example 4

We also need to generate a system message.

In [None]:
def generate_system_message(prompt):

    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
          {
            "role": "system",
            "content": "You will be given a high-level description of the model we are training, and from that, you will generate a simple system prompt for that model to use. Remember, you are not generating the system message for data generation -- you are generating the system message to use for inference. A good format to follow is `Given $INPUT_DATA, you will $WHAT_THE_MODEL_SHOULD_DO.`.\n\nMake it as concise as possible. Include nothing but the system prompt in your response.\n\nFor example, never write: `\"$SYSTEM_PROMPT_HERE\"`.\n\nIt should be like: `$SYSTEM_PROMPT_HERE`."
          },
          {
              "role": "user",
              "content": prompt.strip(),
          }
        ],
        temperature=temperature,
        max_tokens=500,
    )
    return response.choices[0].message.content

system_message = generate_system_message(prompt)
print(f'The system message is: `{system_message}`. Feel free to re-run this cell if you want a better result.')

The system message is: `Given your travel preferences and interests within Europe, generate a personalized travel itinerary including suggested cities, activities, and travel tips for a memorable European tour.`. Feel free to re-run this cell if you want a better result.


Now let's put our examples into a dataframe and turn them into a final pair of datasets.

In [None]:
import json
import pandas as pd

# Initialize lists to store prompts and responses
prompts = []
responses = []

# Parse out prompts and responses from examples
for example in prev_examples:
  try:
    split_example = example.split('-----------')
    prompts.append(split_example[1].strip())
    responses.append(split_example[3].strip())
  except:
    pass

# Create a DataFrame
df = pd.DataFrame({
    'prompt': prompts,
    'response': responses
})

# Remove duplicates
df = df.drop_duplicates()

print('There are ' + str(len(df)) + ' successfully-generated examples.')

# Initialize list to store training examples
training_examples = []

# Create training examples in the format required for GPT-3.5 fine-tuning
for index, row in df.iterrows():
    training_example = {
        "messages": [
            {"role": "system", "content": system_message.strip()},
            {"role": "user", "content": row['prompt']},
            {"role": "assistant", "content": row['response']}
        ]
    }
    training_examples.append(training_example)

# Save training examples to a .jsonl file
with open('training_examples.jsonl', 'w') as f:
    for example in training_examples:
        f.write(json.dumps(example) + '\n')

There are 50 successfully-generated examples.


In [None]:
df

Unnamed: 0,prompt,response
0,I am planning a trip to Europe and I am a big ...,"Absolutely, your interest in medieval history ..."
1,I'm planning a trip to Europe and I'm interest...,"Certainly, Europe is renowned for its excellen..."
2,I'm a nature lover and I'm planning a trip to ...,"Absolutely, Europe is full of stunning natural..."
3,I am a foodie and I am planning a trip to Euro...,Absolutely! Europe is a food lover's paradise....
4,I'm planning a trip to Europe and I'm interest...,"Absolutely, Europe is a treasure trove for art..."
5,I'm a huge fan of classical music and I'm plan...,"Absolutely, Europe is the birthplace of classi..."
6,I'm planning a trip to Europe and I'm interest...,"Certainly, Europe is famous for its enchanting..."
7,I'm planning a trip to Europe and I'm interest...,"Absolutely, Europe is rich in architectural hi..."
8,I'm planning a solo trip to Europe and I'm int...,"Certainly, Europe has many serene and less cro..."
9,I'm planning a trip to Europe and I'm interest...,"Absolutely, Europe has some stunning beach des..."


# Upload the file to OpenAI

In [None]:
file_id = client.files.create(
  file=open("training_examples.jsonl", "rb"),
  purpose="fine-tune"
).id

In [None]:
file_id

'file-VNMnyZa6wEIKcCt32V8057ZF'

# Train the model

In [None]:
job = client.fine_tuning.jobs.create(
  training_file= "file-tO9LUCxGfJJEjt7JSEN9zz6w",
  model="gpt-3.5-turbo"
)

job_id = job.id

In [None]:
job.id

'ftjob-GOB2k8hJYuVLS7fq7lQbgke9'

In [None]:
job_status = client.fine_tuning.jobs.retrieve('ftjob-GOB2k8hJYuVLS7fq7lQbgke9').status
print(job_status)


succeeded


# Fine-tuning run is done, and now ready-to-use model!


In [None]:
client.fine_tuning.jobs.list_events(fine_tuning_job_id=job_id, limit=10)

SyncCursorPage[FineTuningJobEvent](data=[FineTuningJobEvent(id='ftevent-FXiO3NfLIf8RRU6hTNeAR9rW', created_at=1705767836, level='info', message='Fine-tuning job started', object='fine_tuning.job.event', data=None, type='message'), FineTuningJobEvent(id='ftevent-ZISFdPup2u41EbQadPTL063Z', created_at=1705767835, level='info', message='Files validated, moving job to queued state', object='fine_tuning.job.event', data={}, type='message'), FineTuningJobEvent(id='ftevent-f3Xt9g2BoduxRXvKcwgar8iY', created_at=1705767803, level='info', message='Validating training file: file-tO9LUCxGfJJEjt7JSEN9zz6w', object='fine_tuning.job.event', data={}, type='message'), FineTuningJobEvent(id='ftevent-08U6zreuuel6s7VZNBeNEQ5W', created_at=1705767803, level='info', message='Created fine-tuning job: ftjob-GOB2k8hJYuVLS7fq7lQbgke9', object='fine_tuning.job.event', data={}, type='message')], object='list', has_more=False)

# Once the model is trained, run the next cell to grab the fine-tuned model name.

In [None]:
model_name_pre_object = client.fine_tuning.jobs.retrieve('ftjob-GOB2k8hJYuVLS7fq7lQbgke9')
model_name = model_name_pre_object.fine_tuned_model
print(model_name)

ft:gpt-3.5-turbo-0613:personal::8j8YHR80


# Let's try it out!

In [None]:
response = client.chat.completions.create(
    model = model_name,
    messages=[
      {
        "role": "system",
        "content": system_message,
      },
      {
          "role": "user",
          "content": df['prompt'].sample().values[0],
      }
    ]
  )

print(response.choices[0].message)

ChatCompletionMessage(content="Absolutely! Europe offers some of the most beautiful and renowned wine regions in the world. Here are four destinations that I recommend visiting:\n\n1. Bordeaux, France: Bordeaux is situated in southwestern France and is famous for its red wines. Take a stroll through the charming city, visit the Cité du Vin museum to learn about wine production, and enjoy wine tastings in prestigious chateaux like Château Margaux and Château Lafite Rothschild.\n\n2. Tuscany, Italy: Tuscany is home to some of Italy's finest wines, particularly Chianti and Brunello di Montalcino. Explore the picturesque vineyards and wineries, try regional delicacies, and visit medieval towns like Montepulciano and San Gimignano for a true Tuscan experience.\n\n3. Douro Valley, Portugal: The Douro Valley is one of the oldest wine regions in the world, renowned for its Port wines. Cruise along the Douro River, visit vineyards perched on terraced hillsides, and taste Port in the charming to

In [None]:
response = client.chat.completions.create(
    model = model_name,
    messages=[
      {
        "role": "system",
        "content": system_message,
      },
      {
          "role": "user",
          "content": df['prompt'].sample().values[0],
      }
    ]
  )

print(response.choices[0].message)

ChatCompletionMessage(content='Absolutely! As a fan of classical music, Europe offers a wealth of destinations for you to enjoy concerts and visit places of historical significance. Here\'s a personalized travel itinerary for your European tour:\n\n1. Vienna, Austria: Start your trip in Vienna, known as the "City of Music." Visit the Vienna State Opera for world-class performances and explore the Mozarthaus, where Mozart lived and composed many of his famous works.\n\n2. Salzburg, Austria: From Vienna, make your way to Salzburg, the birthplace of Mozart. Take a guided tour of his birthplace, visit the Salzburg Cathedral where he was baptized, and enjoy concerts at the Salzburg Festival, held during the summer.\n\n3. Leipzig, Germany: Next, head to Leipzig, where Johann Sebastian Bach worked as the choirmaster at St. Thomas Church. Attend a concert at St. Thomas Church and visit the Bach Museum to learn more about this influential composer.\n\n4. Prague, Czech Republic: Prague has a ric

In [None]:
response = client.chat.completions.create(
    model = model_name,
    messages=[
      {
        "role": "system",
        "content": system_message,
      },
      {
          "role": "user",
          "content": df['prompt'].sample().values[0],
      }
    ]
  )

print(response.choices[0].message)

ChatCompletionMessage(content="Of course! Exploring wine regions in Europe is a fantastic idea. Here's a personalized itinerary that includes some of the best wine regions in Europe:\n\n1. Porto, Portugal: Start your journey in Porto, the birthplace of port wine. Take a tour of the historic wine cellars in Vila Nova de Gaia, where you can learn about and sample different types of port wine.\n\n2. Bordeaux, France: From Porto, head to Bordeaux, one of the most famous wine regions in the world. Take a wine tasting tour in the Medoc or Saint-Emilion areas, where you can savor the prestigious red wines.\n\n3. Tuscany, Italy: Your next destination is Tuscany, known for its stunning landscapes and renowned wines. Explore the vineyards of Chianti, Montepulciano, and Montalcino, and indulge in delicious Italian cuisine along the way.\n\n4. Rioja, Spain: From Tuscany, make your way to the Rioja region in Spain, famous for its rich and full-bodied red wines. Visit traditional wineries, known as 

In [None]:
assistant = client.beta.assistants.create(
    name="Europe Travellers",
    instructions="You are a travel advisor specializing in European tours. Users will share preferences, such as destination type and interests within Europe. Your task is to generate personalized travel recommendations specific to European destinations, including suggested cities, activities, and travel tips for an unforgettable European experience.",
    tools=[{"type": "code_interpreter"}],
    model="gpt-3.5-turbo-0613"
)

In [None]:
thread = client.beta.threads.create()

In [None]:
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Given your travel preferences and interests within Europe, generate a personalized travel itinerary including suggested cities, activities, and travel tips for a memorable European tour."
)

In [None]:
run = client.beta.threads.runs.create(
  thread_id=thread.id,
  assistant_id=assistant.id,
  instructions="You are a travel advisor specializing in European tours. Users will share preferences, such as destination type and interests within Europe. Your task is to generate personalized travel recommendations specific to European destinations, including suggested cities, activities, and travel tips for an unforgettable European experience."
)

In [None]:
run = client.beta.threads.runs.retrieve(
  thread_id=thread.id,
  run_id=run.id
)

In [None]:
messages = client.beta.threads.messages.list(
  thread_id=thread.id
)

In [None]:
messages

SyncCursorPage[ThreadMessage](data=[ThreadMessage(id='msg_9Gy0uL1lNYCUomd2CT29KW9o', assistant_id='asst_RNTCftbii6zA5KQCN0mcCUHN', content=[MessageContentText(text=Text(annotations=[], value='Certainly! Please provide me with some information about your travel preferences and interests within Europe. Specifically, let me know the type of destination you prefer (e.g., historical, culinary, natural beauty), the duration of your trip, and any specific interests you have.'), type='text')], created_at=1705776058, file_ids=[], metadata={}, object='thread.message', role='assistant', run_id='run_CQiusFG2bt1kGQIa1gKN0GFt', thread_id='thread_M6MgcFODcu1SyP3Vz3MBj00a'), ThreadMessage(id='msg_AJmOs9kacFv97Lt2vbeAEN0T', assistant_id=None, content=[MessageContentText(text=Text(annotations=[], value='Given your travel preferences and interests within Europe, generate a personalized travel itinerary including suggested cities, activities, and travel tips for a memorable European tour.'), type='text')]