# Create and use a fine tuned model

## Introduction

This notebook demonstrates a fine tuning end to end example. The notebook uploads a training file to the openai API starts a fine tuning job and then uses the fine tuned model to enforce a certain response style.  

The jsonl training file contains a series of examples where a user provides different vegan ingredients and gpt responds with tasty vegan recipes. The recipes are inspired by asian fusion cooking and represent a tasty mix of styles to ensure the user will enjoy the resulting meal. 
The responses also are in html markup to ensure a consistent response and reusability within different formats, such as email messages.

This notebook covers these topics:

1. download a training file from a public repo
2. upload the jsonl fine tuning data to the api
3. process the fine tuning data
4. use the fine tuned model in a chat completion

## Prerequisites

- [Add a lakehouse](https://aka.ms/fabric/addlakehouse) to this notebook. You will download data from a public blob, then store the data in the lakehouse resource.

- To work with the latest chat completion functions in openai in fabric we need to upgrade the openai library. the default library might be stuck at version 0.27 and we need at least version 1. to solve this, we're going to install version 1.12 directly into the running session. Careful! doing this will restart the current running session and any variables you might have set prviously will be wiped. 

In [None]:
%pip install openai==1.12.0

## Parameter block

after installing the right version of openai we set the api key and any other parameters we need.

In [None]:
APIKEY = "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"

In [None]:
IS_CUSTOM_DATA = False  # if True, dataset has to be uploaded manually by user
DATA_FOLDER = "Files/openai"
DATA_FILE = "Chef-training.jsonl"

## Get the data
Now it's time to grab the jsonl file that includes the training prompts to make our GPT Masterchef

In [None]:
if not IS_CUSTOM_DATA:
    # Download demo data files into lakehouse if not exist
    import os, requests

    remote_url = "https://raw.githubusercontent.com/AllgeierSchweiz/openai-lab/main/data/Chef-training.jsonl?token=GHSAT0AAAAAACNZ4YTYYTLSNK5FEO3IWIOCZO6J5HQ"    
    download_path = f"/lakehouse/default/{DATA_FOLDER}"

    if not os.path.exists("/lakehouse/default"):
        raise FileNotFoundError("Default lakehouse not found, please add a lakehouse and restart the session.")
    os.makedirs(download_path, exist_ok=True)
    if not os.path.exists(f"{download_path}/{DATA_FILE}"):
        r = requests.get(f"{remote_url}", timeout=30)
        with open(f"{download_path}/{DATA_FILE}", "wb") as f:
            f.write(r.content)        
    print("Downloaded demo data files into lakehouse.")

In [None]:
from openai import OpenAI
client = OpenAI(
   api_key=APIKEY,
 )

## Upload the trainig file to OpenAI
and retrieve a handle on the file. we'll need the unique file id that is created in the next step

In [None]:
#upload the jsonl training file to the openai service
fo = client.files.create(
  file=open("/lakehouse/default/Files/openai/Chef-training.jsonl", "rb"),
  purpose="fine-tune"
)

## Fine tune the model
Now it's time to start the fine tuning. This will take about 5-10 minutes for the jsonl file we provided.
we keep tabs on the job by storing it in ftjob. we'll use that later to see when it finished.

In [None]:
#create the fine tuning with the file that was just uploaded
ftjob = client.fine_tuning.jobs.create(
  training_file=fo.id, 
  model="gpt-3.5-turbo"
)

print(ftjob)
client.fine_tuning.jobs.list_events(fine_tuning_job_id=ftjob.id, limit=10)

In [None]:
#get the job fresh from the api so we can check status

ftjob = client.fine_tuning.jobs.retrieve(ftjob.id)
print(ftjob.status)
client.fine_tuning.jobs.list_events(fine_tuning_job_id=ftjob.id, limit=10)

## Wait until the fine tuning job has completed
This can take 5-6 Minutes, but due to resource constraints this can also take several hours and be stuck in a queuing state. Might make sense to record the id of the job manually after training has completed. 

In [None]:
#use below if you lost the ftjob for any reason to get the first running job.
#jobs = client.fine_tuning.jobs.list(limit=1)
#ftjob = jobs.data[0]

#check the job status. Wait until finished
import time
while True:
    sec = 60
    # Wait for 60 seconds
    time.sleep(sec)  
    # Retrieve the run status
    ftjob = client.fine_tuning.jobs.retrieve(ftjob.id)

    run_status = ftjob.status
    print(f'{run_status} - {sec} seconds later...')
    # If run is completed, get messages
    if run_status == 'succeeded' or  run_status == 'cancelled':        
        break


## Use the newly trained model
Now it's time to use your newly created Masterchef. This one is geared towards vegan recipes with an asian fusion style, sometimes a mexican twist.

In [None]:
#get the fine tuned model
ftjobid = ftjob.id 

#or come back later and put it here instead to carry on
#ftjobid = "ftevent-ABC123" 

ftjob = client.fine_tuning.jobs.retrieve(ftjobid)
ftmodel = ftjob.fine_tuned_model
print(f'using model {ftmodel} ...')
completion = client.chat.completions.create(
  model=ftmodel,
  messages=[    
    {"role": "system", "content": "You are an Cooking Assistant specialising in vegan recipes. your cooking style is mediterranean asian fusion, similar to a mix between Jamie Oliver and Joanne Molinaro. You  will be given a set of ingredients and respond with a great tasting recipe involving those ingredients."},
    {"role": "user", "content": "Cucumber, Capsicum, flour, Soy Sauce"}
  ]
)


The model will output the results in HTML which can easily be printed prettily in a Jupyter Notebook using the HTML function

In [20]:
from IPython.core.display import HTML
HTML(completion.choices[0].message.content)

StatementMeta(, 7ff8c5dd-67b7-457a-9ea3-0b6069a6b120, 27, Finished, Available)