<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Fine-tune OpenAI GPT-3.5 Turbo Model in Teradata Vantage
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial;color:#00233c'><b>Introduction</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The large Language Model (LLM) is a general-purpose model designed for a broad range of NLP tasks, including providing information on various topics, answering questions, offering suggestions, and even helping us in our work.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>However, to use LLM in a highly specialized business use case scenario like finance or healthcare, we need to train the model using a specific dataset to refine its capabilities and improve its performance. Fine-tuning can achieve this.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Fine-tuning</b> is training a pretrained model on a small and targeted dataset to achieve a specific task. By doing this, users can improve their performance on that task while preserving their general language knowledge.</p>

<center><img src="images/fine-tuning.png" alt="Fine_tuning_process"  width=600 height=480/></center>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In this demo notebook, we train the GPT-3.5 Turbo model using OpenAI API on the "mental health" dataset. By doing so, the model becomes a specialist in answering mental health-related questions and can provide responses on how to tackle mental health and related issues.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>OpenAI allows users to create their own custom GPT-3.5 model tuned towards a particular dataset. As per the use case, we can teach GPT-3.5 the language and terminology of our niche domain, such as medicine or finance. The ChatGPT models are available via API, and in the example below, we used GPT-3.5 Turbo.</p>

<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Business Value</b></p>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Enhanced model performance for specific tasks.</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Saves cost and time in model development.</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>Able to address real-world business problem in an efficient way. </li>
</p> 

<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Why Vantage? </b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Vantage provides a platform for storing data required for fine-tuning. This data is then cleaned, formatted, and validated to meet the standards necessary for fine-tuning GPT-3.5 Turbo model. Once it is ready, the data is stored as training and validation JSONL files.</p> 
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The new data files saved in Vantage are then used to fine-tune the GPT-3.5 Turbo model with OpenAI API. Once the model is fine-tuned, it can be tested on Vantage.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233c'><b>Steps in the analysis:</b></p>
<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>Configuring the environment</li>
    <li>Connection to Vantage and OpenAI</li>
    <li>Data Exploration</li>
    <li>Data errors and cost estimation</li>
    <li>Fine-tuning the model</li>
    <li>Testing the fine-tuned model</li>
    <li>Cleanup</li>
</ol>

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>1. Configuring the environment</b>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>1.1 Install the required libraries</b></p>

In [None]:
%%capture
# '%%capture' suppresses the display of installation steps of the following packages

!pip install -r requirements.txt --quiet

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <i>*The above statements will install the required libraries to run this demo. To gain access to installed libraries after running this, restart the kernel.</i></p>

<div class="alert alert-block alert-info">
    <p style = 'font-size:16px;font-family:Arial;color:#00233C'><i><b>Note:</b> The above statements may need to be uncommented if you run the notebooks on a platform other than ClearScape Analytics Experience that does not have the libraries installed. If you uncomment those installs, be sure to restart the kernel after executing those lines to bring the installed libraries into memory. The simplest way to restart the Kernel is by typing zero zero: <b>0 0</b></i></p></div>

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>1.2 Import the required libraries</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

In [None]:
import json
import os
import tiktoken # for token counting
import openai
from time import sleep

# teradata lib
from teradataml import *

# Suppress warnings
warnings.filterwarnings("ignore")
display.print_sqlmr_query = False
display.suppress_vantage_runtime_warnings = True

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>2. Connection to Vantage</b>

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>2.1 Connect to Vantage</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>When prompted to provide the password. Enter valid password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)
execute_sql('''SET query_band='DEMO=Fine_Tuning_OpenAI_Model_Python.ipynb;' UPDATE FOR SESSION;''')

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>2.2 Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>For this demo, the data has been provided on cloud storage. We can run the demo using foreign tables to access the data without any storage in our environment.</p>

In [None]:
%run -i ../run_procedure.py "call get_data('DEMO_LLM_FineTuning_cloud');"

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Next is an optional step. If you want, you can see the status of databases/tables created and space used.</p>

In [None]:
%run -i ../run_procedure.py "call space_report();"

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>3. Data Exploration</b>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The data used in the demonstration is a mental health chatbot dataset containing questions and answers about mental health issues. The complete information on the data can be found <a href="https://huggingface.co/datasets/heliosbrahma/mental_health_chatbot_dataset">here</a>.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In our database, we have one table: the <b>Mental_Health</b> table. This table contains text conversations between a human and an assistant about mental health issues. The human asks a question, and the assistant helps the human with an answer.</p>

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>3.1 Examine the Mental Health table</b></p>    
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Let's look at the data in the table.</p>

In [None]:
tdf_mental_health = DataFrame(in_schema("DEMO_LLM_FineTuning", "Mental_Health"))
print("Data information: \n", tdf_mental_health.shape)

pd_mental_health = tdf_mental_health.to_pandas()
pd_mental_health.head

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The table has 1 column: <b>text</b> and 172 rows. Each row contains a pair of a question and an answer related to mental health.</p>

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>3.2 Converting to correct format</b></p>    
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The data in the table needs to be converted into a format suitable for fine-tuning OpenAI GPT-3.5 Turbo model.</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The format required is:</p>

```json
{"messages": [{"role": "system", "content": "text"}, {"role": "user", "content": "text"}, {"role": "assistant", "content": "text"}]}
{"messages": [{"role": "system", "content": "text"}, {"role": "user", "content": "text"}, {"role": "assistant", "content": "text"}]}
{"messages": [{"role": "system", "content": "text"}, {"role": "user", "content": "text"}, {"role": "assistant", "content": "text"}]}
```

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The "messages" is a dictionary with two keys:</p>
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'><li> <b>roles</b>: <i>system</i>, <i>user</i>, or <i>assistant</i> tells us from where the content came from</li></ul>
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'><li> <b>content</b>: the text content of the message</li></ul>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'></p>

In [None]:
def conversation_converter(conversation_data, system_message=None):
    
    #Splitting the conversation string into individual lines
    lines = conversation_data.split('\n<')
    
    messages = []
    
    # Including the system message if provided
    if system_message:
        messages.append({
            "role": "system",
            "content": system_message
        })
    
    for l in lines:
        parts = l.split(': ', 1)
        # print(parts)
        
        if parts[0] == "<HUMAN>":
            role = "user"
        else:
            role = "assistant"
        
        message = {
            "role":role,
            "content":parts[1]
        }
        messages.append(message)
    
    output = {
        "messages":messages
    }
    return output

In [None]:
system_message = """You are a helpful and understanding assistant who can help with mental health issues. You are friendly and polite"""

In [None]:
data_mental_health = []

for i in pd_mental_health.index:
    data = pd_mental_health["text"][i]
    data_mental_health.append(data)

dataset = []

for data in data_mental_health:
    data = conversation_converter(data, system_message=system_message)
    dataset.append(data)

print("The length of data:", len(dataset))
print("An example from dataset:")
for message in dataset[0]["messages"]:
    print(message)

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>4. Data errors and cost estimation</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Once we have compiled the dataset and before creating a fine-tuning job, it is important to perform following tasks:</p>
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'><li>Format validation</li></ul>
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'><li>Token counting</li></ul>
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'><li>Cost estimation</li></ul>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>This part is taken from <a href="https://github.com/openai/openai-cookbook">openai_cookbook</a>.</p>

<div class=\"alert alert-block alert-warning\>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><i><b>Note: Using OpenAI for fine-tuning is going to cost some money.</b></i></p>
</div>

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>4.1 Format validation</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>It is important to validate that each conversation in the dataset adheres to the format expected by the fine-tuning API. The below code cell finds out the error related to format of the dataset.</p>

In [None]:
#Format validation
format_errors = defaultdict(int)

for ex in dataset:
    if not isinstance(ex, dict):
        format_errors["data_type"] += 1
        continue

    messages = ex.get("messages", None)
    if not messages:
        format_errors["missing_messages_list"] += 1
        continue

    for message in messages:
        if "role" not in message or "content" not in message:
            format_errors["message_missing_key"] += 1

        if any(k not in ("role", "content", "name") for k in message):
            format_errors["message_unrecognized_key"] += 1

        if message.get("role", None) not in ("system", "user", "assistant"):
            format_errors["unrecognized_role"] += 1

        content = message.get("content", None)
        if not content or not isinstance(content, str):
            format_errors["missing_content"] += 1

    if not any(message.get("role", None) == "assistant" for message in messages):
        format_errors["example_missing_assistant_message"] += 1

if format_errors:
    print("Found errors:")
    for k, v in format_errors.items():
        print(f"{k}: {v}")
else:
    print("No errors found")


<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>4.2 Token Counting Utilities</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Here the helpful utilities are defined which are to be used in the rest of the Notebook.</p>

In [None]:
# Token counting functions
encoding = tiktoken.get_encoding("cl100k_base")

# not exact!
# simplified from https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
def num_tokens_from_messages(messages, tokens_per_message=3, tokens_per_name=1):
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3
    return num_tokens

def num_assistant_tokens_from_messages(messages):
    num_tokens = 0
    for message in messages:
        if message["role"] == "assistant":
            num_tokens += len(encoding.encode(message["content"]))
    return num_tokens

def print_distribution(values, name):
    print(f"\n#### Distribution of {name}:")
    print(f"min / max: {min(values)}, {max(values)}")
    print(f"mean / median: {np.mean(values)}, {np.median(values)}")
    print(f"p5 / p95: {np.quantile(values, 0.1)}, {np.quantile(values, 0.9)}")

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>4.3 Data Warnings and Token Counts</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Potential issues like missing messages in the dataset can be identified with some lightweight analysis. We can also find statistical insight into the message and token counts.</p>


In [None]:
# Warnings and tokens counts
n_missing_system = 0
n_missing_user = 0
n_messages = []
convo_lens = []
assistant_message_lens = []

for ex in dataset:
    messages = ex["messages"]
    if not any(message["role"] == "system" for message in messages):
        n_missing_system += 1
    if not any(message["role"] == "user" for message in messages):
        n_missing_user += 1
    n_messages.append(len(messages))
    convo_lens.append(num_tokens_from_messages(messages))
    assistant_message_lens.append(num_assistant_tokens_from_messages(messages))

print("Num examples missing system message:", n_missing_system)
print("Num examples missing user message:", n_missing_user)
print_distribution(n_messages, "num_messages_per_example")
print_distribution(convo_lens, "num_total_tokens_per_example")
print_distribution(assistant_message_lens, "num_assistant_tokens_per_example")
n_too_long = sum(l > 4096 for l in convo_lens)
print(f"\n{n_too_long} examples may be over the 4096 token limit, they will be truncated during fine-tuning")

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>4.4 Cost Estimation</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In this section, we estimate the total number of tokens that will be used for fine-tuning, this allows us to approximate the cost. It is worth noting that the duration of the fine-tuning jobs will also increase with the token count.</p>

In [None]:
# Pricing and default n_epochs estimate
MAX_TOKENS_PER_EXAMPLE = 4096

TARGET_EPOCHS = 3
MIN_TARGET_EXAMPLES = 100
MAX_TARGET_EXAMPLES = 25000
MIN_DEFAULT_EPOCHS = 1
MAX_DEFAULT_EPOCHS = 25

n_epochs = TARGET_EPOCHS
n_train_examples = len(dataset)
if n_train_examples * TARGET_EPOCHS < MIN_TARGET_EXAMPLES:
    n_epochs = min(MAX_DEFAULT_EPOCHS, MIN_TARGET_EXAMPLES // n_train_examples)
elif n_train_examples * TARGET_EPOCHS > MAX_TARGET_EXAMPLES:
    n_epochs = max(MIN_DEFAULT_EPOCHS, MAX_TARGET_EXAMPLES // n_train_examples)

n_billing_tokens_in_dataset = sum(min(MAX_TOKENS_PER_EXAMPLE, length) for length in convo_lens)
print(f"Dataset has ~{n_billing_tokens_in_dataset} tokens that will be charged for during training")
print(f"By default, you'll train for {n_epochs} epochs on this dataset")
print(f"By default, you'll be charged for ~{n_epochs * n_billing_tokens_in_dataset} tokens")
print("See pricing page to estimate total costs")

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><i><b>Note:</b> See the pricing page to check the cost as per the analysis result of the above step.</i></p></div>

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>5. Fine-tuning the model</b>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>After validating the data format, fine-tuning the model comes next.</p>

<div class="alert alert-block alert-warning">
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><i><b>Note: Using OpenAI for fine-tuning is going to cost some money.</b></i></p>
</div>

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>5.1 Get the OpenAI API key</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In order to fine-tune the model, we will need an OpenAI API key. If do not have one, please refer to the instructions provided in this guide to obtain OpenAI API key:  </p>

[Openai_setup_api_key_guide](..//Openai_setup_api_key/Openai_setup_api_key.md)

In [None]:
import getpass

# enter your openai api key
api_key = getpass.getpass(prompt="\n Please Enter Openai api key: ")

os.environ["OPENAI_API_KEY"] = api_key

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>5.2 JSONL file creation for training</b></p>    
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>OpenAI accepts training and validation data formatted as JSON Line(JSONL) document. The fine-tuning dataset must be formatted in the conversational format.</p>

In [None]:
def write_jsonl(conversations, file_name):
    with open(file_name, 'w') as out:
        for conversation in conversations:
            json_out = json.dumps(conversation) + "\n"
            out.write(json_out)

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The first half of the dataset is used for training, and the remaining will be used for validation. The validation data is optional and is used to ensure that the model does not overfit the training set.</p>

In [None]:
training_file_name = 'training_data.jsonl'
validation_file_name = 'validation_data.jsonl'

# Training dataset
write_jsonl(dataset[:86], training_file_name)

# Validation dataset
write_jsonl(dataset[87:172], validation_file_name)

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The above code will generate two files:</p>
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'><li> <b>training_data.jsonl</b></li></ul>
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'><li> <b>validation_data.jsonl</b></li></ul>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The different model types require various data formats; here, the format complies with the GPT-3.5 Turbo model. The training data and validation data sets consist of input and output examples for how we would like the model to perform.</p>

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>5.3 Upload files to OpenAI</b></p>    
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Next, we must upload the data files to the OpenAI's <b>Files</b> endpoint. The uploaded files are used for fine-tuning.</p>

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><i><b>Note:</b> In case of folowing error:</i></p>
<p style = 'font-size:14px;font-family:Arial;color:#00233C'><b>"AttributeError: module 'openai' has no attribute 'OpenAI'"</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><i>Make sure to have latest package installed of openai.</div>

In [None]:
client = openai.OpenAI(api_key=api_key)

with open(training_file_name, "rb") as training_fd:
    training_response = client.files.create(
        file=training_fd, purpose="fine-tune"
    )

training_file_id = training_response.id

with open(validation_file_name, "rb") as validation_fd:
    validation_response = client.files.create(
        file=validation_fd, purpose="fine-tune"
    )
validation_file_id = validation_response.id

print("Training file ID:", training_file_id)
print("Validation file ID:", validation_file_id)

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>5.4 Initiate Fine-tuning</b></p>    
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>With the generated files and suffix(optional) to identify model in place, we are ready to create fine-tuning job and commence fine-tuning process. We can utilize generated file IDs and a model identifier to create fine-tuning job. The <b>id</b> returned in response will help us retrieve updates of the job.</p>


<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><i><b>Note:</b> We might get <i>files not ready</i> error because the processing is done on OpenAI system. In that case, retry after few minutes</i></p></div>

In [None]:
response = client.fine_tuning.jobs.create(
    training_file=training_file_id,
    validation_file=validation_file_id,
    model="gpt-3.5-turbo",
    suffix="mental-health",
)

job_id = response.id

print("Job ID:", response.id)
print("Status:", response.status)

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><i><b>Note:</b> Run above cell only once or else the fine-tuning process will start again.</i></p></div>

 <div class="alert alert-block alert-warning">
 <p style = 'font-size:16px;font-family:Arial;color:#00233C'><i><b>Note: The fine-tuning can take 10-15 minutes depending on the OpenAI server and number of tokens.</b></i></p>
 </div>

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>5.5 Check job status</b></p>    
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Initially the fine-tuning status will read as <i>validating_files</i>, then <i>running</i>, and finally will turn to <i>status: succeeded</i> after competion of job.</p>

In [None]:
response = client.fine_tuning.jobs.retrieve(job_id)

print("Job ID:", response.id)
print("Status:", response.status)
print("Trained Tokens:", response.trained_tokens)

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Once it is completed, we can use the <b>result_files</b> to sample the results from the validation set (if we uploaded one), and use the ID from the <b>fine_tuned_model</b> parameter to invoke our trained model.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We can track the progress of the fine-tune with the events endpoint.</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The below cell will show the status of fine-tuning process.</p>

In [None]:
print("Fine-tuning in progress..")

while 1:
    sleep(3)
    response = client.fine_tuning.jobs.list_events(job_id)
    events = response.data
    if events[0].data:
        break

while 1:
    sleep(.5)
    response = client.fine_tuning.jobs.list_events(job_id)
    events = response.data
    if events[0].data:
        for event in events:
            print(event.message, end="\r")
    else:
        print("\nFine-tuning completed!")
        break

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><i><b>Note:</b> In case the above process is taking time, go to <a href=\"platform.openai.com/finetune/job_ID\">platform.openai.com/finetune/job_ID</a>, replace job_ID with a valid one and check the status of fine-tuning for details.</i></p>
</div>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>After completion of above cell, we will get a fine-tuned model ID from the job, after that only we can run the below code.</p>

In [None]:
response = client.fine_tuning.jobs.retrieve(job_id)
fine_tuned_model_id = response.fine_tuned_model

if fine_tuned_model_id is None: 
    raise RuntimeError("Fine-tuned model ID not found. The job has likely not been completed yet.")

print("Fine-tuned model ID:", fine_tuned_model_id)

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>6. Testing the fine-tuned model</b>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Congratulations! The GPT-3.5 Turbo model has fine-tuned to the dataset on tackling mental health issues in today's world. To test the model, we can ask a question related to mental health issues.</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>An example question to test the model: "Who are you, and how can you help me?"</p>

In [None]:
test_messages = []
test_messages.append({"role": "system", "content": system_message})
user_message = input(prompt="Please ask your question ")
test_messages.append({"role": "user", "content": user_message})

response = client.chat.completions.create(model=fine_tuned_model_id, messages=test_messages, temperature=0, max_tokens=500)
print(response.choices[0].message.content)

<div id='section8'></div>
<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>7. Cleanup</b>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>7.1 Work Tables</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Cleanup work tables to prevent errors next time.</p>

In [None]:
for table in ["Mental_Health"]:
    try:
        db_drop_table(table)

    except:
        pass

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'> <b>7.2 Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The following code will clean up tables and databases created above.</p>

In [None]:
%run -i ../run_procedure.py "call remove_data('DEMO_LLM_FineTuning');"        # Takes 5 seconds

In [None]:
remove_context()

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>Links</b>
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>Teradataml Python reference: <a href = 'https://docs.teradata.com/search/all?query=Python+Package+User+Guide&content-lang=en-US'>here</a></li>
    <li>OpenAI fine-tuning reference: <a href='https://platform.openai.com/docs/guides/fine-tuning'>here</a></li>
</ul>

<footer style="padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2024. All Rights Reserved
        </div>
    </div>
</footer>