# Welcome To The Notebook

### Set up the project environment

In [1]:
!pip install openai==1.7.2 python-dotenv




[notice] A new release of pip is available: 23.3.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


Import modules

In [2]:
import pandas as pd
import os, time
from openai import OpenAI
from dotenv import load_dotenv
import json
import matplotlib.pyplot as plt

print("Modules are imported.")

Modules are imported.


Set up the OpenAI API:

* paste API key into "apikey" variable
* create OpenAI client using apikey
* If you require an org ID, create "org_id" variable and add to client

In [3]:

apikey = "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"

Creating OpenAI Client

In [4]:
client = OpenAI(
    api_key = apikey
)
client

<openai.OpenAI at 0x1b19c26dd30>

### Prepare the training data

Load `Customer Complaints.csv`



In [5]:
training_data = pd.read_csv('Customer Complaints.csv')
training_data.head(10)

Unnamed: 0,Complaints,Details
0,Unreliable internet! Sick of constant outages...,"{""Topic"": ""Internet"", ""Problem"": ""Unreliable s..."
1,TV signal keeps dropping during crucial momen...,"{""Topic"": ""TV"", ""Problem"": ""Signal dropout"", ""..."
2,Phone line always crackling! Can't hear a thi...,"{""Topic"": ""Phone"", ""Problem"": ""Crackling line""..."
3,Ridiculous prices for such terrible service! ...,"{""Topic"": ""Billing"", ""Problem"": ""Overcharged"",..."
4,Internet speed slower than a snail! Can't str...,"{""Topic"": ""Internet"", ""Problem"": ""Slow speed"",..."
5,TV channels constantly freezing! Can't enjoy ...,"{""Topic"": ""TV"", ""Problem"": ""Channels freezing""..."
6,Phone calls dropping mid-conversation! Unbeli...,"{""Topic"": ""Phone"", ""Problem"": ""Calls dropping""..."
7,Billing errors every month! Overcharged again...,"{""Topic"": ""Billing"", ""Problem"": ""Errors"", ""Cus..."
8,Customer service is a joke! No help whatsoeve...,"{""Topic"": ""Customer Service"", ""Problem"": ""Inco..."
9,This is the worst!,"{""Topic"": ""PulseNet General"", ""Problem"": ""Gene..."


In [6]:
training_data.shape

(67, 2)

**Convert the Complaints records to json**

To be able to use the data for the fine-tuning purpose, we first need to convert each row of the dataframe into the following format:

<pre>
<code>
{
  <span style="color: blue;">"messages"</span>: [
    {
      <span style="color: blue;">"role"</span>: <span style="color: red;">"system"</span>,
      <span style="color: blue;">"content"</span>: "<span style="color: green;">Providing context about the user's prompt.
                  It may include information about the task,
                  instructions, or background details relevant
                  to the conversation.</span>"
    },
    {
      <span style="color: blue;">"role"</span>: <span style="color: red;">"user"</span>,
      <span style="color: blue;">"content"</span>: "<span style="color: green;">the prompt or input provided by the user,
                  which typically initiates the conversation with the assistant.</span>"
    },
    {
      <span style="color: blue;">"role"</span>: <span style="color: red;">"assistant"</span>,
      <span style="color: blue;">"content"</span>: "<span style="color: green;">The desired response or output generated by
                  the assistant in response to the user's prompt.</span>"
    }
  ]
}
</code>
</pre>


Define a method that get's a row of the dataframe and converts it into the json format

In [7]:
def save_as_json(row):

  system_content = """
      Given a customer complaint text, extract and return the following information in json (dict) format:
      - Topic: The product/department that the customer has a complaint about.
      - Problem: A two or three-word description of what exactly the problem is.
      - Customer_Dissatisfaction_Index: is a number between 0 and 100 showing
             how angry the customer is about the problem.
  """

  formatted_data = {
        "messages": [
            {"role": "system", "content": system_content},
            {"role": "user", "content": row.Complaints},
            {"role": "assistant", "content": row.Details}
        ]
      }

  with open("training_data.json", "a") as json_file:
        json.dump(formatted_data, json_file)
        json_file.write("\n")

Use method to generate `training_data.json`

In [8]:
for index, row in training_data.iterrows():
    save_as_json(row)   

### Fine-tune GPT 3.5 based on training data

Import the json file prepared as the training data

In [9]:
data = client.files.create(
    file = open('training_data.json', 'rb'),
    purpose = 'fine-tune'
)
data

FileObject(id='file-vZulyOE56Od054sIrmX5NZJu', bytes=46789, created_at=1711999707, filename='training_data.json', object='file', purpose='fine-tune', status='processed', status_details=None)

Create the Fine Tuning Job

In [12]:
fine_tuning_job = client.fine_tuning.jobs.create(
    training_file = data.id,
    model = 'gpt-3.5-turbo',
    hyperparameters = {
        'n_epochs': 'auto'
    }
)
fine_tuning_job

BadRequestError: Error code: 400 - {'error': {'message': 'Fine-tuning jobs cannot be created on an Explore plan. You can upgrade to a paid plan on your billing page: https://platform.openai.com/account/billing/overview', 'type': 'invalid_request_error', 'param': None, 'code': 'exceeded_quota'}}

Retrieve the status progress of the fine-tune

*time.sleep set to notify every 2 seconds until successful*

In [None]:
While True:
    time.sleep(2)
    retrieved_job = client.fine_tuning.jobs.retrieve(fine_tuning_job.id)
    status = retrieved_job.status
    print(status)
    
    if(status == 'succeeded'):
        print("The job has successfully completed")
        break

### Evaluate the model

Retrieve the event messages to check out the learning process of the fine-tuning job.

In [None]:
events = list(client.fine_tuning.jobs.list_events(fine_tuning_job_id = retrieved_job, limit= 100))

for event in events:
    print(events.message)

Extract the training loss in each learning step

In [None]:
steps = []
train_loss = []

for event in events:
    if(event.data):
        steps.append(event.data['step'])
        train_loss.append(event.data['train_loss'])
        
print(steps) 
print(train_loss)       

Use a line chart to visualize the train_loss in each step

In [None]:
plt.plot(steps, train_loss, marker = 'o', linestyle = '-')

### Deploy the model

View `retrieved_job` again and name model with `retrieved_job.fine_tuned_model`

In [None]:
myLLM = retrieved_job.fine_tuned_model
print(myLLM)

Define a method to extract information from a given user complaint using a specific LLM and return the results.

In [10]:
def extract_details(user_complaint, model_name):
    """
    This function extracts information from a given user complaint using a specific LLM (Large Language Model).

    Parameters:
    user_complaint (str): The text of the user's complaint.
    model_name (str): The name of the specific LLM model to use for extraction.
    """

    system_content = """
        Given a customer complaint text, extract and return the following information in JSON (dict) format:
        - Topic
        - Problem
        - Customer_Dissatisfaction_Index
    """

    # Generate a response using the specified model and the user's complaint
    response = client.chat.completions.create(
        model = model_name,
        messages=[
            {"role": "system", "content": system_content},  # System content explaining the expected output
            {"role": "user", "content": user_complaint}  # User's complaint passed as content
        ]
    )

    # Return the content of the generated response
    return response.choices[0].message.content


Use the fine-tuned model to extract the details for the following user complaint:

*TV channels keep disappearing from my subscription! What's going on? Extremely annoyed with this service!*


In [None]:
complaint = "TV channels keep disappearing from my subscription! What's going on? Extremely annoyed with this service!"
extract_details(complaint, myLLM)

Test the `GPT-4` model with the same user complaint

In [None]:
extract_details(complaint, 'gpt-4')

Use a new complaint:

*Line is down! It is really annoying!*

In [None]:
complaint = 'Line is down! It is really annoying!'
extract_details(complaint, myLLM)

Compare the results from GPT-4

In [None]:
extract_details(complaint, 'gpt-4')

We can see that the model, which is trained on the complaints dataset, provides better answers compared to GPT-4. The model is fine-tuned based on the data and is familiar with the different edge cases and the context of the dataset.

In [11]:
customer_complaint = "I am very Angry! I want my money back!"
extract_details(customer_complaint, myLLM)

In [None]:
extract_details(customer_complaint, 'gpt-4')