### Set up the project environment

In [None]:
!pip install openai==1.7.2 python-dotenv

Importing modules

In [None]:
import pandas as pd
import os, time
from openai import OpenAI
from dotenv import load_dotenv
import json
import matplotlib.pyplot as plt

print("Modules are imported.")

Setting up the OpenAI API:

* Prepare a .env file to store the OpenAI API key.
* Uploading the .env file to our colab environment
* Load the API key and setup the API

In [None]:
load_dotenv('apikey.env.txt')

APIKEY = os.getenv('APIKEY')
ORGID = os.getenv('ORGID')

Creating OpenAI Client

In [None]:
client = OpenAI(
    api_key=APIKEY,
    organization = ORGID
)

client

### Prepare the training data

Loading the data `Customer Complaints.csv`



In [None]:
training_data=pd.read_csv("Customer Complaints.csv)"
training_data

Defining a method that get's a row of the dataframe and convert it into the json format

In [None]:
def save_as_json(row):

  system_content = """
      Given a customer complaint text, extract and return the following information in json (dict) format:
      - Topic: The product/department that the customer has a complaint about.
      - Problem: A two or three-word description of what exactly the problem is.
      - Customer_Dissatisfaction_Index: is a number between 0 and 100 showing
             how angry the customer is about the problem.
  """

  formatted_data = {
        "messages": [
            {"role": "system", "content": system_content},
            {"role": "user", "content": row.Complaints},
            {"role": "assistant", "content": row.Details}
        ]
      }

  with open("training_data.json", "a") as json_file:
        json.dump(formatted_data, json_file)
        json_file.write("\n")

Using of this method to generate the `training_data.json`

In [None]:
for index, row in training_data.iterrows():
save_as_json(row)

### Fine-tune GPT 3.5 based on the training data

Importing the json file which was prepared as the training data

In [None]:
data_file = client.files.create(
            file = open('training_data.json', 'rb'),
            purpose='fine-tune',
)
data_file

Creating the Fine Tuning Job

In [None]:
fine_tuning_job = client.fine_tuning.jobs.create(
    training_file = data_file.id,
    model = 'gpt-3.5-turbo',
    hyperparameters={
        "n_epochs": 1
    }
)
fine_tuning_job

Retrieving the state of the fine-tune

In [None]:
while True:
    time.sleep(2)
    retrieved_job = client.fine_tuning.jobs.retrieve(file_tuning_job.id)
    status = retrieved_job.status
    print(status)

    if(status == "succeeded"):
        print("The job is done!")
        break

### Evaluate model

Retrieving the event messages to check out the learning process of the fine-tuning job.

In [None]:
events = list(client.fine_tuning.jobs.list_events(fine_tuning_job_id = retrieved_job.id, limit = 100).data)

for e in events:
    print(e.message)

Extracting the training loss in each learning step

In [None]:
step = []
train_loss = []

    for e in events:
        if(e.data):
            steps.append(e.data['step'])
            train_loss.append(e.data['train_loss'])

    print(steps)
    print(train_loss)

Using a line chart to visualize the train_loss in each step

In [None]:
plt.plot(steps, train_loss, marker = 'o', linestyle='-')

### Deploy our model

Taking a look at `retrieved_job` again

In [None]:
myLLM = retrieved_job.fine_tuned_model
print(myLLM)

Defining a method to extract information from a given user complaint using a specific LLM and return the results.

In [None]:
def extract_details(user_complaint, model_name):
    """
    This function extracts information from a given user complaint using a specific LLM (Large Language Model).

    Parameters:
    user_complaint (str): The text of the user's complaint.
    model_name (str): The name of the specific LLM model to use for extraction.
    """

    system_content = """
        Given a customer complaint text, extract and return the following information in JSON (dict) format:
        - Topic
        - Problem
        - Customer_Dissatisfaction_Index
    """

    # Generate a response using the specified model and the user's complaint
    response = client.chat.completions.create(
        model = model_name,
        messages=[
            {"role": "system", "content": system_content},  # System content explaining the expected output
            {"role": "user", "content": user_complaint}  # User's complaint passed as content
        ]
    )

    # Return the content of the generated response
    return response.choices[0].message.content


Using the fine-tuned model to extract the details for the following user complaint:

*TV channels keep disappearing from my subscription! What's going on? Extremely annoyed with this service!*

In [None]:
complaint = "TV channels keep disappearing from my subscription! What's going on? Extremely annoyed with this service!"
extract_details (complaint, myLLM)

Testing the `GPT-4` model with the same user complaint

In [None]:
extract_details(complaint, 'gpt-4')

Trying for the following complaint:

*Line is down! It is really annoying!*

In [None]:
complaint = "Line is down! It is really annoying!"
extract_details(complaint, myLLM)

Comparing the results from GPT-4

In [None]:
extract_details(complaint,myLLM)

The model, which is trained on the dataset, provides better answers compared to GPT-4. The model is fine-tuned based on the provided data and is familiar with the different edge cases and the context of the dataset.

In [None]:
customer_complaint = "I am very Angry! I want my money back!"
extract_details(customer_complaint,myLLM)