## **Context**




We are going to use a large language model to automate the classification and processing of user help desk support tickets.  The ultimate goal would be to predict ticket categories, assign priority, suggest estimated resolution time, generate a response based on sentiment analysis from the LLM, and create output that is stored in a dataframe. The input file is Support_ticket_text_data.xls.

The dateframe should have 7 columns:

Support ticket ID (from input file), support ticket text (from input file), category, tags, priority, estimated resolution time, and a generated reply from the LLM.

## **Project Objective**

Develop a Generative AI application using a Large Language Model to **automate the classification and processing of support tickets.** The application will aim to predict ticket categories, assign priority, suggest estimated resolution times, generate responses based on sentiment analysis, and store the results in a structured DataFrame.


In [1]:
!pip install openai

Collecting openai
  Downloading openai-1.16.1-py3-none-any.whl (266 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/266.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.2/266.9 kB[0m [31m2.5 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m266.2/266.9 kB[0m [31m4.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m266.9/266.9 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m4.1 MB/s[

In [2]:
import openai, json, pandas as pd, os

In [3]:
# Mount Google drive to access the dataset
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
# Read a CSV file into a DataFrame and store it in the 'data' variable
df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/ARTI 330 Help Desk LLM/Support_ticket_text_data.csv')

In [5]:
df.shape

(27, 2)

In [6]:
df.head()

Unnamed: 0,support_tick_id,support_ticket_text
0,ST2024-001,How do you find the mac address on my computer?
1,ST2024-002,I dropped my laptop in the parking lot and it ...
2,ST2024-003,The screen resolution on my computer monitor d...
3,ST2024-004,How do you get a virus off of my computer? I c...
4,ST2024-005,I can't get my computer to print to my printer...


In [7]:
!openai --version

openai 1.16.1


In [8]:
# Loading my API Key from a google drive text file, not the most secure I know...
client = openai.OpenAI(api_key=open('/content/drive/MyDrive/Colab Notebooks/openai-api-key.txt', 'r').read())

In [9]:
# Uses a GPT API to generate specific JSON Output
def generate_response(ticket_text):
    sys_msg = f'''You are a helpful assistant designed to output JSON.

    "classification": A classification of the ticket as either a Technical Issue, a Hardware Issue, or a Data Recovery Issue
    "tags": Space-separated generalized tags that indicate the area of the problem more specifically
    "priority": A priority level for the support ticket
    "resolution time": An estimation of an approximate resolution time for the ticket
    "reply": A reply to the support ticket offering potential steps to solve the problem. Steps are formatted as one sentence'''

    # Generate response using GPT-3.5 Turbo
    response = client.chat.completions.create(
      model="gpt-3.5-turbo",
      response_format={ "type": "json_object" },
      messages=[
        {"role": "system", "content": sys_msg},
        {"role": "user", "content": f'Respond exactly with the specified JSON format for the ticket: {ticket_text}'}
      ]
    )

    return response.choices[0].message.content.strip()

In [10]:
# Generating responses for each column in the dataframe
df['gpt_response'] = df['support_ticket_text'].apply(lambda x: generate_response(x))

In [11]:
df.head()

Unnamed: 0,support_tick_id,support_ticket_text,gpt_response
0,ST2024-001,How do you find the mac address on my computer?,"{\n ""classification"": ""Technical Issue"",\n ..."
1,ST2024-002,I dropped my laptop in the parking lot and it ...,"{\n ""classification"": ""Data Recovery Issue""..."
2,ST2024-003,The screen resolution on my computer monitor d...,"{\n ""classification"": ""Technical Issue"",\n ..."
3,ST2024-004,How do you get a virus off of my computer? I c...,"{\n ""classification"": ""Technical Issue"",\n ..."
4,ST2024-005,I can't get my computer to print to my printer...,"{\n ""classification"": ""Hardware Issue"",\n ..."


In [12]:
# Looking at one response in full
df['gpt_response'][0]

'{\n    "classification": "Technical Issue",\n    "tags": "Network Hardware",\n    "priority": "Medium",\n    "resolution time": "1-2 business days",\n    "reply": "To find the MAC address on your computer, you can open Command Prompt and type \'ipconfig /all\'. Look for the \'Physical Address\' under the network adapter you are interested in."\n}'

In [13]:
# Function that attempts to parse a json reponse
def extract_json_data(response):
  try:
    json_str = response[response.index('{'):response.index('}')+1]
  except:
    print('Brackets not detected in response.')
    return {}
  try:
    data_dict = json.loads(json_str)
    return data_dict
  except json.JSONDecodeError as e:
    print(f"Error parsing JSON: {e}")
    return {}

In [14]:
# Parsing the JSON responses for every column
df['parsed_response'] = df['gpt_response'].apply(lambda x: extract_json_data(x))

In [15]:
df.head()

Unnamed: 0,support_tick_id,support_ticket_text,gpt_response,parsed_response
0,ST2024-001,How do you find the mac address on my computer?,"{\n ""classification"": ""Technical Issue"",\n ...","{'classification': 'Technical Issue', 'tags': ..."
1,ST2024-002,I dropped my laptop in the parking lot and it ...,"{\n ""classification"": ""Data Recovery Issue""...","{'classification': 'Data Recovery Issue', 'tag..."
2,ST2024-003,The screen resolution on my computer monitor d...,"{\n ""classification"": ""Technical Issue"",\n ...","{'classification': 'Technical Issue', 'tags': ..."
3,ST2024-004,How do you get a virus off of my computer? I c...,"{\n ""classification"": ""Technical Issue"",\n ...","{'classification': 'Technical Issue', 'tags': ..."
4,ST2024-005,I can't get my computer to print to my printer...,"{\n ""classification"": ""Hardware Issue"",\n ...","{'classification': 'Hardware Issue', 'tags': '..."


In [16]:
# A cleaned up version of the JSON response
df['parsed_response'][0]

{'classification': 'Technical Issue',
 'tags': 'Network Hardware',
 'priority': 'Medium',
 'resolution time': '1-2 business days',
 'reply': "To find the MAC address on your computer, you can open Command Prompt and type 'ipconfig /all'. Look for the 'Physical Address' under the network adapter you are interested in."}

### **Creating a report**

In [17]:
# Extracting the five features and adding them to the dataframe
df = pd.concat([df, pd.json_normalize(df['parsed_response'])], axis=1)

In [18]:
df.head()

Unnamed: 0,support_tick_id,support_ticket_text,gpt_response,parsed_response,classification,tags,priority,resolution time,reply
0,ST2024-001,How do you find the mac address on my computer?,"{\n ""classification"": ""Technical Issue"",\n ...","{'classification': 'Technical Issue', 'tags': ...",Technical Issue,Network Hardware,Medium,1-2 business days,"To find the MAC address on your computer, you ..."
1,ST2024-002,I dropped my laptop in the parking lot and it ...,"{\n ""classification"": ""Data Recovery Issue""...","{'classification': 'Data Recovery Issue', 'tag...",Data Recovery Issue,Hardware Data-Recovery,High,2-3 days,It seems like your laptop may have suffered ha...
2,ST2024-003,The screen resolution on my computer monitor d...,"{\n ""classification"": ""Technical Issue"",\n ...","{'classification': 'Technical Issue', 'tags': ...",Technical Issue,Display Resolution,Medium,1-2 business days,To increase the screen resolution on your comp...
3,ST2024-004,How do you get a virus off of my computer? I c...,"{\n ""classification"": ""Technical Issue"",\n ...","{'classification': 'Technical Issue', 'tags': ...",Technical Issue,Virus Removal,High,Within 24 hours,"To remove the virus from your computer, please..."
4,ST2024-005,I can't get my computer to print to my printer...,"{\n ""classification"": ""Hardware Issue"",\n ...","{'classification': 'Hardware Issue', 'tags': '...",Hardware Issue,Printing Power,Medium,1-2 days,Please double-check the connection between you...


In [19]:
# Dropping unecessary columns
df.drop(['gpt_response', 'parsed_response'], inplace=True, axis=1)

In [20]:
df.sample(df.shape[0])

Unnamed: 0,support_tick_id,support_ticket_text,classification,tags,priority,resolution time,reply
20,ST2024-021,"My internet connection is frequently dropping,...",Technical Issue,Internet Connection,High,Within 4 hours,Please try restarting your router and modem to...
6,ST2024-007,Urgent help required! My laptop refuses to sta...,Hardware Issue,laptop start crucial-presentation,Urgent,Within 24 hours,Please try performing a hard reset on your lap...
24,ST2024-025,I am experiencing a critical problem with my i...,Technical Issue,Internet Connectivity,High,Within 24 hours,To address the slow internet speed and frequen...
7,ST2024-008,I've accidentally deleted essential work docum...,Data Recovery Issue,data recovery,High,Within 24 hours,We understand the urgency of your situation. T...
15,ST2024-016,I accidentally formatted my USB drive with cri...,Data Recovery Issue,Data Recovery,High,1-2 days,"To recover files from a formatted USB drive, y..."
0,ST2024-001,How do you find the mac address on my computer?,Technical Issue,Network Hardware,Medium,1-2 business days,"To find the MAC address on your computer, you ..."
4,ST2024-005,I can't get my computer to print to my printer...,Hardware Issue,Printing Power,Medium,1-2 days,Please double-check the connection between you...
21,ST2024-022,Wi-Fi is inconsistent despite proximity to the...,Technical Issue,Wi-Fi Router Connectivity,High,1-2 hours,Please try restarting your router and reconnec...
25,ST2024-026,I hope this message finds you well. I am writi...,Data Recovery Issue,Software Behavior Data Loss,High,2-3 business days,"Based on the description provided, it appears ..."
12,ST2023-013,I'm experiencing a recurring blue screen error...,Hardware Issue,PC Hardware,High,1-2 days,"To diagnose the blue screen error, start by ch..."


It feels a little like cheating to use a GPT model but it serves the same purpose. Moreover, even using to oldest and cheapest model available, GPT achieves more consistency than the Llama model did when I got it to work previsouly. The Llama model routinly failed to generate responses for 4 or 5 of the 27 support tickets, and the responses it did generate were of less quality.

The model aside there isn't much more to this notebook. The extract_json_data function was tweaked to add additional error handling, but it doesn't appear necessary with GPT-3.5-Turbo.

With some small changes, this code could be used to target certain tickets. My vision is that the system could tag tickets once their created so this script could be run automatically and generate responses for only tickets that still need them.



---

