# LLM: Support Ticket Categorization.

#### **Demo Project:** The aim of this project is to develop an advanced support ticket categorization system that accurately classifies incoming tickets, assigns relevant tags based on their content, implements mechanisms and generate the first response based on the sentiment for prioritizing tickets for prompt resolution.

#### Author: **Gabriel Egbenya**

## Installing and Importing Necessary Libraries and Dependencies

In [1]:
# Installation for GPU llama-cpp-python
# uncomment and run the following code in case GPU is being used

!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.2/1.8 MB[0m [31m5.7 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m27.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.9/60.9 kB[0m [31m209.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m201.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.5/19.5 MB[0m [31m258.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
[31mERROR:

In [None]:
# Installation for CPU llama-cpp-python
# uncomment and run the following code in case GPU is not being used

# !CMAKE_ARGS="-DLLAMA_CUBLAS=off" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

**Note** : There may be an error related to a dependency issue thrown by the pip package. This can be ignored as it will not impact the execution of the code.

In [2]:
# For downloading the models from HF Hub
!pip install huggingface_hub==0.20.3 pandas==1.5.3 -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/330.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━[0m [32m225.3/330.1 kB[0m [31m6.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m330.1/330.1 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/12.1 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.6/12.1 MB[0m [31m84.0 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━[0m [32m10.1/12.1 MB[0m [31m147.6 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m12.1/12.1 MB[0m [31m195.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.1/12.1 MB[0m [31m109.2 MB/s[0m eta [36m

In [3]:
# Function to download the model from the Hugging Face model hub
from huggingface_hub import hf_hub_download

# Importing the Llama class from the llama_cpp module
from llama_cpp import Llama

# Importing the json module
import json

# for loading and manipulating data
import pandas as pd

# for time computations
import time

## **Loading the Data**

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
# loading the dataset
tickets = pd.read_csv('/content/drive/My Drive/great_learning/NLP/Lab_NLP/Support_ticket_text_data_mid_term.csv')

In [6]:
# Complete the code to read the CSV file.
data = tickets.copy()

## **Data Overview**

### Checking the first 5 rows of the data

In [7]:
# Complete the code to check the first 5 rows of the data
data.head()

Unnamed: 0,support_tick_id,support_ticket_text
0,ST2023-006,My internet connection has significantly slowe...
1,ST2023-007,Urgent help required! My laptop refuses to sta...
2,ST2023-008,I've accidentally deleted essential work docum...
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...
4,ST2023-010,"My smartphone battery is draining rapidly, eve..."


### Checking the shape of the data

In [8]:
# Complete the code to check the shape of the data
data.shape

(21, 2)

### Checking the missing values in the data

In [9]:
# Complete the code to check for missing values in the data
data.isnull().sum()


Unnamed: 0,0
support_tick_id,0
support_ticket_text,0


## **Model Building**

### Loading the model

In [10]:
model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.2-GGUF"
model_basename = "mistral-7b-instruct-v0.2.Q6_K.gguf"

In [11]:
model_path = hf_hub_download(
    repo_id=model_name_or_path,
    filename=model_basename
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


mistral-7b-instruct-v0.2.Q6_K.gguf:   0%|          | 0.00/5.94G [00:00<?, ?B/s]

In [12]:
# uncomment and run the following code in case GPU is being used

llm = Llama(
    model_path=model_path,
    n_ctx=1024, # Context window
)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


In [None]:
# uncomment and run the following code in case GPU is not being used

# llm = Llama(
#     model_path=model_path,
#     n_ctx=1024, # Context window
#     n_cores=-2 # Number of CPU cores to use
# )

### Utility functions

In [13]:
# defining a function to parse the JSON output from the model
def extract_json_data(json_str):
    try:
        # Find the indices of the opening and closing curly braces
        json_start = json_str.find('{')
        json_end = json_str.rfind('}')

        if json_start != -1 and json_end != -1:
            extracted_category = json_str[json_start:json_end + 1]  # Extract the JSON object
            data_dict = json.loads(extracted_category)
            return data_dict
        else:
            print(f"Warning: JSON object not found in response: {json_str}")
            return {}
    except json.JSONDecodeError as e:
        print(f"Error parsing JSON: {e}")
        return {}

## **Task 1: Ticket Categorization and Returning Structured Output**

In [14]:
# creating a copy of the data
data_1 = data.copy()

In [15]:
#Defining the response function for Task 1.
def response_1(prompt,ticket):
    model_output = llm(
      f"""
      Q: {prompt}
      Support ticket: {ticket}
      A:
      """,
      max_tokens=32,          #sets the maximum number of tokens the model should generate for this task.
      stop=["Q:", "\n"],
      temperature=0.01,       #sets the value for temperature.
      echo=False,
    )

    temp_output = model_output["choices"][0]["text"]
    final_output = temp_output[temp_output.index('{'):]

    return final_output

In [16]:
data_1.head()

Unnamed: 0,support_tick_id,support_ticket_text
0,ST2023-006,My internet connection has significantly slowe...
1,ST2023-007,Urgent help required! My laptop refuses to sta...
2,ST2023-008,I've accidentally deleted essential work docum...
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...
4,ST2023-010,"My smartphone battery is draining rapidly, eve..."


In [17]:
prompt_1 = """
    You are an AI analyzing support text. Your goal is to classify the tickets of the provided text into one of the following categories:
    - Hardware Issues
    - Data Recovery
    - Technical Issues

    If the ticket belongs to two or more categories, return only one category with the highest score.

    Format the output as a JSON object with a single key-value pair as shown below:
    {"Category": "your_category_prediction"}
"""

**Note**: The output of the model should be in a structured format (JSON format).

In [18]:
start = time.time()
data_1['model_response'] = data_1['support_ticket_text'].apply(lambda x: response_1(prompt_1, x))
end = time.time()

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


In [19]:
print("Time taken ",(end-start))

Time taken  86.41630721092224


In [20]:
data_1['model_response'].head()

Unnamed: 0,model_response
0,"{""Category"": ""Technical Issues""}"
1,"{""Category"": ""Hardware Issues""}"
2,"{""Category"": ""Data Recovery""}"
3,"{""Category"": ""Technical Issues""}"
4,"{""Category"": ""Hardware Issues""}"


In [21]:
i = 2
print(data_1.loc[i, 'support_ticket_text'])

I've accidentally deleted essential work documents, causing substantial data loss. I understand the need to avoid further actions on my device. Can you please prioritize the data recovery process and guide me through it?


In [22]:
print(data_1.loc[i, 'model_response'])

{"Category": "Data Recovery"}


In [23]:
# applying the function to the model response
data_1['model_response_parsed'] = data_1['model_response'].apply(extract_json_data)
data_1['model_response_parsed'].head()

Unnamed: 0,model_response_parsed
0,{'Category': 'Technical Issues'}
1,{'Category': 'Hardware Issues'}
2,{'Category': 'Data Recovery'}
3,{'Category': 'Technical Issues'}
4,{'Category': 'Hardware Issues'}


In [24]:
data_1['model_response_parsed'].value_counts()

Unnamed: 0,model_response_parsed
{'Category': 'Technical Issues'},7
{'Category': 'Hardware Issues'},7
{'Category': 'Data Recovery'},7


In [25]:
# Normalizing the model_response_parsed column
model_response_parsed_df_1 = pd.json_normalize(data_1['model_response_parsed'])
model_response_parsed_df_1.head()

Unnamed: 0,Category
0,Technical Issues
1,Hardware Issues
2,Data Recovery
3,Technical Issues
4,Hardware Issues


In [26]:
# Concatinating two dataframes
data_with_parsed_model_output_1 = pd.concat([data_1, model_response_parsed_df_1], axis=1)
data_with_parsed_model_output_1.head()

Unnamed: 0,support_tick_id,support_ticket_text,model_response,model_response_parsed,Category
0,ST2023-006,My internet connection has significantly slowe...,"{""Category"": ""Technical Issues""}",{'Category': 'Technical Issues'},Technical Issues
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{""Category"": ""Hardware Issues""}",{'Category': 'Hardware Issues'},Hardware Issues
2,ST2023-008,I've accidentally deleted essential work docum...,"{""Category"": ""Data Recovery""}",{'Category': 'Data Recovery'},Data Recovery
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{""Category"": ""Technical Issues""}",{'Category': 'Technical Issues'},Technical Issues
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{""Category"": ""Hardware Issues""}",{'Category': 'Hardware Issues'},Hardware Issues


In [27]:
# Dropping model_response and model_response_parsed columns
final_data_1 = data_with_parsed_model_output_1.drop(['model_response','model_response_parsed'], axis=1)
final_data_1.head()

Unnamed: 0,support_tick_id,support_ticket_text,Category
0,ST2023-006,My internet connection has significantly slowe...,Technical Issues
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware Issues
2,ST2023-008,I've accidentally deleted essential work docum...,Data Recovery
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Technical Issues
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Hardware Issues


## **Task 2: Creating Tags**

In [28]:
# creating a copy of the data
data_2 = data.copy()

In [30]:
def response_2(prompt,ticket,category):
    model_output = llm(
      f"""
      Q: {prompt}
      Support ticket: {ticket}
      Category: {category}
      A:
      """,
      max_tokens=64,
      stop=["Q:", "\n"],
      temperature=0.01,
      echo=False,
    )

    temp_output = model_output["choices"][0]["text"]
    final_output = temp_output[temp_output.index('{'):]

    return final_output

In [31]:
prompt_2 = """
    You are a highly intelligent AI tasked with categorizing support tickets. You are provided a ticket and it's category.

    Instructions:
    Your goal is to identify key tags in the tickets that aids the classification of the tickets into the following category:
      - Hardware Issues
      - Data Recovery
      - Technical Issues

    Format the output as a JSON object. Ensure that all values in the JSON are formatted as strings, and each element within the lists should be enclosed in double quote:
    {"Tags": "your_tag_predictions"}

    If your_tab_predictions is empty, return None

"""

**Note**: The output of the model should be in a structured format (JSON format).

In [32]:
start = time.time()
data_2["model_response"]=final_data_1[['support_ticket_text','Category']].apply(lambda x: response_2(prompt_2, x[0],x[1]),axis =1)
end = time.time()

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


In [33]:
print("Time taken ",end-start)

Time taken  101.04957914352417


In [34]:
data_2["model_response"].head()

Unnamed: 0,model_response
0,"{""Tags"": [""internet connection"", ""slow down"", ..."
1,"{""Tags"": [""laptop"", ""refuses to start"", ""cruci..."
2,"{""Tags"": [""data loss"", ""document deletion"", ""f..."
3,"{""Tags"": [""Wi-Fi Signal"", ""Technical Issues""]}"
4,"{""Tags"": [""Battery"", ""Draining"", ""Rapidly""]}"


In [35]:
i = 2
print(data_2.loc[i, 'support_ticket_text'])

I've accidentally deleted essential work documents, causing substantial data loss. I understand the need to avoid further actions on my device. Can you please prioritize the data recovery process and guide me through it?


In [36]:
print(data_2.loc[i, 'model_response'])

{"Tags": ["data loss", "document deletion", "file recovery"]}


In [37]:
# Applying the function to the model response
data_2['model_response_parsed'] = data_2['model_response'].apply(extract_json_data)
data_2['model_response_parsed'].head()

Unnamed: 0,model_response_parsed
0,"{'Tags': ['internet connection', 'slow down', ..."
1,"{'Tags': ['laptop', 'refuses to start', 'cruci..."
2,"{'Tags': ['data loss', 'document deletion', 'f..."
3,"{'Tags': ['Wi-Fi Signal', 'Technical Issues']}"
4,"{'Tags': ['Battery', 'Draining', 'Rapidly']}"


In [38]:
data_2["model_response_parsed"]

Unnamed: 0,model_response_parsed
0,"{'Tags': ['internet connection', 'slow down', ..."
1,"{'Tags': ['laptop', 'refuses to start', 'cruci..."
2,"{'Tags': ['data loss', 'document deletion', 'f..."
3,"{'Tags': ['Wi-Fi Signal', 'Technical Issues']}"
4,"{'Tags': ['Battery', 'Draining', 'Rapidly']}"
5,"{'Tags': ['Account Access', 'Password Reset', ..."
6,"{'Tags': ['Performance', 'Productivity']}"
7,"{'Tags': ['blue screen', 'crashes', 'recurring..."
8,"{'Tags': ['External Hard Drive', 'Data Recover..."
9,"{'Tags': ['graphics card', 'malfunctioning', '..."


In [39]:
# Normalizing the model_response_parsed column
model_response_parsed_df_2 = pd.json_normalize(data_2['model_response_parsed'])
model_response_parsed_df_2.head()

Unnamed: 0,Tags
0,"[internet connection, slow down, disconnections]"
1,"[laptop, refuses to start, crucial presentatio..."
2,"[data loss, document deletion, file recovery]"
3,"[Wi-Fi Signal, Technical Issues]"
4,"[Battery, Draining, Rapidly]"


In [40]:
# Concatinating two dataframes
data_with_parsed_model_output_2 = pd.concat([data_2, model_response_parsed_df_2], axis=1)
data_with_parsed_model_output_2.head()

Unnamed: 0,support_tick_id,support_ticket_text,model_response,model_response_parsed,Tags
0,ST2023-006,My internet connection has significantly slowe...,"{""Tags"": [""internet connection"", ""slow down"", ...","{'Tags': ['internet connection', 'slow down', ...","[internet connection, slow down, disconnections]"
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{""Tags"": [""laptop"", ""refuses to start"", ""cruci...","{'Tags': ['laptop', 'refuses to start', 'cruci...","[laptop, refuses to start, crucial presentatio..."
2,ST2023-008,I've accidentally deleted essential work docum...,"{""Tags"": [""data loss"", ""document deletion"", ""f...","{'Tags': ['data loss', 'document deletion', 'f...","[data loss, document deletion, file recovery]"
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{""Tags"": [""Wi-Fi Signal"", ""Technical Issues""]}","{'Tags': ['Wi-Fi Signal', 'Technical Issues']}","[Wi-Fi Signal, Technical Issues]"
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{""Tags"": [""Battery"", ""Draining"", ""Rapidly""]}","{'Tags': ['Battery', 'Draining', 'Rapidly']}","[Battery, Draining, Rapidly]"


In [41]:
# Dropping model_response and model_response_parsed columns
final_data_2 = data_with_parsed_model_output_2.drop(['model_response','model_response_parsed'], axis=1)
final_data_2.head()

Unnamed: 0,support_tick_id,support_ticket_text,Tags
0,ST2023-006,My internet connection has significantly slowe...,"[internet connection, slow down, disconnections]"
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"[laptop, refuses to start, crucial presentatio..."
2,ST2023-008,I've accidentally deleted essential work docum...,"[data loss, document deletion, file recovery]"
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"[Wi-Fi Signal, Technical Issues]"
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","[Battery, Draining, Rapidly]"


In [42]:
# Checking the value counts of Category column
final_data_2['Tags'].value_counts()

Unnamed: 0,Tags
"[accidentally formatted, data recovery]",2
"[External Hard Drive, Data Recovery]",2
"[internet connection, slow down, disconnections]",1
"[data loss, document deletion, file recovery]",1
"[laptop, refuses to start, crucial presentation, immediate assistance, hardware issue]",1
"[Wi-Fi Signal, Technical Issues]",1
"[Battery, Draining, Rapidly]",1
"[Performance, Productivity]",1
"[Account Access, Password Reset, Data Recovery]",1
"[blue screen, crashes, recurring, hardware issue]",1


In [43]:
final_data_2 = pd.concat([final_data_2,final_data_1["Category"]],axis=1)

In [44]:
final_data_2 = final_data_2[["support_tick_id","support_ticket_text","Category","Tags"]]
final_data_2.head()

Unnamed: 0,support_tick_id,support_ticket_text,Category,Tags
0,ST2023-006,My internet connection has significantly slowe...,Technical Issues,"[internet connection, slow down, disconnections]"
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware Issues,"[laptop, refuses to start, crucial presentatio..."
2,ST2023-008,I've accidentally deleted essential work docum...,Data Recovery,"[data loss, document deletion, file recovery]"
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Technical Issues,"[Wi-Fi Signal, Technical Issues]"
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Hardware Issues,"[Battery, Draining, Rapidly]"


In [45]:
final_data_2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 4 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   support_tick_id      21 non-null     object
 1   support_ticket_text  21 non-null     object
 2   Category             21 non-null     object
 3   Tags                 21 non-null     object
dtypes: object(4)
memory usage: 800.0+ bytes


## **Task 3: Assigning Priority and ETA**

In [46]:
# creating a copy of the data
data_3 = data.copy()

In [47]:
def response_3(prompt,ticket,category,tags):
    model_output = llm(
      f"""
      Q: {prompt}
      Support ticket: {ticket}
      Category: {category}
      Tags: {tags}
      A:
      """,
      max_tokens=32,
      stop=["Q:", "\n"],
      temperature=0.01,
      echo=False,
    )

    temp_output = model_output["choices"][0]["text"]
    final_output = temp_output[temp_output.index('{'):]

    return final_output

In [48]:
prompt_3 = """
    You are a highly intelligent AI tasked with categorizing support ticket text. Your goal is to analyze the provided support ticket text and determine the appropriate priority category based on its content.

    Specify Low if the priority is low or if the priority is not specified in the ticket, Medium if the priority is medium, and High if the priority is high.

    Once the above analysis is complete, your next goal is to analyze the provided support ticket text to infer the Estimated Time of Arrival (ETA) specified within the tickets.
    For example, if the ticket contains word synonyms to urgent, immediate and so on, then the ETA should be Emergency.
    If the ticket contains word like within, day, or hour, then the ETA should be your prediction. Else, the ETA should be set to None.


    Format the output as a JSON object with a single key-value pair as shown below:
    {"Priority": "your_priority_prediction", "ETA": "your_eta_prediction"}

"""

**Note**: The output of the model should be in a structured format (JSON format).

In [49]:
# Applying generate_llama_response function on support_ticket_text column
start = time.time()
data_3['model_response'] = final_data_2[['support_ticket_text','Category','Tags']].apply(lambda x: response_3(prompt_3, x[0],x[1],x[2]),axis=1)
end = time.time()

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


In [50]:
print("Time taken ",(end-start))

Time taken  95.89315462112427


In [51]:
data_3['model_response'].head()


Unnamed: 0,model_response
0,"{""Priority"": ""High"", ""ETA"": ""Within 24 hours""}"
1,"{""Priority"": ""High"", ""ETA"": ""Tomorrow""}"
2,"{""Priority"": ""High"", ""ETA"": ""Within 2 hours""}"
3,"{""Priority"": ""Medium"", ""ETA"": ""Within 1-2 busi..."
4,"{""Priority"": ""Medium"", ""ETA"": ""None""}"


In [52]:
i = 2
print(data_3.loc[i, 'support_ticket_text'])

I've accidentally deleted essential work documents, causing substantial data loss. I understand the need to avoid further actions on my device. Can you please prioritize the data recovery process and guide me through it?


In [53]:
print(data_3.loc[i, 'model_response'])

{"Priority": "High", "ETA": "Within 2 hours"}


In [54]:
# Applying the function to the model response
data_3['model_response_parsed'] = data_3['model_response'].apply(extract_json_data)
data_3['model_response_parsed'].head()

Unnamed: 0,model_response_parsed
0,"{'Priority': 'High', 'ETA': 'Within 24 hours'}"
1,"{'Priority': 'High', 'ETA': 'Tomorrow'}"
2,"{'Priority': 'High', 'ETA': 'Within 2 hours'}"
3,"{'Priority': 'Medium', 'ETA': 'Within 1-2 busi..."
4,"{'Priority': 'Medium', 'ETA': 'None'}"


In [55]:
# Normalizing the model_response_parsed column
model_response_parsed_df_3 = pd.json_normalize(data_3['model_response_parsed'])
model_response_parsed_df_3.head(21)

Unnamed: 0,Priority,ETA
0,High,Within 24 hours
1,High,Tomorrow
2,High,Within 2 hours
3,Medium,Within 1-2 business days
4,Medium,
5,High,Emergency
6,High,Within 2 business days
7,High,Within 2 hours
8,High,Within 24 hours
9,Medium,Within 1-2 business days


In [56]:
data_2['support_ticket_text'][0]

'My internet connection has significantly slowed down over the past two days, making it challenging to work efficiently from home. Frequent disconnections are causing major disruptions. Please assist in resolving this connectivity issue promptly.'

In [57]:
# Concatinating two dataframes
data_with_parsed_model_output_3 = pd.concat([data_3, model_response_parsed_df_3], axis=1)
data_with_parsed_model_output_3.head()

Unnamed: 0,support_tick_id,support_ticket_text,model_response,model_response_parsed,Priority,ETA
0,ST2023-006,My internet connection has significantly slowe...,"{""Priority"": ""High"", ""ETA"": ""Within 24 hours""}","{'Priority': 'High', 'ETA': 'Within 24 hours'}",High,Within 24 hours
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{""Priority"": ""High"", ""ETA"": ""Tomorrow""}","{'Priority': 'High', 'ETA': 'Tomorrow'}",High,Tomorrow
2,ST2023-008,I've accidentally deleted essential work docum...,"{""Priority"": ""High"", ""ETA"": ""Within 2 hours""}","{'Priority': 'High', 'ETA': 'Within 2 hours'}",High,Within 2 hours
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{""Priority"": ""Medium"", ""ETA"": ""Within 1-2 busi...","{'Priority': 'Medium', 'ETA': 'Within 1-2 busi...",Medium,Within 1-2 business days
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{""Priority"": ""Medium"", ""ETA"": ""None""}","{'Priority': 'Medium', 'ETA': 'None'}",Medium,


In [58]:
# Dropping model_response and model_response_parsed columns
final_data_3 = data_with_parsed_model_output_3.drop(['model_response','model_response_parsed'], axis=1)
final_data_3.head()

Unnamed: 0,support_tick_id,support_ticket_text,Priority,ETA
0,ST2023-006,My internet connection has significantly slowe...,High,Within 24 hours
1,ST2023-007,Urgent help required! My laptop refuses to sta...,High,Tomorrow
2,ST2023-008,I've accidentally deleted essential work docum...,High,Within 2 hours
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Medium,Within 1-2 business days
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Medium,


In [59]:
final_data_3 = pd.concat([final_data_3,final_data_2[["Category","Tags"]]],axis=1)

In [60]:
final_data_3 = final_data_3[["support_tick_id","support_ticket_text","Category","Tags","Priority","ETA"]]
final_data_3

Unnamed: 0,support_tick_id,support_ticket_text,Category,Tags,Priority,ETA
0,ST2023-006,My internet connection has significantly slowe...,Technical Issues,"[internet connection, slow down, disconnections]",High,Within 24 hours
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware Issues,"[laptop, refuses to start, crucial presentatio...",High,Tomorrow
2,ST2023-008,I've accidentally deleted essential work docum...,Data Recovery,"[data loss, document deletion, file recovery]",High,Within 2 hours
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Technical Issues,"[Wi-Fi Signal, Technical Issues]",Medium,Within 1-2 business days
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Hardware Issues,"[Battery, Draining, Rapidly]",Medium,
5,ST2023-011,I'm locked out of my online banking account an...,Data Recovery,"[Account Access, Password Reset, Data Recovery]",High,Emergency
6,ST2023-012,"My computer's performance is sluggish, severel...",Technical Issues,"[Performance, Productivity]",High,Within 2 business days
7,ST2023-013,I'm experiencing a recurring blue screen error...,Hardware Issues,"[blue screen, crashes, recurring, hardware issue]",High,Within 2 hours
8,ST2023-014,My external hard drive isn't being recognized ...,Data Recovery,"[External Hard Drive, Data Recovery]",High,Within 24 hours
9,ST2023-015,The graphics card in my gaming laptop seems to...,Hardware Issues,"[graphics card, malfunctioning, gaming laptop]",Medium,Within 1-2 business days


## **Task 4 - Creating a Draft Response**

In [61]:
# creating a copy of the data
data_4 = data.copy()

In [62]:
def response_4(prompt,ticket,category,tags,priority,eta):
    model_output = llm(
      f"""
      Q: {prompt}
      Support ticket: {ticket}
      Category : {category}
      Tags : {tags}
      Priority: {priority}
      ETA: {eta}
      A:
      """,
      max_tokens=128,
      stop=["Q:", "\n"],
      temperature=0.01,
      echo=False,
    )

    temp_output = model_output["choices"][0]["text"]


    return temp_output

In [63]:
prompt_4 = """
   You are an AI analyzing customer support ticket and generating appropriate draft response. Draft a response for the customer based on the ticket content.

    Rephrase the response to the shortest way possible. Only return the response.
"""

prompt_4 = """
   You are a highly capable AI support agent tasked with reviewing support tickets and generating appropriate draft responses. Please analyze the content of the provided support ticket text and create a helpful draft response addressing the issue(s) mentioned in the ticket.

   Rephrase the above response to the shortest way possible.
   Only return the response.
"""

**Note** : For this task, we will not be using the *`extract_json_data`* function. Hence, the output from the model should be a plain string and not a JSON object.

In [64]:
#Applying generate_llama_response function on support_ticket_text column
start = time.time()
data_4['model_response'] = final_data_3[['support_ticket_text','Category','Tags','Priority','ETA']].apply(lambda x: response_4(prompt_4, x[0],x[1],x[2],x[3],x[4]),axis=1)
end = time.time()

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


In [65]:
print("Time taken",(end-start))

Time taken 203.18322896957397


In [66]:
data_4['model_response'].head()

Unnamed: 0,model_response
0,We apologize for the inconvenience you're exp...
1,We're sorry for the inconvenience. Please bri...
2,We're deeply sorry for your data loss. Our te...
3,We apologize for your inconvenience with the ...
4,We're sorry for the inconvenience with your s...


In [67]:
i = 2
print(data_4.loc[i, 'support_ticket_text'])

I've accidentally deleted essential work documents, causing substantial data loss. I understand the need to avoid further actions on my device. Can you please prioritize the data recovery process and guide me through it?


In [68]:
print(data_4.loc[i, 'model_response'])

 We're deeply sorry for your data loss. Our team will prioritize the document recovery process and guide you through it as soon as possible, within 2 hours.


In [69]:
final_data_4 = pd.concat([final_data_3,data_4["model_response"]],axis=1)

In [70]:
final_data_4.rename(columns={"model_response":"Response"},inplace=True)

In [71]:
final_data_4.rename(columns={"model_response":"Response"})

Unnamed: 0,support_tick_id,support_ticket_text,Category,Tags,Priority,ETA,Response
0,ST2023-006,My internet connection has significantly slowe...,Technical Issues,"[internet connection, slow down, disconnections]",High,Within 24 hours,We apologize for the inconvenience you're exp...
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware Issues,"[laptop, refuses to start, crucial presentatio...",High,Tomorrow,We're sorry for the inconvenience. Please bri...
2,ST2023-008,I've accidentally deleted essential work docum...,Data Recovery,"[data loss, document deletion, file recovery]",High,Within 2 hours,We're deeply sorry for your data loss. Our te...
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Technical Issues,"[Wi-Fi Signal, Technical Issues]",Medium,Within 1-2 business days,We apologize for your inconvenience with the ...
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Hardware Issues,"[Battery, Draining, Rapidly]",Medium,,We're sorry for the inconvenience with your s...
5,ST2023-011,I'm locked out of my online banking account an...,Data Recovery,"[Account Access, Password Reset, Data Recovery]",High,Emergency,"We're sorry for the inconvenience, please cal..."
6,ST2023-012,"My computer's performance is sluggish, severel...",Technical Issues,"[Performance, Productivity]",High,Within 2 business days,1. Check for and remove unnecessary startup pr...
7,ST2023-013,I'm experiencing a recurring blue screen error...,Hardware Issues,"[blue screen, crashes, recurring, hardware issue]",High,Within 2 hours,1. Check for recent system updates and install...
8,ST2023-014,My external hard drive isn't being recognized ...,Data Recovery,"[External Hard Drive, Data Recovery]",High,Within 24 hours,We apologize for the inconvenience with your ...
9,ST2023-015,The graphics card in my gaming laptop seems to...,Hardware Issues,"[graphics card, malfunctioning, gaming laptop]",Medium,Within 1-2 business days,We'll examine your graphics card issue on you...


In [72]:
final_data_4['Response'][4]

" We're sorry for the inconvenience with your smartphone battery. Here are some steps you can take to troubleshoot:"

In [73]:
df = pd.DataFrame(final_data_4)
df.head()

Unnamed: 0,support_tick_id,support_ticket_text,Category,Tags,Priority,ETA,Response
0,ST2023-006,My internet connection has significantly slowe...,Technical Issues,"[internet connection, slow down, disconnections]",High,Within 24 hours,We apologize for the inconvenience you're exp...
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware Issues,"[laptop, refuses to start, crucial presentatio...",High,Tomorrow,We're sorry for the inconvenience. Please bri...
2,ST2023-008,I've accidentally deleted essential work docum...,Data Recovery,"[data loss, document deletion, file recovery]",High,Within 2 hours,We're deeply sorry for your data loss. Our te...
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Technical Issues,"[Wi-Fi Signal, Technical Issues]",Medium,Within 1-2 business days,We apologize for your inconvenience with the ...
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Hardware Issues,"[Battery, Draining, Rapidly]",Medium,,We're sorry for the inconvenience with your s...


In [74]:
# Creating a copy of the dataframe of task-4
final_data = final_data_4.copy()

In [75]:
final_data['Category'].value_counts()

Unnamed: 0,Category
Technical Issues,7
Hardware Issues,7
Data Recovery,7


In [76]:
final_data["Priority"].value_counts()

Unnamed: 0,Priority
High,17
Medium,4


In [77]:
final_data["ETA"].value_counts()

Unnamed: 0,ETA
Within 2 hours,8
Emergency,3
Within 2 business days,3
,2
Within 24 hours,2
Within 1-2 business days,2
Tomorrow,1


Let's dive in a bit deeper here.

In [78]:
final_data.groupby(['Category', 'ETA']).support_tick_id.count()

Unnamed: 0_level_0,Unnamed: 1_level_0,support_tick_id
Category,ETA,Unnamed: 2_level_1
Data Recovery,Emergency,1
Data Recovery,Within 2 business days,1
Data Recovery,Within 2 hours,4
Data Recovery,Within 24 hours,1
Hardware Issues,Emergency,1
Hardware Issues,,2
Hardware Issues,Tomorrow,1
Hardware Issues,Within 1-2 business days,1
Hardware Issues,Within 2 hours,2
Technical Issues,Emergency,1
