**To develop an advanced support ticket categorization system that accurately classifies incoming tickets, assigns relevant tags based on their content, implements mechanisms and generate the first response based on the sentiment for prioritizing tickets for prompt resolution.**


## **Installing and Importing Necessary Libraries and Dependencies**

In [1]:
# Ignore warnings
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
# # for time computations.
import time

In [2]:
# Installation for GPU llama-cpp-python.
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m36.7/36.7 MB[0m [31m139.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.9/60.9 kB[0m [31m144.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m257.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m133.3/133.3 kB[0m [31m308.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.3/16.3 MB[0m [31m337.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver does not currently take into account all t

In [4]:
# Importing the Llama class from the llama_cpp module.
from llama_cpp import Llama

In [3]:
# For downloading the models from HF Hub.
# !pip install huggingface_hub==0.20.3 pandas==1.5.3 -q

In [5]:
# Function to download the model from the Hugging Face model hub.
from huggingface_hub import hf_hub_download

# Importing the json module.
import json

## **Loading the Data**

In [8]:
# Loading the data into df
df = pd.read_csv("Support_ticket_text_data_mid_term.csv")

# Creating copy of 'df' in the variable data
data = df.copy()

## **Data Overview**

### Checking the first 5 rows of the data

In [10]:
# first 5 rows of the data
data.head(5)

Unnamed: 0,support_tick_id,support_ticket_text
0,ST2023-006,My internet connection has significantly slowe...
1,ST2023-007,Urgent help required! My laptop refuses to sta...
2,ST2023-008,I've accidentally deleted essential work docum...
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...
4,ST2023-010,"My smartphone battery is draining rapidly, eve..."


### Checking the shape of the data

In [11]:
# shape of data
data.shape

(21, 2)

In [1]:
# There are 21 rows and 2 columns present in this data.

### Checking the missing values in the data

In [12]:
# Missing values in data
data.isna().sum().sum()

0

In [2]:
# From the above output we identify there are no missing values in the dataset.

## **Model Building**

### Loading the model

In [13]:
# model name and model base name
model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.2-GGUF"
model_basename = "mistral-7b-instruct-v0.2.Q6_K.gguf"

In [14]:
# Declaring repo_id and filename
model_path = hf_hub_download(
    repo_id=model_name_or_path, # repo_id = model_name_or_path
    filename=model_basename # filename = model_basename
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


mistral-7b-instruct-v0.2.Q6_K.gguf:   0%|          | 0.00/5.94G [00:00<?, ?B/s]

In [15]:
# Defining the llm model - Llama (Run using GPU)

llm = Llama(
    model_path=model_path,
    n_ctx=1024, # Context window
)

llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from /root/.cache/huggingface/hub/models--TheBloke--Mistral-7B-Instruct-v0.2-GGUF/snapshots/3a6fbf4a41a1d52e415a4958cde6856d34b2db93/mistral-7b-instruct-v0.2.Q6_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.2
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loade

### Utility functions

In [16]:
# defining a function to parse the JSON output from the model
def extract_json_data(json_str):
    try:
        # Find the indices of the opening and closing curly braces
        json_start = json_str.find('{')
        json_end = json_str.rfind('}')

        if json_start != -1 and json_end != -1:
            extracted_category = json_str[json_start:json_end + 1]  # Extract the JSON object
            data_dict = json.loads(extracted_category)
            return data_dict
        else:
            print(f"Warning: JSON object not found in response: {json_str}")
            return {}
    except json.JSONDecodeError as e:
        print(f"Error parsing JSON: {e}")
        return {}

## **Ticket Categorization and Returning Structured Output**

In [17]:
# creating a copy of the data
data_1 = data.copy()

In [18]:
# Defining the response funciton for Task 1.
def response_1(prompt,ticket):
    model_output = llm(
      f"""
      Q: {prompt}
      Support ticket: {ticket}
      A:
      """,
      max_tokens=10, # defining the maximum number of tokens the model should generate for this task.
      stop=["Q:", "\n"],
      temperature=0.01, # temperature set to 0.01(low) for deterministic output.
      echo=False,
    )

    temp_output = model_output["choices"][0]["text"]
    final_output = temp_output[temp_output.index('{'):]

    return final_output

In [19]:
# Prompt creation for task 1
prompt_1 = """
   As an AI, your job is to categorize IT support tickets. 
   Please label each ticket as either a Hardware Issue, Data Recovery, or Technical Issue. 
   Your response should be in the format: {"category": "Hardware Issues"}, {"category": "Data Recovery"}, or {"category": "Technical Issues"}. 
   Keep your output simple and accurate. Ensure that all curly braces are closed and there are no additional characters in the output.
"""

**Note**: The output of the model should be in a structured format (JSON format).

In [20]:
# Utilizing generate_llama_response as a function on the variable: support_ticket_text 
start = time.time()
data_1['model_response'] = data_1['support_ticket_text'].apply(lambda x: response_1(prompt_1, x))
end = time.time()


llama_print_timings:        load time =    2789.61 ms
llama_print_timings:      sample time =       7.33 ms /    10 runs   (    0.73 ms per token,  1364.26 tokens per second)
llama_print_timings: prompt eval time =    2789.27 ms /   175 tokens (   15.94 ms per token,    62.74 tokens per second)
llama_print_timings:        eval time =    8077.37 ms /     9 runs   (  897.49 ms per token,     1.11 tokens per second)
llama_print_timings:       total time =   10917.82 ms /   184 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =    2789.61 ms
llama_print_timings:      sample time =       4.93 ms /    10 runs   (    0.49 ms per token,  2029.63 tokens per second)
llama_print_timings: prompt eval time =    1693.32 ms /    51 tokens (   33.20 ms per token,    30.12 tokens per second)
llama_print_timings:        eval time =    6579.20 ms /     9 runs   (  731.02 ms per token,     1.37 tokens per second)
llama_print_timings:       total time =    8305.30 ms /    60 

In [21]:
# Time taken for model to return output
print("Time taken:", round((end-start)),"seconds")

Time taken  4.4 minutes.


In [22]:
# Initial model output
data_1['model_response'].head(5)

Unnamed: 0,model_response
0,"{""category"": ""Technical Issues""}"
1,"{""category"": ""Hardware Issues""}"
2,"{""category"": ""Data Recovery""}"
3,"{""category"": ""Technical Issues""}"
4,"{""category"": ""Hardware Issues""}"
5,"{""category"": ""Technical Issues""}"
6,"{""category"": ""Technical Issues""}"
7,"{""category"": ""Hardware Issues""}"
8,"{""category"": ""Data Recovery""}"
9,"{""category"": ""Hardware Issues""}"


In [23]:
# Displaying the support ticket text
i = 6
print(data_1.loc[i,'support_ticket_text'])

My computer's performance is sluggish, severely impacting my work. I need help optimizing it to regain productivity.


In [24]:
# Model output
print(data_1.loc[i, 'model_response'])

{"category": "Technical Issues"}


In [25]:
# Applying the function to the model response
data_1['model_response_parsed'] = data_1['model_response'].apply(extract_json_data)
data_1['model_response_parsed'].head()

Unnamed: 0,model_response_parsed
0,{'category': 'Technical Issues'}
1,{'category': 'Hardware Issues'}
2,{'category': 'Data Recovery'}
3,{'category': 'Technical Issues'}
4,{'category': 'Hardware Issues'}


In [26]:
# Model output after extracting JSON data
data_1['model_response_parsed'].value_counts()

Unnamed: 0_level_0,count
model_response_parsed,Unnamed: 1_level_1
{'category': 'Technical Issues'},8
{'category': 'Hardware Issues'},7
{'category': 'Data Recovery'},6


In [27]:
# Normalizing the model_response_parsed column
model_response_parsed_df_1 = pd.json_normalize(data_1['model_response_parsed'])
model_response_parsed_df_1.head()

Unnamed: 0,category
0,Technical Issues
1,Hardware Issues
2,Data Recovery
3,Technical Issues
4,Hardware Issues


In [28]:
# Concatinating two dataframes
data_with_parsed_model_output_1 = pd.concat([data_1, model_response_parsed_df_1], axis=1)
data_with_parsed_model_output_1.head()

Unnamed: 0,support_tick_id,support_ticket_text,model_response,model_response_parsed,category
0,ST2023-006,My internet connection has significantly slowe...,"{""category"": ""Technical Issues""}",{'category': 'Technical Issues'},Technical Issues
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{""category"": ""Hardware Issues""}",{'category': 'Hardware Issues'},Hardware Issues
2,ST2023-008,I've accidentally deleted essential work docum...,"{""category"": ""Data Recovery""}",{'category': 'Data Recovery'},Data Recovery
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{""category"": ""Technical Issues""}",{'category': 'Technical Issues'},Technical Issues
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{""category"": ""Hardware Issues""}",{'category': 'Hardware Issues'},Hardware Issues


In [29]:
# Dropping model_response and model_response_parsed columns
final_data_1 = data_with_parsed_model_output_1.drop(['model_response','model_response_parsed'], axis=1)
final_data_1.head()

Unnamed: 0,support_tick_id,support_ticket_text,category
0,ST2023-006,My internet connection has significantly slowe...,Technical Issues
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware Issues
2,ST2023-008,I've accidentally deleted essential work docum...,Data Recovery
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Technical Issues
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Hardware Issues


## **Creating Tags**

In [30]:
# creating a copy of the data
data_2 = data.copy()

In [31]:
def response_2(prompt,ticket,category):
    model_output = llm(
      f"""
      Q: {prompt}
      Support ticket: {ticket}
      Category: {category}
      A:
      """,
      max_tokens=1024,  # defining the maximum number of tokens the model should generate for this task.
      stop=["Q:", "\n"],
      temperature=0.01,  # temperature set to 0.01(low) for deterministic output.
      echo=False,
    )

    temp_output = model_output["choices"][0]["text"]
    final_output = temp_output[temp_output.index('{'):]

    return final_output

In [32]:
# Prompt creation for task 2
prompt_2 = """
   As an AI, your task is to label IT support tickets with relevant tags. 
   Please identify the most appropriate keywords and include them in your response. 
   Your output should be formatted as follows: {"tags": ["Wifi", "Data Loss", "Connection Issues", "Battery"]}.
   Keep your output simple and accurate. Ensure that all curly braces are closed and there are no additional characters in the output.
"""

**Note**: The output of the model should be in a structured format (JSON format).

In [33]:
# Utilizing generate_llama_response as a function on the variable: support_ticket_text
start = time.time()
data_2["model_response"]=final_data_1[['support_ticket_text','category']].apply(lambda x: response_2(prompt_2, x[0],x[1]),axis =1)
end = time.time()

  data_2["model_response"]=final_data_1[['support_ticket_text','category']].apply(lambda x: response_2(prompt_2, x[0],x[1]),axis =1)
Llama.generate: prefix-match hit

llama_print_timings:        load time =    2789.61 ms
llama_print_timings:      sample time =      10.66 ms /    19 runs   (    0.56 ms per token,  1783.03 tokens per second)
llama_print_timings: prompt eval time =    2207.78 ms /   166 tokens (   13.30 ms per token,    75.19 tokens per second)
llama_print_timings:        eval time =   13993.29 ms /    18 runs   (  777.40 ms per token,     1.29 tokens per second)
llama_print_timings:       total time =   16276.15 ms /   184 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =    2789.61 ms
llama_print_timings:      sample time =       4.96 ms /     9 runs   (    0.55 ms per token,  1812.69 tokens per second)
llama_print_timings: prompt eval time =    1728.46 ms /    59 tokens (   29.30 ms per token,    34.13 tokens per second)
llama_print_timin

In [34]:
# Time taken for model to generate output
print("Time taken:",round((end-start))," seconds")

Time taken  4.0 minutes.


In [35]:
# Initial model output
data_2['model_response'].head(5)

Unnamed: 0,model_response
0,"{""tags"": [""Connection Issues"", ""Internet"", ""Sl..."
1,"{""tags"": [""Hardware""]}"
2,"{""tags"": [""Data Loss""]}"
3,"{""tags"": [""Wifi"", ""Connection Issues""]}"
4,"{""tags"": [""Battery""]}"
5,"{""tags"": [""Account Access"", ""Password Reset""]}"
6,"{""tags"": [""Performance Issues""]}"
7,"{""tags"": [""Hardware Issues""]}"
8,"{""tags"": [""Data Loss""]}"
9,"{""tags"": [""Graphics Card""]}"


In [36]:
# Support ticket text
i = 0
print(data_2.loc[i,'support_ticket_text'])

My internet connection has significantly slowed down over the past two days, making it challenging to work efficiently from home. Frequent disconnections are causing major disruptions. Please assist in resolving this connectivity issue promptly.


In [37]:
# Model output
print(data_2.loc[i,'model_response'])

{"tags": ["Connection Issues", "Internet", "Slow Connection"]}


In [38]:
# Applying the function to the model response
data_2['model_response_parsed'] = data_2['model_response'].apply(extract_json_data)

In [39]:
# Model output after extracting JSON data
data_2["model_response_parsed"]

Unnamed: 0,model_response_parsed
0,"{'tags': ['Connection Issues', 'Internet', 'Sl..."
1,{'tags': ['Hardware']}
2,{'tags': ['Data Loss']}
3,"{'tags': ['Wifi', 'Connection Issues']}"
4,{'tags': ['Battery']}
5,"{'tags': ['Account Access', 'Password Reset']}"
6,{'tags': ['Performance Issues']}
7,{'tags': ['Hardware Issues']}
8,{'tags': ['Data Loss']}
9,{'tags': ['Graphics Card']}


In [40]:
# Normalizing the model_response_parsed column
model_response_parsed_df_2 = pd.json_normalize(data_2['model_response_parsed'])
model_response_parsed_df_2.head()

Unnamed: 0,tags
0,"[Connection Issues, Internet, Slow Connection]"
1,[Hardware]
2,[Data Loss]
3,"[Wifi, Connection Issues]"
4,[Battery]


In [41]:
# Concatinating two dataframes
data_with_parsed_model_output_2 = pd.concat([data_2, model_response_parsed_df_2], axis=1)
data_with_parsed_model_output_2.head()

Unnamed: 0,support_tick_id,support_ticket_text,model_response,model_response_parsed,tags
0,ST2023-006,My internet connection has significantly slowe...,"{""tags"": [""Connection Issues"", ""Internet"", ""Sl...","{'tags': ['Connection Issues', 'Internet', 'Sl...","[Connection Issues, Internet, Slow Connection]"
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{""tags"": [""Hardware""]}",{'tags': ['Hardware']},[Hardware]
2,ST2023-008,I've accidentally deleted essential work docum...,"{""tags"": [""Data Loss""]}",{'tags': ['Data Loss']},[Data Loss]
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{""tags"": [""Wifi"", ""Connection Issues""]}","{'tags': ['Wifi', 'Connection Issues']}","[Wifi, Connection Issues]"
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{""tags"": [""Battery""]}",{'tags': ['Battery']},[Battery]


In [42]:
# Dropping model_response and model_response_parsed columns
final_data_2 = data_with_parsed_model_output_2.drop(['model_response','model_response_parsed'], axis=1)
final_data_2.head()

Unnamed: 0,support_tick_id,support_ticket_text,tags
0,ST2023-006,My internet connection has significantly slowe...,"[Connection Issues, Internet, Slow Connection]"
1,ST2023-007,Urgent help required! My laptop refuses to sta...,[Hardware]
2,ST2023-008,I've accidentally deleted essential work docum...,[Data Loss]
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"[Wifi, Connection Issues]"
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",[Battery]


In [43]:
# Checking the value counts of Category column
final_data_2['tags'].value_counts()

Unnamed: 0_level_0,count
tags,Unnamed: 1_level_1
[Data Loss],6
[Hardware],2
"[Connection Issues, Internet, Slow Connection]",1
"[Wifi, Connection Issues]",1
[Battery],1
"[Account Access, Password Reset]",1
[Performance Issues],1
[Hardware Issues],1
[Graphics Card],1
"[Screen, Hardware]",1


In [44]:
# Concatinating two dataframes
final_data_2 = pd.concat([final_data_2,final_data_1["category"]],axis=1)

In [45]:
# viewing newly updated dataframe
final_data_2 = final_data_2[["support_tick_id","support_ticket_text","category","tags"]]
final_data_2

Unnamed: 0,support_tick_id,support_ticket_text,category,tags
0,ST2023-006,My internet connection has significantly slowe...,Technical Issues,"[Connection Issues, Internet, Slow Connection]"
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware Issues,[Hardware]
2,ST2023-008,I've accidentally deleted essential work docum...,Data Recovery,[Data Loss]
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Technical Issues,"[Wifi, Connection Issues]"
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Hardware Issues,[Battery]
5,ST2023-011,I'm locked out of my online banking account an...,Technical Issues,"[Account Access, Password Reset]"
6,ST2023-012,"My computer's performance is sluggish, severel...",Technical Issues,[Performance Issues]
7,ST2023-013,I'm experiencing a recurring blue screen error...,Hardware Issues,[Hardware Issues]
8,ST2023-014,My external hard drive isn't being recognized ...,Data Recovery,[Data Loss]
9,ST2023-015,The graphics card in my gaming laptop seems to...,Hardware Issues,[Graphics Card]


## **Assigning Priority and ETA**

In [46]:
# creating a copy of the data
data_3 = data.copy()

In [47]:
# Function created to generate an output from the model
def response_3(prompt,ticket,category,tags):
    model_output = llm(
      f"""
      Q: {prompt}
      Support ticket: {ticket}
      Category: {category}
      Tags: {tags}
      A:
      """,
      max_tokens=20,   # defining the maximum number of tokens the model should generate for this task.
      stop=["Q:", "\n"],
      temperature=0.01,  # temperature set to 0.01(low) for deterministic output.
      echo=False,
    )

    temp_output = model_output["choices"][0]["text"]
    final_output = temp_output[temp_output.index('{'):]

    return final_output

In [48]:
# Prompt creation for task 3
prompt_3 = """
    As an AI, your task is to determine the priority and estimated time to resolve (ETA) for IT support tickets. 
    Consider the severity of the issue, the time needed for resolution, and customer satisfaction. 
    Your response should be in the format: {"priority": "High", "eta": "2 Days"}.
    Keep your output simple and accurate. Ensure that all curly braces are closed and there are no additional characters in the output.
"""

**Note**: The output of the model should be in a structured format (JSON format).

In [49]:
# Utilizing generate_llama_response as a function on the variable: support_ticket_text  
start = time.time()
data_3['model_response'] = final_data_2[['support_ticket_text','category','tags']].apply(lambda x: response_3(prompt_3, x[0],x[1],x[2]),axis=1)
end = time.time()

  data_3['model_response'] = final_data_2[['support_ticket_text','category','tags']].apply(lambda x: response_3(prompt_3, x[0],x[1],x[2]),axis=1)
Llama.generate: prefix-match hit

llama_print_timings:        load time =    2789.61 ms
llama_print_timings:      sample time =       7.75 ms /    14 runs   (    0.55 ms per token,  1807.38 tokens per second)
llama_print_timings: prompt eval time =    2284.66 ms /   180 tokens (   12.69 ms per token,    78.79 tokens per second)
llama_print_timings:        eval time =   10534.19 ms /    13 runs   (  810.32 ms per token,     1.23 tokens per second)
llama_print_timings:       total time =   12878.55 ms /   193 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =    2789.61 ms
llama_print_timings:      sample time =       7.71 ms /    14 runs   (    0.55 ms per token,  1815.12 tokens per second)
llama_print_timings: prompt eval time =    1774.97 ms /    68 tokens (   26.10 ms per token,    38.31 tokens per second)
llam

In [50]:
# Time taken for model to generate output
print("Time taken:",round((end-start))," seconds")

Time taken  4.3 minutes.


In [51]:
# Initial model output
data_3['model_response'].head(5)

Unnamed: 0,model_response
0,"{""priority"": ""High"", ""eta"": ""1 Day""}"
1,"{""priority"": ""High"", ""eta"": ""1 Day""}"
2,"{""priority"": ""High"", ""eta"": ""1 Day""}"
3,"{""priority"": ""Medium"", ""eta"": ""3 Days""}"
4,"{""priority"": ""Medium"", ""eta"": ""3 Days""}"
5,"{""priority"": ""High"", ""eta"": ""1 Day""}"
6,"{""priority"": ""High"", ""eta"": ""1 Day""}"
7,"{""priority"": ""High"", ""eta"": ""3 Days""}"
8,"{""priority"": ""High"", ""eta"": ""3 Days""}"
9,"{""priority"": ""High"", ""eta"": ""3 Days""}"


In [52]:
# Support ticket text
i = 3
print(data_3.loc[i,'support_ticket_text'])

Despite being in close proximity to my Wi-Fi router, the signal remains persistently weak in my home. This issue has been ongoing, and I need assistance troubleshooting it. Please help me resolve the weak Wi-Fi signal problem.


In [53]:
# Model output
print(data_3.loc[i,'model_response'])

{"priority": "Medium", "eta": "3 Days"}


In [54]:
# Applying the function to the model response
data_3['model_response_parsed'] = data_3['model_response'].apply(extract_json_data)
data_3['model_response_parsed'].head()

Unnamed: 0,model_response_parsed
0,"{'priority': 'High', 'eta': '1 Day'}"
1,"{'priority': 'High', 'eta': '1 Day'}"
2,"{'priority': 'High', 'eta': '1 Day'}"
3,"{'priority': 'Medium', 'eta': '3 Days'}"
4,"{'priority': 'Medium', 'eta': '3 Days'}"


In [55]:
# Normalizing the model_response_parsed column
model_response_parsed_df_3 = pd.json_normalize(data_3['model_response_parsed'])
model_response_parsed_df_3.head(21)

Unnamed: 0,priority,eta
0,High,1 Day
1,High,1 Day
2,High,1 Day
3,Medium,3 Days
4,Medium,3 Days
5,High,1 Day
6,High,1 Day
7,High,3 Days
8,High,3 Days
9,High,3 Days


In [56]:
# Concatinating two dataframes
data_with_parsed_model_output_3 = pd.concat([data_3, model_response_parsed_df_3], axis=1)
data_with_parsed_model_output_3.head()

Unnamed: 0,support_tick_id,support_ticket_text,model_response,model_response_parsed,priority,eta
0,ST2023-006,My internet connection has significantly slowe...,"{""priority"": ""High"", ""eta"": ""1 Day""}","{'priority': 'High', 'eta': '1 Day'}",High,1 Day
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{""priority"": ""High"", ""eta"": ""1 Day""}","{'priority': 'High', 'eta': '1 Day'}",High,1 Day
2,ST2023-008,I've accidentally deleted essential work docum...,"{""priority"": ""High"", ""eta"": ""1 Day""}","{'priority': 'High', 'eta': '1 Day'}",High,1 Day
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{""priority"": ""Medium"", ""eta"": ""3 Days""}","{'priority': 'Medium', 'eta': '3 Days'}",Medium,3 Days
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{""priority"": ""Medium"", ""eta"": ""3 Days""}","{'priority': 'Medium', 'eta': '3 Days'}",Medium,3 Days


In [57]:
# Dropping model_response and model_response_parsed columns
final_data_3 = data_with_parsed_model_output_3.drop(['model_response','model_response_parsed'], axis=1)
final_data_3.head()

Unnamed: 0,support_tick_id,support_ticket_text,priority,eta
0,ST2023-006,My internet connection has significantly slowe...,High,1 Day
1,ST2023-007,Urgent help required! My laptop refuses to sta...,High,1 Day
2,ST2023-008,I've accidentally deleted essential work docum...,High,1 Day
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Medium,3 Days
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Medium,3 Days


In [58]:
# Concatinating two dataframes
final_data_3 = pd.concat([final_data_3,final_data_2[["category","tags"]]],axis=1)

In [59]:
# Creating new dataframe
final_data_3 = final_data_3[["support_tick_id","support_ticket_text","category","tags","priority","eta"]]

In [60]:
# viewing newly updated dataframe
final_data_3

Unnamed: 0,support_tick_id,support_ticket_text,category,tags,priority,eta
0,ST2023-006,My internet connection has significantly slowe...,Technical Issues,"[Connection Issues, Internet, Slow Connection]",High,1 Day
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware Issues,[Hardware],High,1 Day
2,ST2023-008,I've accidentally deleted essential work docum...,Data Recovery,[Data Loss],High,1 Day
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Technical Issues,"[Wifi, Connection Issues]",Medium,3 Days
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Hardware Issues,[Battery],Medium,3 Days
5,ST2023-011,I'm locked out of my online banking account an...,Technical Issues,"[Account Access, Password Reset]",High,1 Day
6,ST2023-012,"My computer's performance is sluggish, severel...",Technical Issues,[Performance Issues],High,1 Day
7,ST2023-013,I'm experiencing a recurring blue screen error...,Hardware Issues,[Hardware Issues],High,3 Days
8,ST2023-014,My external hard drive isn't being recognized ...,Data Recovery,[Data Loss],High,3 Days
9,ST2023-015,The graphics card in my gaming laptop seems to...,Hardware Issues,[Graphics Card],High,3 Days


## **Creating a Draft Response**

In [61]:
# creating a copy of the data
data_4 = data.copy()

In [62]:
# Function to generate output from the model
def response_4(prompt,ticket,category,tags,priority,eta):
    model_output = llm(
      f"""
      Q: {prompt}
      Support ticket: {ticket}
      Category : {category}
      Tags : {tags}
      Priority: {priority}
      ETA: {eta}
      A:
      """,
      max_tokens=1024,  # defining the maximum number of tokens the model should generate for this task.
      stop=["Q:", "\n"],
      temperature=0.01,  # temperature set to 0.01(low) for deterministic output.
      echo=False,
    )

    temp_output = model_output["choices"][0]["text"]

    return temp_output

In [63]:
# Prompt creation for task 4
prompt_4 = """
    As an AI, your task is to draft a response for IT support tickets. 
    Consider customer satisfaction, the severity of the issue, and the company's responsibility. 
    Your response should be in the format: {"response": "This is a draft response"}. 
    Ensure your response is empathetic, professional, helpful, and concise.
    Please ensure that all curly braces are closed and there are no additional characters in the output.
"""

**Note** : For this task, we will not be using the *`extract_json_data`* function. Hence, the output from the model should be a plain string and not a JSON object.

In [64]:
# Utilizing generate_llama_response as a function on the variable: support_ticket_text 
start = time.time()
data_4['model_response'] = final_data_3[['support_ticket_text','category','tags','priority','eta']].apply(lambda x: response_4(prompt_4, x[0],x[1],x[2],x[3],x[4]),axis=1)
end = time.time()

  data_4['model_response'] = final_data_3[['support_ticket_text','category','tags','priority','eta']].apply(lambda x: response_4(prompt_4, x[0],x[1],x[2],x[3],x[4]),axis=1)
Llama.generate: prefix-match hit

llama_print_timings:        load time =    2789.61 ms
llama_print_timings:      sample time =     115.80 ms /   200 runs   (    0.58 ms per token,  1727.10 tokens per second)
llama_print_timings: prompt eval time =    2403.00 ms /   197 tokens (   12.20 ms per token,    81.98 tokens per second)
llama_print_timings:        eval time =  159970.56 ms /   199 runs   (  803.87 ms per token,     1.24 tokens per second)
llama_print_timings:       total time =  163234.98 ms /   396 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =    2789.61 ms
llama_print_timings:      sample time =     123.59 ms /   218 runs   (    0.57 ms per token,  1763.97 tokens per second)
llama_print_timings: prompt eval time =    1841.78 ms /    82 tokens (   22.46 ms per token,    44

In [65]:
# Time taken for output to be generated by model
print("Time taken:", round((end-start)),"seconds")

Time taken  62.6 minutes.


In [66]:
# Initial model output
data_4['model_response'].head(21)

Unnamed: 0,model_response
0,"{""response"": ""Dear Valued Customer,\n\nWe apo..."
1,"{""response"": ""Dear Valued Customer,\n\nWe und..."
2,"{""response"": ""Dear Valued Customer,\n\nWe und..."
3,"{""response"": ""Dear Valued Customer,\n\nWe apo..."
4,"{""response"": ""Dear Valued Customer,\n\nWe're ..."
5,"{""response"": ""Dear Valued Customer,\n\nWe're ..."
6,"{""response"": ""Dear Valued Customer,\n\nWe apo..."
7,"{""response"": ""Dear Valued Customer,\n\nWe apo..."
8,"{""response"": ""Dear Valued Customer,\n\nWe und..."
9,"{""response"": ""Dear Valued Customer,\n\nWe apo..."


In [67]:
# Support ticket text
i = 2
print(data_4.loc[i,'support_ticket_text'])

I've accidentally deleted essential work documents, causing substantial data loss. I understand the need to avoid further actions on my device. Can you please prioritize the data recovery process and guide me through it?


In [68]:
# Model output
print(data_4.loc[i,'model_response'])

 {"response": "Dear Valued Customer,\n\nWe understand the urgency and importance of your situation and apologize for any inconvenience caused by the data loss.\n\nOur team of experts will prioritize your data recovery request and work diligently to help you recover your essential documents.\n\nIn the meantime, we recommend that you avoid using your device to prevent any further data loss.\n\nWe will keep you updated throughout the process and will provide you with clear instructions on how to proceed once your data is recovered.\n\nPlease rest assured that our team will do everything in their power to help you recover your data as quickly and efficiently as possible.\n\nThank you for bringing this matter to our attention and please do not hesitate to contact us if you have any further questions or concerns.\n\nBest regards,\n[Your Company] Data Recovery Team"}


In [69]:
# Applying the function to the model response
data_4['model_response_parsed'] = data_4['model_response'].apply(extract_json_data)
data_4['model_response_parsed'].head()

Unnamed: 0,model_response_parsed
0,"{'response': 'Dear Valued Customer, We apolog..."
1,"{'response': 'Dear Valued Customer, We unders..."
2,"{'response': 'Dear Valued Customer, We unders..."
3,"{'response': 'Dear Valued Customer, We apolog..."
4,"{'response': 'Dear Valued Customer, We're sor..."


In [70]:
# Normalizing the model_response_parsed column
model_response_parsed_df_4 = pd.json_normalize(data_4['model_response_parsed'])
model_response_parsed_df_4.head(21)

Unnamed: 0,response
0,"Dear Valued Customer,\n\nWe apologize for the ..."
1,"Dear Valued Customer,\n\nWe understand that yo..."
2,"Dear Valued Customer,\n\nWe understand the urg..."
3,"Dear Valued Customer,\n\nWe apologize for any ..."
4,"Dear Valued Customer,\n\nWe're sorry to hear t..."
5,"Dear Valued Customer,\n\nWe're sorry to hear t..."
6,"Dear Valued Customer,\n\nWe apologize for any ..."
7,"Dear Valued Customer,\n\nWe apologize for the ..."
8,"Dear Valued Customer,\n\nWe understand that lo..."
9,"Dear Valued Customer,\n\nWe apologize for any ..."


In [71]:
# Concatinating two dataframes
final_data_4 = pd.concat([final_data_3,model_response_parsed_df_4],axis=1)

In [72]:
# Renaming the dataframe
final_data_4.rename(columns={"model_response_parsed":"response"},inplace=True)

In [73]:
# Viewing newly updated dataframe
final_data_4

Unnamed: 0,support_tick_id,support_ticket_text,category,tags,priority,eta,response
0,ST2023-006,My internet connection has significantly slowe...,Technical Issues,"[Connection Issues, Internet, Slow Connection]",High,1 Day,"Dear Valued Customer,\n\nWe apologize for the ..."
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware Issues,[Hardware],High,1 Day,"Dear Valued Customer,\n\nWe understand that yo..."
2,ST2023-008,I've accidentally deleted essential work docum...,Data Recovery,[Data Loss],High,1 Day,"Dear Valued Customer,\n\nWe understand the urg..."
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Technical Issues,"[Wifi, Connection Issues]",Medium,3 Days,"Dear Valued Customer,\n\nWe apologize for any ..."
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Hardware Issues,[Battery],Medium,3 Days,"Dear Valued Customer,\n\nWe're sorry to hear t..."
5,ST2023-011,I'm locked out of my online banking account an...,Technical Issues,"[Account Access, Password Reset]",High,1 Day,"Dear Valued Customer,\n\nWe're sorry to hear t..."
6,ST2023-012,"My computer's performance is sluggish, severel...",Technical Issues,[Performance Issues],High,1 Day,"Dear Valued Customer,\n\nWe apologize for any ..."
7,ST2023-013,I'm experiencing a recurring blue screen error...,Hardware Issues,[Hardware Issues],High,3 Days,"Dear Valued Customer,\n\nWe apologize for the ..."
8,ST2023-014,My external hard drive isn't being recognized ...,Data Recovery,[Data Loss],High,3 Days,"Dear Valued Customer,\n\nWe understand that lo..."
9,ST2023-015,The graphics card in my gaming laptop seems to...,Hardware Issues,[Graphics Card],High,3 Days,"Dear Valued Customer,\n\nWe apologize for any ..."


## **Model Output Analysis**

In [74]:
# Creating a copy of the dataframe of task 4
final_data = final_data_4.copy()

In [75]:
# Value counts of category
final_data['category'].value_counts()

Unnamed: 0_level_0,count
category,Unnamed: 1_level_1
Technical Issues,8
Hardware Issues,7
Data Recovery,6


The model output for **category**:
> "Technical Issues" for 8 tickets

> "Hardware Issues" for 7 tickets

> "Data Recovery" for 6 tickets

In [76]:
# Value counts of priority
final_data["priority"].value_counts()

Unnamed: 0_level_0,count
priority,Unnamed: 1_level_1
High,19
Medium,2


The model output for **priority** of:

> "High" to 19 tickets

> "Medium" to 2 tickets

In [77]:
# Value counts of ETA
final_data["eta"].value_counts()

Unnamed: 0_level_0,count
eta,Unnamed: 1_level_1
3 Days,12
1 Day,9


The model output for **ETA** of:
> "3 Days" to 12 tickets

> "1 Day" to 9 tickets.

Let's dive in a bit deeper here.

In [78]:
 # Group by data with regard to categories and ETA.
final_data.groupby(['category', 'eta']).support_tick_id.count()

Unnamed: 0_level_0,Unnamed: 1_level_0,support_tick_id
category,eta,Unnamed: 2_level_1
Data Recovery,1 Day,1
Data Recovery,3 Days,5
Hardware Issues,1 Day,2
Hardware Issues,3 Days,5
Technical Issues,1 Day,6
Technical Issues,3 Days,2


> Most "Data Recovery" tickets are estimated by the model to be resolved in "3 Days".

> Most "Hardware Issues" tickets are estimated by the model to be resovled in "3 Days".

> Most "Technical Isses" tickets are estimated by the model to be resovled in "1 Day".

In [79]:
# Final_data(output) generated by model.
final_data.head()

Unnamed: 0,support_tick_id,support_ticket_text,category,tags,priority,eta,response
0,ST2023-006,My internet connection has significantly slowe...,Technical Issues,"[Connection Issues, Internet, Slow Connection]",High,1 Day,"Dear Valued Customer,\n\nWe apologize for the ..."
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware Issues,[Hardware],High,1 Day,"Dear Valued Customer,\n\nWe understand that yo..."
2,ST2023-008,I've accidentally deleted essential work docum...,Data Recovery,[Data Loss],High,1 Day,"Dear Valued Customer,\n\nWe understand the urg..."
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Technical Issues,"[Wifi, Connection Issues]",Medium,3 Days,"Dear Valued Customer,\n\nWe apologize for any ..."
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Hardware Issues,[Battery],Medium,3 Days,"Dear Valued Customer,\n\nWe're sorry to hear t..."


## **Actionable Insights and Recommendations**

**Insights:**
> A detailed company information in the prompts provide better model output.

> Adjust priority levels to align with your business's actual capabilities.

> Curating responses to a specific business by adjusting prompts or outputs.

> Adjust or expand categories to match your business's support needs. 

> Overall, The model's estimation of resolution times aligns with real-world scenarios.

**Recommendations:**
> Fine-tune the model with your company's data or profile for an improved performance.

> Adjust "priority" of support tickets to reflect priorities the business can actually facilitate.

> Need to evaluate on the format of responses with regard to the mail/response delivery methods.

> Require a thorough test of the model with actual data before implementation.