## **Problem Statement**

### **Business Context**

In today's dynamic business landscape, organizations are increasingly recognizing the pivotal role customer feedback plays in shaping the trajectory of their products and services. The ability to **swiftly and effectively respond to customer input** not only fosters enhanced customer experiences but also serves as a catalyst for growth, prolonged customer engagement, and the nurturing of lifetime value relationships.

As a dedicated Product Manager or Product Analyst, staying attuned to the voice of your customers is not just a best practice; it's a strategic imperative.

While your organization may be inundated with a wealth of customer-generated feedback and support tickets, your role entails much more than just processing these inputs. To make your efforts in managing customer experience and expectations truly impactful, you need a structured approach – a method that allows you to discern the most pressing issues, set priorities, and allocate resources judiciously.

One of the most effective strategies at your disposal as an organization is to harness the power of automated Support Ticket Categorization - **done in the modern day using Large Language Models and Generative AI.**


### **Objective**

Develop a Generative AI application using a Large Language Model to **automate the classification and processing of support tickets.** The application will aim to predict ticket categories, assign priority, suggest estimated resolution times, generate responses based on sentiment analysis, and store the results in a structured DataFrame.


## **Installing and Importing Necessary Libraries and Dependencies**

In [1]:
# Installation for CPU llama-cpp-python
!CMAKE_ARGS="-DLLAMA_CUBLAS=off" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.1/1.8 MB[0m [31m1.7 MB/s[0m eta [36m0:00:02[0m[2K     [91m━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━[0m [32m0.9/1.8 MB[0m [31m12.6 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m21.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.2/18.2 MB[0m [31m195.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m150.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?

In [3]:
# Installation of the hugging face hub
!pip install huggingface_hub==0.20.3 pandas==1.5.3 -q

In [5]:
# Importing library for data manipulation
import pandas as pd

# Function to download the model from the Hugging Face model hub
from huggingface_hub import hf_hub_download

# Importing the Llama class from the llama_cpp module
from llama_cpp import Llama

# Importing the json module
import json

## **Loading the Dataset**

In [6]:
# Mounting the Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [7]:
# Reading the CSV file
data = pd.read_csv("/content/drive/MyDrive/AL ML program/Introduction to Natural Language Processing/project6/Support_ticket_text_data_mid_term.csv")

## **Data Overview**

In [8]:
# Checking the first 5 rows of the data
data.head()

Unnamed: 0,support_tick_id,support_ticket_text
0,ST2023-006,My internet connection has significantly slowe...
1,ST2023-007,Urgent help required! My laptop refuses to sta...
2,ST2023-008,I've accidentally deleted essential work docum...
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...
4,ST2023-010,"My smartphone battery is draining rapidly, eve..."


In [9]:
# Checking the shape of the data
data.shape

(21, 2)

In [10]:
# Checking for missing values in the data
data.isnull().sum()

support_tick_id        0
support_ticket_text    0
dtype: int64

## **Model Building**

### Loading the model

In [11]:
# Loading the model TheBloke/Llama-2-13B-chat-GGUF from Hugging Face
model_name_or_path = "TheBloke/Llama-2-13B-chat-GGUF"
model_basename = "llama-2-13b-chat.Q5_K_M.gguf" # the model is in gguf format

In [12]:
# Using hf_hub_download to download the model from the Hugging Face model hub
# The repo_id parameter specifies the model name or path in the Hugging Face repository
# The filename parameter specifies the name of the file to download
model_path = hf_hub_download(
    repo_id = model_name_or_path,
    filename = model_basename
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


llama-2-13b-chat.Q5_K_M.gguf:   0%|          | 0.00/9.23G [00:00<?, ?B/s]

In [13]:
# Creating an instance of the 'Llama' class with specified parameters

lcpp_llm = Llama(
    model_path=model_path,
    n_threads=2,  # CPU cores
    n_batch=512,
    n_gpu_layers=43,
    n_ctx=4096,  # Context window
)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


### Utility functions

In [1]:
def generate_llama_response(instruction, review):

    # System message explicitly instructing not to include the review text
    system_message = """
        [INST]<<SYS>>
        {}
        <</SYS>>[/INST]
    """.format(instruction)

    # Combining user_prompt and system_message to create the prompt
    prompt = f"{review}\n{system_message}"

    # Generating a response from the LLaMA model
    response = lcpp_llm(
        prompt=prompt,
        max_tokens=1024,
        temperature=0.01,
        top_p=0.95,
        repeat_penalty=1.2,
        top_k=50,
        stop=['INST'],
        echo=False,
    )

    # Extracting the sentiment from the response
    response_text = response["choices"][0]["text"]
    return response_text

## **Task 1: Ticket Categorization and Returning Structured Output**

In [2]:
# Creating a copy of the data
data_1 = data.copy()

NameError: name 'data' is not defined

In [16]:
# Writing a prompt to get the desired output for ticket categorization
instruction_1 = """
    Read the following support ticket text and categorize the issue. Provide the category for a response or solution. Use the following format:

    Category: [Choose from Hardware Issues, Data Recovery, Technical Issues]
"""

In [17]:
# Applying the prompt to the model and getting the model response for ticket categorization
data_1['llama_response'] = data_1['support_ticket_text'].apply(lambda x: generate_llama_response(instruction_1, x))

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


KeyboardInterrupt: 

In [None]:
# Checking the first five rows of the data to confirm whether the new column has been added
data_1.head()

Unnamed: 0,support_tick_id,support_ticket_text,llama_response
0,ST2023-006,My internet connection has significantly slowe...,Sure! Here's the categorization of your suppo...
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Category: Technical Issues
2,ST2023-008,I've accidentally deleted essential work docum...,Category: Data Recovery\n\nBased on your desc...
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Sure! Here's the categorization of your suppo...
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Category: Technical Issues\n\nBased on your d...


In [None]:
# Defining a function to extract category from the model response
def extract_category(model_response):
    if 'technical issues' in model_response.lower():
        return 'Technical issues'
    elif 'hardware issues' in model_response.lower():
        return 'Hardware issues'
    elif 'data recovery' in model_response.lower():
        return 'Data recovery'

In [None]:
# Applying the extract category function to draw the category type from the model response
data_1['Category'] = data_1['llama_response'].apply(extract_category)
data_1['Category'].head()

0    Technical issues
1    Technical issues
2       Data recovery
3    Technical issues
4    Technical issues
Name: Category, dtype: object

In [None]:
# Droping the model response from the dataset
final_data_1 = data_1.drop(['llama_response'], axis=1)
final_data_1.head(21)

Unnamed: 0,support_tick_id,support_ticket_text,Category
0,ST2023-006,My internet connection has significantly slowe...,Technical issues
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Technical issues
2,ST2023-008,I've accidentally deleted essential work docum...,Data recovery
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Technical issues
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Technical issues
5,ST2023-011,I'm locked out of my online banking account an...,Technical issues
6,ST2023-012,"My computer's performance is sluggish, severel...",Technical issues
7,ST2023-013,I'm experiencing a recurring blue screen error...,Hardware issues
8,ST2023-014,My external hard drive isn't being recognized ...,Data recovery
9,ST2023-015,The graphics card in my gaming laptop seems to...,Hardware issues


# Observation

The model has successfully generated and extracted the category type for each support ticket, which has been added to the dataset.

## **Task 2: Ticket Categorization and Returning Structured Output**

In [None]:
# creating a copy of the data
data_2 = data.copy()

In [None]:
## Writing a prompt to get the desired output
instruction_2 = """
    Analyze the support ticket text provided and determine the category of the issue. Output the result in JSON format where the 'Category' field should be one of "Hardware Issues", "Data Recovery", or "Technical Issues".

The output should be in the form of a JSON with
"Category": "<Category>"

    Replace the placeholders (<...>) with the appropriate information extracted from the support ticket text.
"""


In [None]:
# Applying the prompt to the model and getting the model response for ticket categorization
data_2['llama_response'] = data_2['support_ticket_text'].apply(lambda x: generate_llama_response(instruction_2,x))

Llama.generate: prefix-match hit

llama_print_timings:        load time =     766.75 ms
llama_print_timings:      sample time =       9.04 ms /    15 runs   (    0.60 ms per token,  1659.11 tokens per second)
llama_print_timings: prompt eval time =     564.58 ms /   171 tokens (    3.30 ms per token,   302.88 tokens per second)
llama_print_timings:        eval time =     856.37 ms /    14 runs   (   61.17 ms per token,    16.35 tokens per second)
llama_print_timings:       total time =    1474.81 ms /   185 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     766.75 ms
llama_print_timings:      sample time =       8.79 ms /    15 runs   (    0.59 ms per token,  1707.46 tokens per second)
llama_print_timings: prompt eval time =     596.63 ms /   170 tokens (    3.51 ms per token,   284.93 tokens per second)
llama_print_timings:        eval time =     834.16 ms /    14 runs   (   59.58 ms per token,    16.78 tokens per second)
llama_print_timings:       to

In [None]:
# Checking the first five rows of the data to confirm whether the new column has been added
data_2.head()

Unnamed: 0,support_tick_id,support_ticket_text,llama_response
0,ST2023-006,My internet connection has significantly slowe...,"{\n""Category"": ""Technical Issues""\n}"
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{\n""Category"": ""Hardware Issues""\n}"
2,ST2023-008,I've accidentally deleted essential work docum...,"{\n""Category"": ""Data Recovery""\n}"
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{\n""Category"": ""Technical Issues""\n}"
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{\n""Category"": ""Technical Issues""\n}"


In [None]:
# defining a function to parse the JSON output from the model
def extract_json_data(json_str):
    try:
        # Finding the indices of the opening and closing curly braces
        json_start = json_str.find('{')
        json_end = json_str.rfind('}')

        if json_start != -1 and json_end != -1:
            extracted_category = json_str[json_start:json_end + 1]  # Extract the JSON object
            data_dict = json.loads(extracted_category)
            return data_dict
        else:
            print(f"Warning: JSON object not found in response: {json_str}")
            return {}
    except json.JSONDecodeError as e:
        print(f"Error parsing JSON: {e}")
        return {}

In [None]:
# Applying the extract_json_data function on the llama_response column to create a new column called llama_response_parsed
data_2['llama_response_parsed'] = data_2['llama_response'].apply(extract_json_data)
data_2['llama_response_parsed'].head()

0    {'Category': 'Technical Issues'}
1     {'Category': 'Hardware Issues'}
2       {'Category': 'Data Recovery'}
3    {'Category': 'Technical Issues'}
4    {'Category': 'Technical Issues'}
Name: llama_response_parsed, dtype: object

In [None]:
# Applying the json_normalize on llama_response_parsed variable
llama_response_parsed_df_2 = pd.json_normalize(data_2['llama_response_parsed'])
llama_response_parsed_df_2.head()

Unnamed: 0,Category
0,Technical Issues
1,Hardware Issues
2,Data Recovery
3,Technical Issues
4,Technical Issues


In [None]:
# Concatenating data_2 and llama_response_parsed_df_2
data_with_parsed_model_output_2 = pd.concat([data_2, llama_response_parsed_df_2], axis=1)
data_with_parsed_model_output_2.head()

Unnamed: 0,support_tick_id,support_ticket_text,llama_response,llama_response_parsed,Category
0,ST2023-006,My internet connection has significantly slowe...,"{\n""Category"": ""Technical Issues""\n}",{'Category': 'Technical Issues'},Technical Issues
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{\n""Category"": ""Hardware Issues""\n}",{'Category': 'Hardware Issues'},Hardware Issues
2,ST2023-008,I've accidentally deleted essential work docum...,"{\n""Category"": ""Data Recovery""\n}",{'Category': 'Data Recovery'},Data Recovery
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{\n""Category"": ""Technical Issues""\n}",{'Category': 'Technical Issues'},Technical Issues
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{\n""Category"": ""Technical Issues""\n}",{'Category': 'Technical Issues'},Technical Issues


In [None]:
# Droping the model response, 'llama_response', 'llama_response_parsed', from the dataset
final_data_2 = data_with_parsed_model_output_2.drop(['llama_response','llama_response_parsed'], axis=1)
final_data_2.head(21)

Unnamed: 0,support_tick_id,support_ticket_text,Category
0,ST2023-006,My internet connection has significantly slowe...,Technical Issues
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware Issues
2,ST2023-008,I've accidentally deleted essential work docum...,Data Recovery
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Technical Issues
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Technical Issues
5,ST2023-011,I'm locked out of my online banking account an...,Technical Issues
6,ST2023-012,"My computer's performance is sluggish, severel...",Technical Issues
7,ST2023-013,I'm experiencing a recurring blue screen error...,Hardware Issues
8,ST2023-014,My external hard drive isn't being recognized ...,Data Recovery
9,ST2023-015,The graphics card in my gaming laptop seems to...,Hardware Issues


# Observation

The model has identified and extracted the category for each support ticket from its JSON response, incorporating this information into the dataset.

Technical Issues are the most reported, with 9 out of 21 tickets. Data Recovery has 7 tickets, and Hardware Issues have 5. This indicates users face more general technical difficulties than specific hardware problems or data loss scenarios.

## **Task 3: Ticket Categorization, Creating Tags, and Returning Structured Output**

In [None]:
# creating a copy of the data
data_3 = data.copy()

In [None]:
# Writing a prompt to get the desired output
instruction_3 = """
    Analyze the support ticket text provided and determine the tag of the issue. Output the result in JSON format where the 'Tags' field should be the list of key terms related to the issue.

The output should be in the form of a JSON with
"Tags": "<Tags>"

    Replace the placeholders (<...>) by a list of key terms related to the issue extracted from the support ticket text.
"""

In [None]:
# Applying the prompt to the model and getting the model response to create tags
data_3['llama_response'] = data_3['support_ticket_text'].apply(lambda x: generate_llama_response(instruction_3,x))

Llama.generate: prefix-match hit

llama_print_timings:        load time =     766.75 ms
llama_print_timings:      sample time =      28.93 ms /    44 runs   (    0.66 ms per token,  1521.07 tokens per second)
llama_print_timings: prompt eval time =     546.29 ms /   165 tokens (    3.31 ms per token,   302.03 tokens per second)
llama_print_timings:        eval time =    2547.66 ms /    43 runs   (   59.25 ms per token,    16.88 tokens per second)
llama_print_timings:       total time =    3276.53 ms /   208 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     766.75 ms
llama_print_timings:      sample time =      29.83 ms /    42 runs   (    0.71 ms per token,  1407.98 tokens per second)
llama_print_timings: prompt eval time =     560.49 ms /   164 tokens (    3.42 ms per token,   292.60 tokens per second)
llama_print_timings:        eval time =    2375.89 ms /    41 runs   (   57.95 ms per token,    17.26 tokens per second)
llama_print_timings:       to

In [None]:
# Checking the first five rows of the data to confirm whether the new column has been added
data_3.head()

Unnamed: 0,support_tick_id,support_ticket_text,llama_response
0,ST2023-006,My internet connection has significantly slowe...,Sure! Here is the analysis of the support tic...
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{\n""Tags"": [""urgent"", ""laptop"", ""start"", ""har..."
2,ST2023-008,I've accidentally deleted essential work docum...,Sure! Here is the analysis of the support tic...
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Sure! Here is the analysis of the support tic...
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Sure! Here's the analysis of the support tick...


In [None]:
# Applying the extract_json_data function on the llama_response column to create a new column called llama_response_parsed
data_3['llama_response_parsed'] = data_3['llama_response'].apply(extract_json_data)
data_3['llama_response_parsed'].head()

0    {'Tags': ['Internet Connection', 'Speed', 'Dis...
1    {'Tags': ['urgent', 'laptop', 'start', 'hardwa...
2    {'Tags': ['Data Loss', 'Essential Documents', ...
3    {'Tags': ['Wi-Fi', 'signal strength', 'weak', ...
4    {'Tags': ['battery', 'draining', 'rapidly', 'm...
Name: llama_response_parsed, dtype: object

In [None]:
# Applying the json_normalize on llama_response_parsed variable
llama_response_parsed_df_3 = pd.json_normalize(data_3['llama_response_parsed'])
llama_response_parsed_df_3.head()

Unnamed: 0,Tags
0,"[Internet Connection, Speed, Disconnections, W..."
1,"[urgent, laptop, start, hardware, issue, prese..."
2,"[Data Loss, Essential Documents, Accidental De..."
3,"[Wi-Fi, signal strength, weak, persistent, hom..."
4,"[battery, draining, rapidly, minimal, use, issue]"


In [None]:
# Concatenating data_3 and llama_response_parsed_df_3
data_with_parsed_model_output_3 = pd.concat([data_3, llama_response_parsed_df_3], axis=1)
data_with_parsed_model_output_3.head()

Unnamed: 0,support_tick_id,support_ticket_text,llama_response,llama_response_parsed,Tags
0,ST2023-006,My internet connection has significantly slowe...,Sure! Here is the analysis of the support tic...,"{'Tags': ['Internet Connection', 'Speed', 'Dis...","[Internet Connection, Speed, Disconnections, W..."
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{\n""Tags"": [""urgent"", ""laptop"", ""start"", ""har...","{'Tags': ['urgent', 'laptop', 'start', 'hardwa...","[urgent, laptop, start, hardware, issue, prese..."
2,ST2023-008,I've accidentally deleted essential work docum...,Sure! Here is the analysis of the support tic...,"{'Tags': ['Data Loss', 'Essential Documents', ...","[Data Loss, Essential Documents, Accidental De..."
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Sure! Here is the analysis of the support tic...,"{'Tags': ['Wi-Fi', 'signal strength', 'weak', ...","[Wi-Fi, signal strength, weak, persistent, hom..."
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Sure! Here's the analysis of the support tick...,"{'Tags': ['battery', 'draining', 'rapidly', 'm...","[battery, draining, rapidly, minimal, use, issue]"


In [None]:
# Dropping llama_response and llama_response_parsed variables
final_data_3 = data_with_parsed_model_output_3.drop(['llama_response','llama_response_parsed'], axis=1)
final_data_3.head(21)

Unnamed: 0,support_tick_id,support_ticket_text,Tags
0,ST2023-006,My internet connection has significantly slowe...,"[Internet Connection, Speed, Disconnections, W..."
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"[urgent, laptop, start, hardware, issue, prese..."
2,ST2023-008,I've accidentally deleted essential work docum...,"[Data Loss, Essential Documents, Accidental De..."
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"[Wi-Fi, signal strength, weak, persistent, hom..."
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","[battery, draining, rapidly, minimal, use, issue]"
5,ST2023-011,I'm locked out of my online banking account an...,"[urgent, transaction, password, reset, online,..."
6,ST2023-012,"My computer's performance is sluggish, severel...","[Computer Performance, Sluggish, Productivity,..."
7,ST2023-013,I'm experiencing a recurring blue screen error...,"[Blue Screen Error, Frequent Crashes, Hardware..."
8,ST2023-014,My external hard drive isn't being recognized ...,"[Data Recovery, External Hard Drive, Not Recog..."
9,ST2023-015,The graphics card in my gaming laptop seems to...,"[hardware issue, graphics card, poor gaming pe..."


# Observation

The model has accurately identified the tags for each support ticket, extracted this data from its JSON response, and integrated it into the dataset as a structured entry.

## **Task 4 - Ticket Categorization, Creating Tags, Assigning Priority, and Returning Structured Output**

In [None]:
# Creatng a copy of the data
data_4 = data.copy()

In [None]:
## Writing a prompt to get the desired output
instruction_4 = """
    Analyze the support ticket text provided and determine the priority of the issue. Output the result in JSON format where the 'Priority' field should be one of "High", "Medium", or "Low".

The output should be in the form of a JSON with
"Priority": "<Priority>"

    Replace the placeholders (<...>) with the appropriate information extracted from the support ticket text.
"""

In [None]:
# Applying the prompt to the model and getting the model response to create tags by applying the generate_llama_response function to each ticket in the 'support_ticket_text' column of the DataFrame 'data_4'
data_4['llama_response'] = data_4['support_ticket_text'].apply(lambda x: generate_llama_response(instruction_4,x))

Llama.generate: prefix-match hit

llama_print_timings:        load time =     766.75 ms
llama_print_timings:      sample time =      47.08 ms /    78 runs   (    0.60 ms per token,  1656.61 tokens per second)
llama_print_timings: prompt eval time =     549.32 ms /   166 tokens (    3.31 ms per token,   302.19 tokens per second)
llama_print_timings:        eval time =    4592.22 ms /    77 runs   (   59.64 ms per token,    16.77 tokens per second)
llama_print_timings:       total time =    5412.39 ms /   243 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     766.75 ms
llama_print_timings:      sample time =       6.76 ms /    12 runs   (    0.56 ms per token,  1776.20 tokens per second)
llama_print_timings: prompt eval time =     579.16 ms /   165 tokens (    3.51 ms per token,   284.89 tokens per second)
llama_print_timings:        eval time =     657.26 ms /    11 runs   (   59.75 ms per token,    16.74 tokens per second)
llama_print_timings:       to

In [None]:
# Checking the first five rows of the data to confirm whether the new column has been added
data_4.head()

Unnamed: 0,support_tick_id,support_ticket_text,llama_response
0,ST2023-006,My internet connection has significantly slowe...,"{\n""Priority"": ""High""\n}\n\nThe support ticke..."
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{\n""Priority"": ""High""\n}"
2,ST2023-008,I've accidentally deleted essential work docum...,"{\n""Priority"": ""High""\n}"
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{\n""Priority"": ""Medium""\n}\n\nThe support tic..."
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{\n""Priority"": ""Medium""\n}\n\nBased on the in..."


In [None]:
## Applying the extract_json_data function on the llama_response column to create a new column called llama_response_parsed
data_4['llama_response_parsed'] = data_4['llama_response'].apply(extract_json_data)
data_4['llama_response_parsed'].head()

0      {'Priority': 'High'}
1      {'Priority': 'High'}
2      {'Priority': 'High'}
3    {'Priority': 'Medium'}
4    {'Priority': 'Medium'}
Name: llama_response_parsed, dtype: object

In [None]:
## Applying the json_normalize on llama_response_parsed variable
llama_response_parsed_df_4 = pd.json_normalize(data_4['llama_response_parsed'])
llama_response_parsed_df_4.head()

Unnamed: 0,Priority
0,High
1,High
2,High
3,Medium
4,Medium


In [None]:
## Concatenating data_4 and llama_response_parsed_df_4
data_with_parsed_model_output_4 = pd.concat([data_4, llama_response_parsed_df_4], axis=1)
data_with_parsed_model_output_4.head()

Unnamed: 0,support_tick_id,support_ticket_text,llama_response,llama_response_parsed,Priority
0,ST2023-006,My internet connection has significantly slowe...,"{\n""Priority"": ""High""\n}\n\nThe support ticke...",{'Priority': 'High'},High
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{\n""Priority"": ""High""\n}",{'Priority': 'High'},High
2,ST2023-008,I've accidentally deleted essential work docum...,"{\n""Priority"": ""High""\n}",{'Priority': 'High'},High
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{\n""Priority"": ""Medium""\n}\n\nThe support tic...",{'Priority': 'Medium'},Medium
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{\n""Priority"": ""Medium""\n}\n\nBased on the in...",{'Priority': 'Medium'},Medium


In [None]:
## Droping llama_response and llama_response_parsed variables
final_data_4 = data_with_parsed_model_output_4.drop(['llama_response','llama_response_parsed'], axis=1)
final_data_4.head(21)

Unnamed: 0,support_tick_id,support_ticket_text,Priority
0,ST2023-006,My internet connection has significantly slowe...,High
1,ST2023-007,Urgent help required! My laptop refuses to sta...,High
2,ST2023-008,I've accidentally deleted essential work docum...,High
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Medium
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Medium
5,ST2023-011,I'm locked out of my online banking account an...,High
6,ST2023-012,"My computer's performance is sluggish, severel...",High
7,ST2023-013,I'm experiencing a recurring blue screen error...,High
8,ST2023-014,My external hard drive isn't being recognized ...,High
9,ST2023-015,The graphics card in my gaming laptop seems to...,High


# Observation

The model has identified the priority for each support ticket, extracted it from the JSON response, and added it to the dataset.

Most tickets (17 out of 21) are marked as 'High' priority, suggesting the AI might be overly sensitive or that users report critical issues. Only 4 tickets are marked as 'Medium' priority, indicating the need to refine priority assignment criteria.



## **Task 5 - Ticket Categorization, Creating Tags, Assigning Priority, Assigning ETA, and Returning Structured Output**

In [None]:
# Creating a copy of the data
data_5 = data.copy()

In [None]:
# Writing a prompt to get the desired output
instruction_5 = """
    Analyze the support ticket text provided and determine the ETA of the issue. Output the result in JSON format where the 'ETA' field is an estimated time of arrival for a response or solution and should be one of '24 hours', 'Immediate' or '2-3 business days'.

    The output should be in the form of a JSON with
    "ETA": "<ETA>"

    Replace the placeholders (<...>) with the appropriate information extracted from the support ticket text.
"""

In [None]:
# Creating a new column llama_response' by applying the generate_llama_response function to each ticket in the 'support_ticket_text' column of the DataFrame 'data_5'
data_5['llama_response'] = data_5['support_ticket_text'].apply(lambda x: generate_llama_response(instruction_5,x))

Llama.generate: prefix-match hit

llama_print_timings:        load time =     766.75 ms
llama_print_timings:      sample time =      69.20 ms /   108 runs   (    0.64 ms per token,  1560.58 tokens per second)
llama_print_timings: prompt eval time =     564.04 ms /   186 tokens (    3.03 ms per token,   329.76 tokens per second)
llama_print_timings:        eval time =    6408.12 ms /   107 runs   (   59.89 ms per token,    16.70 tokens per second)
llama_print_timings:       total time =    7425.77 ms /   293 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     766.75 ms
llama_print_timings:      sample time =      40.46 ms /    67 runs   (    0.60 ms per token,  1655.87 tokens per second)
llama_print_timings: prompt eval time =     587.38 ms /   185 tokens (    3.18 ms per token,   314.96 tokens per second)
llama_print_timings:        eval time =    3943.83 ms /    66 runs   (   59.75 ms per token,    16.74 tokens per second)
llama_print_timings:       to

In [None]:
# Checking the first five rows of the data to confirm whether the new column has been added
data_5.head()

Unnamed: 0,support_tick_id,support_ticket_text,llama_response
0,ST2023-006,My internet connection has significantly slowe...,"{\n""ETA"": ""24 hours""\n}\n\nBased on the infor..."
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{\n""ETA"": ""Immediate""\n}\n\nBased on the info..."
2,ST2023-008,I've accidentally deleted essential work docum...,"{\n""ETA"": ""Immediate""\n}\n\nBased on the info..."
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{\n""ETA"": ""2-3 business days""\n}\n\nBased on ..."
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","\n {\n ""ETA"": ""24 hours""\n }\n \..."


In [None]:
## Applying the extract_json_data function on the llama_response column to create a new column called llama_response_parsed
data_5['llama_response_parsed'] = data_5['llama_response'].apply(extract_json_data)
data_5['llama_response_parsed'].head()

0             {'ETA': '24 hours'}
1            {'ETA': 'Immediate'}
2            {'ETA': 'Immediate'}
3    {'ETA': '2-3 business days'}
4             {'ETA': '24 hours'}
Name: llama_response_parsed, dtype: object

In [None]:
## Applying the json_normalize on llama_response_parsed variable
llama_response_parsed_df_5 = pd.json_normalize(data_5['llama_response_parsed'])
llama_response_parsed_df_5.head()

Unnamed: 0,ETA
0,24 hours
1,Immediate
2,Immediate
3,2-3 business days
4,24 hours


In [None]:
## Concatenating data_5 and llama_response_parsed_df_5
data_with_parsed_model_output_5 = pd.concat([data_5, llama_response_parsed_df_5], axis=1)
data_with_parsed_model_output_5.head()

Unnamed: 0,support_tick_id,support_ticket_text,llama_response,llama_response_parsed,ETA
0,ST2023-006,My internet connection has significantly slowe...,"{\n""ETA"": ""24 hours""\n}\n\nBased on the infor...",{'ETA': '24 hours'},24 hours
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{\n""ETA"": ""Immediate""\n}\n\nBased on the info...",{'ETA': 'Immediate'},Immediate
2,ST2023-008,I've accidentally deleted essential work docum...,"{\n""ETA"": ""Immediate""\n}\n\nBased on the info...",{'ETA': 'Immediate'},Immediate
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{\n""ETA"": ""2-3 business days""\n}\n\nBased on ...",{'ETA': '2-3 business days'},2-3 business days
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","\n {\n ""ETA"": ""24 hours""\n }\n \...",{'ETA': '24 hours'},24 hours


In [None]:
## Dropping llama_response and llama_response_parsed variables
final_data_5 = data_with_parsed_model_output_5.drop(['llama_response','llama_response_parsed'], axis=1)
final_data_5.head(21)

Unnamed: 0,support_tick_id,support_ticket_text,ETA
0,ST2023-006,My internet connection has significantly slowe...,24 hours
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Immediate
2,ST2023-008,I've accidentally deleted essential work docum...,Immediate
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,2-3 business days
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",24 hours
5,ST2023-011,I'm locked out of my online banking account an...,Immediate
6,ST2023-012,"My computer's performance is sluggish, severel...",2-3 business days
7,ST2023-013,I'm experiencing a recurring blue screen error...,2-3 business days
8,ST2023-014,My external hard drive isn't being recognized ...,2-3 business days
9,ST2023-015,The graphics card in my gaming laptop seems to...,2-3 business days


# Observation

* The model has identified the ETA for each support ticket, extracted this information from the JSON response, and added it to the dataset.

* ETAs are well-distributed: '24 hours' is the most common (9 tickets), followed by '2-3 business days' (7 tickets), and 'Immediate' (5 tickets).
 * This indicates a responsive IT support structure focused on quick issue resolution.

## **Task 6 - Ticket Categorization, Creating Tags, Assigning Priority, Assigning ETA, Creating a Draft Response, and Returning Structured Output**

In [None]:
# Creating a copy of the data
data_6 = data.copy()

In [None]:
instruction_6 = """
    Your task is to read the support ticket text and generate a brief customer service response addressing the issue. The response must be written in valid JSON format. It is crucial that your output strictly adheres to the JSON structure, with no exceptions. Here is the template you must use:

    {"Response": "Type your customer service response here."}

    For example, if the support ticket says, "My laptop screen is flickering," your output should look like:

    {"Response": "We're sorry to hear that your laptop screen is flickering. It may be an issue with the display driver or the hardware. We recommend restarting your laptop and updating the display drivers. If the problem continues, please contact our support team for a detailed troubleshooting guide."}

    Now, apply this template to generate a response for the new support ticket text:

    {"Response": "<Your Customer Service Response Here>"}

    Remember, only the placeholder text in quotes should be replaced, and the response must be enclosed in quotation marks to maintain valid JSON format.
"""


In [None]:
# Creating a new column llama_response' by applying the generate_llama_response function to each ticket in the 'support_ticket_text' column of the DataFrame 'data_6'
data_6['llama_response'] = data_6['support_ticket_text'].apply(lambda x: generate_llama_response(instruction_6,x))

Llama.generate: prefix-match hit

llama_print_timings:        load time =     766.75 ms
llama_print_timings:      sample time =      48.88 ms /    82 runs   (    0.60 ms per token,  1677.54 tokens per second)
llama_print_timings: prompt eval time =     811.96 ms /   308 tokens (    2.64 ms per token,   379.33 tokens per second)
llama_print_timings:        eval time =    5014.99 ms /    81 runs   (   61.91 ms per token,    16.15 tokens per second)
llama_print_timings:       total time =    6123.25 ms /   389 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     766.75 ms
llama_print_timings:      sample time =      55.46 ms /    84 runs   (    0.66 ms per token,  1514.52 tokens per second)
llama_print_timings: prompt eval time =     800.33 ms /   307 tokens (    2.61 ms per token,   383.59 tokens per second)
llama_print_timings:        eval time =    5019.11 ms /    83 runs   (   60.47 ms per token,    16.54 tokens per second)
llama_print_timings:       to

In [None]:
# Checking the first five rows of the data to confirm whether the new column has been added
data_6.head()

Unnamed: 0,support_tick_id,support_ticket_text,llama_response
0,ST2023-006,My internet connection has significantly slowe...,Sure! Here's a brief customer service respons...
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{""Response"": ""We apologize for the inconvenie..."
2,ST2023-008,I've accidentally deleted essential work docum...,Sure! Here's your customer service response i...
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Sure! Here is a brief customer service respon...
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","Sure! Here's my response: \n\n{""Response"": ""T..."


In [None]:
## Appling the extract_json_data function on the llama_response column to create a new column called llama_response_parsed
data_6['llama_response_parsed'] = data_6['llama_response'].apply(extract_json_data)
data_6['llama_response_parsed'].head()

0    {'Response': 'We apologize for any inconvenien...
1    {'Response': 'We apologize for the inconvenien...
2    {'Response': 'We apologize for any inconvenien...
3    {'Response': 'Thank you for reaching out about...
4    {'Response': 'Thank you for reaching out! We a...
Name: llama_response_parsed, dtype: object

In [None]:
## Applying the normalize on llama_response_parsed variable
llama_response_parsed_df_6= pd.json_normalize(data_6['llama_response_parsed'])
llama_response_parsed_df_6.head()

Unnamed: 0,Response
0,We apologize for any inconvenience caused by y...
1,We apologize for the inconvenience you are exp...
2,We apologize for any inconvenience caused by t...
3,Thank you for reaching out about your persiste...
4,Thank you for reaching out! We apologize for a...


In [None]:
## Concatenating data_6 and llama_response_parsed_df_6
data_with_parsed_model_output_6 = pd.concat([data_6, llama_response_parsed_df_6], axis=1)
data_with_parsed_model_output_6.head()

Unnamed: 0,support_tick_id,support_ticket_text,llama_response,llama_response_parsed,Response
0,ST2023-006,My internet connection has significantly slowe...,Sure! Here's a brief customer service respons...,{'Response': 'We apologize for any inconvenien...,We apologize for any inconvenience caused by y...
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{""Response"": ""We apologize for the inconvenie...",{'Response': 'We apologize for the inconvenien...,We apologize for the inconvenience you are exp...
2,ST2023-008,I've accidentally deleted essential work docum...,Sure! Here's your customer service response i...,{'Response': 'We apologize for any inconvenien...,We apologize for any inconvenience caused by t...
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Sure! Here is a brief customer service respon...,{'Response': 'Thank you for reaching out about...,Thank you for reaching out about your persiste...
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","Sure! Here's my response: \n\n{""Response"": ""T...",{'Response': 'Thank you for reaching out! We a...,Thank you for reaching out! We apologize for a...


In [None]:
## Droping llama_response and llama_response_parsed variables
final_data_6 = data_with_parsed_model_output_6.drop(['llama_response','llama_response_parsed'], axis=1)
final_data_6.head(21)

Unnamed: 0,support_tick_id,support_ticket_text,Response
0,ST2023-006,My internet connection has significantly slowe...,We apologize for any inconvenience caused by y...
1,ST2023-007,Urgent help required! My laptop refuses to sta...,We apologize for the inconvenience you are exp...
2,ST2023-008,I've accidentally deleted essential work docum...,We apologize for any inconvenience caused by t...
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Thank you for reaching out about your persiste...
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Thank you for reaching out! We apologize for a...
5,ST2023-011,I'm locked out of my online banking account an...,We apologize for any inconvenience you may hav...
6,ST2023-012,"My computer's performance is sluggish, severel...",We apologize for any inconvenience caused by y...
7,ST2023-013,I'm experiencing a recurring blue screen error...,We apologize for the inconvenience caused by t...
8,ST2023-014,My external hard drive isn't being recognized ...,We apologize for any inconvenience caused by y...
9,ST2023-015,The graphics card in my gaming laptop seems to...,We're sorry to hear that your graphics card ap...


# Observation

The model has successfully generated a concise customer service response for each support ticket.

## **Model Output Analysis**

In [None]:
# Creating a copy of the dataframe
columns_to_add = [df.iloc[:, -1] for df in [final_data_3, final_data_4, final_data_5, final_data_6]]
final_data = pd.concat([final_data_2] + columns_to_add, axis=1)

In [None]:
# Printing the final dataframe
final_data

Unnamed: 0,support_tick_id,support_ticket_text,Category,Tags,Priority,ETA,Response
0,ST2023-006,My internet connection has significantly slowe...,Technical Issues,"[Internet Connection, Speed, Disconnections, W...",High,24 hours,We apologize for any inconvenience caused by y...
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware Issues,"[urgent, laptop, start, hardware, issue, prese...",High,Immediate,We apologize for the inconvenience you are exp...
2,ST2023-008,I've accidentally deleted essential work docum...,Data Recovery,"[Data Loss, Essential Documents, Accidental De...",High,Immediate,We apologize for any inconvenience caused by t...
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Technical Issues,"[Wi-Fi, signal strength, weak, persistent, hom...",Medium,2-3 business days,Thank you for reaching out about your persiste...
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Technical Issues,"[battery, draining, rapidly, minimal, use, issue]",Medium,24 hours,Thank you for reaching out! We apologize for a...
5,ST2023-011,I'm locked out of my online banking account an...,Technical Issues,"[urgent, transaction, password, reset, online,...",High,Immediate,We apologize for any inconvenience you may hav...
6,ST2023-012,"My computer's performance is sluggish, severel...",Technical Issues,"[Computer Performance, Sluggish, Productivity,...",High,2-3 business days,We apologize for any inconvenience caused by y...
7,ST2023-013,I'm experiencing a recurring blue screen error...,Hardware Issues,"[Blue Screen Error, Frequent Crashes, Hardware...",High,2-3 business days,We apologize for the inconvenience caused by t...
8,ST2023-014,My external hard drive isn't being recognized ...,Data Recovery,"[Data Recovery, External Hard Drive, Not Recog...",High,2-3 business days,We apologize for any inconvenience caused by y...
9,ST2023-015,The graphics card in my gaming laptop seems to...,Hardware Issues,"[hardware issue, graphics card, poor gaming pe...",High,2-3 business days,We're sorry to hear that your graphics card ap...


In [None]:
# Checking the distribution of categories
final_data['Category'].value_counts()

Technical Issues    9
Data Recovery       7
Hardware Issues     5
Name: Category, dtype: int64

In [None]:
# Checking the distribution of priority
final_data['Priority'].value_counts()

High      17
Medium     4
Name: Priority, dtype: int64

In [None]:
# Checking the distribution of ETA
final_data['ETA'].value_counts()

24 hours             9
2-3 business days    7
Immediate            5
Name: ETA, dtype: int64

In [None]:
# Checking the distribution of priority by categories
final_data.groupby(['Category', 'Priority']).support_tick_id.count()

Category          Priority
Data Recovery     High        7
Hardware Issues   High        4
                  Medium      1
Technical Issues  High        6
                  Medium      3
Name: support_tick_id, dtype: int64

In [None]:
# Checking the distribution of ETA by categories
final_data.groupby(['Category', 'ETA']).support_tick_id.count()

Category          ETA              
Data Recovery     2-3 business days    2
                  24 hours             4
                  Immediate            1
Hardware Issues   2-3 business days    2
                  24 hours             1
                  Immediate            2
Technical Issues  2-3 business days    3
                  24 hours             4
                  Immediate            2
Name: support_tick_id, dtype: int64

**Observations**

Support Ticket Analysis

**Category Distribution:**

Technical Issues: Most common (9 tickets).

* Data Recovery: 7 tickets.

* Hardware Issues: Least frequent (5 tickets).

* Insight: Users face more technical issues than hardware problems or data loss.

**Priority Levels:**

* High Priority: 17 tickets.
* Medium Priority: 4 tickets.
* Insight: High number of high-priority tickets may indicate AI sensitivity or critical nature of issues. Priority criteria may need refinement.

**Estimated Resolution Times (ETA):**

* 24 Hours: Most common (9 tickets).
* 2-3 Business Days: 7 tickets.
* Immediate: 5 tickets.
* Insight: Responsive IT support aims for quick resolution.

**Category vs. Priority:**

* Data Recovery: All high priority.
* Hardware Issues: Mostly high priority, one medium.
* Technical Issues: Mixed priorities, more high than medium.

**Category vs. ETA:**

* Data Recovery: Mostly resolved within 24 hours, some variability.
* Hardware Issues: Spread across all ETAs, case-by-case approach.
* Technical Issues: Mostly resolved within 24 hours or 2-3 business days, less urgency than Data Recovery.


**Actionable Insights and Recommendations**

1- We can implement a feedback mechanism, allowing support agents to adjust or confirm AI categorization and priority settings, utilizing reinforcement learning for continuous improvement. This iterative process enables the AI to refine its accuracy over time.

2- To improve the model we can update design of prompts and adjust model parameters such as
* temperature to regulate response randomness,
* top_p to manage response diversity, or
* top_k to limit the maximum number of likely next tokens in the generated response.

3- While responses are based on sentiment analysis, they may appear generic. Enhancing sentiment analysis to detect varying levels of frustration or urgency in ticket text would enable more tailored responses.

4- To address recurring issues such as blue screen errors or hardware failures, we can integrate predictive maintenance tips into responses, mitigating future tickets on the same issue.

5- Reviewing ETA for resolution times ensures alignment with issue urgency and complexity. Implementation of a dynamic ETA predictor, learning from past ticket data, would provide more accurate estimates.

6- To keep pace with evolving technology and common issues, regular model retraining intervals with new data are essential to stay updated with the latest tech support challenges and solutions.