**Assignment - NLP**

**Objective -
Develop a Generative AI application using a Large Language Model to automate the classification and processing of support tickets. The application will aim to predict ticket categories, assign priority, suggest estimated resolution times, and store the results in a structured DataFrame.**

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
# Installation for GPU llama-cpp-python
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m36.9/36.9 MB[0m [31m151.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.2/18.2 MB[0m [31m243.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m212.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m133.2/133.2 kB[0m [31m270.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torch 2.2.1+cu12

In [3]:
# Install the hugging face hub
!pip install huggingface_hub -q

In [4]:
import pandas as pd

# Function to download the model from the Hugging Face model hub
from huggingface_hub import hf_hub_download

# Importing the Llama class from the llama_cpp module
from llama_cpp import Llama

# Importing the json module
import json

**Loading the Dataset**

In [5]:
data1 = pd.read_csv('/content/drive/MyDrive/Support_ticket_text_data_mid_term.csv')

**Data Overview**

In [6]:
data1.head()

Unnamed: 0,support_tick_id,support_ticket_text
0,ST2023-006,My internet connection has significantly slowe...
1,ST2023-007,Urgent help required! My laptop refuses to sta...
2,ST2023-008,I've accidentally deleted essential work docum...
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...
4,ST2023-010,"My smartphone battery is draining rapidly, eve..."


In [7]:
# code to check the shape of the data
data1.shape

(21, 2)

**Observations**

Data has 21 rows and 2 columns

In [8]:
#check for missing values in the data
data1.isnull().sum()

support_tick_id        0
support_ticket_text    0
dtype: int64

**Observations**

Data has no missing values

**Model Building**

**Loading the model**

In [9]:
model_name_or_path = "TheBloke/Llama-2-13B-chat-GGUF"
model_basename = "llama-2-13b-chat.Q5_K_M.gguf" # the model is in gguf format

In [10]:
# Using hf_hub_download to download a model from the Hugging Face model hub
# The repo_id parameter specifies the model name or path in the Hugging Face repository
# The filename parameter specifies the name of the file to download
model_path = hf_hub_download(
    repo_id=model_name_or_path,
    filename=model_basename
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


llama-2-13b-chat.Q5_K_M.gguf:   0%|          | 0.00/9.23G [00:00<?, ?B/s]

In [11]:
lcpp_llm = Llama(
    model_path=model_path,
    n_threads=2,  # CPU cores
    n_batch=512,  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
    n_gpu_layers=43,  # Change this value based on your model and your GPU VRAM pool.
    n_ctx=4096,  # Context window
)

llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from /root/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGUF/snapshots/4458acc949de0a9914c3eab623904d4fe999050a/llama-2-13b-chat.Q5_K_M.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 5120
llama_model_loader: - kv   4:                          llama.block_count u32              = 40
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 13824
llama_model_loader: - kv   6:                 llama.rope.dimension_

**Defining Model Response Parameters**

In [12]:
def generate_llama_response(instruction, review):

    # System message explicitly instructing not to include the review text
    system_message = """
        [INST]<<SYS>>
        {}
        <</SYS>>[/INST]
    """.format(instruction)

    # Combine user_prompt and system_message to create the prompt
    prompt = f"{review}\n{system_message}"

    # Generate a response from the LLaMA model
    response = lcpp_llm(
        prompt=prompt,
        max_tokens=1024,
        temperature=0.01,
        top_p=0.95,
        repeat_penalty=1.2,
        top_k=50,
        stop=['INST'],
        echo=False,
        seed=42,
    )

    # Extract the sentiment from the response
    response_text = response["choices"][0]["text"]
    return response_text

**Task 1: Ticket Categorization**

In [13]:
# create a copy of the data
data_1 = data1.copy()

In [14]:
# defining the instructions for the model
instruction_1 = """
    You are an AI analyzing restaurant reviews. Classify the sentiment of the provided review into the following categories:
    - Hardware Issues
    - Data Recovery
    - Technical Issues
    - General Inquiry
"""

In [15]:
data_1['llama_response'] = data_1['support_ticket_text'].apply(lambda x: generate_llama_response(instruction_1,x))


llama_print_timings:        load time =     583.51 ms
llama_print_timings:      sample time =      30.24 ms /    48 runs   (    0.63 ms per token,  1587.20 tokens per second)
llama_print_timings: prompt eval time =     583.16 ms /   127 tokens (    4.59 ms per token,   217.78 tokens per second)
llama_print_timings:        eval time =    2469.91 ms /    47 runs   (   52.55 ms per token,    19.03 tokens per second)
llama_print_timings:       total time =    3227.83 ms /   174 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     583.51 ms
llama_print_timings:      sample time =      52.47 ms /    72 runs   (    0.73 ms per token,  1372.29 tokens per second)
llama_print_timings: prompt eval time =     415.26 ms /   125 tokens (    3.32 ms per token,   301.01 tokens per second)
llama_print_timings:        eval time =    3683.22 ms /    71 runs   (   51.88 ms per token,    19.28 tokens per second)
llama_print_timings:       total time =    4483.23 ms /   196 

In [16]:
data_1.head()

Unnamed: 0,support_tick_id,support_ticket_text,llama_response
0,ST2023-006,My internet connection has significantly slowe...,"Based on the review provided, I would classif..."
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Based on the information provided in the revi...
2,ST2023-008,I've accidentally deleted essential work docum...,"Based on the content of your message, I would..."
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"Sure! Based on the review provided, here is t..."
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Sure! I'd be happy to help classify the senti...


In [17]:
def extract_category(model_response):
    if 'technical issues' in model_response.lower():
        return 'Technical issues'
    elif 'hardware issues' in model_response.lower():
        return 'Hardware issues'
    elif 'data recovery' in model_response.lower():
        return 'Data recovery'
    else:
      return 'General Inquiry'

In [18]:
data_1['Category'] = data_1['llama_response'].apply(extract_category)
data_1['Category'].head()

0    Technical issues
1    Technical issues
2       Data recovery
3     General Inquiry
4     Hardware issues
Name: Category, dtype: object

In [19]:
final_data_1 = data_1.drop(['llama_response'], axis=1)
final_data_1.head()

Unnamed: 0,support_tick_id,support_ticket_text,Category
0,ST2023-006,My internet connection has significantly slowe...,Technical issues
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Technical issues
2,ST2023-008,I've accidentally deleted essential work docum...,Data recovery
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,General Inquiry
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Hardware issues


We are able to generate categories for each ticket such as "Technical Issues, Hardware Issues, General inquiry and Data Recovery"

**Task 2: Ticket Categorization and Returning Structured Output**

In [20]:
# create a copy of the data
data_2 = data1.copy()

In [21]:
# defining the instructions for the model
instruction_2 = """
    You are an AI analyzing support tickets. Classify the category of the provided ticket issue into the following categories:
    - Hardware Issues
    - Data Recovery
    - Technical Issues
    - General Inquiry

    Format the output as a JSON object with a single key-value pair as shown below:
    {"Category": "your_category_prediction"}

    Do not include any other text in the output except the JSON.
"""

In [22]:
data_2['llama_response'] = data_2['support_ticket_text'].apply(lambda x: generate_llama_response(instruction_2,x))

Llama.generate: prefix-match hit

llama_print_timings:        load time =     583.51 ms
llama_print_timings:      sample time =       7.07 ms /    11 runs   (    0.64 ms per token,  1556.97 tokens per second)
llama_print_timings: prompt eval time =     585.25 ms /   178 tokens (    3.29 ms per token,   304.14 tokens per second)
llama_print_timings:        eval time =     644.21 ms /    10 runs   (   64.42 ms per token,    15.52 tokens per second)
llama_print_timings:       total time =    1275.95 ms /   188 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     583.51 ms
llama_print_timings:      sample time =       9.27 ms /    13 runs   (    0.71 ms per token,  1401.92 tokens per second)
llama_print_timings: prompt eval time =     549.25 ms /   177 tokens (    3.10 ms per token,   322.25 tokens per second)
llama_print_timings:        eval time =     758.76 ms /    12 runs   (   63.23 ms per token,    15.82 tokens per second)
llama_print_timings:       to

In [23]:
data_2.head()

Unnamed: 0,support_tick_id,support_ticket_text,llama_response
0,ST2023-006,My internet connection has significantly slowe...,"{""Category"": ""Technical Issues""}"
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{ ""Category"": ""Hardware Issues"" }"
2,ST2023-008,I've accidentally deleted essential work docum...,"{""Category"": ""Data Recovery""}"
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{""Category"": ""Technical Issues""}"
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Sure! Based on the information provided in th...


In [24]:
# defining a function to parse the JSON output from the model
def extract_json_data(json_str):
    try:
        # Find the indices of the opening and closing curly braces
        json_start = json_str.find('{')
        json_end = json_str.rfind('}')

        if json_start != -1 and json_end != -1:
            extracted_category = json_str[json_start:json_end + 1]  # Extract the JSON object
            data_dict = json.loads(extracted_category)
            return data_dict
        else:
            print(f"Warning: JSON object not found in response: {json_str}")
            return {}
    except json.JSONDecodeError as e:
        print(f"Error parsing JSON: {e}")
        return {}

In [25]:
data_2['llama_response_parsed'] = data_2['llama_response'].apply(extract_json_data)
data_2['llama_response_parsed'].head()

0    {'Category': 'Technical Issues'}
1     {'Category': 'Hardware Issues'}
2       {'Category': 'Data Recovery'}
3    {'Category': 'Technical Issues'}
4    {'Category': 'Technical Issues'}
Name: llama_response_parsed, dtype: object

In [26]:
llama_response_parsed_df_2 = pd.json_normalize(data_2['llama_response_parsed'])
llama_response_parsed_df_2.head()

Unnamed: 0,Category
0,Technical Issues
1,Hardware Issues
2,Data Recovery
3,Technical Issues
4,Technical Issues


In [27]:
data_with_parsed_model_output_2 = pd.concat([data_2, llama_response_parsed_df_2], axis=1)
data_with_parsed_model_output_2.head()

Unnamed: 0,support_tick_id,support_ticket_text,llama_response,llama_response_parsed,Category
0,ST2023-006,My internet connection has significantly slowe...,"{""Category"": ""Technical Issues""}",{'Category': 'Technical Issues'},Technical Issues
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{ ""Category"": ""Hardware Issues"" }",{'Category': 'Hardware Issues'},Hardware Issues
2,ST2023-008,I've accidentally deleted essential work docum...,"{""Category"": ""Data Recovery""}",{'Category': 'Data Recovery'},Data Recovery
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{""Category"": ""Technical Issues""}",{'Category': 'Technical Issues'},Technical Issues
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Sure! Based on the information provided in th...,{'Category': 'Technical Issues'},Technical Issues


In [28]:
final_data_2 = data_with_parsed_model_output_2.drop(['llama_response','llama_response_parsed'], axis=1)
final_data_2.head()

Unnamed: 0,support_tick_id,support_ticket_text,Category
0,ST2023-006,My internet connection has significantly slowe...,Technical Issues
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware Issues
2,ST2023-008,I've accidentally deleted essential work docum...,Data Recovery
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Technical Issues
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Technical Issues


**Task 3: Ticket Categorization, Creating Tags, and Returning Structured Output**

In [29]:
# create a copy of the data
data_3 = data1.copy()

In [30]:
# defining the instructions for the model
instruction_3 = """
    You are an AI analyzing support tickets. Classify the category of the provided ticket issue into the following categories:
    - Hardware Issues
    - Data Recovery
    - Technical Issues
    - General Inquiry

    Also, Tag the given support tickets using two or more of the below mentioned categories only depending upon the content of the article:
    - Laptop
    - Hardware Issue
    - Network Issue
    - Data Recovery
    - Performance Issue
    - Restart
    - Password Reset
    - Data Loss
    - Wi-fi Signal Strength
    - Battery Issue

    Provide the output in a JSON format separated with comma in the following keys:
    {
        "Category" : "your_Category_prediction",
        "Tags": "your_tag_prediction"
    }

    Only return the JSON, do not return any other information and remove the extra spaces.
"""

In [31]:
data_3['llama_response'] = data_3['support_ticket_text'].apply(lambda x: generate_llama_response(instruction_3,x))

Llama.generate: prefix-match hit

llama_print_timings:        load time =     583.51 ms
llama_print_timings:      sample time =      20.90 ms /    31 runs   (    0.67 ms per token,  1483.04 tokens per second)
llama_print_timings: prompt eval time =     773.93 ms /   290 tokens (    2.67 ms per token,   374.71 tokens per second)
llama_print_timings:        eval time =    1856.71 ms /    30 runs   (   61.89 ms per token,    16.16 tokens per second)
llama_print_timings:       total time =    2774.22 ms /   320 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     583.51 ms
llama_print_timings:      sample time =      18.49 ms /    31 runs   (    0.60 ms per token,  1676.85 tokens per second)
llama_print_timings: prompt eval time =     813.59 ms /   289 tokens (    2.82 ms per token,   355.22 tokens per second)
llama_print_timings:        eval time =    1956.65 ms /    30 runs   (   65.22 ms per token,    15.33 tokens per second)
llama_print_timings:       to

In [32]:
data_3.head()

Unnamed: 0,support_tick_id,support_ticket_text,llama_response
0,ST2023-006,My internet connection has significantly slowe...,"{\n ""Category"": ""Technical Issues"",\n ..."
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{\n ""Category"": ""Hardware Issues"",\n ..."
2,ST2023-008,I've accidentally deleted essential work docum...,"{\n ""Category"": ""Data Recovery"",\n ..."
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{\n ""Category"": ""Technical Issues"",\n ..."
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{\n ""Category"": ""Battery Issue"",\n ..."


In [33]:
data_3['llama_response_parsed'] = data_3['llama_response'].apply(extract_json_data)
data_3['llama_response_parsed'].head()

0    {'Category': 'Technical Issues', 'Tags': 'Hard...
1    {'Category': 'Hardware Issues', 'Tags': 'Lapto...
2    {'Category': 'Data Recovery', 'Tags': 'Data Re...
3    {'Category': 'Technical Issues', 'Tags': 'Wi-f...
4    {'Category': 'Battery Issue', 'Tags': 'Battery...
Name: llama_response_parsed, dtype: object

In [34]:
llama_response_parsed_df_3 = pd.json_normalize(data_3['llama_response_parsed'])
llama_response_parsed_df_3.head()

Unnamed: 0,Category,Tags
0,Technical Issues,"Hardware Issue, Network Issue"
1,Hardware Issues,"Laptop, Hardware Issue"
2,Data Recovery,"Data Recovery, Laptop"
3,Technical Issues,Wi-fi Signal Strength
4,Battery Issue,"Battery Issue, Hardware Issue"


In [35]:
data_with_parsed_model_output_3 = pd.concat([data_3, llama_response_parsed_df_3], axis=1)
data_with_parsed_model_output_3.head()

Unnamed: 0,support_tick_id,support_ticket_text,llama_response,llama_response_parsed,Category,Tags
0,ST2023-006,My internet connection has significantly slowe...,"{\n ""Category"": ""Technical Issues"",\n ...","{'Category': 'Technical Issues', 'Tags': 'Hard...",Technical Issues,"Hardware Issue, Network Issue"
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{\n ""Category"": ""Hardware Issues"",\n ...","{'Category': 'Hardware Issues', 'Tags': 'Lapto...",Hardware Issues,"Laptop, Hardware Issue"
2,ST2023-008,I've accidentally deleted essential work docum...,"{\n ""Category"": ""Data Recovery"",\n ...","{'Category': 'Data Recovery', 'Tags': 'Data Re...",Data Recovery,"Data Recovery, Laptop"
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{\n ""Category"": ""Technical Issues"",\n ...","{'Category': 'Technical Issues', 'Tags': 'Wi-f...",Technical Issues,Wi-fi Signal Strength
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{\n ""Category"": ""Battery Issue"",\n ...","{'Category': 'Battery Issue', 'Tags': 'Battery...",Battery Issue,"Battery Issue, Hardware Issue"


In [36]:
## Complete the code to drop llama_response and llama_response_parsed variables
final_data_3 = data_with_parsed_model_output_3.drop(['llama_response','llama_response_parsed'], axis=1)
final_data_3.head()

Unnamed: 0,support_tick_id,support_ticket_text,Category,Tags
0,ST2023-006,My internet connection has significantly slowe...,Technical Issues,"Hardware Issue, Network Issue"
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware Issues,"Laptop, Hardware Issue"
2,ST2023-008,I've accidentally deleted essential work docum...,Data Recovery,"Data Recovery, Laptop"
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Technical Issues,Wi-fi Signal Strength
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Battery Issue,"Battery Issue, Hardware Issue"


We were able to assign tags to easily identify the tickets and take necessary actions

**Task 4 - Ticket Categorization, Creating Tags, Assigning Priority, and Returning Structured Output**

In [37]:
# create a copy of the data
data_4 = data1.copy()

In [38]:
# defining the instructions for the model
instruction_4 = """
    You are an AI analyzing support tickets. Classify the category of the provided ticket issue into the following categories:
    - Hardware Issues
    - Data Recovery
    - Technical Issues
    - General Inquiry

    Also, Tag the given support tickets using three or more of the below mentioned categories only depending upon the content of the article:
    - Laptop
    - Hardware Issue
    - Network Issue
    - Data Recovery
    - Performance Issue
    - Restart
    - Password Reset
    - Data Loss
    - Wi-fi Signal Strength
    - Battery Issue

    Once tagging is done, assign priority to support tickets using the below mentioned priorities only depending upon the urgency of the ticket resolution:
    - High
    - Medium
    - Low

    Provide the output in a JSON format separated with comma in the following keys:
    {
        "Category" : "your_Category_prediction",
        "Tags": "your_tag_prediction",
        "Priority": "your_priority_prediction"
    }

    Only return the JSON, do not return any other information and remove the extra spaces.
"""


In [39]:
# complete the code to create a new column llama_response'
# by applying the generate_llama_response function to each ticket in the 'support_ticket_text' column of the DataFrame 'data_4'
data_4['model_response_4'] = data_4['support_ticket_text'].apply(lambda x: generate_llama_response(instruction_4,x))

Llama.generate: prefix-match hit

llama_print_timings:        load time =     583.51 ms
llama_print_timings:      sample time =      25.56 ms /    41 runs   (    0.62 ms per token,  1603.94 tokens per second)
llama_print_timings: prompt eval time =     837.17 ms /   350 tokens (    2.39 ms per token,   418.07 tokens per second)
llama_print_timings:        eval time =    2555.53 ms /    40 runs   (   63.89 ms per token,    15.65 tokens per second)
llama_print_timings:       total time =    3551.69 ms /   390 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     583.51 ms
llama_print_timings:      sample time =      23.72 ms /    40 runs   (    0.59 ms per token,  1686.48 tokens per second)
llama_print_timings: prompt eval time =     855.77 ms /   349 tokens (    2.45 ms per token,   407.82 tokens per second)
llama_print_timings:        eval time =    2531.17 ms /    39 runs   (   64.90 ms per token,    15.41 tokens per second)
llama_print_timings:       to

In [40]:
data_4.head()

Unnamed: 0,support_tick_id,support_ticket_text,model_response_4
0,ST2023-006,My internet connection has significantly slowe...,"{\n ""Category"": ""Technical Issues"",\n ..."
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{\n ""Category"": ""Hardware Issues"",\n ..."
2,ST2023-008,I've accidentally deleted essential work docum...,"{\n ""Category"": ""Data Recovery"",\n ..."
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{\n ""Category"": ""Technical Issues"",\n ..."
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{\n ""Category"": ""Battery Issue"",\n ..."


In [41]:
## Complete the code to apply the extract_json_data function on the llama_response column to create a new column called llama_response_parsed
data_4['llama_response_parsed'] = data_4['model_response_4'].apply(extract_json_data)
data_4['llama_response_parsed'].head()

0    {'Category': 'Technical Issues', 'Tags': 'Hard...
1    {'Category': 'Hardware Issues', 'Tags': 'Lapto...
2    {'Category': 'Data Recovery', 'Tags': 'Data Re...
3    {'Category': 'Technical Issues', 'Tags': 'Hard...
4    {'Category': 'Battery Issue', 'Tags': 'Battery...
Name: llama_response_parsed, dtype: object

In [42]:
## Complete the code to apply the json_normalize on llama_response_parsed variable
llama_response_parsed_df_4 = pd.json_normalize(data_4['llama_response_parsed'])
llama_response_parsed_df_4.head()

Unnamed: 0,Category,Tags,Priority
0,Technical Issues,"Hardware Issue, Network Issue",Medium
1,Hardware Issues,"Laptop, Hardware Issue",High
2,Data Recovery,"Data Recovery, Laptop",High
3,Technical Issues,"Hardware Issue, Network Issue",Medium
4,Battery Issue,"Battery Issue, Hardware Issue",Medium


In [43]:
## Complete the code to concat data_4 and llama_response_parsed_df_4
data_with_parsed_model_output_4 = pd.concat([data_4, llama_response_parsed_df_4], axis=1)
data_with_parsed_model_output_4.head()

Unnamed: 0,support_tick_id,support_ticket_text,model_response_4,llama_response_parsed,Category,Tags,Priority
0,ST2023-006,My internet connection has significantly slowe...,"{\n ""Category"": ""Technical Issues"",\n ...","{'Category': 'Technical Issues', 'Tags': 'Hard...",Technical Issues,"Hardware Issue, Network Issue",Medium
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{\n ""Category"": ""Hardware Issues"",\n ...","{'Category': 'Hardware Issues', 'Tags': 'Lapto...",Hardware Issues,"Laptop, Hardware Issue",High
2,ST2023-008,I've accidentally deleted essential work docum...,"{\n ""Category"": ""Data Recovery"",\n ...","{'Category': 'Data Recovery', 'Tags': 'Data Re...",Data Recovery,"Data Recovery, Laptop",High
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{\n ""Category"": ""Technical Issues"",\n ...","{'Category': 'Technical Issues', 'Tags': 'Hard...",Technical Issues,"Hardware Issue, Network Issue",Medium
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{\n ""Category"": ""Battery Issue"",\n ...","{'Category': 'Battery Issue', 'Tags': 'Battery...",Battery Issue,"Battery Issue, Hardware Issue",Medium


In [44]:
## Complete the code to drop llama_response and llama_response_parsed variables
final_data_4 = data_with_parsed_model_output_4.drop(['model_response_4','llama_response_parsed'], axis=1)
final_data_4.head()

Unnamed: 0,support_tick_id,support_ticket_text,Category,Tags,Priority
0,ST2023-006,My internet connection has significantly slowe...,Technical Issues,"Hardware Issue, Network Issue",Medium
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware Issues,"Laptop, Hardware Issue",High
2,ST2023-008,I've accidentally deleted essential work docum...,Data Recovery,"Data Recovery, Laptop",High
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Technical Issues,"Hardware Issue, Network Issue",Medium
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Battery Issue,"Battery Issue, Hardware Issue",Medium


Assigned priorities such as High, medium and low to each ticket to take immediate actions. Let's assign ETA to resolve the tickets and increase customer satisfaction.

**Task 5 - Ticket Categorization, Creating Tags, Assigning Priority, Assigning ETA, and Returning Structured Output**

In [45]:
# create a copy of the data
data_5 = data1.copy()

In [46]:
# defining the instructions for the model
instruction_5 = """
    You are an AI analyzing support tickets. Classify the category of the provided ticket issue into the following categories:
    - Hardware Issues
    - Data Recovery
    - Technical Issues
    - General Inquiry

    Also, Tag the given support tickets using three or more of the below mentioned categories only depending upon the content of the article:
    - Laptop
    - Hardware Issue
    - Network Issue
    - Data Recovery
    - Performance Issue
    - Restart
    - Password Reset
    - Data Loss
    - Wi-fi Signal Strength
    - Battery Issue

    Once tagging is done, assign priority to support tickets using the below mentioned priorities only depending upon the urgency of the ticket resolution:
    - High
    - Medium
    - Low

    Once priority is assigned, generate ETA for support tickets using the below mentioned timelines only depending upon the urgency of the ticket resolution:
    - 24 Hours
    - 2-3 Business Days
    - 3-5 Business days
    - Immediate Attention

    Provide the output in a JSON format separated in the following keys:
    {
        "Category" : "your_Category_prediction",
        "Tags": "your_tag_prediction",
        "Priority": "your_priority_prediction",
        "ETA": "your_eta_prediction"
    }

    Only return the JSON, do not return any other information and remove the extra spaces.
    Do not include any other text in the output except the JSON.
"""

In [47]:
# complete the code to create a new column llama_response'
# by applying the generate_llama_response function to each ticket in the 'support_ticket_text' column of the DataFrame 'data_5'
data_5['model_response_5'] = data_5['support_ticket_text'].apply(lambda x: generate_llama_response(instruction_5,x))

Llama.generate: prefix-match hit

llama_print_timings:        load time =     583.51 ms
llama_print_timings:      sample time =      33.55 ms /    55 runs   (    0.61 ms per token,  1639.20 tokens per second)
llama_print_timings: prompt eval time =     988.85 ms /   442 tokens (    2.24 ms per token,   446.98 tokens per second)
llama_print_timings:        eval time =    3602.76 ms /    54 runs   (   66.72 ms per token,    14.99 tokens per second)
llama_print_timings:       total time =    4780.77 ms /   496 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     583.51 ms
llama_print_timings:      sample time =      35.11 ms /    52 runs   (    0.68 ms per token,  1481.19 tokens per second)
llama_print_timings: prompt eval time =    1018.89 ms /   441 tokens (    2.31 ms per token,   432.82 tokens per second)
llama_print_timings:        eval time =    3431.59 ms /    51 runs   (   67.29 ms per token,    14.86 tokens per second)
llama_print_timings:       to

In [48]:
data_5.head()

Unnamed: 0,support_tick_id,support_ticket_text,model_response_5
0,ST2023-006,My internet connection has significantly slowe...,"{\n ""Category"": ""Technical Issues"",\n ..."
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{\n ""Category"": ""Hardware Issues"",\n ..."
2,ST2023-008,I've accidentally deleted essential work docum...,"{\n ""Category"": ""Data Recovery"",\n ..."
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{\n ""Category"": ""Technical Issues"",\n ..."
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{\n ""Category"": ""Battery Issue"",\n ..."


In [50]:
## Complete the code to apply the extract_json_data function on the llama_response column to create a new column called llama_response_parsed
data_5['llama_response_parsed'] = data_5['model_response_5'].apply(extract_json_data)
data_5['llama_response_parsed'].head()

Error parsing JSON: Extra data: line 9 column 1 (char 117)


0    {'Category': 'Technical Issues', 'Tags': 'Lapt...
1    {'Category': 'Hardware Issues', 'Tags': 'Lapto...
2    {'Category': 'Data Recovery', 'Tags': 'Laptop,...
3    {'Category': 'Technical Issues', 'Tags': 'Hard...
4    {'Category': 'Battery Issue', 'Tags': 'Battery...
Name: llama_response_parsed, dtype: object

In [51]:
## Complete the code to apply the json_normalize on llama_response_parsed variable
llama_response_parsed_df_5 = pd.json_normalize(data_5['llama_response_parsed'])
llama_response_parsed_df_5.head()

Unnamed: 0,Category,Tags,Priority,ETA
0,Technical Issues,"Laptop, Hardware Issue, Network Issue",High,24 Hours
1,Hardware Issues,"Laptop, Hardware Issue",High,24 Hours
2,Data Recovery,"Laptop, Hardware Issue, Data Recovery",High,24 Hours
3,Technical Issues,"Hardware Issue, Network Issue",Medium,2-3 Business Days
4,Battery Issue,"Battery Issue, Hardware Issue",Medium,2-3 Business Days


In [53]:
## Complete the code to concat data_5 and llama_response_parsed_df_5
data_with_parsed_model_output_5 = pd.concat([data_5, llama_response_parsed_df_5], axis=1)
data_with_parsed_model_output_5.head()

Unnamed: 0,support_tick_id,support_ticket_text,model_response_5,llama_response_parsed,Category,Tags,Priority,ETA
0,ST2023-006,My internet connection has significantly slowe...,"{\n ""Category"": ""Technical Issues"",\n ...","{'Category': 'Technical Issues', 'Tags': 'Lapt...",Technical Issues,"Laptop, Hardware Issue, Network Issue",High,24 Hours
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{\n ""Category"": ""Hardware Issues"",\n ...","{'Category': 'Hardware Issues', 'Tags': 'Lapto...",Hardware Issues,"Laptop, Hardware Issue",High,24 Hours
2,ST2023-008,I've accidentally deleted essential work docum...,"{\n ""Category"": ""Data Recovery"",\n ...","{'Category': 'Data Recovery', 'Tags': 'Laptop,...",Data Recovery,"Laptop, Hardware Issue, Data Recovery",High,24 Hours
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{\n ""Category"": ""Technical Issues"",\n ...","{'Category': 'Technical Issues', 'Tags': 'Hard...",Technical Issues,"Hardware Issue, Network Issue",Medium,2-3 Business Days
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{\n ""Category"": ""Battery Issue"",\n ...","{'Category': 'Battery Issue', 'Tags': 'Battery...",Battery Issue,"Battery Issue, Hardware Issue",Medium,2-3 Business Days


In [54]:
## Complete the code to drop llama_response and llama_response_parsed variables
final_data_5 = data_with_parsed_model_output_5.drop(['model_response_5','llama_response_parsed'], axis=1)
final_data_5.head()

Unnamed: 0,support_tick_id,support_ticket_text,Category,Tags,Priority,ETA
0,ST2023-006,My internet connection has significantly slowe...,Technical Issues,"Laptop, Hardware Issue, Network Issue",High,24 Hours
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware Issues,"Laptop, Hardware Issue",High,24 Hours
2,ST2023-008,I've accidentally deleted essential work docum...,Data Recovery,"Laptop, Hardware Issue, Data Recovery",High,24 Hours
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Technical Issues,"Hardware Issue, Network Issue",Medium,2-3 Business Days
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Battery Issue,"Battery Issue, Hardware Issue",Medium,2-3 Business Days


**Task 6 - Ticket Categorization, Creating Tags, Assigning Priority, Assigning ETA, Creating a Draft Response, and Returning Structured Output**

In [55]:
# create a copy of the data
data_6 = data1.copy()

In [56]:
# defining the instructions for the model
instruction_6 = """
    You are an AI analyzing support tickets. Classify the category of the provided ticket issue into the following categories:
    - Hardware Issues
    - Data Recovery
    - Technical Issues
    - General Inquiry

    Also, Tag the given support tickets using three or more of the below mentioned categories only depending upon the content of the article:
    - Laptop
    - Hardware Issue
    - Network Issue
    - Data Recovery
    - Performance Issue
    - Restart
    - Password Reset
    - Data Loss
    - Wi-fi Signal Strength
    - Battery Issue

    Once tagging is done, assign priority to support tickets using the below mentioned priorities only depending upon the urgency of the ticket resolution:
    - High
    - Medium
    - Low

    Once priority is assigned, generate ETA for support tickets using the below mentioned timelines only depending upon the urgency of the ticket resolution:
    - 24 Hours
    - 2-3 Business Days
    - 3-5 Business days
    - Immediate Attention

    Create a Draft Response to respond to customers as soon as the ticket is raised by them using the below mentioned sentences :
    - I apologise for the incovenience.
    - We have received your ticket.
    - One of our support assistants will get in touch with you.
    - I understand that you are experiencing weak network.

    Provide the output in a JSON format separated in the following keys:
    {
        "Category" : "your_Category_prediction",
        "Tags": "your_tag_prediction",
        "Priority": "your_priority_prediction",
        "ETA": "your_eta_prediction",
        "Response" : "your_response_prediction"
    }

    Only return the JSON, do not return any other information and remove the extra spaces.
    Do not include any other text in the output except the JSON.
"""

In [57]:
# complete the code to create a new column llama_response'
# by applying the generate_llama_response function to each ticket in the 'support_ticket_text' column of the DataFrame 'data_6'
data_6['model_response_6'] = data_6['support_ticket_text'].apply(lambda x: generate_llama_response(instruction_6,x))

Llama.generate: prefix-match hit

llama_print_timings:        load time =     583.51 ms
llama_print_timings:      sample time =      66.28 ms /    90 runs   (    0.74 ms per token,  1357.83 tokens per second)
llama_print_timings: prompt eval time =    1749.95 ms /   533 tokens (    3.28 ms per token,   304.58 tokens per second)
llama_print_timings:        eval time =    6204.48 ms /    89 runs   (   69.71 ms per token,    14.34 tokens per second)
llama_print_timings:       total time =    8378.82 ms /   622 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =     583.51 ms
llama_print_timings:      sample time =      48.97 ms /    80 runs   (    0.61 ms per token,  1633.52 tokens per second)
llama_print_timings: prompt eval time =    1684.34 ms /   532 tokens (    3.17 ms per token,   315.85 tokens per second)
llama_print_timings:        eval time =    5687.45 ms /    79 runs   (   71.99 ms per token,    13.89 tokens per second)
llama_print_timings:       to

In [60]:
data_6.head()

Unnamed: 0,support_tick_id,support_ticket_text,model_response_6,llama_response_parsed
0,ST2023-006,My internet connection has significantly slowe...,"{\n""Category"": ""Technical Issues"",\n""Tags"": ""...","{'Category': 'Technical Issues', 'Tags': 'Lapt..."
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{\n""Category"": ""Hardware Issues"",\n""Tags"": ""L...","{'Category': 'Hardware Issues', 'Tags': 'Lapto..."
2,ST2023-008,I've accidentally deleted essential work docum...,"{\n""Category"": ""Data Recovery"",\n""Tags"": ""Lap...","{'Category': 'Data Recovery', 'Tags': 'Laptop,..."
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{\n""Category"": ""Technical Issues"",\n""Tags"": ""...","{'Category': 'Technical Issues', 'Tags': 'Lapt..."
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{\n ""Category"": ""Battery Issue"",\n ...","{'Category': 'Battery Issue', 'Tags': 'Battery..."


In [61]:
## Complete the code to apply the extract_json_data function on the llama_response column to create a new column called llama_response_parsed
data_6['llama_response_parsed'] = data_6['model_response_6'].apply(extract_json_data)
data_6['llama_response_parsed'].head()

0    {'Category': 'Technical Issues', 'Tags': 'Lapt...
1    {'Category': 'Hardware Issues', 'Tags': 'Lapto...
2    {'Category': 'Data Recovery', 'Tags': 'Laptop,...
3    {'Category': 'Technical Issues', 'Tags': 'Lapt...
4    {'Category': 'Battery Issue', 'Tags': 'Battery...
Name: llama_response_parsed, dtype: object

In [62]:
## Complete the code to apply the normalize on llama_response_parsed variable
llama_response_parsed_df_6= pd.json_normalize(data_6['llama_response_parsed'])
llama_response_parsed_df_6.head()

Unnamed: 0,Category,Tags,Priority,ETA,Response
0,Technical Issues,"Laptop, Network Issue, Performance Issue",High,24 Hours,I apologise for the inconvenience. We have rec...
1,Hardware Issues,"Laptop, Hardware Issue",High,24 Hours,I apologise for the inconvenience. We have rec...
2,Data Recovery,"Laptop, Hardware Issue, Data Recovery",High,24 Hours,I apologise for the inconvenience. We have rec...
3,Technical Issues,"Laptop, Network Issue, Wi-fi Signal Strength",Medium,2-3 Business Days,I apologise for the inconvenience. We have rec...
4,Battery Issue,"Battery Issue, Hardware Issue",Medium,2-3 Business Days,We apologize for the inconvenience. We have re...


In [63]:
## Complete the code to concat data_6 and llama_response_parsed_df_6
data_with_parsed_model_output_6 = pd.concat([data_6, llama_response_parsed_df_6], axis=1)
data_with_parsed_model_output_6.head()

Unnamed: 0,support_tick_id,support_ticket_text,model_response_6,llama_response_parsed,Category,Tags,Priority,ETA,Response
0,ST2023-006,My internet connection has significantly slowe...,"{\n""Category"": ""Technical Issues"",\n""Tags"": ""...","{'Category': 'Technical Issues', 'Tags': 'Lapt...",Technical Issues,"Laptop, Network Issue, Performance Issue",High,24 Hours,I apologise for the inconvenience. We have rec...
1,ST2023-007,Urgent help required! My laptop refuses to sta...,"{\n""Category"": ""Hardware Issues"",\n""Tags"": ""L...","{'Category': 'Hardware Issues', 'Tags': 'Lapto...",Hardware Issues,"Laptop, Hardware Issue",High,24 Hours,I apologise for the inconvenience. We have rec...
2,ST2023-008,I've accidentally deleted essential work docum...,"{\n""Category"": ""Data Recovery"",\n""Tags"": ""Lap...","{'Category': 'Data Recovery', 'Tags': 'Laptop,...",Data Recovery,"Laptop, Hardware Issue, Data Recovery",High,24 Hours,I apologise for the inconvenience. We have rec...
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,"{\n""Category"": ""Technical Issues"",\n""Tags"": ""...","{'Category': 'Technical Issues', 'Tags': 'Lapt...",Technical Issues,"Laptop, Network Issue, Wi-fi Signal Strength",Medium,2-3 Business Days,I apologise for the inconvenience. We have rec...
4,ST2023-010,"My smartphone battery is draining rapidly, eve...","{\n ""Category"": ""Battery Issue"",\n ...","{'Category': 'Battery Issue', 'Tags': 'Battery...",Battery Issue,"Battery Issue, Hardware Issue",Medium,2-3 Business Days,We apologize for the inconvenience. We have re...


In [72]:
## Complete the code to drop llama_response and llama_response_parsed variables
final_data_6 = data_with_parsed_model_output_6.drop(['model_response_6','llama_response_parsed'], axis=1)
final_data_6.head(25)

Unnamed: 0,support_tick_id,support_ticket_text,Category,Tags,Priority,ETA,Response
0,ST2023-006,My internet connection has significantly slowe...,Technical Issues,"Laptop, Network Issue, Performance Issue",High,24 Hours,I apologise for the inconvenience. We have rec...
1,ST2023-007,Urgent help required! My laptop refuses to sta...,Hardware Issues,"Laptop, Hardware Issue",High,24 Hours,I apologise for the inconvenience. We have rec...
2,ST2023-008,I've accidentally deleted essential work docum...,Data Recovery,"Laptop, Hardware Issue, Data Recovery",High,24 Hours,I apologise for the inconvenience. We have rec...
3,ST2023-009,Despite being in close proximity to my Wi-Fi r...,Technical Issues,"Laptop, Network Issue, Wi-fi Signal Strength",Medium,2-3 Business Days,I apologise for the inconvenience. We have rec...
4,ST2023-010,"My smartphone battery is draining rapidly, eve...",Battery Issue,"Battery Issue, Hardware Issue",Medium,2-3 Business Days,We apologize for the inconvenience. We have re...
5,ST2023-011,I'm locked out of my online banking account an...,General Inquiry,"Password Reset, Data Recovery",Medium,2-3 Business Days,"We have received your ticket, one of our suppo..."
6,ST2023-012,"My computer's performance is sluggish, severel...",Technical Issues,"Laptop, Hardware Issue, Network Issue",High,24 Hours,I apologize for the inconvenience. We have rec...
7,ST2023-013,I'm experiencing a recurring blue screen error...,Hardware Issues,"Laptop, Hardware Issue, Network Issue",High,24 Hours,I apologise for the inconvenience. We have rec...
8,ST2023-014,My external hard drive isn't being recognized ...,Data Recovery,"Laptop, Hardware Issue, Data Recovery",High,24 Hours,I apologize for the inconvenience. We have rec...
9,ST2023-015,The graphics card in my gaming laptop seems to...,Hardware Issues,"Laptop, Hardware Issue",High,24 Hours,I apologise for the inconvenience. We have rec...


Finally, we are able to give a prompt response to the customer once the ticket is raised.

**Model Output Analysis**

In [66]:
# creating a copy of the dataframe
final_data = final_data_6.copy()

In [67]:
final_data['Category'].value_counts()

Technical Issues    7
Data Recovery       7
Hardware Issues     5
Battery Issue       1
General Inquiry     1
Name: Category, dtype: int64

In [68]:
final_data['Priority'].value_counts()

High      17
Medium     4
Name: Priority, dtype: int64

We observe that 17 tickets out of 21 are of HIGH priority

In [69]:
final_data['ETA'].value_counts()

24 Hours             17
2-3 Business Days     4
Name: ETA, dtype: int64

In [70]:
final_data.groupby(['Priority', 'Category']).support_tick_id.count()   # complete the code to check the distribution of priority by categories

Priority  Category        
High      Data Recovery       7
          Hardware Issues     5
          Technical Issues    5
Medium    Battery Issue       1
          General Inquiry     1
          Technical Issues    2
Name: support_tick_id, dtype: int64

In [71]:
final_data.groupby(['ETA', 'Category']).support_tick_id.count()   # complete the code to check the distribution of ETA by categories

ETA                Category        
2-3 Business Days  Battery Issue       1
                   General Inquiry     1
                   Technical Issues    2
24 Hours           Data Recovery       7
                   Hardware Issues     5
                   Technical Issues    5
Name: support_tick_id, dtype: int64

**Observations**

We used an LLM to do multiple tasks, one stage at a time

We first identified the category of the support tickets.
Next, in addition to identifying the category, we generated tags for it.
Once tags are assigned, we tried to add priority and ETA based on the ticket description.
Finally, in addition to identifying the category, tags, ETA and priority, we also generated a response for the support ticket.

From the above summary of output, we observe that
1. Tickets needs to be immediately addressed as there are no low priority tickets.
2. Out of 21 tickets, 17 tickets are of high priority and needs to be addressed within 24 hours.

**Recommendations**

To try and improve the model output, one can try the following:

1. Update the prompt with few more categories/options
2. Update the model parameters (temparature, top_p, ...)