# Task 6

#  Auto Tagging Support Tickets Using LLM 

# Objective

The objective of this task was to build an intelligent system that can automatically classify customer support tickets into predefined categories using a Large Language Model (LLM). This reduces the need for manual tagging, improves efficiency in handling support requests, and demonstrates how LLMs can be applied to practical customer service tasks.

# Introduction

This task involved working with a free-text support ticket dataset that included customer queries related to issues such as billing, login problems, technical errors, account management, and general inquiries. The challenge was to design an approach that leverages the natural language understanding capabilities of LLMs to assign meaningful tags without requiring extensive manual feature engineering. The implementation made use of OpenRouter’s API to connect with the GPT-based model and classify tickets directly based on prompts. Both zero-shot and few-shot approaches were tested: the zero-shot approach provided predictions without prior examples, while the few-shot approach included some labeled examples in the prompt to guide the model toward more accurate classification.

# Overview

The process began by defining the five target categories for classification: Billing Issue, Login Problem, Technical Error, Account Management, and General Inquiry. For zero-shot classification, each support ticket was directly passed to the LLM along with instructions to choose the top three categories. For the few-shot classification, several examples of tickets paired with their correct categories were added to the prompt, helping the model learn from context before classifying new tickets. The outputs of both approaches included ranked categories with confidence-like scores, which were then simplified to take the top category as the predicted label. These predictions were compared against the true labels to measure performance. To evaluate results, accuracy and weighted F1 scores were calculated for both approaches. The zero-shot model achieved an accuracy of 0.80 with an F1 score of 0.73, while the few-shot model achieved an accuracy of 1 with a F1 score of 1, indicating improved balance between precision and recall. Since the dataset used was relatively small and custom, the performance difference between the two approaches was limited, but with a larger and more diverse dataset, the improvements of few-shot learning would likely be more significant and clearly visible in the results.

# Evaluation

The evaluation demonstrated that LLMs are highly effective for text classification tasks such as ticket tagging, even with little to no training. Zero-shot performance showed strong predictive power without prior examples, while the few-shot approach provided further improvement by incorporating contextual guidance. The results highlighted the flexibility of LLMs for multi-class prediction and ranking, as the model was able to output not only the top category but also the next most relevant categories with associated scores.

# Summary

This task successfully demonstrated how Large Language Models can be applied for auto-tagging support tickets, offering a scalable and efficient solution for customer support systems. By comparing zero-shot and few-shot learning approaches, it was shown that LLMs can adapt flexibly to classification tasks and provide accurate predictions even with minimal setup. The system built in this task is production-ready in concept, as it leverages API-based integration with an LLM and produces reusable outputs in a structured format.

In [9]:
from openai import OpenAI
from sklearn.metrics import accuracy_score, f1_score, classification_report

# Setup OpenRouter client
client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="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",
)

# Categories & Dataset (tuples: (ticket, label))
categories = ["Billing Issue", "Login Problem", "Technical Error", "Account Management", "General Inquiry"]

dataset = [
    ("I was charged twice for my subscription this month.", "Billing Issue"),
    ("I can't log into my account even after resetting my password.", "Login Problem"),
    ("The app crashes every time I upload a file.", "Technical Error"),
    ("How do I update my email address on the account?", "Account Management"),
    ("I just want to know when my trial will expire.", "General Inquiry"),
]

# Helper: Ask LLM
def classify_ticket(ticket, examples=None):
    prompt = "You are an AI assistant that classifies support tickets.\n"
    if examples:
        prompt += "Here are some examples:\n"
        for ex_text, ex_label in examples:
            prompt += f"Ticket: {ex_text}\nCategory: {ex_label}\n"
    prompt += f"\nNow classify this ticket:\nTicket: {ticket}\n"
    prompt += f"Choose the best 3 categories from: {', '.join(categories)}.\n"
    prompt += "Return the answer as: Category1 (score), Category2 (score), Category3 (score)."

    response = client.chat.completions.create(
        model="openai/gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}],
    )
    return response.choices[0].message.content.strip()

# Zero-Shot Classification
print("\nZero-Shot Classification")
zero_shot_preds = []
true_labels = []

for ticket, label in dataset:
    result = classify_ticket(ticket)
    print(f"\nTicket: {ticket}\n  Predictions: {result}")
    top_prediction = result.split(",")[0].split("(")[0].strip()
    zero_shot_preds.append(top_prediction)
    true_labels.append(label)

# Few-Shot Classification
few_shot_examples = [
    ("Payment failed but money was deducted", "Billing Issue"),
    ("Can't log in even after resetting password", "Login Problem"),
    ("App crashes when saving files", "Technical Error"),
    ("Need to update profile information", "Account Management"),
    ("What are your support hours", "General Inquiry")
]

print("\nFew-Shot Classification")
few_shot_preds = []

for ticket, label in dataset:
    result = classify_ticket(ticket, examples=few_shot_examples)
    print(f"\nTicket: {ticket}\n  Predictions: {result}")
    top_prediction = result.split(",")[0].split("(")[0].strip()
    few_shot_preds.append(top_prediction)
# Performance Comparison
print("\nPerformance Comparison")
print("Zero-Shot Accuracy:", accuracy_score(true_labels, zero_shot_preds))
print("Zero-Shot F1 Score:", f1_score(true_labels, zero_shot_preds, average="weighted"))
print("Few-Shot Accuracy:", accuracy_score(true_labels, few_shot_preds))
print("Few-Shot F1 Score:", f1_score(true_labels, few_shot_preds, average="weighted"))




Zero-Shot Classification

Ticket: I was charged twice for my subscription this month.
  Predictions: Billing Issue (0.9), Account Management (0.7), General Inquiry (0.3)

Ticket: I can't log into my account even after resetting my password.
  Predictions: Login Problem (90), Technical Error (60), Account Management (30)

Ticket: The app crashes every time I upload a file.
  Predictions: Technical Error (95), Account Management (5), General Inquiry (0)

Ticket: How do I update my email address on the account?
  Predictions: Account Management (0.8), General Inquiry (0.5), Login Problem (0.3).

Ticket: I just want to know when my trial will expire.
  Predictions: Account Management (0.9), General Inquiry (0.8), Billing Issue (0.2)

Few-Shot Classification

Ticket: I was charged twice for my subscription this month.
  Predictions: Billing Issue (1), Account Management (0), General Inquiry (0)

Ticket: I can't log into my account even after resetting my password.
  Predictions: Login Prob