# **Auto-Tagging-Support-Tickets-Using-LLM-**
### **Problem Statement**
Automatically classify support tickets into predefined categories using a large language model (LLM).
The goal is to reduce manual effort in tagging support tickets and to explore how different approaches — zero-shot, few-shot, and fine-tuned models — perform on the same dataset.

### **Objective**


* Automatically tag free-text support tickets into categories.
* Compare zero-shot, few-shot, and fine-tuned LLM approaches.
* Output top 3 most probable categories per ticket.

# **Step 1 — Install & Import**

In [1]:
!pip install -q transformers datasets evaluate scikit-learn torch accelerate


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/84.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:

import torch
import numpy as np
import pandas as pd

from datasets import load_dataset
from transformers import (
    pipeline,
    AutoTokenizer,
    AutoModelForSequenceClassification,
    Trainer,
    TrainingArguments
)
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score



# **Step 2 — Load Support Ticket Dataset**

In [None]:
dataset = load_dataset("banking77")


# **Step 3: Select Labels and Map IDs**

In [4]:
# Label names provided by dataset
label_names = dataset["test"].features["label"].names

# Choose only 8 labels
SELECTED_LABELS = [
    "card_arrival",
    "card_payment_wrong_exchange_rate",
    "cash_withdrawal_charge",
    "chargeback",
    "card_payment_fee_charged",
    "transfer_failed",
    "passcode_forgotten",
    "request_refund"
]

# Map label names to ids
label2id = {label: idx for idx, label in enumerate(SELECTED_LABELS)}
id2label = {idx: label for label, idx in label2id.items()}

# **Step 4: Convert Dataset to DataFrame and Filter Labels**

In [5]:
# Convert dataset to DataFrame
df = pd.DataFrame(dataset["test"])

# Convert numeric labels to text labels
df["label_name"] = df["label"].apply(lambda x: label_names[x])

# Filter selected labels (NOW WORKS)
df = df[df["label_name"].isin(SELECTED_LABELS)]

# Safe sampling
df = df.sample(n=min(300, len(df)), random_state=42)

# Final label id
df["label_id"] = df["label_name"].map(label2id)

print("Dataset size after filtering:", len(df))

Dataset size after filtering: 240


# **Step 5 — Zero-Shot Classification**

In [None]:
zero_shot = pipeline(
    "zero-shot-classification",
    model="facebook/bart-large-mnli",
    device=0 if torch.cuda.is_available() else -1
)

def zero_shot_predict(text, top_k=3):
    out = zero_shot(
        text,
        candidate_labels=SELECTED_LABELS,
        multi_label=False
    )
    top_labels = out["labels"][:top_k]
    return label2id[top_labels[0]], top_labels

df["zs_pred_id"], df["zs_top3"] = zip(*df["text"].apply(zero_shot_predict))

print("✅ Zero-shot accuracy:",
      accuracy_score(df["label_id"], df["zs_pred_id"]))

# **Step 6: Few-Shot Classification Using FLAN-T5**

In [None]:
few_shot_llm = pipeline(
    "text2text-generation",
    model="google/flan-t5-small",
    device=0 if torch.cuda.is_available() else -1
)

FEW_SHOT_PROMPT = """
Classify the support ticket into one of these categories:
card_arrival, chargeback, transfer_failed, passcode_forgotten, request_refund

Examples:
Ticket: My card has not arrived
Category: card_arrival

Ticket: I was charged twice
Category: chargeback

Ticket: Transfer did not complete
Category: transfer_failed

Ticket: Forgot my login code
Category: passcode_forgotten

Ticket: I want my money back
Category: request_refund

Ticket: {text}
Category:
"""

def few_shot_predict(text):
    prompt = FEW_SHOT_PROMPT.format(text=text)
    out = few_shot_llm(prompt, max_length=20)[0]["generated_text"]
    pred = out.strip().split()[0]
    if pred not in label2id:
        pred = "request_refund"
    return label2id[pred]

df["fs_pred_id"] = df["text"].apply(few_shot_predict)

print("✅ Few-shot accuracy:",
      accuracy_score(df["label_id"], df["fs_pred_id"]))

# **Step 7: Fine-Tuning DistilBERT for Ticket Classification**

In [None]:
train_df, test_df = train_test_split(
    df,
    test_size=0.2,
    random_state=42
)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

def tokenize(batch):
    return tokenizer(batch["text"], truncation=True, padding=True)

train_ds = train_df[["text", "label_id"]]
test_ds = test_df[["text", "label_id"]]

from datasets import Dataset
train_ds = Dataset.from_pandas(train_ds).map(tokenize, batched=True)
test_ds = Dataset.from_pandas(test_ds).map(tokenize, batched=True)

train_ds = train_ds.rename_column("label_id", "labels")
test_ds = test_ds.rename_column("label_id", "labels")

train_ds.set_format("torch")
test_ds.set_format("torch")
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=len(SELECTED_LABELS)
)

# **Step 8: Train the Model**

In [9]:
training_args = TrainingArguments(
    output_dir="./ticket_model",
    eval_strategy="epoch",
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    logging_steps=10,
    report_to="none"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_ds,
    eval_dataset=test_ds
)

trainer.train()



Epoch,Training Loss,Validation Loss
1,1.955,1.645375
2,1.5799,1.190712
3,1.2453,1.014433




TrainOutput(global_step=36, training_loss=1.5038589239120483, metrics={'train_runtime': 261.239, 'train_samples_per_second': 2.205, 'train_steps_per_second': 0.138, 'total_flos': 10581965632512.0, 'train_loss': 1.5038589239120483, 'epoch': 3.0})

# **Step 9: Evaluate Fine-Tuned Model**

In [10]:
preds = trainer.predict(test_ds).predictions.argmax(axis=1)
print("✅ Fine-tuned accuracy:",
      accuracy_score(test_ds["labels"], preds))
df[["text", "label", "zs_top3"]].head(5)

✅ Fine-tuned accuracy: 0.9375


Unnamed: 0,text,label,zs_top3
24,"My card has not arrived yet, where is it?",11,"[card_arrival, transfer_failed, chargeback]"
6,Do you know if there is a tracking number for ...,11,"[request_refund, card_arrival, card_payment_fe..."
813,What is the fee charged with this card payment?,15,"[card_payment_fee_charged, chargeback, request..."
829,Someone needs to make me aware when there are ...,15,"[chargeback, card_payment_fee_charged, card_pa..."
824,So what items actually come with extra fees,15,"[card_payment_fee_charged, chargeback, card_pa..."


# **Key Results / Observations**
**Zero-Shot Classification**

Provides reasonable predictions without any training.
Accuracy lower than fine-tuned model due to domain-specific context.

**Few-Shot Classification**

Slightly improves over zero-shot when examples in the prompt are informative.
Useful when dataset is small or fine-tuning is not feasible.

**Fine-Tuned Model (DistilBERT)**

Achieves the highest accuracy among all approaches.
Learns domain-specific patterns and terminology.
Requires labeled dataset and training time.

**Top-3 Predictions**

Zero-shot model outputs top-3 labels per ticket, helpful in cases of ambiguity.
Provides a ranked list for potential automation or human-in-the-loop review.

# **Insights & Conclusion**
* Fine-tuning transformer models is the most reliable approach for domain-specific ticket classification.
* Zero-shot and few-shot learning are useful when labeled data is scarce or for rapid prototyping.
* Combining LLM outputs (top-3 predictions) with human review can reduce manual effort while maintaining accuracy.
* The task demonstrates practical use of LLMs in support automation, prompt engineering, and multi-class text classification.