# ðŸŽ¯ TicketCluster â€” Unsupervised Customer Support Ticket Intelligence

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/RansiluRanasinghe/TicketCluster-Unsupervised-ML/blob/main/notebook.ipynb)

---

## ðŸ“Œ Project Overview

This notebook implements **TicketCluster**, an unsupervised NLP system that analyzes and groups customer support tickets using **K-Means clustering**.

**Goal:** Simulate how real organizations explore unlabeled support tickets to identify recurring issues, emerging patterns, and operational insightsâ€”without predefined categories.

---

## ðŸŽ¯ Workflow

1. **Minimal, production-oriented text preprocessing**
2. **Vectorization** of ticket content (Subject + Body)
3. **Unsupervised clustering** to discover natural groupings
4. **Cluster interpretation** from business perspectives

This notebook emphasizes **model reasoning and reliability** over aggressive tuning, reflecting real customer-support analytics pipelines.

---

**Author:** Ransilu Ranasinghe | [GitHub](https://github.com/RansiluRanasinghe) | [LinkedIn](https://www.linkedin.com/in/ransilu-ranasinghe-a596792ba)

---

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

###Loading the Dataset

In [2]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("tobiasbueck/multilingual-customer-support-tickets")

print("Path to dataset files:", path)

Downloading from https://www.kaggle.com/api/v1/datasets/download/tobiasbueck/multilingual-customer-support-tickets?dataset_version_number=14...


100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 16.1M/16.1M [00:00<00:00, 217MB/s]

Extracting files...





Path to dataset files: /root/.cache/kagglehub/datasets/tobiasbueck/multilingual-customer-support-tickets/versions/14


In [3]:
import os

files = os.listdir(path)

print("Available files: ")
for file in files:
  print(" - ", file)

Available files: 
 -  dataset-tickets-german_normalized.csv
 -  dataset-tickets-german_normalized_50_5_2.csv
 -  aa_dataset-tickets-multi-lang-5-2-50-version.csv
 -  dataset-tickets-multi-lang-4-20k.csv
 -  dataset-tickets-multi-lang3-4k.csv


In [4]:
dataset_path = os.path.join(path, "dataset-tickets-multi-lang-4-20k.csv")

df = pd.read_csv(dataset_path)
print("Shape:", df.shape)
print("Columns:", len(df.columns))

Shape: (20000, 15)
Columns: 15


In [5]:
display(df.head())

Unnamed: 0,subject,body,answer,type,queue,priority,language,tag_1,tag_2,tag_3,tag_4,tag_5,tag_6,tag_7,tag_8
0,Unvorhergesehener Absturz der Datenanalyse-Pla...,Die Datenanalyse-Plattform brach unerwartet ab...,Ich werde Ihnen bei der LÃ¶sung des Problems he...,Incident,General Inquiry,low,de,Crash,Technical,Bug,Hardware,Resolution,Outage,Documentation,
1,Customer Support Inquiry,Seeking information on digital strategies that...,We offer a variety of digital strategies and s...,Request,Customer Service,medium,en,Feedback,Sales,IT,Tech Support,,,,
2,Data Analytics for Investment,I am contacting you to request information on ...,I am here to assist you with data analytics to...,Request,Customer Service,medium,en,Technical,Product,Guidance,Documentation,Performance,Feature,,
3,Krankenhaus-Dienstleistung-Problem,Ein Medien-Daten-Sperrverhalten trat aufgrund ...,ZurÃ¼ck zur E-Mail-Beschwerde Ã¼ber den Sperrver...,Incident,Customer Service,high,de,Security,Breach,Login,Maintenance,Incident,Resolution,Feedback,
4,Security,"Dear Customer Support, I am reaching out to in...","Dear [name], we take the security of medical d...",Request,Customer Service,medium,en,Security,Customer,Compliance,Breach,Documentation,Guidance,,


In [6]:
print(df.dtypes)

subject     object
body        object
answer      object
type        object
queue       object
priority    object
language    object
tag_1       object
tag_2       object
tag_3       object
tag_4       object
tag_5       object
tag_6       object
tag_7       object
tag_8       object
dtype: object
