# Support Ticket Classification & Prioritization
### Machine Learning Task 2 (2026)

In this notebook, we build an ML system to automatically classify and prioritize customer support tickets. This helps businesses:
- **Respond Faster**: Categorized tickets reach the right team instantly.
- **Reduce Backlog**: Automated sorting eliminates manual effort.
- **Improve Satisfaction**: High-priority issues are addressed first.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
import string
import re
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.preprocessing import LabelEncoder
import joblib

# Pre-download NLP assets
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('punkt_tab')

## 1. Data Loading & Exploratory Data Analysis (EDA)
We start by loading the dataset and examining its structure.

In [None]:
df = pd.read_csv('customer_support_tickets.csv')
print(f"Total Tickets: {df.shape[0]}")
df[['Ticket Subject', 'Ticket Description', 'Ticket Type', 'Ticket Priority']].head()

## 2. Text Preprocessing
Raw text is messy. We need to clean it by:
- Lowercasing
- Removing punctuation and numbers
- Removing 'stopwords' (common words like 'the', 'is' that don't add meaning)
- **Lemmatization**: Reducing words to their base form (e.g., 'charging' -> 'charge').

In [None]:
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

def clean_text(text):
    if pd.isna(text):
        return ""
    text = text.lower()
    text = text.translate(str.maketrans('', '', string.punctuation))
    text = re.sub(r'\d+', '', text)
    tokens = word_tokenize(text)
    tokens = [lemmatizer.lemmatize(word) for word in tokens if word not in stop_words]
    return " ".join(tokens)

# Combine Subject and Description for better context
df['combined_text'] = df['Ticket Subject'].fillna('') + " " + df['Ticket Description'].fillna('')
df['cleaned_text'] = df['combined_text'].apply(clean_text)

print("Sample Cleaned Text:")
print(df['cleaned_text'].iloc[0])

## 3. Feature Extraction & Model Training
We use **TF-IDF** to convert text into numbers and **Random Forest** for classification.

In [None]:
# Vectorization
tfidf = TfidfVectorizer(max_features=5000, ngram_range=(1, 2))
X = tfidf.fit_transform(df['cleaned_text'])

# Pre-train Category Model
le_type = LabelEncoder()
y_type = le_type.fit_transform(df['Ticket Type'])
X_train_t, X_test_t, y_train_t, y_test_t = train_test_split(X, y_type, test_size=0.2, random_state=42)

model_type = RandomForestClassifier(n_estimators=100, random_state=42)
model_type.fit(X_train_t, y_train_t)

# Pre-train Priority Model
le_prio = LabelEncoder()
y_prio = le_prio.fit_transform(df['Ticket Priority'])
X_train_p, X_test_p, y_train_p, y_test_p = train_test_split(X, y_prio, test_size=0.2, random_state=42)

model_prio = RandomForestClassifier(n_estimators=100, random_state=42)
model_prio.fit(X_train_p, y_train_p)

## 4. Evaluation & Visualizations
We visualize the performance using **Confusion Matrices**.

In [None]:
def plot_evaluation(model, X_test, y_test, classes, title):
    y_pred = model.predict(X_test)
    cm = confusion_matrix(y_test, y_pred)
    
    plt.figure(figsize=(8, 6))
    sns.heatmap(cm, annot=True, fmt='d', xticklabels=classes, yticklabels=classes, cmap='Blues')
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.title(f'Confusion Matrix: {title}')
    plt.show()
    
    print(f"\nClassification Report for {title}:")
    print(classification_report(y_test, y_pred, target_names=classes))

plot_evaluation(model_type, X_test_t, y_test_t, le_type.classes_, "Ticket Category")
plot_evaluation(model_prio, X_test_p, y_test_p, le_prio.classes_, "Ticket Priority")

## 5. Inference: Testing the System
Let's see how the model handles a new, unseen ticket.

In [None]:
def predict_ticket(subject, description):
    text = clean_text(subject + " " + description)
    vec = tfidf.transform([text])
    
    cat = le_type.inverse_transform(model_type.predict(vec))[0]
    prio = le_prio.inverse_transform(model_prio.predict(vec))[0]
    
    return f"Category: {cat} | Priority: {prio}"

new_subject = "Login error"
new_desc = "I cannot access my account because the password reset link is not working."
print(f"Ticket: {new_subject}")
print(predict_ticket(new_subject, new_desc))