# Service‑Ticket Sentiment & Topic Tagging
This notebook walks through generating a synthetic dataset, training two classical ML models (Logistic Regression for **sentiment**, k‑NN for multi‑label **topic** tagging), evaluating them, saving artefacts, and exposing a simple Flask REST endpoint.

In [64]:
import random, pickle, os, json
from pathlib import Path

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score

DATA_DIR = Path.cwd()


## 1. Generate a synthetic dataset

In [67]:

def generate_mock_dataset(n: int = 5000, seed: int = 42) -> pd.DataFrame:
    random.seed(seed)
    topics_examples = {
        "billing": [
            "My invoice is incorrect",
            "Need refund for over‑charge",
            "Please update my billing address",
        ],
        "technical": [
            "App crashes on start‑up",
            "Cannot connect to server",
            "Error 500 while uploading a file",
        ],
        "account": [
            "Password‑reset link not working",
            "I can't update my profile details",
            "Account locked after failed attempts",
        ],
        "service": [
            "Internet has been down since morning",
            "Pages load very slowly",
            "Service unavailable in my area",
        ],
        "feature": [
            "Please add a dark‑mode option",
            "Requesting integration with Slack",
            "Need an export‑to‑CSV feature",
        ],
    }
    pos_templates = [
        "Thanks for the quick resolution of {}.",
        "Great job handling {}.",
        "Appreciate the support on {}."
    ]
    neg_templates = [
        "I'm frustrated because {}!",
        "Still waiting for a fix on {}.",
        "This is unacceptable: {}."
    ]
    records = []
    for _ in range(n):
        topic = random.choice(list(topics_examples))
        base = random.choice(topics_examples[topic])
        if random.random() < 0.5:
            sentiment = "negative"
            text = random.choice(neg_templates).format(base.lower())
        else:
            sentiment = "positive"
            text = random.choice(pos_templates).format(base.lower())
        records.append({"ticket_text": text, "sentiment": sentiment, "topic": topic})
    return pd.DataFrame.from_records(records)

df = generate_mock_dataset()
df.head()


Unnamed: 0,ticket_text,sentiment,topic
0,Thanks for the quick resolution of my invoice ...,positive,billing
1,Appreciate the support on app crashes on start...,positive,technical
2,Thanks for the quick resolution of please add ...,positive,feature
3,This is unacceptable: my invoice is incorrect.,negative,billing
4,Appreciate the support on please add a dark‑mo...,positive,feature


## 2. Vectorise text with TF‑IDF

In [70]:

tfidf = TfidfVectorizer(max_features=5000, ngram_range=(1, 2))
X = tfidf.fit_transform(df["ticket_text"])


## 3. Train a Logistic Regression classifier for sentiment

In [73]:

y_sent = df["sentiment"]
X_train_s, X_test_s, y_train_s, y_test_s = train_test_split(
    X, y_sent, test_size=0.2, stratify=y_sent, random_state=42
)
sent_clf = LogisticRegression(max_iter=1000)
sent_clf.fit(X_train_s, y_train_s)
sent_acc = accuracy_score(y_test_s, sent_clf.predict(X_test_s))
print(f"Sentiment accuracy: {sent_acc:.3f}")


Sentiment accuracy: 1.000


## 4. Train a k‑NN classifier for multi‑label topic tagging

In [76]:

mlb = MultiLabelBinarizer()
y_topic = mlb.fit_transform([[t] for t in df["topic"]])
X_train_t, X_test_t, y_train_t, y_test_t = train_test_split(
    X, y_topic, test_size=0.2, random_state=42
)
topic_clf = KNeighborsClassifier(n_neighbors=5)
topic_clf.fit(X_train_t, y_train_t)
topic_f1 = f1_score(y_test_t, topic_clf.predict(X_test_t), average="micro")
print(f"Topic micro‑F1: {topic_f1:.3f}")


Topic micro‑F1: 1.000


## 5. Save trained artefacts

In [79]:

os.makedirs(DATA_DIR, exist_ok=True)
(df.to_csv(DATA_DIR / "mock_tickets.csv", index=False))
pickle.dump(tfidf, open(DATA_DIR / "tfidf_vectorizer.pkl", "wb"))
pickle.dump(sent_clf, open(DATA_DIR / "sentiment_model.pkl", "wb"))
pickle.dump(topic_clf, open(DATA_DIR / "topic_model.pkl", "wb"))
pickle.dump(mlb, open(DATA_DIR / "topic_mlb.pkl", "wb"))
print("Artefacts saved to", DATA_DIR)


Artefacts saved to /Users/jay/Desktop/Service_Ticket Sentiment_and_Topic_Tagging


## 6. Inference helper

In [90]:

def predict(text: str):
    X = tfidf.transform([text])
    sentiment = sent_clf.predict(X)[0]
    topics = mlb.inverse_transform(topic_clf.predict(X))[0]
    return {"sentiment": sentiment, "topics": list(topics)}

predict("Still waiting for a fix on my billing address")


{'sentiment': 'negative', 'topics': ['billing']}