# 02 - Labeling Intent

This notebook is where we create our **Golden Dataset** — a high-quality, manually labeled sample of customer tweets, annotated by intent.

---

### Intent Categories

| Intent Label        | Description                                      |
|---------------------|------------------------------------------------|
| `cancel_service`    | Customer wants to cancel account or switch plan|
| `billing_issue`     | Complaints or questions about billing           |
| `technical_issue`   | Problems with device, network, or service       |
| `account_help`      | Issues with account management or login         |
| `upgrade_request`   | Requests to upgrade device or plan               |
| `general_question`  | General questions or inquiries                    |
| `positive_feedback` | Praise or compliments                             |
| `complaint`         | Negative feedback not related to billing or tech|
| `other`             | Anything else or unclear                          |

# Labeling and Logic of Categorization

In [1]:
import pandas as pd
import sys
import os


# Fixing routing issue
project_root = os.path.abspath('..')
sys.path.append(project_root)

from scripts.preprocess import assign_intent

# Loads Data from Cleaned CSV
df = pd.read_csv("../data/processed/cleaned_tweets.csv")

# Sample 250 rows to label
golden_df = df.sample(250, random_state=42).copy()

# Creates Categories to put into column later
intent_categories = [
    "Billing",
    "Technical Support",
    "Account Management",
    "Complaint",
    "Praise/Thank You",
    "Other"
]

keywords = {
    "Billing": ["bill", "charge", "payment", "refund"],
    "Technical Support": ["error", "issue", "problem", "disconnect", "slow"],
    "Account Management": ["password", "login", "account", "reset"],
    "Complaint": ["bad", "terrible", "worst", "disappointed", "angry"],
    "Praise/Thank You": ["thank", "great", "love", "awesome", "appreciate"]
}

# Apply Categorization Function to Datarframe and Write to CSV file

In [2]:
df['intent'] = df['cleaned_text'].apply(lambda text: assign_intent(text, keywords))
print(df[['cleaned_text', 'intent']].sample(10))

df['tweet_length'] = df['cleaned_text'].str.len()

print(df[['cleaned_text', 'tweet_length']].sample(10))

# Writes golden dataset to csv file
df.to_csv("../data/processed/golden_intent_labeled.csv", index=False)
print("Golden dataset saved")

                                           cleaned_text             intent
1766       oh thats actually a very good help thank you   Praise/Thank You
1980   sorry to hear that where did you acquire the ...              Other
1389   i get to the end of an avios booking then get...              Other
4639                                         delta suck              Other
5451   can i use yesterdays tickets from euston to m...              Other
14     sorry for the trouble kindly send us a note v...              Other
8875   sowhy bring up the hour timeframe if the cust...              Other
6674   hi ciaran im sorry you havent received a conf...  Technical Support
9567   we do not have a completion time set we are b...              Other
1144   im sorry about the hassle please reach out to...              Other
                                           cleaned_text  tweet_length
8253   the worst card ever i got a bill for for a ca...           114
313    im sorry you didnt dig it we

# Quick Evaluation

In [3]:
# Peep the examples by intent
for label in df['intent'].unique():
    print(f"Intent: {label}")
    sample = df[df['intent'] == label][['cleaned_text']].dropna().sample(3, random_state=1)
    for i, row in enumerate(sample.itertuples(index=False), 1):
        print(f"\n  {i}. {row.cleaned_text}")

Intent: Technical Support

  1.  what is the issue with your airtime lawrence has it been resolved for you now please dm us 

  2.  it keeps saying error code ive been pre ordering cod since black ops never had this bad of a launch please fix major cod fan here

  3.  hi ani sorry to hear youre having issues with your service if you could please dm your account phone number an 
Intent: Other

  1.  think its possible to get bumped to even more itd be a wonderful experience

  2.  your package will be delivered tomorrow im sorry about the inconvenience i dont have any more current updates on this since your local center in canada is dealing with you directly mc 

  3.  is it possible to set up an azure vps instance in china
Intent: Billing

  1.  i just tried to update my payment but got this message the csrf token is invalid please try to resubmit the form i redid the form and it still gives me the same error please help

  2. wow amazon charged me for this prime thingy like i knew i h

# Additional Features