# 02 - Labeling Intent

This notebook is where we create our **Golden Dataset** — a high-quality, manually labeled sample of customer tweets, annotated by intent.

---

### Intent Categories

| Intent Label        | Description                                      |
|---------------------|------------------------------------------------|
| `cancel_service`    | Customer wants to cancel account or switch plan|
| `billing_issue`     | Complaints or questions about billing           |
| `technical_issue`   | Problems with device, network, or service       |
| `account_help`      | Issues with account management or login         |
| `upgrade_request`   | Requests to upgrade device or plan               |
| `general_question`  | General questions or inquiries                    |
| `positive_feedback` | Praise or compliments                             |
| `complaint`         | Negative feedback not related to billing or tech|
| `other`             | Anything else or unclear                          |

# Labeling and Logic of Categorization

In [4]:
import pandas as pd

# Loads Data from Cleaned CSV
df = pd.read_csv("../data/processed/cleaned_tweets.csv")

# Sample 250 rows to label
golden_df = df.sample(250, random_state=42).copy()

# Creates Categories to put into column later
intent_categories = [
    "Billing",
    "Technical Support",
    "Account Management",
    "Complaint",
    "Praise/Thank You",
    "Other"
]

keywords = {
    "Billing": ["bill", "charge", "payment", "refund"],
    "Technical Support": ["error", "issue", "problem", "disconnect", "slow"],
    "Account Management": ["password", "login", "account", "reset"],
    "Complaint": ["bad", "terrible", "worst", "disappointed", "angry"],
    "Praise/Thank You": ["thank", "great", "love", "awesome", "appreciate"]
}

def assign_intent(text):
    if not isinstance(text, str):
        return "Other"
    
    for intent, kws in keywords.items():
        for kw in kws:
            if kw in text:
                return intent
    return "Other"

# Apply Categorization Function to Datarframe and Check First 10 Entries

In [13]:
df['intent'] = df['cleaned_text'].apply(assign_intent)
print(df[['cleaned_text', 'intent']].sample(10))

                                           cleaned_text             intent
6845   sorry to hear that send us a dm with your ema...              Other
875    hi dr standford happy to have you with us tod...              Other
3136      fam i need contact details for your sa office              Other
278                 sorry ill never cheat on you again               Other
1201   ive spent hours today trying to get this stra...              Other
2867                                           agent no              Other
330   just now i checked uber w promo applied was ce...              Other
7489   from waterloo packed even guard frustrated ti...   Praise/Thank You
9650   not impressed with your app i deferred having...              Other
4777   heres what you can do to work around the issu...  Technical Support
