# 02 - Labeling Intent

This notebook is where we create our **Golden Dataset** — a high-quality, manually labeled sample of customer tweets, annotated by intent.

---

### Intent Categories

| Intent Label        | Description                                      |
|---------------------|------------------------------------------------|
| `cancel_service`    | Customer wants to cancel account or switch plan|
| `billing_issue`     | Complaints or questions about billing           |
| `technical_issue`   | Problems with device, network, or service       |
| `account_help`      | Issues with account management or login         |
| `upgrade_request`   | Requests to upgrade device or plan               |
| `general_question`  | General questions or inquiries                    |
| `positive_feedback` | Praise or compliments                             |
| `complaint`         | Negative feedback not related to billing or tech|
| `other`             | Anything else or unclear                          |

# Labeling and Logic of Categorization

In [11]:
import pandas as pd
import sys
import os


# Fixing routing issue
project_root = os.path.abspath('..')
sys.path.append(project_root)

from scripts.preprocess import assign_intent

# Loads Data from Cleaned CSV
df = pd.read_csv("../data/processed/cleaned_tweets.csv")

# Sample 250 rows to label
golden_df = df.sample(250, random_state=42).copy()

# Creates Categories to put into column later
intent_categories = [
    "Billing",
    "Technical Support",
    "Account Management",
    "Complaint",
    "Praise/Thank You",
    "Other"
]

keywords = {
    "Billing": ["bill", "charge", "payment", "refund"],
    "Technical Support": ["error", "issue", "problem", "disconnect", "slow"],
    "Account Management": ["password", "login", "account", "reset"],
    "Complaint": ["bad", "terrible", "worst", "disappointed", "angry"],
    "Praise/Thank You": ["thank", "great", "love", "awesome", "appreciate"]
}

# Apply Categorization Function to Datarframe and Write to CSV file

In [14]:
df['intent'] = df['cleaned_text'].apply(lambda text: assign_intent(text, keywords))
print(df[['cleaned_text', 'intent']].sample(10))

df['tweet_length'] = df['cleaned_text'].str.len()

print(df[['cleaned_text', 'tweet_length']].sample(10))

# Writes golden dataset to csv file
df.to_csv("../data/processed/golden_intent_labeled.csv", index=False)
print("Golden dataset saved")

                                           cleaned_text              intent
1941   im sorry your package is arriving later than ...               Other
9211   specifically the two young ladies that worked...               Other
704   dear uber team i have uber account with the no...  Account Management
3718   but train left on time despite boards stating...               Other
8208   hi there please send us a dm in order for us ...               Other
9597   apologies for trouble caused to you as checke...             Billing
9330  in love with my new iphone x thanks iphonex ap...    Praise/Thank You
4819   still strong after five years thats awesome t...    Praise/Thank You
9028         when redzone isnt working on my xboxltltlt               Other
4634   i cant update my payment info because your we...             Billing
                                           cleaned_text  tweet_length
8736   thanks for your kind words what time service ...          87.0
6292   i dont even know 

# Quick Evaluation

In [13]:
# Peep the examples by intent
for label in df['intent'].unique():
    print(f"Intent: {label}")
    sample = df[df['intent'] == label][['cleaned_text']].dropna().sample(3, random_state=1)
    for i, row in enumerate(sample.itertuples(index=False), 1):
        print(f"\n  {i}. {row.cleaned_text}")

Intent: Other

  1.  you for choosing airtel martha 

  2. hey am trying so hard to track down hallowed but have been to stores and nothing staff never heard of it can you help

  3.  hey andy wed like to know more about your comment please dm us in order to assist you regarding this matter ac
Intent: Account Management

  1.  hey allison could you send us a dm with your accounts email address well take a look backstage fr 

  2.  hey there can you please dm us the phone number associated with your account so we can look into your background check 

  3.  hey there aaron sorry for the delay can you please provide me with your gt console and cod email and also can you please enter this web page and link your account to the one on your console jr
Intent: Billing

  1.  my cable bill along is add fubo to that too 

  2.  wasnt satisfied with my order it came fully incorrect and i only got a partial refund 

  3.  could you dm me the email address you sent it from or the reference number o

# Additional Features

                                           cleaned_text  tweet_length
4979                     i also reported it to and couk          31.0
2157   dattendre que vous receviez les documents pui...          89.0
4393   ever since i updated my phone it sucks batter...         102.0
5114                          we hope youve enjoyed it           26.0
7087  this update keep pissing me off get it togethe...          54.0
571    if you go to settings gt general gt about gt ...          87.0
6785   hey there can you send us a dm with the phone...         131.0
3249   permisi saya kmrn ingin berlangganan premium ...         126.0
1254   actually they didnt im not at all happy with ...         105.0
7510   gotcha do you happen to notice this issue whe...          88.0
