# Bitext Customer Support Dataset — Quick Stats (TechGadgets Fine-Tuning)

This notebook loads the Bitext customer support dataset from Hugging Face and prints the **required** dataset statistics:
- Total examples
- Columns
- Total categories + distribution
- Total intents + top intents
- One sample instruction per category
- Missing/null values


In [2]:
from datasets import load_dataset
import pandas as pd

# Load dataset
dataset = load_dataset("bitext/Bitext-customer-support-llm-chatbot-training-dataset")
df = pd.DataFrame(dataset["train"])

df.head()

  from .autonotebook import tqdm as notebook_tqdm
Generating train split: 100%|██████████| 26872/26872 [00:00<00:00, 172748.12 examples/s]


Unnamed: 0,flags,instruction,category,intent,response
0,B,question about cancelling order {{Order Number}},ORDER,cancel_order,I've understood you have a question regarding ...
1,BQZ,i have a question about cancelling oorder {{Or...,ORDER,cancel_order,I've been informed that you have a question ab...
2,BLQZ,i need help cancelling puchase {{Order Number}},ORDER,cancel_order,I can sense that you're seeking assistance wit...
3,BL,I need to cancel purchase {{Order Number}},ORDER,cancel_order,I understood that you need assistance with can...
4,BCELN,"I cannot afford this order, cancel purchase {{...",ORDER,cancel_order,I'm sensitive to the fact that you're facing f...


In [3]:
print("Total examples:", len(df))
print("Columns:", list(df.columns))

Total examples: 26872
Columns: ['flags', 'instruction', 'category', 'intent', 'response']


In [4]:
# Category stats
category_counts = df["category"].value_counts()
print("Total categories:", df["category"].nunique())
print("\nCategory distribution:")
category_counts

Total categories: 11

Category distribution:


category
ACCOUNT         5986
ORDER           3988
REFUND          2992
INVOICE         1999
CONTACT         1999
PAYMENT         1998
FEEDBACK        1997
DELIVERY        1994
SHIPPING        1970
SUBSCRIPTION     999
CANCEL           950
Name: count, dtype: int64

In [5]:
# Intent stats
intent_counts = df["intent"].value_counts()
print("Total intents:", df["intent"].nunique())
print("\nTop 10 intents:")
intent_counts.head(10)

Total intents: 27

Top 10 intents:


intent
check_invoice               1000
complaint                   1000
contact_customer_service    1000
edit_account                1000
switch_account              1000
check_payment_methods        999
contact_human_agent          999
delivery_period              999
get_invoice                  999
newsletter_subscription      999
Name: count, dtype: int64

In [6]:
# One sample instruction per category
sample_per_category = (
    df.groupby("category")["instruction"]
      .first()
      .reset_index()
)
sample_per_category

Unnamed: 0,category,instruction
0,ACCOUNT,new {{Account Type}} acount for wife
1,CANCEL,"I can't ifnd the bloody termination charge, I ..."
2,CONTACT,I want help to speak to customer support
3,DELIVERY,could you help me check what delivery methods ...
4,FEEDBACK,help me to file a claim
5,INVOICE,show me invoice{{Invoice Number}}
6,ORDER,question about cancelling order {{Order Number}}
7,PAYMENT,I try to list your available payment methoids
8,REFUND,i do not know hoow i could check ur reimbursem...
9,SHIPPING,give me information about a delivery address m...


In [7]:
# Missing/null check
df.isnull().sum()

flags          0
instruction    0
category       0
intent         0
response       0
dtype: int64