## PROJECT OVERVIEW

When a user asks a question, we want to:

- Understand the user’s intent (e.g., reset password, check order, etc.)
- Respond with a predefined answer based on the intent

We do this by training an ML classifier to map user inputs to intents.

In [72]:
# Import Lib
import pandas as pd
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split


In [73]:
# Loading Data
train_data = pd.read_csv('customer-intent-dataset/Bitext_Sample_Customer_Service_Training_Dataset.csv')
test_data = pd.read_csv('customer-intent-dataset/Bitext_Sample_Customer_Service_Testing_Dataset.csv')
val_data = pd.read_csv('customer-intent-dataset/Bitext_Sample_Customer_Service_Validation_Dataset.csv')


In [74]:
train_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6539 entries, 0 to 6538
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   utterance  6539 non-null   object
 1   intent     6539 non-null   object
 2   category   6539 non-null   object
 3   tags       6539 non-null   object
dtypes: object(4)
memory usage: 204.5+ KB


In [75]:
val_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 818 entries, 0 to 817
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   utterance  818 non-null    object
 1   intent     818 non-null    object
 2   category   818 non-null    object
 3   tags       818 non-null    object
dtypes: object(4)
memory usage: 25.7+ KB


In [76]:
test_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 818 entries, 0 to 817
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   utterance  818 non-null    object
 1   intent     818 non-null    object
 2   category   818 non-null    object
 3   tags       818 non-null    object
dtypes: object(4)
memory usage: 25.7+ KB


In [77]:
# Getting Unqiue Intent or Category
train_data['intent'].unique()

array(['cancel_order', 'change_order', 'change_shipping_address',
       'check_cancellation_fee', 'check_invoice', 'check_payment_methods',
       'check_refund_policy', 'complaint', 'contact_customer_service',
       'contact_human_agent', 'create_account', 'delete_account',
       'delivery_options', 'delivery_period', 'edit_account',
       'get_invoice', 'get_refund', 'newsletter_subscription',
       'payment_issue', 'place_order', 'recover_password',
       'registration_problems', 'review', 'set_up_shipping_address',
       'switch_account', 'track_order', 'track_refund'], dtype=object)

In [78]:
# Responses for the intent

intent_response_pairs = [
    ("cancel_order", "Your order can be canceled within 24 hours of placement. Please visit your order page or contact support to proceed."),
    ("change_order", "To make changes to an existing order, please go to your order summary and click 'Edit Order' or reach out to support."),
    ("change_shipping_address", "If your order hasn't shipped yet, you can change the address under 'My Orders'. Otherwise, contact support."),
    ("check_cancellation_fee", "Orders canceled after processing may incur a cancellation fee. Please check your order terms for details."),
    ("check_invoice", "Your invoice can be found in your account under the 'Orders' section. Click on the specific order to download it."),
    ("check_payment_methods", "We accept credit/debit cards, PayPal, and bank transfers. You can view all options at checkout."),
    ("check_refund_policy", "We offer refunds within 30 days of delivery. Items must be unused and in original packaging."),
    ("complaint", "We're sorry for the inconvenience. Please describe your issue, and our team will investigate it immediately."),
    ("contact_customer_service", "Our customer service team is available 24/7 via live chat or email at support@example.com."),
    ("contact_human_agent", "Connecting you to a human agent... please wait a moment while we transfer your chat."),
    ("create_account", "To create an account, click 'Sign Up' on the top right corner and fill in your details."),
    ("delete_account", "To delete your account, go to Account Settings > Delete Account. Note that this action is irreversible."),
    ("delivery_options", "We offer standard, express, and same-day delivery options in select areas. You can choose one during checkout."),
    ("delivery_period", "Standard delivery takes 3–5 business days. Express delivery options are also available at checkout."),
    ("edit_account", "You can edit your profile, contact info, and preferences in your account settings."),
    ("get_invoice", "You can get your invoice by logging into your account and navigating to the 'Invoices' section."),
    ("get_refund", "To request a refund, go to your order history, select the item, and click 'Request Refund'."),
    ("newsletter_subscription", "You can subscribe to our newsletter in your profile settings or at the bottom of our homepage."),
    ("payment_issue", "If you're experiencing issues with payment, please verify your payment details or try a different method."),
    ("place_order", "You can place an order by browsing our products, adding them to your cart, and completing the checkout process."),
    ("recover_password", "Click 'Forgot Password?' on the login page to reset your password. Follow the instructions in the email."),
    ("registration_problems", "If you're facing issues registering, ensure all required fields are filled and try again. Contact support if the problem persists."),
    ("review", "We'd love your feedback! You can leave a review on the product page or through the link we sent via email after your purchase."),
    ("set_up_shipping_address", "You can set up your shipping address under your profile in 'Shipping Information'."),
    ("switch_account", "Please log out of your current session and log in with the credentials of the other account you'd like to use."),
    ("track_order", "You can track your order using the tracking link sent to your email or from your account dashboard under 'My Orders'."),
    ("track_refund", "Refunds are usually processed within 5–7 business days. You can track your refund status under 'My Orders'."),
    ("greet", "Hello! How can I help you today?")
]

In [79]:
# Create DataFrame
df = pd.DataFrame(intent_response_pairs, columns=["Intent", "Response"])
df.to_csv('customer-intent-dataset/intent-responses.csv')

In [80]:
greeting_data = [
    ("hello", "greet"),
    ("hi", "greet"),
    ("hey", "greet"),
    ("good morning", "greet"),
    ("good afternoon", "greet"),
    ("good evening", "greet"),
    ("howdy", "greet"),
    ("heya", "greet"),
    ("hi there", "greet"),
    ("hello there", "greet"),
    ("hey there", "greet"),
    ("yo", "greet"),
    ("hiya", "greet"),
    ("morning", "greet"),
    ("afternoon", "greet"),
    ("evening", "greet"),
    ("what’s up", "greet"),
    ("sup", "greet"),
    ("hey, how are you", "greet"),
    ("how are you doing", "greet"),
    ("how’s it going", "greet"),
    ("hope you’re doing well", "greet"),
    ("long time no see", "greet"),
    ("nice to meet you", "greet"),
    ("pleased to meet you", "greet"),
    ("good to see you", "greet"),
    ("yo, what’s good", "greet"),
    ("hey buddy", "greet"),
    ("hey friend", "greet"),
    ("hello my friend", "greet"),
]


In [81]:
greet_df = pd.DataFrame(greeting_data, columns=["utterance", "intent"])
greet_df_x, greet_df_y = greet_df['utterance'], greet_df['intent']
greet_df_x_train, greet_df_x_test, greet_df_y_train, greet_df_y_test = train_test_split(greet_df_x, greet_df_y, test_size=0.3, random_state=42)

In [82]:
print(greet_df_x_train.isnull().sum())
print(greet_df_y_train.isnull().sum())
print(greet_df_x_test.isnull().sum())
print(greet_df_y_test.isnull().sum())

0
0
0
0


In [83]:
greet_df_x_train

0                      hello
4             good afternoon
16                 what’s up
5               good evening
13                   morning
11                        yo
22          long time no see
1                         hi
2                        hey
25           good to see you
3               good morning
21    hope you’re doing well
26           yo, what’s good
18          hey, how are you
29           hello my friend
20            how’s it going
7                       heya
10                 hey there
14                 afternoon
19         how are you doing
6                      howdy
Name: utterance, dtype: object

### Prepare Training Data
Training data is a set of sample customer questions (texts) paired with labels representing their meaning or intent.

In [84]:
train_data.head()

Unnamed: 0,utterance,intent,category,tags
0,would it be possible to cancel the order I made?,cancel_order,ORDER,BIP
1,cancelling order,cancel_order,ORDER,BK
2,I need assistance canceling the last order I h...,cancel_order,ORDER,B
3,problem with canceling the order I made,cancel_order,ORDER,B
4,I don't know how to cancel the order I made,cancel_order,ORDER,B


#### spliting data into X and y

In [85]:
X_train = train_data.utterance
y_train = train_data.intent
X_test = test_data.utterance
y_test = test_data.intent
X_val = val_data.utterance
y_val = val_data.intent

In [86]:
# Convert greeting utterances and intents to lists
greet_X = greet_df_x_train.tolist()
greet_y = greet_df_y_train.tolist()

# If X_train and y_train are Pandas Series, convert them to lists too
X_train = X_train.tolist() if hasattr(X_train, "tolist") else list(X_train)
y_train = y_train.tolist() if hasattr(y_train, "tolist") else list(y_train)

# Extend training data with greeting samples
X_train.extend(greet_X)
y_train.extend(greet_y)

# Do the same for test set
greet_X_test = greet_df_x_test.tolist()
greet_y_test = greet_df_y_test.tolist()

X_test = X_test.tolist() if hasattr(X_test, "tolist") else list(X_test)
y_test = y_test.tolist() if hasattr(y_test, "tolist") else list(y_test)

X_test.extend(greet_X_test)
y_test.extend(greet_y_test)


In [95]:
X_train

['would it be possible to cancel the order I made?',
 'cancelling order',
 'I need assistance canceling the last order I have made',
 'problem with canceling the order I made',
 "I don't know how to cancel the order I made",
 'can you help me cancel the order I made?',
 'I would like to know about order cancellations',
 'could you help me cancelling an order?',
 "I don't know how to cancel an order I made",
 'help me cancelling my last order',
 'I do not know how to cancel the last order I made',
 'I need assistance with canceling an order I made',
 'information about canceling an order',
 'I would like to cancel an order',
 'I need assistance with cancelling my orders',
 'I need assistance with canceling the order I have made',
 'assistance canceling the order I have made',
 'question about cancelling the last order',
 'help me to cancel my last order',
 'problem with cancelling orders',
 'problems with cancelling an order',
 'assistance with cancelling the order I made',
 'I need hel

In [88]:
# Convert text into numerical vectors or vectorization
# TfidfVectorizer() converts words into a weighted numerical format based on how important a word

vectorizer = TfidfVectorizer()
X_train_vect = vectorizer.fit_transform(X_train)
X_test_vect = vectorizer.transform(X_test)
X_val_vect = vectorizer.transform(X_val)

In [89]:
X_train_vect

<6560x655 sparse matrix of type '<class 'numpy.float64'>'
	with 47079 stored elements in Compressed Sparse Row format>

In [90]:
X_train_vect.toarray()

array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.28411939, 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ]])

In [91]:
# Create and train the model
# Naive Bayes is great for text classification.
# It learns how the words in the question relate to different intents.

model = MultinomialNB()
model.fit(X_train_vect, y_train)

In [92]:
# Predict intent
predicted_label = model.predict(X_test_vect)
accuracy = accuracy_score(y_test, predicted_label)
print(f"Accuracy: {accuracy:.2f}")

Accuracy: 0.98


In [93]:
# Saving model and vectorizer
import joblib

joblib.dump(model, 'model.pkl')
joblib.dump(vectorizer, 'vectorizer.pkl')

['vectorizer.pkl']

In [94]:
user_input = 'how can i get my order'
vect_input = vectorizer.transform([user_input])
predicted_label = model.predict(vect_input)
print(predicted_label)
response = df.set_index('Intent').to_dict().get('Response')
user_response = response.get(predicted_label[0])
user_response

['track_order']


"You can track your order using the tracking link sent to your email or from your account dashboard under 'My Orders'."