# Chatbot Development Notebook

This notebook details the development process of a customer service chatbot. We go through data preprocessing, model training, and evaluation steps, followed by testing the model with sample inputs.

In [1]:
import pandas as pd
import re
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
import pickle

In [2]:
# Load the dataset
df = pd.read_excel('../data/Chat_Team.xlsx')

# Display basic information about the dataset
df.head()
df.info()

# Check for missing values
df.isnull().sum()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31952 entries, 0 to 31951
Data columns (total 18 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   Transaction Start Date    31952 non-null  datetime64[ns]
 1   Agent                     31952 non-null  object        
 2   Chat Duration             31952 non-null  object        
 3   Teams                     31952 non-null  object        
 4   Session Name              31952 non-null  object        
 5   Chat Closed By            31741 non-null  object        
 6   Interactive Chat          31952 non-null  bool          
 7   Browser                   31952 non-null  object        
 8   Operating System          31952 non-null  object        
 9   Geo                       17665 non-null  object        
 10  Response Time of Agent    31952 non-null  object        
 11  Response time of Visitor  31952 non-null  object        
 12  Transaction End Da

Transaction Start Date          0
Agent                           0
Chat Duration                   0
Teams                           0
Session Name                    0
Chat Closed By                211
Interactive Chat                0
Browser                         0
Operating System                0
Geo                         14287
Response Time of Agent          0
Response time of Visitor        0
Transaction End Date            0
Customer Rating                 0
Customer Comment                0
Transferred Chat                0
Customer Wait Time              0
Text                          819
dtype: int64

In [6]:
def clean_text(text):
    # Remove unwanted characters and lowercase the text
    text = re.sub(r'\W', ' ', str(text))
    text = re.sub(r'\s+', ' ', text)
    return text.lower()

# Clean the text column
df['cleaned_text'] = df['Text'].apply(clean_text)

# Fill missing values in the 'Geo' and 'Customer Comment' columns
df['Geo'] = df['Geo'].fillna('Unknown')
df['Customer Comment'] = df['Customer Comment'].fillna('')

# Save cleaned data
df.to_csv('../data/clean_chat.csv', index=False)

: 

In [4]:
X = df['cleaned_text']
y = df['intent_label']  # Assume we have a column that labels intents

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Vectorize the text data
vectorizer = TfidfVectorizer(max_features=1000)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

KeyError: 'intent_label'

In [None]:
# Train a Naive Bayes model
model = MultinomialNB()
model.fit(X_train_tfidf, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test_tfidf)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
print(classification_report(y_test, y_pred))

# Save the model and vectorizer
with open('../models/intent_classifier.pkl', 'wb') as f:
    pickle.dump((model, vectorizer), f)

In [None]:
# Sample inputs to test the model
sample_text = ["I have a complaint about my service", "What are your business hours?"]
sample_tfidf = vectorizer.transform([clean_text(text) for text in sample_text])

# Predict intents for the sample inputs
sample_pred = model.predict(sample_tfidf)
print(sample_pred)

In this notebook, we trained a Naive Bayes model to classify customer intents based on text inputs. We also explored data preprocessing and vectorization techniques using TF-IDF.

Next steps include improving model performance, testing with more data, and integrating the model into the chatbot's interface.