<a href="https://colab.research.google.com/github/Jan2309jr/deepCSAT/blob/main/deepcsat.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - DeepCSAT : Ecommerce Customer Satisfaction Score Prediction




##### **Project Type**    - Supervised learning using artificial neural networks
##### **Contribution**    - Individual
##### **By - Janani Ravi**

# **Project Summary -**

This project aims to predict Customer Satisfaction (CSAT) scores using Deep Learning Artificial Neural Networks (ANN). By leveraging customer interaction data and feedback, the model forecasts satisfaction levels with high precision. The system provides actionable insights that help e-commerce businesses enhance service quality, boost customer loyalty, and support data-driven decision-making for continuous improvement.

# **GitHub Link -**

[Click here](https://github.com/Jan2309jr/deepCSAT.git)

# **Problem Statement**


Customer satisfaction (CSAT) is a key factor that drives customer retention, loyalty, and overall business growth in the e-commerce industry. Traditional survey-based methods for measuring satisfaction are often time-consuming and fail to capture real-time customer sentiments. The challenge is to accurately predict CSAT scores using interaction-based data to identify service gaps and improve performance.

# ***Implementation***

In [None]:
# Import Libraries

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

## ***1. Getting Familiar with the Dataset***

In [None]:
# Load Dataset
url='https://drive.google.com/uc?export=download&id=10pFYAEZqnZ9mQHUrxly7xwe9qRKL7uM5'
df=pd.read_csv(url)

In [None]:
# Dataset First Look
df.head()

In [None]:
# Dataset Rows & Columns count
print("rows:",df.shape[0])
print("columns:",df.shape[1])

In [None]:
# Dataset Info
df.info()

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
plt.figure(figsize=(12,6))
plt.bar(df.columns,df.isnull().sum())
plt.tick_params(axis='x',rotation=90)
plt.show()

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns.tolist()

In [None]:
# Dataset Describe
df.describe()

In [None]:
# Check Unique Values for each variable.
df.nunique()

## 3. ***Data Wrangling***

In [None]:
#Removing irrelevant columns
df.drop(["Unique id","channel_name","Customer_City", "Order_id","Customer Remarks","order_date_time","Product_category","Item_price","connected_handling_time"], axis=1, inplace=True)

In [None]:
# Standardizing the columns
df.rename(columns={
    "Satisfaction Score": "satisfaction_score",
    "category": "Category",
    "Sub-category": "Sub Category",
    "Issue_reported at": "Issue Reported At",
    "issue_responded": "Issue Responded At",
    "Survey_response_Date": "Survey Response Date",
    "Agent_name": "Agent Name"
}, inplace=True)

In [None]:
# Formatting Date and Time Columns
df["Issue Reported At"] = pd.to_datetime(df["Issue Reported At"], errors='coerce')
df["Issue Responded At"] = pd.to_datetime(df["Issue Responded At"], errors='coerce')
df["Survey Response Date"] = pd.to_datetime(df["Survey Response Date"], errors='coerce')

In [None]:
df.head()

In [None]:
df.shape

## ***4. Exploratoty Data Analysis***

In [None]:
#Csat score distribution
sns.countplot(x='CSAT Score', data=df)
plt.title('Distribution of CSAT Scores')

In [None]:
#Category featuring
sns.countplot(y='Category', data=df, order=df['Category'].value_counts().index)

In [None]:
#sub category featuring
df['Sub Category'].value_counts().head(10).plot(kind='barh')

In [None]:
#tenure bucket distribution
sns.countplot(x='Tenure Bucket', data=df)

In [None]:
#Response Time Calculation(adding new feature)
df['Response_Time_Mins'] = (df['Issue Responded At'] - df['Issue Reported At']).dt.total_seconds() / 60

In [None]:
# Average Response Time by Shift
sns.barplot(x='Agent Shift', y='Response_Time_Mins', data=df)

In [None]:
# Response Time Trend Over Time
df.groupby(df['Issue Reported At'].dt.date)['Response_Time_Mins'].mean().plot()

In [None]:
#CSAT Over Time
df.groupby(df['Survey Response Date'].dt.to_period('M'))['CSAT Score'].mean().plot(kind='line')

In [None]:
#Average CSAT by Agent
top_agents = df.groupby('Agent Name')['CSAT Score'].mean().sort_values(ascending=False).head(10)
top_agents.plot(kind='barh', title='Top 10 Agents by Avg CSAT')

In [None]:
#Average Response Time by Agent
sns.barplot(x='Agent Name', y='Response_Time_Mins', data=df.sort_values('Response_Time_Mins', ascending=False).head(10))
plt.xticks(rotation=90)

In [None]:
#Supervisor vs. Average CSAT
sns.boxplot(x='Supervisor', y='CSAT Score', data=df)
plt.xticks(rotation=90)

In [None]:
# Correlation Heatmap (numeric features only)
sns.heatmap(df.select_dtypes(include='number').corr(), annot=True, cmap='coolwarm')

In [None]:
#Response Time vs. CSAT
sns.scatterplot(x='Response_Time_Mins', y='CSAT Score', data=df)

In [None]:
#Category vs. CSAT
sns.boxplot(x='Category', y='CSAT Score', data=df)
plt.xticks(rotation=90)

In [None]:
#Shift vs. CSAT
sns.barplot(x='Agent Shift', y='CSAT Score', data=df)

In [None]:
#Tenure vs. CSAT
sns.barplot(x='Tenure Bucket', y='CSAT Score', data=df)

In [None]:
# Tenure vs. Response Time
sns.boxplot(x='Tenure Bucket', y='Response_Time_Mins', data=df)

In [None]:
#Manager-wise Avg CSAT
manager_csat = df.groupby('Manager')['CSAT Score'].mean().sort_values(ascending=False)
manager_csat.plot(kind='bar', title='Average CSAT by Manager')

In [None]:
#Manager vs. Average Response Time
sns.barplot(x='Manager', y='Response_Time_Mins', data=df)
plt.xticks(rotation=90)

In [None]:
#Pairplot for numerical relations
sns.pairplot(df[['Response_Time_Mins', 'CSAT Score']])

In [None]:
#Category vs. Response Time
sns.boxplot(x='Category', y='Response_Time_Mins', data=df)

In [None]:
# Heatmap for Encoded Categorical Variables (after label encoding)
encoded_df = df.copy()
# Encode categorical columns to numeric first
from sklearn.preprocessing import LabelEncoder
for col in df.select_dtypes(include='object'):
    encoded_df[col] = LabelEncoder().fit_transform(df[col])
sns.heatmap(encoded_df.corr(), cmap='coolwarm')

## ***5. Feature Engineering & Data Pre-processing***

### 1. Handling Missing Values

In [None]:
# Handling Missing Values & Missing Value Imputation

#### What all missing value imputation techniques have you used and why did you use those techniques?

Answer Here.

### 2. Handling Outliers

In [None]:
# Handling Outliers & Outlier treatments

##### What all outlier treatment techniques have you used and why did you use those techniques?

Answer Here.

### 3. Categorical Encoding

In [None]:
# Encode your categorical columns

#### What all categorical encoding techniques have you used & why did you use those techniques?

Answer Here.

### 4. Textual Data Preprocessing
(It's mandatory for textual dataset i.e., NLP, Sentiment Analysis, Text Clustering etc.)

#### 1. Expand Contraction

In [None]:
# Expand Contraction

#### 2. Lower Casing

In [None]:
# Lower Casing

#### 3. Removing Punctuations

In [None]:
# Remove Punctuations

#### 4. Removing URLs & Removing words and digits contain digits.

In [None]:
# Remove URLs & Remove words and digits contain digits

#### 5. Removing Stopwords & Removing White spaces

In [None]:
# Remove Stopwords

In [None]:
# Remove White spaces

#### 6. Rephrase Text

In [None]:
# Rephrase Text

#### 7. Tokenization

In [None]:
# Tokenization

#### 8. Text Normalization

In [None]:
# Normalizing Text (i.e., Stemming, Lemmatization etc.)

##### Which text normalization technique have you used and why?

Answer Here.

#### 9. Part of speech tagging

In [None]:
# POS Taging

#### 10. Text Vectorization

In [None]:
# Vectorizing Text

##### Which text vectorization technique have you used and why?

Answer Here.

### 4. Feature Manipulation & Selection

#### 1. Feature Manipulation

In [None]:
# Manipulate Features to minimize feature correlation and create new features

#### 2. Feature Selection

In [None]:
# Select your features wisely to avoid overfitting

##### What all feature selection methods have you used  and why?

Answer Here.

##### Which all features you found important and why?

Answer Here.

### 5. Data Transformation

#### Do you think that your data needs to be transformed? If yes, which transformation have you used. Explain Why?

In [None]:
# Transform Your data

### 6. Data Scaling

In [None]:
# Scaling your data

##### Which method have you used to scale you data and why?

### 7. Dimesionality Reduction

##### Do you think that dimensionality reduction is needed? Explain Why?

Answer Here.

In [None]:
# DImensionality Reduction (If needed)

##### Which dimensionality reduction technique have you used and why? (If dimensionality reduction done on dataset.)

Answer Here.

### 8. Data Splitting

In [None]:
# Split your data to train and test. Choose Splitting ratio wisely.

##### What data splitting ratio have you used and why?

Answer Here.

### 9. Handling Imbalanced Dataset

##### Do you think the dataset is imbalanced? Explain Why.

Answer Here.

In [None]:
# Handling Imbalanced Dataset (If needed)

##### What technique did you use to handle the imbalance dataset and why? (If needed to be balanced)

Answer Here.

## ***6. ML Model Implementation***

### ML Model - 1

### ML Model - 2

# **Conclusion**

Write the conclusion here.