# eCommerce Customer Service Satisfaction

## About Dataset
The dataset captures customer satisfaction scores for a one-month period at an e-commerce platform called Shopzilla (a pseudonym). It includes various features such as category and sub-category of interaction, customer remarks, survey response date, category, item price, agent details (name, supervisor, manager), and CSAT score etc.

Note: Please be advised that the authentic information has been obfuscated, and the dataset has been fabricated using the Faker library to ensure the concealment of genuine details

**Data Source:** https://www.kaggle.com/datasets/ddosad/ecommerce-customer-service-satisfaction?resource=download 

### Problem Statement
Shopzilla is facing high cost for customer acquisition, which affects it’s profit. Study shows that customer lifetime value is directly proportional to the retention rate. A 5% increase in customer retention can produces more than 25% increase in profit. The company is currently working to improve customer service quality, enhance overall customer satisfaction, and increase customer retention.

### Objectives
1. Identify key drivers of customer satisfaction by using decision tree
2. Conduct a time-series analysis to observe trends and patterns in customer satisfaction over the one-month period.
3. Enhance accuracy of model such that factors influencing both bad and good experience can be identified such that 
they may be rectified and reinforced
4. Classify CSAT score
5. Complete capstone by deadline

## Import Libraries

In [1]:
# Import libraries.
import pandas as pd
import numpy as np

## Read in Data

In [2]:
df = pd.read_csv('Customer_support_data.csv')
df.columns = df.columns.str.replace(' ', '_') # Rename header EX: Unique id to Unique_id

In [3]:
df.head(3)

Unnamed: 0,Unique_id,channel_name,category,Sub-category,Customer_Remarks,Order_id,order_date_time,Issue_reported_at,issue_responded,Survey_response_Date,Customer_City,Product_category,Item_price,connected_handling_time,Agent_name,Supervisor,Manager,Tenure_Bucket,Agent_Shift,CSAT_Score
0,7e9ae164-6a8b-4521-a2d4-58f7c9fff13f,Outcall,Product Queries,Life Insurance,,c27c9bb4-fa36-4140-9f1f-21009254ffdb,,01/08/2023 11:13,01/08/2023 11:47,01-Aug-23,,,,,Richard Buchanan,Mason Gupta,Jennifer Nguyen,On Job Training,Morning,5
1,b07ec1b0-f376-43b6-86df-ec03da3b2e16,Outcall,Product Queries,Product Specific Information,,d406b0c7-ce17-4654-b9de-f08d421254bd,,01/08/2023 12:52,01/08/2023 12:54,01-Aug-23,,,,,Vicki Collins,Dylan Kim,Michael Lee,>90,Morning,5
2,200814dd-27c7-4149-ba2b-bd3af3092880,Inbound,Order Related,Installation/demo,,c273368d-b961-44cb-beaf-62d6fd6c00d5,,01/08/2023 20:16,01/08/2023 20:38,01-Aug-23,,,,,Duane Norman,Jackson Park,William Kim,On Job Training,Evening,5


In [4]:
def summary(df):
    summy = pd.DataFrame(df.dtypes, columns=['data type'])
    summy['isNull'] = df.isnull().sum().values
    summy['Duplicate'] = df.duplicated().sum()
    summy['#unique'] = df.nunique().values
    return summy


print(df.shape)
summary(df)

(85907, 20)


Unnamed: 0,data type,isNull,Duplicate,#unique
Unique_id,object,0,0,85907
channel_name,object,0,0,3
category,object,0,0,12
Sub-category,object,0,0,57
Customer_Remarks,object,57165,0,18231
Order_id,object,18232,0,67675
order_date_time,object,68693,0,13766
Issue_reported_at,object,0,0,30923
issue_responded,object,0,0,30262
Survey_response_Date,object,0,0,31


In [5]:
df1 = df['CSAT_Score'].value_counts()

In [6]:
df1

CSAT_Score
5    59617
1    11230
4    11219
3     2558
2     1283
Name: count, dtype: int64

There are large number of null values in `Customer_Remarks`, `Order_id`, `order_date_time`, `Customer_City`, `Product_category`,	`Item_price`, and `connected_handling_time` columns. So, we'll check for correlation to see if we may drop them. 

Before that, we will first drop the `Unique_id` column as it is not neccesary for modeling and it will take up much computing power when we dummify categorical variables. (`Unique_id` have 85907 unique vallue)

In [7]:
df.drop(columns = ['Unique_id', 'Customer_Remarks', 'Order_id'], inplace = True)
df.head(1)

Unnamed: 0,channel_name,category,Sub-category,Customer_Remarks,Order_id,order_date_time,Issue_reported_at,issue_responded,Survey_response_Date,Customer_City,Product_category,Item_price,connected_handling_time,Agent_name,Supervisor,Manager,Tenure_Bucket,Agent_Shift,CSAT_Score
0,Outcall,Product Queries,Life Insurance,,c27c9bb4-fa36-4140-9f1f-21009254ffdb,,01/08/2023 11:13,01/08/2023 11:47,01-Aug-23,,,,,Richard Buchanan,Mason Gupta,Jennifer Nguyen,On Job Training,Morning,5


In [12]:
# Identify categorical and dummify them
categorical_cols = df.select_dtypes(include=['object']).columns.tolist()

df = pd.get_dummies(columns=categorical_cols, drop_first=True, data=df)

In [14]:
df.head(3)

Unnamed: 0,Item_price,connected_handling_time,CSAT_Score,channel_name_Inbound,channel_name_Outcall,category_Cancellation,category_Feedback,category_Offers & Cashback,category_Onboarding related,category_Order Related,...,Manager_Olivia Tan,Manager_William Kim,Tenure_Bucket_31-60,Tenure_Bucket_61-90,Tenure_Bucket_>90,Tenure_Bucket_On Job Training,Agent_Shift_Evening,Agent_Shift_Morning,Agent_Shift_Night,Agent_Shift_Split
0,,,5,False,True,False,False,False,False,False,...,False,False,False,False,False,True,False,True,False,False
1,,,5,False,True,False,False,False,False,False,...,False,False,False,False,True,False,False,True,False,False
2,,,5,True,False,False,False,False,False,True,...,False,True,False,False,False,True,True,False,False,False


In [15]:
df.shape

(85907, 164165)

In [11]:
df.corr()

ValueError: could not convert string to float: 'Outcall'