# OVERVIEW

 This comprehensive dataset captures customer satisfaction scores over a one-month period at the Shopzilla e-commerce platform, a pseudonymous entity. With 85,907 rows and 20 columns, this dataset provides a rich source for conducting Exploratory Data Analysis (EDA), Visualization, and Machine Learning Classification tasks. 

# Purpose

This dataset serves as a robust resource for evaluating customer service performance, forecasting satisfaction levels, and conducting customer behavior analysis within the e-commerce sector. The information contained in the dataset includes crucial features such as channel name, order details, customer feedback, agent information, and, most importantly, Customer Satisfaction (CSAT) scores.

In [1]:
data=pd.read_csv('Customer_support_data.csv')

<IPython.core.display.Javascript object>

In [2]:
data.head()

Unnamed: 0,Unique id,channel_name,category,Sub-category,Customer Remarks,Order_id,order_date_time,Issue_reported at,issue_responded,Survey_response_Date,Customer_City,Product_category,Item_price,connected_handling_time,Agent_name,Supervisor,Manager,Tenure Bucket,Agent Shift,CSAT Score
0,7e9ae164-6a8b-4521-a2d4-58f7c9fff13f,Outcall,Product Queries,Life Insurance,,c27c9bb4-fa36-4140-9f1f-21009254ffdb,,01/08/2023 11:13,01/08/2023 11:47,01-Aug-23,,,,,Richard Buchanan,Mason Gupta,Jennifer Nguyen,On Job Training,Morning,5
1,b07ec1b0-f376-43b6-86df-ec03da3b2e16,Outcall,Product Queries,Product Specific Information,,d406b0c7-ce17-4654-b9de-f08d421254bd,,01/08/2023 12:52,01/08/2023 12:54,01-Aug-23,,,,,Vicki Collins,Dylan Kim,Michael Lee,>90,Morning,5
2,200814dd-27c7-4149-ba2b-bd3af3092880,Inbound,Order Related,Installation/demo,,c273368d-b961-44cb-beaf-62d6fd6c00d5,,01/08/2023 20:16,01/08/2023 20:38,01-Aug-23,,,,,Duane Norman,Jackson Park,William Kim,On Job Training,Evening,5
3,eb0d3e53-c1ca-42d3-8486-e42c8d622135,Inbound,Returns,Reverse Pickup Enquiry,,5aed0059-55a4-4ec6-bb54-97942092020a,,01/08/2023 20:56,01/08/2023 21:16,01-Aug-23,,,,,Patrick Flores,Olivia Wang,John Smith,>90,Evening,5
4,ba903143-1e54-406c-b969-46c52f92e5df,Inbound,Cancellation,Not Needed,,e8bed5a9-6933-4aff-9dc6-ccefd7dcde59,,01/08/2023 10:30,01/08/2023 10:32,01-Aug-23,,,,,Christopher Sanchez,Austin Johnson,Michael Lee,0-30,Morning,5


In [3]:
data.tail()

Unnamed: 0,Unique id,channel_name,category,Sub-category,Customer Remarks,Order_id,order_date_time,Issue_reported at,issue_responded,Survey_response_Date,Customer_City,Product_category,Item_price,connected_handling_time,Agent_name,Supervisor,Manager,Tenure Bucket,Agent Shift,CSAT Score
85902,505ea5e7-c475-4fac-ac36-1d19a4cb610f,Inbound,Refund Related,Refund Enquiry,,1b5a2b9c-a95f-405f-a42e-5b1b693f3dc9,,30/08/2023 23:20,31/08/2023 07:22,31-Aug-23,,,,,Brandon Leon,Ethan Tan,William Kim,On Job Training,Morning,4
85903,44b38d3f-1523-4182-aba2-72917586647c,Inbound,Order Related,Seller Cancelled Order,Supported team customer executive good,d0e8a817-96d5-4ace-bb82-adec50398e22,,31/08/2023 08:15,31/08/2023 08:17,31-Aug-23,,,,,Linda Foster,Noah Patel,Emily Chen,>90,Morning,5
85904,723bce2c-496c-4aa8-a64b-ca17004528f0,Inbound,Order Related,Order status enquiry,need to improve with proper details.,bdefe788-ccec-4eda-8ca4-51045e68db8a,,31/08/2023 18:57,31/08/2023 19:02,31-Aug-23,,,,,Kimberly Martinez,Aiden Patel,Olivia Tan,On Job Training,Evening,5
85905,707528ee-6873-4192-bfa9-a491f1c08ab5,Inbound,Feedback,UnProfessional Behaviour,,a031ec28-0c5e-450e-95b2-592342c40bc4,,31/08/2023 19:59,31/08/2023 20:00,31-Aug-23,,,,,Daniel Martin,Olivia Suzuki,Olivia Tan,>90,Morning,4
85906,07c7a878-0d5a-42e0-97ef-de59abec0238,Inbound,Returns,Reverse Pickup Enquiry,,3230db30-f8da-4c44-8636-ec76d1d3d4f3,,31/08/2023 23:36,31/08/2023 23:37,31-Aug-23,,,,,Elizabeth Guerra,Nathan Patel,Jennifer Nguyen,On Job Training,Evening,5


In [4]:
data.isna().sum()

Unique id                      0
channel_name                   0
category                       0
Sub-category                   0
Customer Remarks           57165
Order_id                   18232
order_date_time            68693
Issue_reported at              0
issue_responded                0
Survey_response_Date           0
Customer_City              68828
Product_category           68711
Item_price                 68701
connected_handling_time    85665
Agent_name                     0
Supervisor                     0
Manager                        0
Tenure Bucket                  0
Agent Shift                    0
CSAT Score                     0
dtype: int64

In [5]:
data.shape

(85907, 20)

In [6]:
data['CSAT Score'].value_counts()

CSAT Score
5    59617
1    11230
4    11219
3     2558
2     1283
Name: count, dtype: int64

In [14]:
df=data.copy()

In [16]:
# Drop unnecessary columns
df_cleaned = df.drop(["Customer Remarks", "Order_id", "order_date_time"], axis=1)

# Impute missing values for numerical features
df_cleaned["Item_price"].fillna(df_cleaned["Item_price"].median(), inplace=True)
df_cleaned["connected_handling_time"].fillna(df_cleaned["connected_handling_time"].median(), inplace=True)

# Impute missing values for categorical features
df_cleaned["Customer_City"].fillna("Unknown", inplace=True)
df_cleaned["Product_category"].fillna("Unknown", inplace=True)


# Convert timestamp columns to datetime format
timestamp_columns = ["Issue_reported at", "issue_responded", "Survey_response_Date"]
for column in timestamp_columns:
    df_cleaned[column] = pd.to_datetime(df_cleaned[column], errors='coerce')
    
# Impute missing values for timestamp features
for column in ["Issue_reported at", "issue_responded"]:
    df_cleaned[column].fillna(df_cleaned[column].median(), inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_cleaned["Item_price"].fillna(df_cleaned["Item_price"].median(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_cleaned["connected_handling_time"].fillna(df_cleaned["connected_handling_time"].median(), inplace=True)
The behavior will change in pandas 3.0. This

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

  df_cleaned[column] = pd.to_datetime(df_cleaned[column], errors='coerce')
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_cleaned[column].fillna(df_cleaned[column].median(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_cleaned[column].fillna(df_cleaned[column].median(), inplace=True)


In [20]:
df_cleaned.isna().sum()

Unique id                  0
channel_name               0
category                   0
Sub-category               0
Issue_reported at          0
issue_responded            0
Survey_response_Date       0
Customer_City              0
Product_category           0
Item_price                 0
connected_handling_time    0
Agent_name                 0
Supervisor                 0
Manager                    0
Tenure Bucket              0
Agent Shift                0
CSAT Score                 0
dtype: int64

In [26]:
df_cleaned.dtypes

Unique id                          object
channel_name                       object
category                           object
Sub-category                       object
Issue_reported at          datetime64[ns]
issue_responded            datetime64[ns]
Survey_response_Date       datetime64[ns]
Customer_City                      object
Product_category                   object
Item_price                        float64
connected_handling_time           float64
Agent_name                         object
Supervisor                         object
Manager                            object
Tenure Bucket                      object
Agent Shift                        object
CSAT Score                          int64
dtype: object

array([   nan,   434.,  1299., ..., 27995.,  4579.,  1629.])