# Customer Analytics

### Customer analytics is a process by which data from customer behavior is used to help make key business decisions via market segmentation and predictive analytics. This information is used by businesses for direct marketing, site selection, and customer relationship management.

## Case study Background

#### An International E-Commerce company(Electronic goods) wants to use some of the most advanced machine learning techniques to analyse their customers with respect to their services and some important customer success matrix.

#### They also have future expansion plans to India.

#### They have some specific key insights to be found out from their existing customer database.

#### • The company operates in various states of USA, but the customer data is only specific to one state in the USA.
#### • The warehouse is located on the eastern part whereas, the state to which shipments are delivered is at the western part of USA.

## Problem Statement 1

#### As a Data Scientist, they want you to build a model to predict if the shipments are or will reach on time or not.

#### For this, they want you to use various Logit/Probabilistic techniques with the most accurate model. The main models they want you to build and compare the accuracy are:

#### 1. Logistic Regression
#### 2. Support Vector Machines
#### 3. Random Forest
#### 4. XgBoost or any other boosting technique.

### Data Information : (Important)

##### ID - Id number of the customer
##### Warehouse_block - The company has a big warehouse which is divided in various blocks such as A,B,C,D and so on.
##### Mode_of_Shipment - The company ships the products by different modes of transport such as ship, air and road
##### Customer_care_calls - this variable indicates the number of calls made for enquiry of the shipment.(Sometimes customer make too many calls, hence the company wants to know that are these customers unknowingly favoured.
##### Customer_rating - The company has rated every customer on various parameters, 1 being the lowest (Worst), 5 being highest (Best)
##### Cost_of_the_Product - It is the cost of the product in USD
##### Prior_purchases - This variable indicates the number of prior purchases
##### Product_importance - The company has categorised the products in the range of high, medium and low based on various parameters
##### Gender - Male or female
##### Discount_offered - it is the percentage of discount offered on that specific product.
##### Weight_in_gms - It is the weight in grams
##### Reached.on.Time_Y.N - It is the Y variable, where 1 Indicates that the product has NOT reached on time and 0 indicates it has reached on time

In [2]:
#Find the path of the current folder
pwd

'C:\\Users\\BLAZIN\\Python Projects\\Customer Analytics'

In [4]:
# Change the path
import os
os.chdir('E:\\IMS Course Content\\Course Content\\Data Science Term 3\\Project')

In [5]:
#import numpy and pandas
import numpy as np
import pandas as pd

In [22]:
# Load the Train and test csv
df_train = pd.read_csv('Customer_Train.csv', index_col=None)
df_test = pd.read_csv('Customer_Test.csv', index_col=None)

In [10]:
combine = [df_train, df_test]

#### Exploratory Data Analysis

In [7]:
df_train.head()

Unnamed: 0,ID,Warehouse_block,Mode_of_Shipment,Customer_care_calls,Customer_rating,Cost_of_the_Product,Prior_purchases,Product_importance,Gender,Discount_offered,Weight_in_gms,Reached.on.Time_Y.N
0,1,D,Flight,4,2,177,3,low,F,44,1233,1
1,2,F,Flight,4,5,216,2,low,M,59,3088,1
2,3,A,Flight,2,2,183,4,low,M,48,3374,1
3,4,B,Flight,3,3,176,4,medium,M,10,1177,1
4,5,C,Flight,2,2,184,3,medium,F,46,2484,1


In [9]:
df_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10999 entries, 0 to 10998
Data columns (total 12 columns):
ID                     10999 non-null int64
Warehouse_block        10999 non-null object
Mode_of_Shipment       10999 non-null object
Customer_care_calls    10999 non-null int64
Customer_rating        10999 non-null int64
Cost_of_the_Product    10999 non-null int64
Prior_purchases        10999 non-null int64
Product_importance     10999 non-null object
Gender                 10999 non-null object
Discount_offered       10999 non-null int64
Weight_in_gms          10999 non-null int64
Reached.on.Time_Y.N    10999 non-null int64
dtypes: int64(8), object(4)
memory usage: 1.0+ MB


In [14]:
df_train.describe()

Unnamed: 0,ID,Customer_care_calls,Customer_rating,Cost_of_the_Product,Prior_purchases,Discount_offered,Weight_in_gms,Reached.on.Time_Y.N
count,10999.0,10999.0,10999.0,10999.0,10999.0,10999.0,10999.0,10999.0
mean,5500.0,4.054459,2.990545,210.196836,3.567597,13.373216,3634.016729,0.596691
std,3175.28214,1.14149,1.413603,48.063272,1.52286,16.205527,1635.377251,0.490584
min,1.0,2.0,1.0,96.0,2.0,1.0,1001.0,0.0
25%,2750.5,3.0,2.0,169.0,3.0,4.0,1839.5,0.0
50%,5500.0,4.0,3.0,214.0,3.0,7.0,4149.0,1.0
75%,8249.5,5.0,4.0,251.0,4.0,10.0,5050.0,1.0
max,10999.0,7.0,5.0,310.0,10.0,65.0,7846.0,1.0


In [12]:
df_train.shape

(10999, 12)

In [13]:
df_test.shape

(3993, 12)

##### Cleaning the Category

In [23]:
#Dropping the ID Column
df_train = df_train.drop(columns='ID', axis=1)

df_train.head()

Unnamed: 0,Warehouse_block,Mode_of_Shipment,Customer_care_calls,Customer_rating,Cost_of_the_Product,Prior_purchases,Product_importance,Gender,Discount_offered,Weight_in_gms,Reached.on.Time_Y.N
0,D,Flight,4,2,177,3,low,F,44,1233,1
1,F,Flight,4,5,216,2,low,M,59,3088,1
2,A,Flight,2,2,183,4,low,M,48,3374,1
3,B,Flight,3,3,176,4,medium,M,10,1177,1
4,C,Flight,2,2,184,3,medium,F,46,2484,1


In [24]:
#Rename the columns
df_train = df_train.rename(index=str, columns={"Warehouse_block":"Warehouse", "Mode_of_Shipment":"Mode", "Customer_care_calls":"Cust_Calls", 
                                               "Customer_rating":"Cust_Rating", "Cost_of_the_Product":"Prod_Cost", "Prior_purchases":"Prior_Purchase",
                                              "Product_importance":"Prod_Imp", "Discount_offered":"Discount", "Weight_in_gms":"Weight", 
                                               "Reached.on.Time_Y.N":"Reached", "Gender":"Sex"})

df_train.head()

Unnamed: 0,Warehouse,Mode,Cust_Calls,Cust_Rating,Prod_Cost,Prior_Purchase,Prod_Imp,Sex,Discount,Weight,Reached
0,D,Flight,4,2,177,3,low,F,44,1233,1
1,F,Flight,4,5,216,2,low,M,59,3088,1
2,A,Flight,2,2,183,4,low,M,48,3374,1
3,B,Flight,3,3,176,4,medium,M,10,1177,1
4,C,Flight,2,2,184,3,medium,F,46,2484,1


In [26]:
#To capitalize the fist character in the Product Importance column
df_train['Prod_Imp'] = df_train['Prod_Imp'].str.capitalize()

df_train.head()

Unnamed: 0,Warehouse,Mode,Cust_Calls,Cust_Rating,Prod_Cost,Prior_Purchase,Prod_Imp,Sex,Discount,Weight,Reached
0,D,Flight,4,2,177,3,Low,F,44,1233,1
1,F,Flight,4,5,216,2,Low,M,59,3088,1
2,A,Flight,2,2,183,4,Low,M,48,3374,1
3,B,Flight,3,3,176,4,Medium,M,10,1177,1
4,C,Flight,2,2,184,3,Medium,F,46,2484,1


In [31]:
#To change the value of the Reached on Time column to Y and N

df_train['Reached'] = df_train['Reached'].replace({1: 'N', 0: 'Y'})

df_train.head()

Unnamed: 0,Warehouse,Mode,Cust_Calls,Cust_Rating,Prod_Cost,Prior_Purchase,Prod_Imp,Sex,Discount,Weight,Reached
0,D,Flight,4,2,177,3,Low,F,44,1233,N
1,F,Flight,4,5,216,2,Low,M,59,3088,N
2,A,Flight,2,2,183,4,Low,M,48,3374,N
3,B,Flight,3,3,176,4,Medium,M,10,1177,N
4,C,Flight,2,2,184,3,Medium,F,46,2484,N


##### Data Insight and Visualisation

In [32]:
#Counting no of Product which has reached
df_train['Reached'].value_counts()

N    6563
Y    4436
Name: Reached, dtype: int64

In [34]:
#Counting no of Warehouse the product has been dispatched
df_train['Warehouse'].value_counts()

F    3666
D    1834
B    1833
A    1833
C    1833
Name: Warehouse, dtype: int64

In [35]:
#Counting no of Mode of transport used for delivering the product
df_train['Mode'].value_counts()

Ship      7462
Flight    1777
Road      1760
Name: Mode, dtype: int64

In [36]:
#Counting no of Calls the Customer for the product
df_train['Cust_Calls'].value_counts()

4    3557
3    3217
5    2328
6    1013
2     638
7     246
Name: Cust_Calls, dtype: int64

In [37]:
#Counting no of Rating given for the customer
df_train['Cust_Rating'].value_counts()

3    2239
1    2235
4    2189
5    2171
2    2165
Name: Cust_Rating, dtype: int64

In [38]:
#Counting no of Prior Purchase done by the customer
df_train['Prior_Purchase'].value_counts()

3     3955
2     2599
4     2155
5     1287
6      561
10     178
7      136
8      128
Name: Prior_Purchase, dtype: int64

In [39]:
#Counting no of Importance for the Product
df_train['Prod_Imp'].value_counts()

Low       5297
Medium    4754
High       948
Name: Prod_Imp, dtype: int64

In [40]:
#Counting no of Gender of the Custome
df_train['Sex'].value_counts()

F    5545
M    5454
Name: Sex, dtype: int64