### HackerEarth Machine Learning Challenge: How to not Lose Customer in 10 Days

Link: https://www.hackerearth.com/challenges/competitive/hackerearth-machine-learning-challenge-predict-customer-churn/

**Problem Statement**: No business can thrive without it’s customers. On the flip side, customers leaving the business is a nightmare that every business owner dreads!

In fact, one of the key metrics to measure a business’ success is by measuring its customer churn rate - the lower the churn, the more loved the company is. 

Typically, every user of a product or a service is assigned a prediction value that estimates their state of churn at any given time. This value may be based on multiple factors such as the user’s demographic, their browsing behavior and historical purchase data, among other details.

This value factors in unique and proprietary predictions of how long a user will remain a customer and is updated every day for all users who have purchased at least one of the products/services. The values assigned are between 1 and 5.

**Task**: An up-and-coming startup is keen on reducing its customer churn and has hired you as a Machine Learning engineer for this task. As an expert, you are required to build a sophisticated Machine Learning model that predicts the churn score for a website based on multiple features.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [None]:
dataset = pd.read_csv("/content/drive/MyDrive/dataset/train.csv")

In [None]:
dataset.head()

Unnamed: 0,customer_id,Name,age,gender,security_no,region_category,membership_category,joining_date,joined_through_referral,referral_id,preferred_offer_types,medium_of_operation,internet_option,last_visit_time,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
0,fffe4300490044003600300030003800,Pattie Morrisey,18,F,XW0DQ7H,Village,Platinum Membership,17-08-2017,No,xxxxxxxx,Gift Vouchers/Coupons,?,Wi-Fi,16.08.02,17,300.63,53005.25,17,781.75,Yes,Yes,No,Not Applicable,Products always in Stock,2
1,fffe43004900440032003100300035003700,Traci Peery,32,F,5K0N3X1,City,Premium Membership,28-08-2017,?,CID21329,Gift Vouchers/Coupons,Desktop,Mobile_Data,12.38.13,16,306.34,12838.38,10,,Yes,No,Yes,Solved,Quality Customer Care,1
2,fffe4300490044003100390032003600,Merideth Mcmeen,44,F,1F2TCL3,Town,No Membership,11-11-2016,Yes,CID12313,Gift Vouchers/Coupons,Desktop,Wi-Fi,22.53.21,14,516.16,21027.0,22,500.69,No,Yes,Yes,Solved in Follow-up,Poor Website,5
3,fffe43004900440036003000330031003600,Eufemia Cardwell,37,M,VJGJ33N,City,No Membership,29-10-2016,Yes,CID3793,Gift Vouchers/Coupons,Desktop,Mobile_Data,15.57.50,11,53.27,25239.56,6,567.66,No,Yes,Yes,Unsolved,Poor Website,5
4,fffe43004900440031003900350030003600,Meghan Kosak,31,F,SVZXCWB,City,No Membership,12-09-2017,No,xxxxxxxx,Credit/Debit Card Offers,Smartphone,Mobile_Data,15.46.44,20,113.13,24483.66,16,663.06,No,Yes,Yes,Solved,Poor Website,5


In [None]:
len(dataset)

36992

In [None]:
dataset.shape

(36992, 25)

In [None]:
dataset.describe()

Unnamed: 0,age,days_since_last_login,avg_time_spent,avg_transaction_value,points_in_wallet,churn_risk_score
count,36992.0,36992.0,36992.0,36992.0,33549.0,36992.0
mean,37.118161,-41.915576,243.472334,29271.194003,686.882199,3.463397
std,15.867412,228.8199,398.289149,19444.806226,194.063624,1.409661
min,10.0,-999.0,-2814.10911,800.46,-760.661236,-1.0
25%,23.0,8.0,60.1025,14177.54,616.15,3.0
50%,37.0,12.0,161.765,27554.485,697.62,4.0
75%,51.0,16.0,356.515,40855.11,763.95,5.0
max,64.0,26.0,3235.578521,99914.05,2069.069761,5.0


In [None]:
dataset.churn_risk_score.value_counts()

 3    10424
 4    10185
 5     9827
 2     2741
 1     2652
-1     1163
Name: churn_risk_score, dtype: int64

**Data Cleaning**

In [None]:
# Find the Null Values
dataset.isnull().sum()

customer_id                        0
Name                               0
age                                0
gender                             0
security_no                        0
region_category                 5428
membership_category                0
joining_date                       0
joined_through_referral            0
referral_id                        0
preferred_offer_types            288
medium_of_operation                0
internet_option                    0
last_visit_time                    0
days_since_last_login              0
avg_time_spent                     0
avg_transaction_value              0
avg_frequency_login_days           0
points_in_wallet                3443
used_special_discount              0
offer_application_preference       0
past_complaint                     0
complaint_status                   0
feedback                           0
churn_risk_score                   0
dtype: int64

In [None]:
dataset.shape

(36992, 25)

In [None]:
# Fill the NA values rather than dropping it
for i in dataset.columns:
    if dataset[i].dtype=='float64':
        dataset[i]=dataset[i].fillna(dataset[i].mean())
    else:
        dataset[i]=dataset[i].fillna(method='ffill')

In [None]:
dataset.shape

(36992, 25)

In [None]:
dataset.isnull().sum()

customer_id                     0
Name                            0
age                             0
gender                          0
security_no                     0
region_category                 0
membership_category             0
joining_date                    0
joined_through_referral         0
referral_id                     0
preferred_offer_types           0
medium_of_operation             0
internet_option                 0
last_visit_time                 0
days_since_last_login           0
avg_time_spent                  0
avg_transaction_value           0
avg_frequency_login_days        0
points_in_wallet                0
used_special_discount           0
offer_application_preference    0
past_complaint                  0
complaint_status                0
feedback                        0
churn_risk_score                0
dtype: int64

In [None]:
# Dropping the unimportant features of table
dataset.head()

Unnamed: 0,customer_id,Name,age,gender,security_no,region_category,membership_category,joining_date,joined_through_referral,referral_id,preferred_offer_types,medium_of_operation,internet_option,last_visit_time,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
0,fffe4300490044003600300030003800,Pattie Morrisey,18,F,XW0DQ7H,Village,Platinum Membership,17-08-2017,No,xxxxxxxx,Gift Vouchers/Coupons,?,Wi-Fi,16.08.02,17,300.63,53005.25,17,781.75,Yes,Yes,No,Not Applicable,Products always in Stock,2
1,fffe43004900440032003100300035003700,Traci Peery,32,F,5K0N3X1,City,Premium Membership,28-08-2017,?,CID21329,Gift Vouchers/Coupons,Desktop,Mobile_Data,12.38.13,16,306.34,12838.38,10,686.882199,Yes,No,Yes,Solved,Quality Customer Care,1
2,fffe4300490044003100390032003600,Merideth Mcmeen,44,F,1F2TCL3,Town,No Membership,11-11-2016,Yes,CID12313,Gift Vouchers/Coupons,Desktop,Wi-Fi,22.53.21,14,516.16,21027.0,22,500.69,No,Yes,Yes,Solved in Follow-up,Poor Website,5
3,fffe43004900440036003000330031003600,Eufemia Cardwell,37,M,VJGJ33N,City,No Membership,29-10-2016,Yes,CID3793,Gift Vouchers/Coupons,Desktop,Mobile_Data,15.57.50,11,53.27,25239.56,6,567.66,No,Yes,Yes,Unsolved,Poor Website,5
4,fffe43004900440031003900350030003600,Meghan Kosak,31,F,SVZXCWB,City,No Membership,12-09-2017,No,xxxxxxxx,Credit/Debit Card Offers,Smartphone,Mobile_Data,15.46.44,20,113.13,24483.66,16,663.06,No,Yes,Yes,Solved,Poor Website,5


In [None]:
def columnGetter():
  column_names = []
  for cols in dataset.columns:
    column_names.append(cols)
  return column_names

In [None]:
cols_name = columnGetter()

In [None]:
for i in range(0,len(cols_name)):
  print(cols_name[i])

customer_id
Name
age
gender
security_no
region_category
membership_category
joining_date
joined_through_referral
referral_id
preferred_offer_types
medium_of_operation
internet_option
last_visit_time
days_since_last_login
avg_time_spent
avg_transaction_value
avg_frequency_login_days
points_in_wallet
used_special_discount
offer_application_preference
past_complaint
complaint_status
feedback
churn_risk_score


### Features Selections and Dropping Other Features

In [None]:
dataset.head()

Unnamed: 0,customer_id,Name,age,gender,security_no,region_category,membership_category,joining_date,joined_through_referral,referral_id,preferred_offer_types,medium_of_operation,internet_option,last_visit_time,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
0,fffe4300490044003600300030003800,Pattie Morrisey,18,F,XW0DQ7H,Village,Platinum Membership,17-08-2017,No,xxxxxxxx,Gift Vouchers/Coupons,?,Wi-Fi,16.08.02,17,300.63,53005.25,17,781.75,Yes,Yes,No,Not Applicable,Products always in Stock,2
1,fffe43004900440032003100300035003700,Traci Peery,32,F,5K0N3X1,City,Premium Membership,28-08-2017,?,CID21329,Gift Vouchers/Coupons,Desktop,Mobile_Data,12.38.13,16,306.34,12838.38,10,686.882199,Yes,No,Yes,Solved,Quality Customer Care,1
2,fffe4300490044003100390032003600,Merideth Mcmeen,44,F,1F2TCL3,Town,No Membership,11-11-2016,Yes,CID12313,Gift Vouchers/Coupons,Desktop,Wi-Fi,22.53.21,14,516.16,21027.0,22,500.69,No,Yes,Yes,Solved in Follow-up,Poor Website,5
3,fffe43004900440036003000330031003600,Eufemia Cardwell,37,M,VJGJ33N,City,No Membership,29-10-2016,Yes,CID3793,Gift Vouchers/Coupons,Desktop,Mobile_Data,15.57.50,11,53.27,25239.56,6,567.66,No,Yes,Yes,Unsolved,Poor Website,5
4,fffe43004900440031003900350030003600,Meghan Kosak,31,F,SVZXCWB,City,No Membership,12-09-2017,No,xxxxxxxx,Credit/Debit Card Offers,Smartphone,Mobile_Data,15.46.44,20,113.13,24483.66,16,663.06,No,Yes,Yes,Solved,Poor Website,5


In [None]:
dataset = dataset.drop(["customer_id","Name","security_no","referral_id","last_visit_time"],axis=1)

In [None]:
dataset.head()

Unnamed: 0,age,gender,region_category,membership_category,joining_date,joined_through_referral,preferred_offer_types,medium_of_operation,internet_option,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
0,18,F,Village,Platinum Membership,17-08-2017,No,Gift Vouchers/Coupons,?,Wi-Fi,17,300.63,53005.25,17,781.75,Yes,Yes,No,Not Applicable,Products always in Stock,2
1,32,F,City,Premium Membership,28-08-2017,?,Gift Vouchers/Coupons,Desktop,Mobile_Data,16,306.34,12838.38,10,686.882199,Yes,No,Yes,Solved,Quality Customer Care,1
2,44,F,Town,No Membership,11-11-2016,Yes,Gift Vouchers/Coupons,Desktop,Wi-Fi,14,516.16,21027.0,22,500.69,No,Yes,Yes,Solved in Follow-up,Poor Website,5
3,37,M,City,No Membership,29-10-2016,Yes,Gift Vouchers/Coupons,Desktop,Mobile_Data,11,53.27,25239.56,6,567.66,No,Yes,Yes,Unsolved,Poor Website,5
4,31,F,City,No Membership,12-09-2017,No,Credit/Debit Card Offers,Smartphone,Mobile_Data,20,113.13,24483.66,16,663.06,No,Yes,Yes,Solved,Poor Website,5


In [None]:
dataset.isnull().sum()

age                             0
gender                          0
region_category                 0
membership_category             0
joining_date                    0
joined_through_referral         0
preferred_offer_types           0
medium_of_operation             0
internet_option                 0
days_since_last_login           0
avg_time_spent                  0
avg_transaction_value           0
avg_frequency_login_days        0
points_in_wallet                0
used_special_discount           0
offer_application_preference    0
past_complaint                  0
complaint_status                0
feedback                        0
churn_risk_score                0
dtype: int64

In [None]:
dataset.shape

(36992, 20)

#### Gender

In [None]:
dataset.gender.unique()

array(['F', 'M', 'Unknown'], dtype=object)

In [None]:
dataset.gender.value_counts()

F          18490
M          18443
Unknown       59
Name: gender, dtype: int64

In [None]:
from sklearn.preprocessing import LabelEncoder

In [None]:
le = LabelEncoder()

In [None]:
dataset.head()

Unnamed: 0,age,gender,region_category,membership_category,joining_date,joined_through_referral,preferred_offer_types,medium_of_operation,internet_option,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
0,18,F,Village,Platinum Membership,17-08-2017,No,Gift Vouchers/Coupons,?,Wi-Fi,17,300.63,53005.25,17,781.75,Yes,Yes,No,Not Applicable,Products always in Stock,2
1,32,F,City,Premium Membership,28-08-2017,?,Gift Vouchers/Coupons,Desktop,Mobile_Data,16,306.34,12838.38,10,686.882199,Yes,No,Yes,Solved,Quality Customer Care,1
2,44,F,Town,No Membership,11-11-2016,Yes,Gift Vouchers/Coupons,Desktop,Wi-Fi,14,516.16,21027.0,22,500.69,No,Yes,Yes,Solved in Follow-up,Poor Website,5
3,37,M,City,No Membership,29-10-2016,Yes,Gift Vouchers/Coupons,Desktop,Mobile_Data,11,53.27,25239.56,6,567.66,No,Yes,Yes,Unsolved,Poor Website,5
4,31,F,City,No Membership,12-09-2017,No,Credit/Debit Card Offers,Smartphone,Mobile_Data,20,113.13,24483.66,16,663.06,No,Yes,Yes,Solved,Poor Website,5


In [None]:
dataset["gender"] = le.fit_transform(dataset.gender)

In [None]:
dataset["gender"]

0        0
1        0
2        0
3        1
4        0
        ..
36987    0
36988    0
36989    0
36990    1
36991    1
Name: gender, Length: 36992, dtype: int64

In [None]:
dataset.head()

Unnamed: 0,age,gender,region_category,membership_category,joining_date,joined_through_referral,preferred_offer_types,medium_of_operation,internet_option,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
0,18,0,Village,Platinum Membership,17-08-2017,No,Gift Vouchers/Coupons,?,Wi-Fi,17,300.63,53005.25,17,781.75,Yes,Yes,No,Not Applicable,Products always in Stock,2
1,32,0,City,Premium Membership,28-08-2017,?,Gift Vouchers/Coupons,Desktop,Mobile_Data,16,306.34,12838.38,10,686.882199,Yes,No,Yes,Solved,Quality Customer Care,1
2,44,0,Town,No Membership,11-11-2016,Yes,Gift Vouchers/Coupons,Desktop,Wi-Fi,14,516.16,21027.0,22,500.69,No,Yes,Yes,Solved in Follow-up,Poor Website,5
3,37,1,City,No Membership,29-10-2016,Yes,Gift Vouchers/Coupons,Desktop,Mobile_Data,11,53.27,25239.56,6,567.66,No,Yes,Yes,Unsolved,Poor Website,5
4,31,0,City,No Membership,12-09-2017,No,Credit/Debit Card Offers,Smartphone,Mobile_Data,20,113.13,24483.66,16,663.06,No,Yes,Yes,Solved,Poor Website,5


#### Region_Category

In [None]:
dataset.region_category.unique()

array(['Village', 'City', 'Town'], dtype=object)

In [None]:
dataset["region_category"] = le.fit_transform(dataset.region_category)

In [None]:
dataset.head(20)

Unnamed: 0,age,gender,region_category,membership_category,joining_date,joined_through_referral,preferred_offer_types,medium_of_operation,internet_option,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
0,18,0,2,Platinum Membership,17-08-2017,No,Gift Vouchers/Coupons,?,Wi-Fi,17,300.63,53005.25,17.0,781.75,Yes,Yes,No,Not Applicable,Products always in Stock,2
1,32,0,0,Premium Membership,28-08-2017,?,Gift Vouchers/Coupons,Desktop,Mobile_Data,16,306.34,12838.38,10.0,686.882199,Yes,No,Yes,Solved,Quality Customer Care,1
2,44,0,1,No Membership,11-11-2016,Yes,Gift Vouchers/Coupons,Desktop,Wi-Fi,14,516.16,21027.0,22.0,500.69,No,Yes,Yes,Solved in Follow-up,Poor Website,5
3,37,1,0,No Membership,29-10-2016,Yes,Gift Vouchers/Coupons,Desktop,Mobile_Data,11,53.27,25239.56,6.0,567.66,No,Yes,Yes,Unsolved,Poor Website,5
4,31,0,0,No Membership,12-09-2017,No,Credit/Debit Card Offers,Smartphone,Mobile_Data,20,113.13,24483.66,16.0,663.06,No,Yes,Yes,Solved,Poor Website,5
5,13,1,0,Gold Membership,08-01-2016,No,Gift Vouchers/Coupons,?,Wi-Fi,23,433.62,13884.77,24.0,722.27,Yes,No,Yes,Unsolved,No reason specified,3
6,21,1,1,Gold Membership,19-03-2015,Yes,Gift Vouchers/Coupons,Desktop,Mobile_Data,10,55.38,8982.5,28.0,756.21,Yes,No,Yes,Solved in Follow-up,No reason specified,3
7,42,1,1,No Membership,12-07-2016,?,Credit/Debit Card Offers,Both,Fiber_Optic,19,429.11,44554.82,24.0,568.08,No,Yes,Yes,Unsolved,Poor Product Quality,5
8,44,1,2,Silver Membership,14-12-2016,No,Without Offers,Smartphone,Fiber_Optic,15,191.07,18362.31,20.0,686.882199,Yes,No,Yes,Solved in Follow-up,Poor Customer Service,3
9,45,0,1,No Membership,30-11-2016,No,Gift Vouchers/Coupons,?,Wi-Fi,10,97.31,19244.16,28.0,706.23,No,Yes,Yes,No Information Available,Poor Customer Service,4


#### Membership Category

In [None]:
dataset["membership_category"].value_counts()

Basic Membership       7724
No Membership          7692
Gold Membership        6795
Silver Membership      5988
Premium Membership     4455
Platinum Membership    4338
Name: membership_category, dtype: int64

In [None]:
dataset.membership_category.unique()

array(['Platinum Membership', 'Premium Membership', 'No Membership',
       'Gold Membership', 'Silver Membership', 'Basic Membership'],
      dtype=object)

In [None]:
dataset["membership_category"] = le.fit_transform(dataset.membership_category)

In [None]:
dataset.head()

Unnamed: 0,age,gender,region_category,membership_category,joining_date,joined_through_referral,preferred_offer_types,medium_of_operation,internet_option,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
0,18,0,2,3,17-08-2017,No,Gift Vouchers/Coupons,?,Wi-Fi,17,300.63,53005.25,17,781.75,Yes,Yes,No,Not Applicable,Products always in Stock,2
1,32,0,0,4,28-08-2017,?,Gift Vouchers/Coupons,Desktop,Mobile_Data,16,306.34,12838.38,10,686.882199,Yes,No,Yes,Solved,Quality Customer Care,1
2,44,0,1,2,11-11-2016,Yes,Gift Vouchers/Coupons,Desktop,Wi-Fi,14,516.16,21027.0,22,500.69,No,Yes,Yes,Solved in Follow-up,Poor Website,5
3,37,1,0,2,29-10-2016,Yes,Gift Vouchers/Coupons,Desktop,Mobile_Data,11,53.27,25239.56,6,567.66,No,Yes,Yes,Unsolved,Poor Website,5
4,31,0,0,2,12-09-2017,No,Credit/Debit Card Offers,Smartphone,Mobile_Data,20,113.13,24483.66,16,663.06,No,Yes,Yes,Solved,Poor Website,5


In [None]:
dataset["membership_category"].value_counts()

0    7724
2    7692
1    6795
5    5988
4    4455
3    4338
Name: membership_category, dtype: int64

#### Joining Date

In [None]:
dataset["joining_date"].value_counts()

02-06-2015    55
04-07-2015    51
21-06-2015    50
03-08-2016    49
26-06-2015    49
              ..
16-03-2016    19
03-06-2016    18
12-09-2015    18
03-07-2017    18
04-03-2015    16
Name: joining_date, Length: 1096, dtype: int64

In [None]:
dataset["joining_date"].unique()

array(['17-08-2017', '28-08-2017', '11-11-2016', ..., '11-12-2017',
       '25-09-2016', '15-04-2017'], dtype=object)

In [None]:
#df = df.replace('-', '', regex=True).astype(int)
dataset["joining_date"] = dataset["joining_date"].replace("-",'',regex=True).astype(int)

In [None]:
dataset.head()

Unnamed: 0,age,gender,region_category,membership_category,joining_date,joined_through_referral,preferred_offer_types,medium_of_operation,internet_option,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
0,18,0,2,3,17082017,No,Gift Vouchers/Coupons,?,Wi-Fi,17,300.63,53005.25,17,781.75,Yes,Yes,No,Not Applicable,Products always in Stock,2
1,32,0,0,4,28082017,?,Gift Vouchers/Coupons,Desktop,Mobile_Data,16,306.34,12838.38,10,686.882199,Yes,No,Yes,Solved,Quality Customer Care,1
2,44,0,1,2,11112016,Yes,Gift Vouchers/Coupons,Desktop,Wi-Fi,14,516.16,21027.0,22,500.69,No,Yes,Yes,Solved in Follow-up,Poor Website,5
3,37,1,0,2,29102016,Yes,Gift Vouchers/Coupons,Desktop,Mobile_Data,11,53.27,25239.56,6,567.66,No,Yes,Yes,Unsolved,Poor Website,5
4,31,0,0,2,12092017,No,Credit/Debit Card Offers,Smartphone,Mobile_Data,20,113.13,24483.66,16,663.06,No,Yes,Yes,Solved,Poor Website,5


#### Joined Through Referral

In [None]:
dataset["joined_through_referral"].value_counts()

No     15839
Yes    15715
?       5438
Name: joined_through_referral, dtype: int64

In [None]:
dataset = dataset[dataset["joined_through_referral"]!="?"]

In [None]:
dataset["joined_through_referral"].value_counts()

No     15839
Yes    15715
Name: joined_through_referral, dtype: int64

In [None]:
dataset["joined_through_referral"] = le.fit_transform(dataset["joined_through_referral"])

In [None]:
dataset.shape

(31554, 20)

In [None]:
dataset["joined_through_referral"].value_counts()

0    15839
1    15715
Name: joined_through_referral, dtype: int64

#### Preferred Offer Types

In [None]:
dataset.head()

Unnamed: 0,age,gender,region_category,membership_category,joining_date,joined_through_referral,preferred_offer_types,medium_of_operation,internet_option,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
0,18,0,2,3,17082017,0,Gift Vouchers/Coupons,?,Wi-Fi,17,300.63,53005.25,17,781.75,Yes,Yes,No,Not Applicable,Products always in Stock,2
2,44,0,1,2,11112016,1,Gift Vouchers/Coupons,Desktop,Wi-Fi,14,516.16,21027.0,22,500.69,No,Yes,Yes,Solved in Follow-up,Poor Website,5
3,37,1,0,2,29102016,1,Gift Vouchers/Coupons,Desktop,Mobile_Data,11,53.27,25239.56,6,567.66,No,Yes,Yes,Unsolved,Poor Website,5
4,31,0,0,2,12092017,0,Credit/Debit Card Offers,Smartphone,Mobile_Data,20,113.13,24483.66,16,663.06,No,Yes,Yes,Solved,Poor Website,5
5,13,1,0,1,8012016,0,Gift Vouchers/Coupons,?,Wi-Fi,23,433.62,13884.77,24,722.27,Yes,No,Yes,Unsolved,No reason specified,3


In [None]:
dataset["preferred_offer_types"].unique()

array(['Gift Vouchers/Coupons', 'Credit/Debit Card Offers',
       'Without Offers'], dtype=object)

In [None]:
dataset["preferred_offer_types"].value_counts()

Gift Vouchers/Coupons       10617
Credit/Debit Card Offers    10554
Without Offers              10383
Name: preferred_offer_types, dtype: int64

In [None]:
dataset["preferred_offer_types"] = le.fit_transform(dataset["preferred_offer_types"])

In [None]:
dataset.head()

Unnamed: 0,age,gender,region_category,membership_category,joining_date,joined_through_referral,preferred_offer_types,medium_of_operation,internet_option,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
0,18,0,2,3,17082017,0,1,?,Wi-Fi,17,300.63,53005.25,17,781.75,Yes,Yes,No,Not Applicable,Products always in Stock,2
2,44,0,1,2,11112016,1,1,Desktop,Wi-Fi,14,516.16,21027.0,22,500.69,No,Yes,Yes,Solved in Follow-up,Poor Website,5
3,37,1,0,2,29102016,1,1,Desktop,Mobile_Data,11,53.27,25239.56,6,567.66,No,Yes,Yes,Unsolved,Poor Website,5
4,31,0,0,2,12092017,0,0,Smartphone,Mobile_Data,20,113.13,24483.66,16,663.06,No,Yes,Yes,Solved,Poor Website,5
5,13,1,0,1,8012016,0,1,?,Wi-Fi,23,433.62,13884.77,24,722.27,Yes,No,Yes,Unsolved,No reason specified,3


#### Medium of Operation

In [None]:
dataset["medium_of_operation"].unique()

array(['?', 'Desktop', 'Smartphone', 'Both'], dtype=object)

In [None]:
dataset = dataset[dataset["medium_of_operation"]!="?"]

In [None]:
dataset["medium_of_operation"].unique()

array(['Desktop', 'Smartphone', 'Both'], dtype=object)

In [None]:
dataset["medium_of_operation"].value_counts()

Desktop       11903
Smartphone    11802
Both           3258
Name: medium_of_operation, dtype: int64

In [None]:
dataset["medium_of_operation"] = le.fit_transform(dataset["medium_of_operation"])

In [None]:
dataset["medium_of_operation"].unique()

array([1, 2, 0])

In [None]:
dataset["medium_of_operation"].value_counts()

1    11903
2    11802
0     3258
Name: medium_of_operation, dtype: int64

#### Internet

In [None]:
dataset.head()

Unnamed: 0,age,gender,region_category,membership_category,joining_date,joined_through_referral,preferred_offer_types,medium_of_operation,internet_option,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
2,44,0,1,2,11112016,1,1,1,Wi-Fi,14,516.16,21027.0,22,500.69,No,Yes,Yes,Solved in Follow-up,Poor Website,5
3,37,1,0,2,29102016,1,1,1,Mobile_Data,11,53.27,25239.56,6,567.66,No,Yes,Yes,Unsolved,Poor Website,5
4,31,0,0,2,12092017,0,0,2,Mobile_Data,20,113.13,24483.66,16,663.06,No,Yes,Yes,Solved,Poor Website,5
6,21,1,1,1,19032015,1,1,1,Mobile_Data,10,55.38,8982.5,28,756.21,Yes,No,Yes,Solved in Follow-up,No reason specified,3
8,44,1,2,5,14122016,0,2,2,Fiber_Optic,15,191.07,18362.31,20,686.882199,Yes,No,Yes,Solved in Follow-up,Poor Customer Service,3


In [None]:
dataset["internet_option"].unique()

array(['Wi-Fi', 'Mobile_Data', 'Fiber_Optic'], dtype=object)

In [None]:
dataset["internet_option"] = le.fit_transform(dataset["internet_option"])

In [None]:
dataset["internet_option"].unique()

array([2, 1, 0])

In [None]:
dataset["internet_option"].value_counts()

1    9012
2    9010
0    8941
Name: internet_option, dtype: int64

#### Used Special Discount

In [None]:
dataset["used_special_discount"].value_counts()

Yes    14840
No     12123
Name: used_special_discount, dtype: int64

In [None]:
dataset["used_special_discount"] = le.fit_transform(dataset["used_special_discount"])

In [None]:
dataset["used_special_discount"].value_counts()

1    14840
0    12123
Name: used_special_discount, dtype: int64

In [None]:
dataset.head()

Unnamed: 0,age,gender,region_category,membership_category,joining_date,joined_through_referral,preferred_offer_types,medium_of_operation,internet_option,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
2,44,0,1,2,11112016,1,1,1,2,14,516.16,21027.0,22,500.69,0,Yes,Yes,Solved in Follow-up,Poor Website,5
3,37,1,0,2,29102016,1,1,1,1,11,53.27,25239.56,6,567.66,0,Yes,Yes,Unsolved,Poor Website,5
4,31,0,0,2,12092017,0,0,2,1,20,113.13,24483.66,16,663.06,0,Yes,Yes,Solved,Poor Website,5
6,21,1,1,1,19032015,1,1,1,1,10,55.38,8982.5,28,756.21,1,No,Yes,Solved in Follow-up,No reason specified,3
8,44,1,2,5,14122016,0,2,2,0,15,191.07,18362.31,20,686.882199,1,No,Yes,Solved in Follow-up,Poor Customer Service,3


#### Offer Application Preference

In [None]:
dataset["offer_application_preference"].value_counts()

Yes    14829
No     12134
Name: offer_application_preference, dtype: int64

In [None]:
dataset["offer_application_preference"] = le.fit_transform(dataset["offer_application_preference"])

In [None]:
dataset["offer_application_preference"].value_counts()

1    14829
0    12134
Name: offer_application_preference, dtype: int64

#### Past Complaint

In [None]:
dataset["past_complaint"].value_counts()

No     13528
Yes    13435
Name: past_complaint, dtype: int64

In [None]:
dataset["past_complaint"] = le.fit_transform(dataset["past_complaint"])

In [None]:
dataset["past_complaint"].value_counts()

0    13528
1    13435
Name: past_complaint, dtype: int64

#### Complaint Status

In [None]:
dataset["complaint_status"].value_counts()

Not Applicable              13528
Unsolved                     3416
Solved                       3397
Solved in Follow-up          3330
No Information Available     3292
Name: complaint_status, dtype: int64

In [None]:
dataset["complaint_status"] = le.fit_transform(dataset["complaint_status"])

In [None]:
dataset["complaint_status"].value_counts()

1    13528
4     3416
2     3397
3     3330
0     3292
Name: complaint_status, dtype: int64

In [None]:
dataset.head()

Unnamed: 0,age,gender,region_category,membership_category,joining_date,joined_through_referral,preferred_offer_types,medium_of_operation,internet_option,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
2,44,0,1,2,11112016,1,1,1,2,14,516.16,21027.0,22,500.69,0,1,1,3,Poor Website,5
3,37,1,0,2,29102016,1,1,1,1,11,53.27,25239.56,6,567.66,0,1,1,4,Poor Website,5
4,31,0,0,2,12092017,0,0,2,1,20,113.13,24483.66,16,663.06,0,1,1,2,Poor Website,5
6,21,1,1,1,19032015,1,1,1,1,10,55.38,8982.5,28,756.21,1,0,1,3,No reason specified,3
8,44,1,2,5,14122016,0,2,2,0,15,191.07,18362.31,20,686.882199,1,0,1,3,Poor Customer Service,3


#### Feedback

In [None]:
dataset["feedback"].unique()

array(['Poor Website', 'No reason specified', 'Poor Customer Service',
       'Poor Product Quality', 'Too many ads', 'User Friendly Website',
       'Quality Customer Care', 'Products always in Stock',
       'Reasonable Price'], dtype=object)

In [None]:
dataset["feedback"] = le.fit_transform(dataset["feedback"])

In [None]:
dataset.head()

Unnamed: 0,age,gender,region_category,membership_category,joining_date,joined_through_referral,preferred_offer_types,medium_of_operation,internet_option,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
2,44,0,1,2,11112016,1,1,1,2,14,516.16,21027.0,22,500.69,0,1,1,3,3,5
3,37,1,0,2,29102016,1,1,1,1,11,53.27,25239.56,6,567.66,0,1,1,4,3,5
4,31,0,0,2,12092017,0,0,2,1,20,113.13,24483.66,16,663.06,0,1,1,2,3,5
6,21,1,1,1,19032015,1,1,1,1,10,55.38,8982.5,28,756.21,1,0,1,3,0,3
8,44,1,2,5,14122016,0,2,2,0,15,191.07,18362.31,20,686.882199,1,0,1,3,1,3


#### Churn Risk Score

In [None]:
dataset["churn_risk_score"].value_counts()

 3    7608
 4    7365
 5    7136
 2    2027
 1    1950
-1     877
Name: churn_risk_score, dtype: int64

In [None]:
dataset.shape

(26963, 20)

### Outlier Detection and Removal for numeric values

#### Days Since Last Login

In [None]:
dataset.head()

Unnamed: 0,age,gender,region_category,membership_category,joining_date,joined_through_referral,preferred_offer_types,medium_of_operation,internet_option,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
2,44,0,1,2,11112016,1,1,1,2,14,516.16,21027.0,22,500.69,0,1,1,3,3,5
3,37,1,0,2,29102016,1,1,1,1,11,53.27,25239.56,6,567.66,0,1,1,4,3,5
4,31,0,0,2,12092017,0,0,2,1,20,113.13,24483.66,16,663.06,0,1,1,2,3,5
6,21,1,1,1,19032015,1,1,1,1,10,55.38,8982.5,28,756.21,1,0,1,3,0,3
8,44,1,2,5,14122016,0,2,2,0,15,191.07,18362.31,20,686.882199,1,0,1,3,1,3


In [None]:
dataset["days_since_last_login"].value_counts()

 12     1738
 13     1721
 14     1680
 11     1675
 15     1634
 16     1520
 10     1503
-999    1458
 9      1361
 17     1270
 8      1151
 18     1061
 7      1040
 19      997
 6       922
 5       881
 20      840
 21      730
 4       719
 22      672
 3       615
 23      535
 2       447
 24      355
 1       232
 25      145
 26       61
Name: days_since_last_login, dtype: int64

In [None]:
# Its seen that day_since_last_login can't be negative so remove those values
dataset = dataset[dataset["days_since_last_login"]!=-999]

In [None]:
dataset["days_since_last_login"].value_counts()

12    1738
13    1721
14    1680
11    1675
15    1634
16    1520
10    1503
9     1361
17    1270
8     1151
18    1061
7     1040
19     997
6      922
5      881
20     840
21     730
4      719
22     672
3      615
23     535
2      447
24     355
1      232
25     145
26      61
Name: days_since_last_login, dtype: int64

In [None]:
dataset.shape

(25505, 20)

In [None]:
dataset.head()

Unnamed: 0,age,gender,region_category,membership_category,joining_date,joined_through_referral,preferred_offer_types,medium_of_operation,internet_option,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
2,44,0,1,2,11112016,1,1,1,2,14,516.16,21027.0,22,500.69,0,1,1,3,3,5
3,37,1,0,2,29102016,1,1,1,1,11,53.27,25239.56,6,567.66,0,1,1,4,3,5
4,31,0,0,2,12092017,0,0,2,1,20,113.13,24483.66,16,663.06,0,1,1,2,3,5
6,21,1,1,1,19032015,1,1,1,1,10,55.38,8982.5,28,756.21,1,0,1,3,0,3
8,44,1,2,5,14122016,0,2,2,0,15,191.07,18362.31,20,686.882199,1,0,1,3,1,3


#### Average Time Spend

In [None]:
dataset["avg_time_spent"].value_counts()

 34.100000     17
 32.960000     15
 30.560000     14
 33.280000     13
 33.680000     13
               ..
 784.030000     1
 377.050000     1
-886.577661     1
 54.880000      1
 240.040000     1
Name: avg_time_spent, Length: 19308, dtype: int64

In [None]:
# Again Average Spent Time can't be negative
dataset = dataset[dataset["avg_time_spent"]>=0]

In [None]:
dataset.shape

(24330, 20)

#### Average Transaction Value

In [None]:
print(dataset["avg_transaction_value"].value_counts())

4407.26     2
38138.54    2
61756.98    2
33922.27    2
48440.34    2
           ..
32065.13    1
17721.03    1
25072.82    1
41869.33    1
35302.32    1
Name: avg_transaction_value, Length: 24283, dtype: int64


In [None]:
dataset.head()

Unnamed: 0,age,gender,region_category,membership_category,joining_date,joined_through_referral,preferred_offer_types,medium_of_operation,internet_option,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
2,44,0,1,2,11112016,1,1,1,2,14,516.16,21027.0,22,500.69,0,1,1,3,3,5
3,37,1,0,2,29102016,1,1,1,1,11,53.27,25239.56,6,567.66,0,1,1,4,3,5
4,31,0,0,2,12092017,0,0,2,1,20,113.13,24483.66,16,663.06,0,1,1,2,3,5
6,21,1,1,1,19032015,1,1,1,1,10,55.38,8982.5,28,756.21,1,0,1,3,0,3
8,44,1,2,5,14122016,0,2,2,0,15,191.07,18362.31,20,686.882199,1,0,1,3,1,3


#### Average Frequency Login Days

In [None]:
dataset["avg_frequency_login_days"].value_counts()

Error           2308
13               928
17               899
14               891
6                887
                ... 
-14.86449559       1
17.65581603        1
-10.05103279       1
49.46164518        1
-5.585924221       1
Name: avg_frequency_login_days, Length: 1103, dtype: int64

In [None]:
# Remove the Error Keyword which is not suitable for numerical value
dataset = dataset[dataset["avg_frequency_login_days"]!="Error"]

In [None]:
dataset["avg_frequency_login_days"] = pd.to_numeric(dataset["avg_frequency_login_days"])

In [None]:
dataset.shape

(22022, 20)

#### Points in Wallet

In [None]:
dataset["points_in_wallet"].unique()

array([500.69     , 567.66     , 663.06     , ..., 242.9796255,
       725.89     , 197.2644136])

In [None]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 22022 entries, 2 to 36990
Data columns (total 20 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   age                           22022 non-null  int64  
 1   gender                        22022 non-null  int64  
 2   region_category               22022 non-null  int64  
 3   membership_category           22022 non-null  int64  
 4   joining_date                  22022 non-null  int64  
 5   joined_through_referral       22022 non-null  int64  
 6   preferred_offer_types         22022 non-null  int64  
 7   medium_of_operation           22022 non-null  int64  
 8   internet_option               22022 non-null  int64  
 9   days_since_last_login         22022 non-null  int64  
 10  avg_time_spent                22022 non-null  float64
 11  avg_transaction_value         22022 non-null  float64
 12  avg_frequency_login_days      22022 non-null  float64
 13  p

### Machine Learning Implementation

#### Logisitic Regression

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

In [None]:
X = dataset.drop(["churn_risk_score"],axis=1)
y = dataset["churn_risk_score"]

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.20,random_state=42)

In [None]:
modelLR = LogisticRegression(max_iter=10000)

modelLR.fit(X_train,y_train)

y_pred = modelLR.predict(X_test)

In [None]:
# Accuracy Score
modelLR.score(X_test,y_test)

0.2606129398410897

In [None]:
# Classification Report
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

          -1       0.00      0.00      0.00       140
           1       0.13      0.12      0.13       312
           2       0.14      0.16      0.15       344
           3       0.28      0.61      0.39      1254
           4       0.30      0.11      0.16      1218
           5       0.28      0.13      0.18      1137

    accuracy                           0.26      4405
   macro avg       0.19      0.19      0.17      4405
weighted avg       0.26      0.26      0.22      4405



  _warn_prf(average, modifier, msg_start, len(result))


In [None]:
# Confusion Matrix
print(confusion_matrix(y_test,y_pred))

[[  0   7  10  88  15  20]
 [  0  39  44 196  12  21]
 [  0  42  55 208  21  18]
 [  0  76 109 770 141 158]
 [  0  65  76 773 137 167]
 [  0  64  93 708 125 147]]


#### Random Forest

In [None]:
from sklearn.ensemble import RandomForestClassifier

In [None]:
modelRF = RandomForestClassifier()

modelRF.fit(X_train,y_train)

y_pred = modelRF.predict(X_test)

In [None]:
# Accurracy Score
modelRF.score(X_test,y_test)

0.7377979568671964

In [None]:
# Classification Report
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

          -1       0.00      0.00      0.00       140
           1       0.69      0.74      0.71       312
           2       0.74      0.72      0.73       344
           3       0.89      0.89      0.89      1254
           4       0.66      0.60      0.63      1218
           5       0.67      0.81      0.73      1137

    accuracy                           0.74      4405
   macro avg       0.61      0.63      0.62      4405
weighted avg       0.72      0.74      0.72      4405



  _warn_prf(average, modifier, msg_start, len(result))


In [None]:
# Confusion Matrix
print(confusion_matrix(y_test,y_pred))

[[   0   11   13   33   24   59]
 [   0  230   75    2    3    2]
 [   0   91  249    1    3    0]
 [   0    1    0 1119  131    3]
 [   0    0    0   98  726  394]
 [   0    0    0    1  210  926]]


#### XGBoost

In [None]:
from xgboost import XGBClassifier

In [None]:
modelXGB = XGBClassifier()

modelXGB.fit(X_train,y_train)

y_pred = modelXGB.predict(X_test)

In [None]:
# Accurracy Score
modelXGB.score(X_test,y_test)

0.7527809307604995

In [None]:
# Classification Report
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

          -1       0.00      0.00      0.00       140
           1       0.66      0.95      0.78       312
           2       0.89      0.60      0.71       344
           3       0.93      0.84      0.89      1254
           4       0.72      0.53      0.61      1218
           5       0.65      0.97      0.78      1137

    accuracy                           0.75      4405
   macro avg       0.64      0.65      0.63      4405
weighted avg       0.75      0.75      0.73      4405



  _warn_prf(average, modifier, msg_start, len(result))


In [None]:
# Confusion Matrix
print(confusion_matrix(y_test,y_pred))

[[   0   13   11   29   21   66]
 [   0  297   15    0    0    0]
 [   0  139  205    0    0    0]
 [   0    0    0 1058  196    0]
 [   0    0    0   45  651  522]
 [   0    0    0    0   32 1105]]


### Test Data and Submission File

In [None]:
test = pd.read_csv("/content/drive/MyDrive/dataset/test.csv",na_values=['?','-999','Error','xxxxxxxx'])
test_1 = pd.read_csv("/content/drive/MyDrive/dataset/test.csv")

In [None]:
test.head()

Unnamed: 0,customer_id,Name,age,gender,security_no,region_category,membership_category,joining_date,joined_through_referral,referral_id,preferred_offer_types,medium_of_operation,internet_option,last_visit_time,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback
0,fffe43004900440031003700300030003400,Alethia Meints,50,F,OQJ1XAY,Village,Premium Membership,2015-11-02,No,,Without Offers,Smartphone,Wi-Fi,07:19:30,12.0,386.26,40721.44,7.0,733.83,Yes,No,No,Not Applicable,Poor Product Quality
1,fffe43004900440031003900370037003300,Ming Lopez,41,M,OUQRPKO,Village,Gold Membership,2016-03-01,No,,Without Offers,Desktop,Fiber_Optic,22:21:16,11.0,37.8,9644.4,9.0,726.0,Yes,No,No,Not Applicable,Poor Website
2,fffe43004900440034003800360037003000,Carina Flannigan,31,F,02J2RE7,Town,Silver Membership,2017-03-03,No,,Gift Vouchers/Coupons,Both,Mobile_Data,16:40:39,18.0,215.36,3693.25,21.0,713.78,Yes,No,Yes,Solved in Follow-up,No reason specified
3,fffe43004900440036003200370033003400,Kyung Wanner,64,M,5YEQIF1,Town,Silver Membership,2017-08-18,Yes,CID8941,Credit/Debit Card Offers,,Fiber_Optic,14:56:17,,44.57,36809.56,11.0,744.97,Yes,No,Yes,No Information Available,Too many ads
4,fffe43004900440035003000370031003900,Enola Gatto,16,F,100RYB5,Town,No Membership,2015-05-05,Yes,CID5690,Without Offers,Smartphone,Mobile_Data,02:57:53,6.0,349.88,40675.86,8.0,299.048351,No,Yes,Yes,Solved in Follow-up,Poor Website


In [None]:
test = test.drop(["customer_id","Name","security_no","referral_id","last_visit_time"],axis=1)

In [None]:
test.head()

Unnamed: 0,age,gender,region_category,membership_category,joining_date,joined_through_referral,preferred_offer_types,medium_of_operation,internet_option,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback
0,50,F,Village,Premium Membership,2015-11-02,No,Without Offers,Smartphone,Wi-Fi,12.0,386.26,40721.44,7.0,733.83,Yes,No,No,Not Applicable,Poor Product Quality
1,41,M,Village,Gold Membership,2016-03-01,No,Without Offers,Desktop,Fiber_Optic,11.0,37.8,9644.4,9.0,726.0,Yes,No,No,Not Applicable,Poor Website
2,31,F,Town,Silver Membership,2017-03-03,No,Gift Vouchers/Coupons,Both,Mobile_Data,18.0,215.36,3693.25,21.0,713.78,Yes,No,Yes,Solved in Follow-up,No reason specified
3,64,M,Town,Silver Membership,2017-08-18,Yes,Credit/Debit Card Offers,,Fiber_Optic,,44.57,36809.56,11.0,744.97,Yes,No,Yes,No Information Available,Too many ads
4,16,F,Town,No Membership,2015-05-05,Yes,Without Offers,Smartphone,Mobile_Data,6.0,349.88,40675.86,8.0,299.048351,No,Yes,Yes,Solved in Follow-up,Poor Website


#### Encoded Some values in training set so doing same in test set

In [None]:
 for i in test.columns:
   if test[i].dtype=='float64':
     test[i]=test[i].fillna(test[i].mean())
   else:
     test[i]=test[i].fillna(method='ffill')

In [None]:
test["region_category"] = test.region_category.apply(str)

In [None]:
test["joined_through_referral"] = test.joined_through_referral.apply(str)

In [None]:
test["gender"] = le.fit_transform(test.gender)
test["region_category"] = le.fit_transform(test["region_category"])
test["membership_category"] = le.fit_transform(test.membership_category)
test["joined_through_referral"] = le.fit_transform(test["joined_through_referral"])
test["preferred_offer_types"] = le.fit_transform(test["preferred_offer_types"])
test["medium_of_operation"] = le.fit_transform(test["medium_of_operation"])
test["internet_option"] = le.fit_transform(test["internet_option"])
test["used_special_discount"] = le.fit_transform(test["used_special_discount"])
test["offer_application_preference"] = le.fit_transform(test["offer_application_preference"])
test["past_complaint"] = le.fit_transform(test["past_complaint"])
test["complaint_status"] = le.fit_transform(test["complaint_status"])
test["feedback"] = le.fit_transform(test["feedback"])

In [None]:
test["joining_date"] = test["joining_date"].replace("-",'',regex=True).astype(int)

In [None]:
result = modelRF.predict(test)

In [None]:
# result

In [None]:
submission = pd.DataFrame({
   'customer_id': test_1['customer_id'],
   'churn_risk_score': result,
})

In [None]:
submission.to_csv('/content/drive/MyDrive/dataset/randomforestresults.csv', index=False)