## Predict the churn risk rate


* Churn rate is a marketing metric that describes the number of customers who leave a business over a specific time period. . Every user is assigned a prediction value that estimates their state of churn at any given time. This value is based on:

* User demographic information
* Browsing behavior
* Historical purchase data among other information

* It factors in our unique and proprietary predictions of how long a user will remain a customer. This score is updated every day for all users who have a minimum of one conversion. The values assigned are between 1 and 5.

* Your task is to predict the churn score for a website based on the features provided in the dataset.

* Data description
* The dataset folder contains the following files:

* train.csv: 36992 x 25
* test.csv: 19919 x 24
* sample_submission.csv: 5 x 2

* The columns provided in the dataset are as follows:




* customer_id - Represents the unique identification number of a customer

* Name - Represents the name of a customer

* age - Represents the age of a customer

* security_no - Represents a unique security number that is used to identify a person

* region_category - Represents the region that a customer belongs to 

* membership_category - Represents the category of the membership that a customer is using

* joining_date - Represents the date when a customer became a member 

* joined_through_referral - Represents whether a customer joined using any referral code or ID

* referral_id - Represents a referral ID

* preferred_offer_types - Represents the type of offer that a customer prefers

* medium_of_operation - Represents the medium of operation that a customer uses for transactions

* internet_option	- Represents the type of internet service a customer uses

* last_visit_time - Represents the last time a customer visited the website

* days_since_last_login	 - Represents the no. of days since a customer last logged into the website

* avg_time_spent - Represents the average time spent by a customer on the website

* avg_transaction_value	- Represents the average transaction value of a customer

* avg_frequency_login_days - Represents the no. of times a customer has logged in to the website


* points_in_wallet	- Represents the points awarded to a customer on each transaction 

* used_special_discount	 - Represents whether a customer uses special discounts offered

* offer_application_preference - Represents whether a customer prefers offers 

* past_complaint - Represents whether a customer has raised any complaints 

* complaint_status	- Represents whether the complaints raised by a customer was resolved 

* feedback - Represents the feedback provided by a customer

* churn_risk_score - Represents the churn risk score that ranges from 1 to 5

* Evaluation metric - score = 100 x metrics.f1_score(actual, predicted, average="macro")





### Result submission guidelines
* The index is customer_id and the target is the churn_risk_score column. 
* The submission file must be submitted in .csv format only.
* The size of this submission file must be 19919 x 2.


### Note: Ensure that your submission file contains the following:

* Correct index values as per the test file
* Correct names of columns as provided in the sample_submission.csv file


In [1]:
import numpy as np
import pandas as pd



In [2]:
train = pd.read_csv("C:/Users/user/Downloads/HACKATHON/HackerEarth/98efc33085a711eb/dataset/train.csv")
test = pd.read_csv("C:/Users/user/Downloads/HACKATHON/HackerEarth/98efc33085a711eb/dataset/test.csv")
submission = pd.read_csv("C:/Users/user/Downloads/HACKATHON/HackerEarth/98efc33085a711eb/dataset/sample_submission.csv")

In [3]:
submission

Unnamed: 0,customer_id,churn_risk_score
0,fffe4300490044003600300030003800,2
1,fffe43004900440032003100300035003700,1
2,fffe4300490044003100390032003600,5
3,fffe43004900440036003000330031003600,5
4,fffe43004900440031003900350030003600,5


In [4]:
train.head()

Unnamed: 0,customer_id,Name,age,gender,security_no,region_category,membership_category,joining_date,joined_through_referral,referral_id,...,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score
0,fffe4300490044003600300030003800,Pattie Morrisey,18,F,XW0DQ7H,Village,Platinum Membership,2017-08-17,No,xxxxxxxx,...,300.63,53005.25,17.0,781.75,Yes,Yes,No,Not Applicable,Products always in Stock,2
1,fffe43004900440032003100300035003700,Traci Peery,32,F,5K0N3X1,City,Premium Membership,2017-08-28,?,CID21329,...,306.34,12838.38,10.0,,Yes,No,Yes,Solved,Quality Customer Care,1
2,fffe4300490044003100390032003600,Merideth Mcmeen,44,F,1F2TCL3,Town,No Membership,2016-11-11,Yes,CID12313,...,516.16,21027.0,22.0,500.69,No,Yes,Yes,Solved in Follow-up,Poor Website,5
3,fffe43004900440036003000330031003600,Eufemia Cardwell,37,M,VJGJ33N,City,No Membership,2016-10-29,Yes,CID3793,...,53.27,25239.56,6.0,567.66,No,Yes,Yes,Unsolved,Poor Website,5
4,fffe43004900440031003900350030003600,Meghan Kosak,31,F,SVZXCWB,City,No Membership,2017-09-12,No,xxxxxxxx,...,113.13,24483.66,16.0,663.06,No,Yes,Yes,Solved,Poor Website,5


In [8]:
train.shape

(36992, 25)

In [9]:
train['churn_risk_score'].isna().sum()

0

In [10]:
test.head()

Unnamed: 0,customer_id,Name,age,gender,security_no,region_category,membership_category,joining_date,joined_through_referral,referral_id,...,days_since_last_login,avg_time_spent,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback
0,fffe43004900440031003700300030003400,Alethia Meints,50,F,OQJ1XAY,Village,Premium Membership,2015-11-02,No,xxxxxxxx,...,12,386.26,40721.44,7.0,733.83,Yes,No,No,Not Applicable,Poor Product Quality
1,fffe43004900440031003900370037003300,Ming Lopez,41,M,OUQRPKO,Village,Gold Membership,2016-03-01,No,xxxxxxxx,...,11,37.8,9644.4,9.0,726.0,Yes,No,No,Not Applicable,Poor Website
2,fffe43004900440034003800360037003000,Carina Flannigan,31,F,02J2RE7,Town,Silver Membership,2017-03-03,No,xxxxxxxx,...,18,215.36,3693.25,21.0,713.78,Yes,No,Yes,Solved in Follow-up,No reason specified
3,fffe43004900440036003200370033003400,Kyung Wanner,64,M,5YEQIF1,Town,Silver Membership,2017-08-18,Yes,CID8941,...,-999,44.57,36809.56,11.0,744.97,Yes,No,Yes,No Information Available,Too many ads
4,fffe43004900440035003000370031003900,Enola Gatto,16,F,100RYB5,Town,No Membership,2015-05-05,Yes,CID5690,...,6,349.88,40675.86,8.0,299.048351,No,Yes,Yes,Solved in Follow-up,Poor Website


In [11]:
train.columns

Index(['customer_id', 'Name', 'age', 'gender', 'security_no',
       'region_category', 'membership_category', 'joining_date',
       'joined_through_referral', 'referral_id', 'preferred_offer_types',
       'medium_of_operation', 'internet_option', 'last_visit_time',
       'days_since_last_login', 'avg_time_spent', 'avg_transaction_value',
       'avg_frequency_login_days', 'points_in_wallet', 'used_special_discount',
       'offer_application_preference', 'past_complaint', 'complaint_status',
       'feedback', 'churn_risk_score'],
      dtype='object')

In [12]:
test.columns

Index(['customer_id', 'Name', 'age', 'gender', 'security_no',
       'region_category', 'membership_category', 'joining_date',
       'joined_through_referral', 'referral_id', 'preferred_offer_types',
       'medium_of_operation', 'internet_option', 'last_visit_time',
       'days_since_last_login', 'avg_time_spent', 'avg_transaction_value',
       'avg_frequency_login_days', 'points_in_wallet', 'used_special_discount',
       'offer_application_preference', 'past_complaint', 'complaint_status',
       'feedback'],
      dtype='object')

In [9]:
customer_id = test['customer_id']

In [10]:
train['churn_risk_score'].unique()

array([ 2,  1,  5,  3,  4, -1], dtype=int64)

In [11]:
# Concat both data set
# after concat train, test fetuare engineering and data cleaning is easy
test['churn_risk_score']=np.nan
train['data']='train'
test['data']='test'
test=test[train.columns]
c_all=pd.concat([train,test],axis=0)

In [12]:
c_all.head()

Unnamed: 0,customer_id,Name,age,gender,security_no,region_category,membership_category,joining_date,joined_through_referral,referral_id,...,avg_transaction_value,avg_frequency_login_days,points_in_wallet,used_special_discount,offer_application_preference,past_complaint,complaint_status,feedback,churn_risk_score,data
0,fffe4300490044003600300030003800,Pattie Morrisey,18,F,XW0DQ7H,Village,Platinum Membership,2017-08-17,No,xxxxxxxx,...,53005.25,17.0,781.75,Yes,Yes,No,Not Applicable,Products always in Stock,2.0,train
1,fffe43004900440032003100300035003700,Traci Peery,32,F,5K0N3X1,City,Premium Membership,2017-08-28,?,CID21329,...,12838.38,10.0,,Yes,No,Yes,Solved,Quality Customer Care,1.0,train
2,fffe4300490044003100390032003600,Merideth Mcmeen,44,F,1F2TCL3,Town,No Membership,2016-11-11,Yes,CID12313,...,21027.0,22.0,500.69,No,Yes,Yes,Solved in Follow-up,Poor Website,5.0,train
3,fffe43004900440036003000330031003600,Eufemia Cardwell,37,M,VJGJ33N,City,No Membership,2016-10-29,Yes,CID3793,...,25239.56,6.0,567.66,No,Yes,Yes,Unsolved,Poor Website,5.0,train
4,fffe43004900440031003900350030003600,Meghan Kosak,31,F,SVZXCWB,City,No Membership,2017-09-12,No,xxxxxxxx,...,24483.66,16.0,663.06,No,Yes,Yes,Solved,Poor Website,5.0,train


In [13]:
c_all.isna().sum()

customer_id                         0
Name                                0
age                                 0
gender                              0
security_no                         0
region_category                  8376
membership_category                 0
joining_date                        0
joined_through_referral             0
referral_id                         0
preferred_offer_types             447
medium_of_operation                 0
internet_option                     0
last_visit_time                     0
days_since_last_login               0
avg_time_spent                      0
avg_transaction_value               0
avg_frequency_login_days            0
points_in_wallet                 5406
used_special_discount               0
offer_application_preference        0
past_complaint                      0
complaint_status                    0
feedback                            0
churn_risk_score                19919
data                                0
dtype: int64

In [14]:
c_all.dtypes

customer_id                      object
Name                             object
age                               int64
gender                           object
security_no                      object
region_category                  object
membership_category              object
joining_date                     object
joined_through_referral          object
referral_id                      object
preferred_offer_types            object
medium_of_operation              object
internet_option                  object
last_visit_time                  object
days_since_last_login             int64
avg_time_spent                  float64
avg_transaction_value           float64
avg_frequency_login_days         object
points_in_wallet                float64
used_special_discount            object
offer_application_preference     object
past_complaint                   object
complaint_status                 object
feedback                         object
churn_risk_score                float64


In [15]:
c_all.drop(['customer_id','Name'],axis=1,inplace=True)

In [16]:
# gender
c_all['g_female'] = (c_all['gender'] == "F").astype(int)
c_all["g_male"] = (c_all['gender']=='M').astype(int)
c_all.drop(['gender'],axis=1,inplace=True)

In [17]:
# region_category
c_all['region_category'].unique()

array(['Village', 'City', 'Town', nan], dtype=object)

In [18]:
c_all['region_category'] = c_all['region_category'].fillna(c_all['region_category'].mode()[0])
c_all['rg_vilage'] = (c_all['region_category'] == 'Village').astype(int)
c_all['rg_city'] = (c_all['region_category'] == 'City').astype(int)
c_all.drop(['region_category'],axis=1,inplace=True)

In [19]:
# membership_category
c_all['membership_category'].unique()

array(['Platinum Membership', 'Premium Membership', 'No Membership',
       'Gold Membership', 'Silver Membership', 'Basic Membership'],
      dtype=object)

In [20]:
c_all['mc_Gold'] = (c_all['membership_category']=='Gold Membership').astype(int)
c_all['mc_Silver'] = (c_all['membership_category'] == 'Silver Membership').astype(int)
c_all['mc_Platinum'] = (c_all['membership_category'] == 'Platinum Membership').astype(int)
c_all['mc_Premium'] = (c_all['membership_category'] == 'Premium Membership').astype(int)
c_all['mc_Basic'] = (c_all['membership_category'] == 'Basic Membership').astype(int)
c_all.drop(['membership_category'],axis=1,inplace=True)


In [21]:
# joined_through_referral
c_all['joined_through_referral'].unique()

array(['No', '?', 'Yes'], dtype=object)

In [22]:
c_all['jr_yes'] = (c_all['joined_through_referral'] == 'Yes').astype(int)
c_all['jr_no'] = (c_all['joined_through_referral'] == 'No').astype(int)
c_all.drop(['joined_through_referral'],axis=1,inplace=True)


In [23]:
# preferred_offer_types
c_all['preferred_offer_types'].unique()

array(['Gift Vouchers/Coupons', 'Credit/Debit Card Offers',
       'Without Offers', nan], dtype=object)

In [24]:
c_all['preferred_offer_types'] = c_all['preferred_offer_types'].fillna(c_all['preferred_offer_types'].mode()[0])
c_all['pot_GiftVouchersCoupons'] = (c_all['preferred_offer_types'] == 'Gift Vouchers/Coupons').astype(int)
c_all['CreditDebitCard'] = (c_all['preferred_offer_types'] == 'Credit/Debit Card Offers')
c_all.drop(['preferred_offer_types'],axis=1,inplace=True)



In [25]:
# medium_of_operation
c_all['medium_of_operation'].unique()

array(['?', 'Desktop', 'Smartphone', 'Both'], dtype=object)

In [26]:
c_all['mo_Desktop'] = (c_all['medium_of_operation'] == 'Desktop').astype(int)
c_all['mo_Smartphone'] = (c_all['medium_of_operation'] == 'Smartphone').astype(int)
c_all['mo_Both'] = (c_all['medium_of_operation'] == 'Both').astype(int)
c_all.drop(['medium_of_operation'],axis=1,inplace=True)


In [27]:
# internet_option
c_all['internet_option'].unique()

array(['Wi-Fi', 'Mobile_Data', 'Fiber_Optic'], dtype=object)

In [28]:
c_all['io_Mobile_Data'] = (c_all['internet_option'] == 'Mobile_Data').astype(int)
c_all['io_Fiber_Optic'] = (c_all['internet_option'] == 'Fiber_Optic').astype(int)
c_all.drop(['internet_option'],axis=1,inplace=True)


In [29]:
# used_special_discount
c_all['used_special_discount'].unique()

array(['Yes', 'No'], dtype=object)

In [30]:
c_all['special_discount'] = (c_all['used_special_discount'] == 'Yes').astype(int)
c_all.drop(['used_special_discount'],axis=1,inplace=True)


In [31]:
# offer_application_preference
c_all['offer_application_preference'].unique()

array(['Yes', 'No'], dtype=object)

In [32]:
c_all['offer_application'] = (c_all['offer_application_preference'] == 'Yes').astype(int)
c_all.drop(['offer_application_preference'],axis=1,inplace=True)


In [33]:
# past_complaint
c_all['past_complaint'].unique()

array(['No', 'Yes'], dtype=object)

In [34]:
c_all['past_complaint'] = (c_all['past_complaint'] == 'Yes').astype(int)


In [35]:
# complaint_status
c_all['complaint_status'].unique()

array(['Not Applicable', 'Solved', 'Solved in Follow-up', 'Unsolved',
       'No Information Available'], dtype=object)

In [36]:
c_all['Solved'] = (c_all['complaint_status'] == 'Solved').astype(int)
c_all['Unsolved'] = (c_all['complaint_status'] == 'Unsolved').astype(int)
c_all['SolvedFollowup'] = (c_all['complaint_status']=='Solved in Follow-up').astype(int)
c_all['NotApplicable'] = (c_all['complaint_status'] == 'Not Applicable').astype(int)
c_all.drop(['complaint_status'],axis=1,inplace=True)


In [37]:
# feedback
c_all['feedback'].unique()

array(['Products always in Stock', 'Quality Customer Care',
       'Poor Website', 'No reason specified', 'Poor Product Quality',
       'Poor Customer Service', 'Too many ads', 'User Friendly Website',
       'Reasonable Price'], dtype=object)

In [38]:
c_all['ProductsAlwaysStock'] = (c_all['feedback'] == 'Products always in Stock').astype(int)
c_all['QualityCustomerCare'] = (c_all['feedback'] == 'Quality Customer Care').astype(int)
c_all['PoorWebsite'] = (c_all['feedback'] == 'Poor Website').astype(int)
c_all['PoorProductQuality'] = (c_all['feedback'] == 'Poor Product Quality').astype(int)
c_all['PoorCustomerService'] = (c_all['feedback'] == 'Poor Customer Service').astype(int)
c_all['Toomanyads'] = (c_all['feedback'] == 'Too many ads').astype(int)
c_all['UserFriendlyWebsite'] = (c_all['feedback'] == 'User Friendly Website').astype(int)
c_all['ReasonablePrice'] = (c_all['feedback'] == 'Reasonable Price').astype(int)
c_all.drop(['feedback'],axis=1,inplace=True)



In [39]:
c_all['joining_date'][0:4]

0    2017-08-17
1    2017-08-28
2    2016-11-11
3    2016-10-29
Name: joining_date, dtype: object

In [40]:
# joining_date
c_all['joining_date'] = c_all['joining_date'].str.replace("-",'').astype(int)

In [41]:
# last_visit_time
c_all['last_visit_time'][0:4]

0    16:08:02
1    12:38:13
2    22:53:21
3    15:57:50
Name: last_visit_time, dtype: object

In [42]:
# last_visit_time
c_all['last_visit_time'] = c_all['last_visit_time'].str.replace(":",'').astype(int)


In [43]:
# avg_frequency_login_days
c_all['avg_frequency_login_days'] = c_all['avg_frequency_login_days'].str.replace('Error','0')
c_all['avg_frequency_login_days'] = pd.to_numeric(c_all['avg_frequency_login_days'])


In [44]:
# points_in_wallet
c_all['points_in_wallet'].isna().sum() ,c_all.shape

(5406, (56911, 45))

In [45]:
print(c_all['points_in_wallet'].mean(), c_all['points_in_wallet'].median(), c_all['points_in_wallet'].mode())
c_all['points_in_wallet'] = c_all['points_in_wallet'].fillna(c_all['points_in_wallet'].median())


686.563761728606 697.82 0    710.69
dtype: float64


In [46]:
from keras.preprocessing.text import Tokenizer
tokenizer = Tokenizer()

tokenizer.fit_on_texts(c_all['security_no'])


Using TensorFlow backend.


In [47]:
c_all['security_no'] = tokenizer.texts_to_sequences(c_all['security_no'])


In [48]:
from keras.preprocessing.sequence import pad_sequences

c_all['security_no'] = pad_sequences(c_all['security_no'], maxlen=1)


In [49]:
c_all['referral_id'].unique()

array(['xxxxxxxx', 'CID21329', 'CID12313', ..., 'CID918', 'CID45490',
       'CID56352'], dtype=object)

In [50]:
c_all['referral_id'] = c_all['referral_id'].str.replace('xxxxxxxx','0')

In [51]:
c_all['referral_id'] = c_all['referral_id'].str.replace('CID','')

In [52]:
c_all['referral_id'] = c_all['referral_id'].str.replace('No referral','0')

In [53]:
# pd.set_option('display.max_rows', None)

c_all['referral_id'][5625]


5625    0
5625    0
Name: referral_id, dtype: object

In [54]:
c_all['referral_id'] = pd.to_numeric(c_all['referral_id']) # convert everything to float values


In [55]:
c_all.isnull().sum()

age                             0
security_no                     0
joining_date                    0
referral_id                     0
last_visit_time                 0
days_since_last_login           0
avg_time_spent                  0
avg_transaction_value           0
avg_frequency_login_days        0
points_in_wallet                0
past_complaint                  0
churn_risk_score            19919
data                            0
g_female                        0
g_male                          0
rg_vilage                       0
rg_city                         0
mc_Gold                         0
mc_Silver                       0
mc_Platinum                     0
mc_Premium                      0
mc_Basic                        0
jr_yes                          0
jr_no                           0
pot_GiftVouchersCoupons         0
CreditDebitCard                 0
mo_Desktop                      0
mo_Smartphone                   0
mo_Both                         0
io_Mobile_Data

In [56]:
c_all.dtypes

age                           int64
security_no                   int32
joining_date                  int32
referral_id                   int64
last_visit_time               int32
days_since_last_login         int64
avg_time_spent              float64
avg_transaction_value       float64
avg_frequency_login_days    float64
points_in_wallet            float64
past_complaint                int32
churn_risk_score            float64
data                         object
g_female                      int32
g_male                        int32
rg_vilage                     int32
rg_city                       int32
mc_Gold                       int32
mc_Silver                     int32
mc_Platinum                   int32
mc_Premium                    int32
mc_Basic                      int32
jr_yes                        int32
jr_no                         int32
pot_GiftVouchersCoupons       int32
CreditDebitCard                bool
mo_Desktop                    int32
mo_Smartphone               

In [57]:
c_train=c_all[c_all['data']=='train']
del c_train['data']
c_test=c_all[c_all['data']=='test']
c_test.drop(['churn_risk_score','data'],axis=1,inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [58]:
del c_all

In [59]:
from sklearn.model_selection import train_test_split

In [60]:
c_train1,c_train2=train_test_split(c_train,test_size=0.2,random_state=2)

In [61]:
# Notice that only train data is used for imputing missing values in both train and test 

x_train1=c_train1.drop('churn_risk_score',axis=1)
y_train1=c_train1['churn_risk_score']

x_test1 = c_train2.drop('churn_risk_score',axis=1)
x_test2 = c_train2['churn_risk_score']




In [62]:
import pandas as pd
import numpy as np
from sklearn.model_selection import RandomizedSearchCV,train_test_split,GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from xgboost.sklearn import XGBClassifier

In [63]:
# param_dist = {
#               "max_depth": [2,3,4,5,6],
#               "learning_rate":[0.01,0.05,0.1,0.3,0.5],
#     "min_child_weight":[4,5,6],
#               "subsample":[i/10.0 for i in range(6,10)],
#  "colsample_bytree":[i/10.0 for i in range(6,10)],
#                "reg_alpha":[1e-5, 1e-2, 0.1, 1, 100],
#               "gamma":[i/10.0 for i in range(0,5)],
#     "n_estimators":[100,500,700,1000],
#     'scale_pos_weight':[2,3,4,5,6,7,8,9]
    
#               }


In [64]:
from xgboost import XGBClassifier, plot_importance
clf  = XGBClassifier(max_depth = 10,random_state = 10, n_estimators=220, eval_metric = 'auc', min_child_weight = 3,
                    colsample_bytree = 0.75, subsample= 0.9)


clf.fit(x_train1, y_train1)

XGBClassifier(base_score=0.5, booster=None, colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=0.75, eval_metric='auc',
              gamma=0, gpu_id=-1, importance_type='gain',
              interaction_constraints=None, learning_rate=0.300000012,
              max_delta_step=0, max_depth=10, min_child_weight=3, missing=nan,
              monotone_constraints=None, n_estimators=220, n_jobs=0,
              num_parallel_tree=1, objective='multi:softprob', random_state=10,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=None, subsample=0.9,
              tree_method=None, validate_parameters=False, verbosity=None)

In [65]:
# c_train2=c_train2.drop('churn_risk_score',axis=1)

In [66]:
x_train1.shape, y_train1.shape, x_test1.shape, x_test2.shape

((29593, 43), (29593,), (7399, 43), (7399,))

In [67]:
predicted_xg=clf.predict(x_test1)

In [68]:
predicted_xg

array([5., 3., 5., ..., 3., 5., 4.])

In [69]:
from sklearn.metrics import f1_score


In [70]:
predicted_xg.shape, y_train1.shape

((7399,), (29593,))

In [71]:
f1_score(x_test2,predicted_xg,average='macro')

0.6245714808862098

In [72]:
x_train=c_train.drop('churn_risk_score',axis=1)
y_train=c_train['churn_risk_score']

In [73]:
clf.fit(x_train, y_train)

XGBClassifier(base_score=0.5, booster=None, colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=0.75, eval_metric='auc',
              gamma=0, gpu_id=-1, importance_type='gain',
              interaction_constraints=None, learning_rate=0.300000012,
              max_delta_step=0, max_depth=10, min_child_weight=3, missing=nan,
              monotone_constraints=None, n_estimators=220, n_jobs=0,
              num_parallel_tree=1, objective='multi:softprob', random_state=10,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=None, subsample=0.9,
              tree_method=None, validate_parameters=False, verbosity=None)

In [74]:
test_pred=clf.predict(c_test)

In [76]:
test_pred

array([3., 3., 4., ..., 5., 4., 3.])

In [77]:
customer_id

0        fffe43004900440031003700300030003400
1        fffe43004900440031003900370037003300
2        fffe43004900440034003800360037003000
3        fffe43004900440036003200370033003400
4        fffe43004900440035003000370031003900
                         ...                 
19914    fffe43004900440035003600330037003800
19915    fffe43004900440032003900370037003100
19916    fffe43004900440036003100310036003700
19917    fffe43004900440034003200330033003600
19918    fffe43004900440036003200340030003100
Name: customer_id, Length: 19919, dtype: object

In [82]:
submission_XGB = pd.DataFrame({
    'customer_id':customer_id,
    'churn_risk_score':test_pred
})

In [83]:
submission

Unnamed: 0,customer_id,churn_risk_score
0,fffe4300490044003600300030003800,2
1,fffe43004900440032003100300035003700,1
2,fffe4300490044003100390032003600,5
3,fffe43004900440036003000330031003600,5
4,fffe43004900440031003900350030003600,5


In [86]:
submission_XGB.head()

Unnamed: 0,customer_id,churn_risk_score
0,fffe43004900440031003700300030003400,3.0
1,fffe43004900440031003900370037003300,3.0
2,fffe43004900440034003800360037003000,4.0
3,fffe43004900440036003200370033003400,3.0
4,fffe43004900440035003000370031003900,5.0


In [85]:
submission_XGB.to_csv('XGB1.csv',index=False)
