#  LTFS Top-up loan Up-sell prediction

A loan is when you receive the money from a financial institution in exchange for future repayment of the principal, plus interest. Financial institutions provide loans to the industries, corporates and individuals. The interest received on these loans is one among the main sources of income for the financial institutions.

A top-up loan, true to its name, is a facility of availing further funds on an existing loan. When you have a loan that has already been disbursed and under repayment and if you need more funds then, you can simply avail additional funding on the same loan thereby minimizing time, effort and cost related to applying again.

LTFS provides it’s loan services to its customers and is interested in selling more of its Top-up loan services to its existing customers so they have decided to identify when to pitch a Top-up during the original loan tenure.  If they correctly identify the most suitable time to offer a top-up, this will ultimately lead to more disbursals and can also help them beat competing offerings from other institutions.

To understand this behaviour, LTFS has provided data for its customers containing the information whether that particular customer took the Top-up service and when he took such Top-up service, represented by the target variable Top-up Month.


You are provided with two types of information: 


1. Customer’s Demographics: The demography table along with the target variable & demographic information contains variables related to Frequency of the loan, Tenure of the loan, Disbursal Amount for a loan & LTV.

2. Bureau data:  Bureau data contains the behavioural and transactional attributes of the customers like current balance, Loan Amount, Overdue etc. for various tradelines of a given customer

As a data scientist, LTFS  has tasked you with building a model given the Top-up loan bucket of 128655 customers along with demographic and bureau data, predict the right bucket/period for 14745 customers in the test data.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from sklearn.preprocessing import OneHotEncoder
from xgboost.sklearn import XGBClassifier
from sklearn.metrics import classification_report
from sklearn.utils import class_weight
import datetime
sns.set()
pd.options.display.max_columns =200

In [2]:
train_data_path = Path(Path.cwd(),'Train','train_Data.xlsx')
train_bureau_path = Path(Path.cwd(),'Train','train_bureau.xlsx')

In [3]:
data = pd.read_excel(train_data_path)
data.head(3)

Unnamed: 0,ID,Frequency,InstlmentMode,LoanStatus,PaymentMode,BranchID,Area,Tenure,AssetCost,AmountFinance,DisbursalAmount,EMI,DisbursalDate,MaturityDAte,AuthDate,AssetID,ManufacturerID,SupplierID,LTV,SEX,AGE,MonthlyIncome,City,State,ZiPCODE,Top-up Month
0,1,Monthly,Arrear,Closed,PDC_E,1,,48,450000,275000.0,275000.0,24000.0,2012-02-10,2016-01-15,2012-02-10,4022465,1568,21946,61.11,M,49.0,35833.33,RAISEN,MADHYA PRADESH,464993.0,> 48 Months
1,2,Monthly,Advance,Closed,PDC,333,BHOPAL,47,485000,350000.0,350000.0,10500.0,2012-03-31,2016-02-15,2012-03-31,4681175,1062,34802,70.0,M,23.0,666.67,SEHORE,MADHYA PRADESH,466001.0,No Top-up Service
2,3,Quatrly,Arrear,Active,Direct Debit,1,,68,690000,519728.0,519728.0,38300.0,2017-06-17,2023-02-10,2017-06-17,25328146,1060,127335,69.77,M,39.0,45257.0,BHOPAL,MADHYA PRADESH,462030.0,12-18 Months


In [4]:
bureau = pd.read_excel(train_bureau_path)
bureau.head(3)

Unnamed: 0,ID,SELF-INDICATOR,MATCH-TYPE,ACCT-TYPE,CONTRIBUTOR-TYPE,DATE-REPORTED,OWNERSHIP-IND,ACCOUNT-STATUS,DISBURSED-DT,CLOSE-DT,LAST-PAYMENT-DATE,CREDIT-LIMIT/SANC AMT,DISBURSED-AMT/HIGH CREDIT,INSTALLMENT-AMT,CURRENT-BAL,INSTALLMENT-FREQUENCY,OVERDUE-AMT,WRITE-OFF-AMT,ASSET_CLASS,REPORTED DATE - HIST,DPD - HIST,CUR BAL - HIST,AMT OVERDUE - HIST,AMT PAID - HIST,TENURE
0,1,False,PRIMARY,Overdraft,NAB,2018-04-30,Individual,Delinquent,2015-10-05,,2018-02-27,,37352,,37873,,37873.0,0.0,Standard,2018043020180331,030000,3787312820,"37873,,",",,",
1,1,False,PRIMARY,Auto Loan (Personal),NAB,2019-12-31,Individual,Active,2018-03-19,,2019-12-19,,44000,"1,405/Monthly",20797,F03,,0.0,Standard,"20191231,20191130,20191031,20190930,20190831,2...",0000000000000000000000000000000000000000000000...,"20797,21988,23174,24341,25504,26648,27780,2891...",",,,,,,,,,,,,,,,,,,,,1452,,",",,,,,,,,,,,,,,,,,,,,,,",36.0
2,1,True,PRIMARY,Tractor Loan,NBF,2020-01-31,Individual,Active,2019-08-30,,NaT,,145000,,116087,,0.0,0.0,,"20200131,20191231,20191130,20191031,20190930,2...",000000000000000000,116087116087145000145000145000145000,000000,",,,,,,",


In [5]:
def clean_bureau(df):
    clean_col = ['CREDIT-LIMIT/SANC AMT','DISBURSED-AMT/HIGH CREDIT','CURRENT-BAL','OVERDUE-AMT']
    for col in clean_col:
        df[col] = df[col].str.replace(",","").astype(float)
    df['DISBURSED-DT'] = df['DISBURSED-DT'].fillna(datetime.datetime(1970,1,1))
    return df

In [6]:
def summarise_bureau(data,data_bureau,train_bureau):
    
    sorted_bureau = data_bureau.sort_values(['ID','DISBURSED-DT'])
    sorted_bureau.drop_duplicates(subset=['ID','DISBURSED-DT'],keep='first',inplace=True)
    
    count_columns = ['SELF-INDICATOR','MATCH-TYPE','OWNERSHIP-IND','ACCOUNT-STATUS']
    select_col = ['ID','DISBURSED-DT','SELF-INDICATOR','MATCH-TYPE','ACCT-TYPE','CONTRIBUTOR-TYPE','OWNERSHIP-IND','ACCOUNT-STATUS',
                 'ASSET_CLASS','CURRENT-BAL','OVERDUE-AMT','WRITE-OFF-AMT']
    
    enc = OneHotEncoder(handle_unknown='ignore')
    enc.fit(train_bureau[count_columns])
    hist = pd.DataFrame(enc.transform(sorted_bureau[count_columns]).toarray(),columns=enc.get_feature_names())

    for i in range(len(count_columns)):
        hist.columns = hist.columns.str.replace(f"x{i}","hist")
        
    hist_features = hist.columns
    
    hist['ID'] = sorted_bureau['ID']
    hist['DISBURSED-DT'] = sorted_bureau['DISBURSED-DT']
    
    hist_summary = hist.groupby(['ID','DISBURSED-DT']).sum().groupby(level=0).shift().groupby(level=0).cumsum().reset_index().fillna(0)
    loan_summary = hist.groupby(['ID','DISBURSED-DT']).size().groupby(level=0).shift().groupby(level=0).cumsum().reset_index().fillna(0)
    loan_summary.rename({0:'hist_loan'},axis=1,inplace=True)
    
    bureau_merge = sorted_bureau[select_col].merge(hist_summary,on=['ID','DISBURSED-DT'],how='left')
    bureau_merge = bureau_merge.merge(loan_summary,on=['ID','DISBURSED-DT'],how='left')
    
    data['DISBURSED-DT'] = data['DisbursalDate']
    bureau_merge = data.merge(bureau_merge,on=['ID','DISBURSED-DT'],how='left')
        
    return bureau_merge

In [7]:
bureau = clean_bureau(bureau)
bureau_merge = summarise_bureau(data,bureau,bureau)
bureau_merge.head(3)

Unnamed: 0,ID,Frequency,InstlmentMode,LoanStatus,PaymentMode,BranchID,Area,Tenure,AssetCost,AmountFinance,DisbursalAmount,EMI,DisbursalDate,MaturityDAte,AuthDate,AssetID,ManufacturerID,SupplierID,LTV,SEX,AGE,MonthlyIncome,City,State,ZiPCODE,Top-up Month,DISBURSED-DT,SELF-INDICATOR,MATCH-TYPE,ACCT-TYPE,CONTRIBUTOR-TYPE,OWNERSHIP-IND,ACCOUNT-STATUS,ASSET_CLASS,CURRENT-BAL,OVERDUE-AMT,WRITE-OFF-AMT,hist_False,hist_True,hist_PRIMARY,hist_SECONDARY,hist_Guarantor,hist_Individual,hist_Joint,hist_Primary,hist_Supl Card Holder,hist_Active,hist_Cancelled,hist_Closed,hist_Delinquent,hist_Restructured,hist_SUIT FILED (WILFUL DEFAULT),hist_Settled,hist_Sold/Purchased,hist_Suit Filed,hist_WILFUL DEFAULT,hist_Written Off,hist_loan
0,1,Monthly,Arrear,Closed,PDC_E,1,,48,450000,275000.0,275000.0,24000.0,2012-02-10,2016-01-15,2012-02-10,4022465,1568,21946,61.11,M,49.0,35833.33,RAISEN,MADHYA PRADESH,464993.0,> 48 Months,2012-02-10,True,PRIMARY,Tractor Loan,NBF,Individual,Closed,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2,Monthly,Advance,Closed,PDC,333,BHOPAL,47,485000,350000.0,350000.0,10500.0,2012-03-31,2016-02-15,2012-03-31,4681175,1062,34802,70.0,M,23.0,666.67,SEHORE,MADHYA PRADESH,466001.0,No Top-up Service,2012-03-31,True,PRIMARY,Tractor Loan,NBF,Individual,Closed,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,3,Quatrly,Arrear,Active,Direct Debit,1,,68,690000,519728.0,519728.0,38300.0,2017-06-17,2023-02-10,2017-06-17,25328146,1060,127335,69.77,M,39.0,45257.0,BHOPAL,MADHYA PRADESH,462030.0,12-18 Months,2017-06-17,True,PRIMARY,Tractor Loan,NBF,Individual,Active,,37637.0,0.0,0.0,7.0,1.0,8.0,0.0,0.0,7.0,1.0,0.0,0.0,2.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0


In [8]:
def compute_features(df):
    df['disburse_diff']= (df['AmountFinance'] != df['DisbursalAmount']).astype(int)
    df['per_loan'] = df['DisbursalAmount']/df['AssetCost']
    df['total_int']= (df['Tenure']* df['EMI']) - df['DisbursalAmount']
    df['emi_salary_per'] = df['EMI']/df['MonthlyIncome']
    df['asset_value'] = df['DisbursalAmount']/ df['LTV']
    df['emi_asset'] = df['EMI']/df['asset_value']
    df['SELF-INDICATOR'] = (df['SELF-INDICATOR'] == True).astype(int)
    return df

def hot_encode(train,data,cat_col):
    data = data.copy()
    enc = OneHotEncoder(handle_unknown='ignore')
    enc.fit(train[cat_col])
    df= pd.DataFrame(enc.transform(data[cat_col]).toarray(),columns=enc.get_feature_names())
    
    for idx,col in enumerate(cat_col):
        df.columns = df.columns.str.replace(f"x{idx}",col)
    
    for col in df.columns:
        data[col] = df[col]
    
    data.drop(cat_col,axis=1,inplace=True)
    
    return data

In [9]:
bureau_merge.shape,data.shape

((128655, 58), (128655, 27))

In [17]:
features = ['Frequency', 'InstlmentMode', 'LoanStatus', 'PaymentMode','Tenure', 'AssetCost', 'AmountFinance',
            'DisbursalAmount','EMI','LTV', 'SEX', 'AGE','MonthlyIncome', 'State',
            'SELF-INDICATOR', 'MATCH-TYPE', 'ACCT-TYPE','CONTRIBUTOR-TYPE', 'OWNERSHIP-IND', 'ACCOUNT-STATUS', 
            'ASSET_CLASS','CURRENT-BAL', 'OVERDUE-AMT', 'WRITE-OFF-AMT', 'hist_False', 'hist_True', 
            'hist_PRIMARY', 'hist_SECONDARY', 'hist_Guarantor','hist_Individual', 'hist_Joint', 'hist_Primary',
            'hist_Supl Card Holder', 'hist_Active', 'hist_Cancelled', 'hist_Closed','hist_Delinquent', 
            'hist_Restructured','hist_SUIT FILED (WILFUL DEFAULT)', 'hist_Settled','hist_Sold/Purchased', 
            'hist_Suit Filed', 'hist_WILFUL DEFAULT','hist_Written Off', 'hist_loan']

x_train = bureau_merge[features].copy()
y_train = bureau_merge['Top-up Month']
categorical_var = ['Frequency', 'InstlmentMode', 'LoanStatus', 'PaymentMode',
                   'SEX', 'State', 'MATCH-TYPE', 'ACCT-TYPE',
                    'CONTRIBUTOR-TYPE', 'OWNERSHIP-IND', 'ACCOUNT-STATUS', 'ASSET_CLASS']

x_train[categorical_var] = x_train[categorical_var].fillna('na')
x_train = compute_features(x_train)
x_train_copy = x_train.copy()
x_train = hot_encode(x_train,x_train,categorical_var)
x_train.fillna(0,inplace=True)
x_train.sample(3)

Unnamed: 0,Tenure,AssetCost,AmountFinance,DisbursalAmount,EMI,LTV,AGE,MonthlyIncome,SELF-INDICATOR,CURRENT-BAL,OVERDUE-AMT,WRITE-OFF-AMT,hist_False,hist_True,hist_PRIMARY,hist_SECONDARY,hist_Guarantor,hist_Individual,hist_Joint,hist_Primary,hist_Supl Card Holder,hist_Active,hist_Cancelled,hist_Closed,hist_Delinquent,hist_Restructured,hist_SUIT FILED (WILFUL DEFAULT),hist_Settled,hist_Sold/Purchased,hist_Suit Filed,hist_WILFUL DEFAULT,hist_Written Off,hist_loan,disburse_diff,per_loan,total_int,emi_salary_per,asset_value,emi_asset,Frequency_BI-Monthly,Frequency_Half Yearly,Frequency_Monthly,Frequency_Quatrly,InstlmentMode_Advance,InstlmentMode_Arrear,LoanStatus_Active,LoanStatus_Closed,PaymentMode_Auto Debit,PaymentMode_Billed,PaymentMode_Cheque,PaymentMode_Direct Debit,PaymentMode_ECS,PaymentMode_ECS Reject,PaymentMode_Escrow,PaymentMode_PDC,PaymentMode_PDC Reject,PaymentMode_PDC_E,PaymentMode_SI Reject,SEX_F,SEX_M,SEX_na,State_ANDHRA PRADESH,State_ASSAM,State_BIHAR,State_CHANDIGARH,State_CHATTISGARH,State_DADRA AND NAGAR HAVELI,State_DELHI,State_GUJARAT,State_HARYANA,State_HIMACHAL PRADESH,State_JHARKHAND,State_KARNATAKA,State_MADHYA PRADESH,State_MAHARASHTRA,State_ORISSA,State_PUNJAB,State_RAJASTHAN,State_TAMIL NADU,State_TELANGANA,State_UTTAR PRADESH,State_UTTARAKHAND,State_WEST BENGAL,MATCH-TYPE_PRIMARY,MATCH-TYPE_na,ACCT-TYPE_Auto Loan (Personal),ACCT-TYPE_Business Loan General,ACCT-TYPE_Business Loan Priority Sector Agriculture,ACCT-TYPE_Business Loan Priority Sector Small Business,ACCT-TYPE_Commercial Vehicle Loan,ACCT-TYPE_Construction Equipment Loan,ACCT-TYPE_Consumer Loan,ACCT-TYPE_Credit Card,ACCT-TYPE_Education Loan,ACCT-TYPE_Gold Loan,ACCT-TYPE_JLG Individual,ACCT-TYPE_Kisan Credit Card,ACCT-TYPE_Loan Against Bank Deposits,ACCT-TYPE_Other,ACCT-TYPE_Overdraft,ACCT-TYPE_Personal Loan,ACCT-TYPE_Tractor Loan,ACCT-TYPE_Two-Wheeler Loan,ACCT-TYPE_na,CONTRIBUTOR-TYPE_COP,CONTRIBUTOR-TYPE_MFI,CONTRIBUTOR-TYPE_NAB,CONTRIBUTOR-TYPE_NBF,CONTRIBUTOR-TYPE_PRB,CONTRIBUTOR-TYPE_RRB,CONTRIBUTOR-TYPE_na,OWNERSHIP-IND_Guarantor,OWNERSHIP-IND_Individual,OWNERSHIP-IND_Joint,OWNERSHIP-IND_Primary,OWNERSHIP-IND_na,InstlmentMode0_Active,InstlmentMode0_Closed,InstlmentMode0_Delinquent,InstlmentMode0_Suit Filed,InstlmentMode0_na,InstlmentMode1_Doubtful,InstlmentMode1_Special Mention Account,InstlmentMode1_Standard,InstlmentMode1_na
116333,36,493000,350000.0,350000.0,35700.0,70.99,58.0,50000.0,1,160802.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.709939,935200.0,0.714,4930.271869,7.24098,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
40290,60,610000,530000.0,530000.0,86000.0,72.79,25.0,33333.33,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.868852,4630000.0,2.58,7281.219948,11.811208,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
67104,42,520000,326490.0,326490.0,62700.0,50.73,27.0,53666.67,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.627865,2306910.0,1.168323,6435.836783,9.742323,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


In [11]:
k_train = class_weight.compute_class_weight("balanced", np.unique(y_train), y_train)
wt = dict(zip(np.unique(y_train), k_train))
w_array = y_train.map(wt)
w_array = w_array.values

In [12]:
xgb = XGBClassifier(learning_rate = 0.1,n_estimators=10,max_depth=5,min_child_weight=1,
                     gamma=0,subsample=1, colsample_bytree=1,objective= 'multi:softmax',
                     nthread=4,seed=27)
xgb.fit(x_train,y_train, eval_metric='auc', sample_weight=w_array)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.1, max_delta_step=0, max_depth=5,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=1000, n_jobs=4, nthread=4, num_parallel_tree=1,
              objective='multi:softprob', random_state=27, reg_alpha=0,
              reg_lambda=1, scale_pos_weight=None, seed=27, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

In [13]:
train_predict = xgb.predict(x_train)
train_predict = np.squeeze(train_predict)
print(classification_report(y_train, train_predict))

                   precision    recall  f1-score   support

      > 48 Months       0.23      0.86      0.36      8366
     12-18 Months       0.28      1.00      0.44      1034
     18-24 Months       0.25      0.96      0.40      2368
     24-30 Months       0.30      0.89      0.45      3492
     30-36 Months       0.29      0.87      0.44      3062
     36-48 Months       0.25      0.86      0.38      3656
No Top-up Service       0.98      0.48      0.65    106677

         accuracy                           0.55    128655
        macro avg       0.37      0.85      0.45    128655
     weighted avg       0.86      0.55      0.60    128655



In [14]:
test_data_path = Path(Path.cwd(),'Test','test_Data.xlsx')
test_bureau_path = Path(Path.cwd(),'Test','test_bureau.xlsx')

test_data = pd.read_excel(test_data_path)
test_bureau = pd.read_excel(test_bureau_path)

test_bureau = clean_bureau(test_bureau)
test_bureau_merge = summarise_bureau(test_data,test_bureau,bureau)

x_test = test_bureau_merge[features].copy()
x_test[categorical_var] = x_test[categorical_var].fillna('na')
x_test = compute_features(x_test)
x_test = hot_encode(x_train_copy,x_test,categorical_var)
x_test.fillna(0,inplace=True)
x_test.head(3)

Unnamed: 0,Tenure,AssetCost,AmountFinance,DisbursalAmount,EMI,LTV,AGE,MonthlyIncome,SELF-INDICATOR,CURRENT-BAL,OVERDUE-AMT,WRITE-OFF-AMT,hist_False,hist_True,hist_PRIMARY,hist_SECONDARY,hist_Guarantor,hist_Individual,hist_Joint,hist_Primary,hist_Supl Card Holder,hist_Active,hist_Cancelled,hist_Closed,hist_Delinquent,hist_Restructured,hist_SUIT FILED (WILFUL DEFAULT),hist_Settled,hist_Sold/Purchased,hist_Suit Filed,hist_WILFUL DEFAULT,hist_Written Off,hist_loan,disburse_diff,per_loan,total_int,emi_salary_per,asset_value,emi_asset,Frequency_BI-Monthly,Frequency_Half Yearly,Frequency_Monthly,Frequency_Quatrly,InstlmentMode_Advance,InstlmentMode_Arrear,LoanStatus_Active,LoanStatus_Closed,PaymentMode_Auto Debit,PaymentMode_Billed,PaymentMode_Cheque,PaymentMode_Direct Debit,PaymentMode_ECS,PaymentMode_ECS Reject,PaymentMode_Escrow,PaymentMode_PDC,PaymentMode_PDC Reject,PaymentMode_PDC_E,PaymentMode_SI Reject,SEX_F,SEX_M,SEX_na,State_ANDHRA PRADESH,State_ASSAM,State_BIHAR,State_CHANDIGARH,State_CHATTISGARH,State_DADRA AND NAGAR HAVELI,State_DELHI,State_GUJARAT,State_HARYANA,State_HIMACHAL PRADESH,State_JHARKHAND,State_KARNATAKA,State_MADHYA PRADESH,State_MAHARASHTRA,State_ORISSA,State_PUNJAB,State_RAJASTHAN,State_TAMIL NADU,State_TELANGANA,State_UTTAR PRADESH,State_UTTARAKHAND,State_WEST BENGAL,MATCH-TYPE_PRIMARY,MATCH-TYPE_na,ACCT-TYPE_Auto Loan (Personal),ACCT-TYPE_Business Loan General,ACCT-TYPE_Business Loan Priority Sector Agriculture,ACCT-TYPE_Business Loan Priority Sector Small Business,ACCT-TYPE_Commercial Vehicle Loan,ACCT-TYPE_Construction Equipment Loan,ACCT-TYPE_Consumer Loan,ACCT-TYPE_Credit Card,ACCT-TYPE_Education Loan,ACCT-TYPE_Gold Loan,ACCT-TYPE_JLG Individual,ACCT-TYPE_Kisan Credit Card,ACCT-TYPE_Loan Against Bank Deposits,ACCT-TYPE_Other,ACCT-TYPE_Overdraft,ACCT-TYPE_Personal Loan,ACCT-TYPE_Tractor Loan,ACCT-TYPE_Two-Wheeler Loan,ACCT-TYPE_na,CONTRIBUTOR-TYPE_COP,CONTRIBUTOR-TYPE_MFI,CONTRIBUTOR-TYPE_NAB,CONTRIBUTOR-TYPE_NBF,CONTRIBUTOR-TYPE_PRB,CONTRIBUTOR-TYPE_RRB,CONTRIBUTOR-TYPE_na,OWNERSHIP-IND_Guarantor,OWNERSHIP-IND_Individual,OWNERSHIP-IND_Joint,OWNERSHIP-IND_Primary,OWNERSHIP-IND_na,InstlmentMode0_Active,InstlmentMode0_Closed,InstlmentMode0_Delinquent,InstlmentMode0_Suit Filed,InstlmentMode0_na,InstlmentMode1_Doubtful,InstlmentMode1_Special Mention Account,InstlmentMode1_Standard,InstlmentMode1_na
0,46,480000,365000.0,365000.0,1000.0,75.83,50.0,32069.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.760417,-319000.0,0.031183,4813.398391,0.207753,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
1,45,480000,285000.0,285000.0,9300.0,57.44,35.0,25000.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.59375,133500.0,0.372,4961.699164,1.874358,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
2,48,580000,400000.0,400000.0,35800.0,68.97,37.0,23333.33,1,0.0,0.0,0.0,6.0,1.0,7.0,0.0,2.0,4.0,1.0,0.0,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0,0,0.689655,1318400.0,1.534286,5799.623025,6.172815,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


In [15]:
x_test.shape,test_data.shape

((14745, 125), (14745, 26))

In [16]:
test_predict = xgb.predict(x_test)
test_predict = np.squeeze(test_predict)
test_data['Top-up Month'] = test_predict
test_data[['ID','Top-up Month']].to_csv('submit_2.csv',index=False)