#### This notebook attempts to evaluate the merit of using the Bank Login score to identify "good" / "bad" customers.

As the Bank Login score already accounts for a lot of factors (~6k attributes!) we will use the "IsFirstDefault" Parameter to judge its correlation with loan outcomes.

In [628]:
import pymysql
import query
import pandas as pd
import seaborn as sns
import sidetable

In [629]:
df = pd.read_csv("/home/vishal/ftp_files_csv/bt_attr.csv",usecols=["setting_app_id","blp_score"])

In [630]:
df.head()

Unnamed: 0,setting_app_id,blp_score
0,10660872622,298
1,10661291196,254
2,10662080444,295
3,10662493120,298
4,10664419139,227


In [631]:
scoredf=df.rename(columns={"setting_app_id": "LoanId", "blp_score": "score"})

In [632]:
scoredf = scoredf.drop_duplicates()

In [633]:
scoredf.shape

(41768, 2)

### Query the DB in the date range '2019-01-01' - '2020-04-30'

In [634]:
dbquerydf = query.iloans("SELECT * FROM view_FCL_Loan WHERE OriginationDate between '2019-01-01' AND '2020-04-30'")

In [635]:
dbdf=dbquerydf[["LoanId","IsFirstDefault"]]

In [636]:
view_df = pd.merge(scoredf, dbdf, how ='left', on ='LoanId') 

In [637]:
view_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 41768 entries, 0 to 41767
Data columns (total 3 columns):
LoanId            41768 non-null int64
score             41768 non-null int64
IsFirstDefault    3859 non-null object
dtypes: int64(2), object(1)
memory usage: 1.3+ MB


In [638]:
view_df.head()

Unnamed: 0,LoanId,score,IsFirstDefault
0,10660872622,298,
1,10661291196,254,
2,10662080444,295,
3,10662493120,298,False
4,10664419139,227,


### Get Lender Approved Loans Data

In [639]:
query_lender_approved='''
select LoanId, 
LoanPrincipal AS ApprovedLoanAmount,
LoanStatus AS LenderApproved 
from view_FCL_Loan_History
where LoanStatus = 'Lender Approved' 
and TimeAdded >= '2019-01-01'
AND TimeAdded <= '2020-04-30'
ORDER BY LoanId , TimeAdded DESC
'''

In [640]:
dblender_approved_df = query.iloans(query_lender_approved)

This query returns duplicate IDs. Drop them!

In [641]:
dblender_approved_df = dblender_approved_df.drop_duplicates()

#### Get funded loans in the date range

In [642]:
query_funded_loans ='''
SELECT 
    LoanId,
    (CASE WHEN LoanId IS NOT NULL THEN 1 ELSE 0 END) AS IsFunded
    FROM
    view_FCL_Loan 
    WHERE LeadTimeAdded >= '2019-01-01'
    AND LeadTimeAdded <= '2020-04-30'
    AND MerchantId IN (15,18)
'''

In [643]:
fundedloansdf = query.iloans(query_funded_loans)

In [644]:
fundedloansdf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12200 entries, 0 to 12199
Data columns (total 2 columns):
LoanId      12200 non-null float64
IsFunded    12200 non-null int64
dtypes: float64(1), int64(1)
memory usage: 190.7 KB


### Bin Data by Deciles and Remap LenderApproved to 0's/1's

In [645]:
view2_df = pd.merge(view_df, dblender_approved_df, how ='left', on ='LoanId')
view2_df = pd.merge(view2_df, fundedloansdf, how ='left', on ='LoanId')
view2_df["IsFirstDefault"]*=1
view2_df['Decile_rank'] = pd.qcut(view2_df['score'], 10,labels = False) 
view2_df['ApprovedLoanAmount'] = view2_df['ApprovedLoanAmount'].fillna(0)
view2_df["LenderApproved"] = view2_df["LenderApproved"].replace(to_replace ="Lender Approved", value =1) 
view2_df["LenderApproved"] = view2_df["LenderApproved"].fillna(0)

In [646]:
view2_df.head()

Unnamed: 0,LoanId,score,IsFirstDefault,ApprovedLoanAmount,LenderApproved,IsFunded,Decile_rank
0,10660872622,298,,0.0,0.0,,9
1,10661291196,254,,0.0,0.0,,0
2,10662080444,295,,0.0,0.0,,7
3,10662493120,298,0.0,300.0,1.0,1.0,9
4,10664419139,227,,0.0,0.0,,0


In [647]:
view2_df.describe()

Unnamed: 0,LoanId,score,ApprovedLoanAmount,LenderApproved,IsFunded,Decile_rank
count,41834.0,41834.0,41834.0,41834.0,3988.0,41834.0
mean,53104440000.0,282.720921,42.289755,0.120643,1.0,4.297246
std,25844940000.0,22.49991,131.811567,0.325716,0.0,2.725292
min,10660110000.0,108.0,0.0,0.0,1.0,0.0
25%,30669180000.0,280.0,0.0,0.0,1.0,2.0
50%,51669920000.0,290.0,0.0,0.0,1.0,4.0
75%,74681850000.0,295.0,0.0,0.0,1.0,7.0
max,99683660000.0,299.0,5000.0,1.0,1.0,9.0


In [648]:
view2_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 41834 entries, 0 to 41833
Data columns (total 7 columns):
LoanId                41834 non-null int64
score                 41834 non-null int64
IsFirstDefault        3915 non-null object
ApprovedLoanAmount    41834 non-null float64
LenderApproved        41834 non-null float64
IsFunded              3988 non-null float64
Decile_rank           41834 non-null int64
dtypes: float64(3), int64(3), object(1)
memory usage: 2.6+ MB


In [649]:
view2_df=view2_df.drop_duplicates()

### Calculate Loans approved,First Defaults and Total Loans per Decile

In [650]:
dropnaview2 = view2_df.dropna()
sums1 = []
sums = []
counts = []
fundedcounts = []
for x in range(0,10):
    sums.append(view2_df[view2_df["Decile_rank"]==x].sum()["LenderApproved"])
    counts.append(view2_df[view2_df["Decile_rank"]==x].count()["LenderApproved"])
    sums1.append(dropnaview2[dropnaview2["Decile_rank"]==x].sum()["IsFirstDefault"])
    fundedcounts.append(dropnaview2[dropnaview2["Decile_rank"]==x].sum()["IsFunded"])

In [651]:
deciledf = pd.DataFrame(list(zip(range(0,10),sums,sums1,counts,fundedcounts)), columns =['Decile','LenderApprovedCount','IsFirstDefaultCount','TotalLoansCount','TotalFunded']) 

### Calculate Percentage of Defaulters per Decile

In [652]:
avg = []
for x in range(0,10):
    avg.append((deciledf['IsFirstDefaultCount'][x]/deciledf['TotalLoansCount'][x])*100)

In [653]:
deciledf["PercentDefault"]=avg

In [654]:
deciledf

Unnamed: 0,Decile,LenderApprovedCount,IsFirstDefaultCount,TotalLoansCount,TotalFunded,PercentDefault
0,0,604.0,418.0,4202,499.0,9.947644
1,1,236.0,11.0,4504,90.0,0.244227
2,2,265.0,14.0,4127,145.0,0.339229
3,3,362.0,9.0,4308,250.0,0.208914
4,4,516.0,13.0,4682,391.0,0.277659
5,5,496.0,27.0,3940,378.0,0.685279
6,6,702.0,28.0,4907,586.0,0.570613
7,7,924.0,34.0,5730,768.0,0.593368
8,8,451.0,17.0,2797,387.0,0.607794
9,9,491.0,9.0,2637,421.0,0.341297


### Calculate Cumulative Fraction of Defaulters per Decile

In [655]:
templist = []
sum = 0
for x in range(0,10):
    sum += (deciledf["IsFirstDefaultCount"][x]/deciledf["IsFirstDefaultCount"].sum())*100
    templist.append(sum)

In [676]:
deciledf["CumulativePercent"]= templist

In [677]:
deciledf["LenderPercent"]=(deciledf["LenderApprovedCount"]/deciledf["TotalLoansCount"])*100

In [678]:
deciledf["FundedPercent"]=(deciledf["TotalFunded"]/deciledf["TotalLoansCount"])*100

In [679]:
testdeciledf=sorted(set(pd.qcut(view2_df['score'], 10,retbins= True)[0]))

In [680]:
randomlist = [0,1,2,3,4,5,6,7,8,9]

In [681]:
testdfdecile = pd.DataFrame(data = randomlist, columns = ["Decile"])

In [682]:
testdfdecile["Bins"] = testdeciledf

## Preparing the final table

In [683]:
finaldf = pd.merge(deciledf, testdfdecile, how ='left', on ='Decile') 

In [684]:
cols = list(finaldf.columns.values)

In [685]:
cols = ['Bins','Decile','TotalLoansCount','LenderApprovedCount','LenderPercent','TotalFunded','FundedPercent','IsFirstDefaultCount','PercentDefault','CumulativePercent']

In [686]:
finaldf=finaldf.reindex(columns=cols)
finaldf.columns = ['Score Range', 'Decile','NumberOfApps',"AppsApproved","FractionAppsApproved","AppsOriginated","FractionOriginated","AppsFirstDefault","FractionDefault","CumulativeDefaultFraction"]

In [687]:
finaldf.round(1)

Unnamed: 0,Score Range,Decile,NumberOfApps,AppsApproved,FractionAppsApproved,AppsOriginated,FractionOriginated,AppsFirstDefault,FractionDefault,CumulativeDefaultFraction
0,"(107.999, 262.0]",0,4202,604.0,14.4,499.0,11.9,418.0,9.9,72.1
1,"(262.0, 277.0]",1,4504,236.0,5.2,90.0,2.0,11.0,0.2,74.0
2,"(277.0, 283.0]",2,4127,265.0,6.4,145.0,3.5,14.0,0.3,76.4
3,"(283.0, 287.0]",3,4308,362.0,8.4,250.0,5.8,9.0,0.2,77.9
4,"(287.0, 290.0]",4,4682,516.0,11.0,391.0,8.4,13.0,0.3,80.2
5,"(290.0, 292.0]",5,3940,496.0,12.6,378.0,9.6,27.0,0.7,84.8
6,"(292.0, 294.0]",6,4907,702.0,14.3,586.0,11.9,28.0,0.6,89.7
7,"(294.0, 296.0]",7,5730,924.0,16.1,768.0,13.4,34.0,0.6,95.5
8,"(296.0, 297.0]",8,2797,451.0,16.1,387.0,13.8,17.0,0.6,98.4
9,"(297.0, 299.0]",9,2637,491.0,18.6,421.0,16.0,9.0,0.3,100.0


In [668]:
print("Totals:")
finaldf[["NumberOfApps","AppsApproved","AppsOriginated","AppsFirstDefault"]].sum()

Totals:


NumberOfApps        41834.0
AppsApproved         5047.0
AppsOriginated       3915.0
AppsFirstDefault      580.0
dtype: float64

## Key Takeaways

1. Loan Applicants with a BL score of <260 are much more likely to default on their first payment.(72.1% of defaults across all loans!) 

2. Conversely, Applicants with a score of >260 are quite unlikely to default on their first payment.

Keeping the above two points in mind, we can safely say that the Bank Login score and the quality of application are <b>strongly correlated.</b>. 

Another interesting thing to note:

3. 89.1% of the Loan Applicants have a BL score of above 260 which account for only 27.7% of all first payment defaults.

### Opportunities Presented

1. A lot of applications with a BL score above 260 are not funded as often as the ones below 260. Maximising funding loans in this bracket will improve the first payment default rates.

2. Integrating the BL score should expedite the approvals process, improving the amount of loans funded in the higher deciles.

### Caveats

1. Due to the distribution of loans skewing heavily towards a BL score of 260+, a new question arises, "How many of these were in the A1/A2 categories in the BV+ score?"

It could be the case that most of these loans could fall into the A1/A2 Categories, and BL score wouldn't apply at all. In that case, we'll need to rethink how to use this score.