#### This notebook attempts to evaluate the merit of using the Bank Login score to identify "good" / "bad" customers.

As the Bank Login score already accounts for a lot of factors (~6k attributes!) we will use the "IsFirstDefault" Parameter to judge its correlation with loan outcomes.

In [375]:
import pymysql
import query
import pandas as pd
import seaborn as sns
import sidetable

In [384]:
df = pd.read_csv("/home/vishal/ftp_files_csv/bt_attr.csv",usecols=["setting_app_id","blp_score"])

In [387]:
df.shape

(42484, 2)

In [None]:
scoredf=df.rename(columns={"setting_app_id": "LoanId", "blp_score": "score"})

### Query the DB in the date range '2019-01-01' - '2020-04-30'

In [338]:
dbquerydf = query.iloans("SELECT * FROM view_FCL_Loan WHERE OriginationDate between '2019-01-01' AND '2020-04-30'")

In [None]:
dbquerydf[["LoanId","IsFirstDefault","OriginationDate"]].info()

In [340]:
dbdf=dbquerydf[["LoanId","IsFirstDefault"]]

In [389]:
dbdf.shape

(16507, 2)

In [342]:
view_df = pd.merge(scoredf, dbdf, how ='left', on ='LoanId') 

In [None]:
view_df.info()

In [None]:
view_df.head()

### Get Lender Approved Loans Data

In [345]:
query_lender_approved='''
select LoanId, 
LoanPrincipal AS ApprovedLoanAmount,
LoanStatus AS LenderApproved 
from view_FCL_Loan_History
where LoanStatus = 'Lender Approved' 
and TimeAdded >= '2019-01-01'
AND TimeAdded <= '2020-04-30'
ORDER BY LoanId , TimeAdded DESC
'''

In [346]:
dblender_approved_df = query.iloans(query_lender_approved)

In [391]:
dblender_approved_df.shape

(80407, 3)

### Bin Data by Deciles and Remap LenderApproved to 0's/1's

In [414]:
view2_df = pd.merge(view_df, dblender_approved_df, how ='left', on ='LoanId') 
view2_df["IsFirstDefault"]*=1
view2_df['Decile_rank'] = pd.qcut(view2_df['score'], 10,labels = False) 
view2_df['ApprovedLoanAmount'] = view2_df['ApprovedLoanAmount'].fillna(0)
view2_df["LenderApproved"] = view2_df["LenderApproved"].replace(to_replace ="Lender Approved", value =1) 
view2_df["LenderApproved"] = view2_df["LenderApproved"].fillna(0)

In [None]:
view2_df.head()

In [None]:
view2_df.describe()

In [None]:
view2_df.info()

In [418]:
view2_df=view2_df.drop_duplicates()

### Calculate Loans approved,First Defaults and Total Loans per Decile

In [419]:
dropnaview2 = view2_df.dropna()
sums1 = []
sums = []
counts = []
for x in range(0,10):
    sums.append(view2_df[view2_df["Decile_rank"]==x].sum()["LenderApproved"])
    counts.append(view2_df[view2_df["Decile_rank"]==x].count()["LenderApproved"])
    sums1.append(dropnaview2[dropnaview2["Decile_rank"]==x].sum()["IsFirstDefault"])

In [420]:
deciledf = pd.DataFrame(list(zip(range(0,10),sums,sums1,counts)), columns =['Decile','LenderApprovedCount','IsFirstDefaultCount','TotalLoansCount']) 

### Calculate Percentage of Defaulters per Decile

In [421]:
avg = []
for x in range(0,10):
    avg.append((deciledf['IsFirstDefaultCount'][x]/deciledf['TotalLoansCount'][x])*100)

In [422]:
deciledf["PercentDefault"]=avg

### Calculate Cumulative Fraction of Defaulters per Decile

In [423]:
templist = []
sum = 0
for x in range(0,10):
    sum += (deciledf["IsFirstDefaultCount"][x]/deciledf["IsFirstDefaultCount"].sum())*100
    templist.append(sum)

In [424]:
deciledf["CumulativePercent"]= templist

In [425]:
deciledf["LenderPercent"]=(deciledf["LenderApprovedCount"]/deciledf["TotalLoansCount"])*100

In [426]:
testdeciledf=sorted(set(pd.qcut(view2_df['score'], 10,retbins= True)[0]))

In [427]:
randomlist = [0,1,2,3,4,5,6,7,8,9]

In [428]:
testdfdecile = pd.DataFrame(data = randomlist, columns = ["Decile"])

In [429]:
testdfdecile["Bins"] = testdeciledf

## Preparing the final table

In [430]:
finaldf = pd.merge(deciledf, testdfdecile, how ='left', on ='Decile') 

In [431]:
cols = list(finaldf.columns.values)

In [432]:
cols = ['Bins','Decile','TotalLoansCount','LenderApprovedCount','LenderPercent','IsFirstDefaultCount','PercentDefault','CumulativePercent']

In [433]:
finaldf=finaldf.reindex(columns=cols)
finaldf.columns = ['Score Range', 'Decile','NumberOfApps',"AppsApproved","FractionAppsApproved","AppsFirstDefault","FractionDefault","CumulativeDefaultFraction"]

In [434]:
finaldf.round(1)

Unnamed: 0,Score Range,Decile,NumberOfApps,AppsApproved,FractionAppsApproved,AppsFirstDefault,FractionDefault,CumulativeDefaultFraction
0,"(107.999, 262.0]",0,3897,591.0,15.2,418.0,10.7,72.1
1,"(262.0, 277.0]",1,5379,289.0,5.4,13.0,0.2,74.3
2,"(277.0, 283.0]",2,4456,298.0,6.7,13.0,0.3,76.6
3,"(283.0, 287.0]",3,4821,444.0,9.2,11.0,0.2,78.4
4,"(287.0, 290.0]",4,5087,577.0,11.3,22.0,0.4,82.2
5,"(290.0, 292.0]",5,4467,599.0,13.4,28.0,0.6,87.1
6,"(292.0, 294.0]",6,2563,383.0,14.9,15.0,0.6,89.7
7,"(294.0, 296.0]",7,5730,924.0,16.1,34.0,0.6,95.5
8,"(296.0, 297.0]",8,2797,451.0,16.1,17.0,0.6,98.4
9,"(297.0, 299.0]",9,2637,491.0,18.6,9.0,0.3,100.0


## Key Takeaways

1. Loan Applicants with a BL score of <260 are much more likely to default on their first payment.(72.1% of defaults across all loans!) 

2. Conversely, Applicants with a score of >260 are quite unlikely to default on their first payment.

Keeping the above two points in mind, we can safely say that the Bank Login score and the quality of application are <b>strongly correlated.</b>. 

Another interesting thing to note:

3. 89.1% of the Loan Applicants have a BL score of above 260 which account for only 27.7% of all first payment defaults.

### Opportunities Presented

1. A lot of applications with a BL score above 260 are not funded as often as the ones below 260. Maximising funding loans in this bracket will improve the first payment default rates.

2. Integrating the BL score should expedite the approvals process, improving the amount of loans funded in the higher deciles.

### Caveats

1. Due to the distribution of loans skewing heavily towards a BL score of 260+, a new question arises, "How many of these were in the A1/A2 categories in the BV+ score?"

It could be the case that most of these loans could fall into the A1/A2 Categories, and BL score wouldn't apply at all. In that case, we'll need to rethink how to use this score.