# Applying for a Loan

## Goal

Another area where data science and machine learning play a huge role is in choosing if granting a loan. This is a particularly hot field as many start-ups feel that bank loan models can be improved. Therefore, there is space to come up with better loaning strategies that can benefit both the lender and the borrower.

In this challenge, you will have access to loan data from a bank and will have to improve their model.


## Challenge Description

We have access to a specific bank loan data. We have data about all loans asked to the bank, whether the bank decided to grant it and, finally, whether the borrower managed to repay it. We also have info about the person asking for the loan at the moment she is asking for the loan.

You have to come up with a better strategy to grant loans. Specifically you should:

- Build a model which is better than the bank model. Assume that:

If you grant the loan and the it doesn’t get repaid, you lose 1.

If you grant the loan and the it does get repaid, you gain 1

If you don’t grant the loan, you gain 0.

- Using the rules above, compare bank profitability vs your model profitability.


- Describe the impact of the most important variables on the prediction. Also, focus on the variable “is_employed”, which describes whether the borrower is employed when she asks for the loan. How does this variable impact the model? Explain why.


- Are there any other variables you’d like to include in the model?


In [3]:
ls

12_Applying_for_a_loan.ipynb  loan_table.csv
borrower_table.csv


In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [5]:
df_loan = pd.read_csv('loan_table.csv')

print(df_loan.shape)
df_loan.head(10)

(101100, 5)


Unnamed: 0,loan_id,loan_purpose,date,loan_granted,loan_repaid
0,19454,investment,2012-03-15,0,
1,496811,investment,2012-01-17,0,
2,929493,other,2012-02-09,0,
3,580653,other,2012-06-27,1,1.0
4,172419,business,2012-05-21,1,0.0
5,77085,other,2012-08-31,0,
6,780070,business,2012-03-14,1,1.0
7,303138,emergency_funds,2012-08-31,1,0.0
8,91475,investment,2012-05-25,1,1.0
9,422392,business,2012-10-25,0,


- loan_id : the id of the loan. Unique by loan. Can be joined to loan id in the other table
- loan_purpose : the reason for asking the loan: investment, other, business, emergency_funds, home
- date : when the loan was asked
- loan_granted : whether the loan was granted
- loan_repaid : whether the loan was repaid. NA means that the loan was not granted

In [6]:
df_borrow = pd.read_csv('borrower_table.csv')

print(df_borrow.shape)
df_borrow.head(10)

(101100, 12)


Unnamed: 0,loan_id,is_first_loan,fully_repaid_previous_loans,currently_repaying_other_loans,total_credit_card_limit,avg_percentage_credit_card_limit_used_last_year,saving_amount,checking_amount,is_employed,yearly_salary,age,dependent_number
0,289774,1,,,8000,0.49,3285,1073,0,0,47,3
1,482590,0,1.0,0.0,4500,1.03,636,5299,1,13500,33,1
2,135565,1,,,6900,0.82,2085,3422,1,24500,38,8
3,207797,0,1.0,0.0,1200,0.82,358,3388,0,0,24,1
4,828078,0,0.0,0.0,6900,0.8,2138,4282,1,18100,36,1
5,423171,1,,,6100,0.53,6163,5298,1,29500,24,1
6,568977,1,,,600,0.89,305,1456,0,0,50,2
7,200139,1,,,4000,0.57,602,2757,1,31700,36,8
8,991294,0,1.0,0.0,7000,0.52,2575,2917,1,58900,33,3
9,875332,0,1.0,0.0,4300,0.83,722,892,1,5400,32,7


- loan_id : the id of the the loan. Unique by loan. Can be joined to loan id in the other table
- is_first_loan : did she ask for any other loans in her lifetime?
- fully_repaid_previous_loans : did she pay on time all of her previous loans? If this is the first loan, it is NA
- currently_repaying_other_loans : is she currently repaying any other loans? If this is the first loan, it is NA
- total_credit_card_limit : total credit card monthly limit
- avg_percentage_credit_card_limit_used_last_year : on an average, how much did she use of her credit card limit in the previous 12 months. This number can be >1 since it is possible to go above the credit card limit
- saving_amount : total saving amount balance when she asked for the loan
- checking_amount : total checking amount balance when she asked for the loan
- is_employed : whether she is employed (1) or not (0)
- yearly_salary : how much she earned in the previous year
- age : her age
- dependent_number : number of people she claims as dependent

In [7]:
df_loan.dtypes

loan_id           int64
loan_purpose     object
date             object
loan_granted      int64
loan_repaid     float64
dtype: object

In [8]:
df_loan.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101100 entries, 0 to 101099
Data columns (total 5 columns):
loan_id         101100 non-null int64
loan_purpose    101100 non-null object
date            101100 non-null object
loan_granted    101100 non-null int64
loan_repaid     47654 non-null float64
dtypes: float64(1), int64(2), object(2)
memory usage: 3.9+ MB


## Question I

Using the rules above, compare bank profitability vs your model profitability.

In [12]:
df_test = df_loan[['loan_granted','loan_repaid']][df_loan.loan_repaid.notnull()]
df_test.head()

Unnamed: 0,loan_granted,loan_repaid
3,1,1.0
4,1,0.0
6,1,1.0
7,1,0.0
8,1,1.0


In [21]:
df_test.loan_granted.value_counts()

1    47654
Name: loan_granted, dtype: int64

In [22]:
df_test.loan_repaid.value_counts()

1.0    30706
0.0    16948
Name: loan_repaid, dtype: int64

In [24]:
len(df_test[df_test.loan_repaid == 1]) - len(df_test[df_test.loan_repaid == 0])

13758