<h1 style="color:#1a8cff;"><center>Lending Club Case Study</center></h1>

![Lending%20Club.png](attachment:Lending%20Club.png)
### Problem Statement:
When the company receives a loan application, the company has to make a decision for loan approval based on the applicant’s profile. Two types of risks are associated with the bank’s decision:
1. If the applicant is likely to repay the loan, then not approving the loan results in a loss of business to the company
2. If the applicant is not likely to repay the loan, i.e. he/she is likely to default, then approving the loan may lead to a financial loss for the company



![Loan.png](attachment:Loan.png)
### 4 major parts covered in this case study:
1. Data Understanding
2. Data Cleaning (cleaning missing values, removing redundant columns etc.) 
3. Data Analysis
4. Recommendations

In [20]:
#Import the relevant libraries and IGNORE the warnings
import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# 1. Data Understanding

In [21]:
#Read the dataset, check the information about the dataframe and display first 10 rows
loan_df = pd.read_csv('loan.csv')
loan_df.info()
print("\nNumber of Rows & Columns: ", loan_df.shape)
loan_df.head(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39717 entries, 0 to 39716
Columns: 111 entries, id to total_il_high_credit_limit
dtypes: float64(74), int64(13), object(24)
memory usage: 33.6+ MB

Number of Rows & Columns:  (39717, 111)


Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,...,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit
0,1077501,1296599,5000,5000,4975.0,36 months,10.65%,162.87,B,B2,...,,,,,0.0,0.0,,,,
1,1077430,1314167,2500,2500,2500.0,60 months,15.27%,59.83,C,C4,...,,,,,0.0,0.0,,,,
2,1077175,1313524,2400,2400,2400.0,36 months,15.96%,84.33,C,C5,...,,,,,0.0,0.0,,,,
3,1076863,1277178,10000,10000,10000.0,36 months,13.49%,339.31,C,C1,...,,,,,0.0,0.0,,,,
4,1075358,1311748,3000,3000,3000.0,60 months,12.69%,67.79,B,B5,...,,,,,0.0,0.0,,,,
5,1075269,1311441,5000,5000,5000.0,36 months,7.90%,156.46,A,A4,...,,,,,0.0,0.0,,,,
6,1069639,1304742,7000,7000,7000.0,60 months,15.96%,170.08,C,C5,...,,,,,0.0,0.0,,,,
7,1072053,1288686,3000,3000,3000.0,36 months,18.64%,109.43,E,E1,...,,,,,0.0,0.0,,,,
8,1071795,1306957,5600,5600,5600.0,60 months,21.28%,152.39,F,F2,...,,,,,0.0,0.0,,,,
9,1071570,1306721,5375,5375,5350.0,60 months,12.69%,121.45,B,B5,...,,,,,0.0,0.0,,,,


#### Comment:
**Loan** dataframe contains __111 features__ and __39717 records__
(Out of which **74** features are *float64*, **13** features are *integer*, and **24** features are *object* datatype respectively)

In [27]:
# Exploring the feature/column names in the dataframe
list(loan_df.columns)

['id',
 'member_id',
 'loan_amnt',
 'funded_amnt',
 'funded_amnt_inv',
 'term',
 'int_rate',
 'installment',
 'grade',
 'sub_grade',
 'emp_title',
 'emp_length',
 'home_ownership',
 'annual_inc',
 'verification_status',
 'issue_d',
 'loan_status',
 'pymnt_plan',
 'url',
 'desc',
 'purpose',
 'title',
 'zip_code',
 'addr_state',
 'dti',
 'delinq_2yrs',
 'earliest_cr_line',
 'inq_last_6mths',
 'mths_since_last_delinq',
 'mths_since_last_record',
 'open_acc',
 'pub_rec',
 'revol_bal',
 'revol_util',
 'total_acc',
 'initial_list_status',
 'out_prncp',
 'out_prncp_inv',
 'total_pymnt',
 'total_pymnt_inv',
 'total_rec_prncp',
 'total_rec_int',
 'total_rec_late_fee',
 'recoveries',
 'collection_recovery_fee',
 'last_pymnt_d',
 'last_pymnt_amnt',
 'next_pymnt_d',
 'last_credit_pull_d',
 'collections_12_mths_ex_med',
 'mths_since_last_major_derog',
 'policy_code',
 'application_type',
 'annual_inc_joint',
 'dti_joint',
 'verification_status_joint',
 'acc_now_delinq',
 'tot_coll_amt',
 'tot_cur_

#### Comment:
Few important features which will help us in identifying whether the loan applicant will be a *Defaulter* or *Non-Defaulter*:
>**loan_amnt** - The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value.

>**term** - The number of payments on the loan. Values are in months and can be either 36 or 60.

>**int_rate** - Interest Rate on the loan.

>**grade** - LC assigned loan grade.

>**sub_grade** - LC assigned loan subgrade.

>**emp_length** - Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years. 

>**home_ownership** - The home ownership status provided by the borrower during registration.

>**annual_inc** - The self-reported annual income provided by the borrower during registration.

>**verification_status** - Indicates if income was verified by LC, not verified, or if the income source was verified.

>**issue_d** - The month which the loan was funded.

>**purpose** - A category provided by the borrower for the loan request. 

>**dti** - *Debt-To-Income* is a ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.

<h3>The target feature/column is <span style="color:Green;"><b>loan_status</b></span></h3>

<h3 align="center" style="color: red;">Customer Behaviour Columns</h3><br>
<b>The customer behaviour variables are not available at the time of loan application, and thus they cannot be used as predictors for credit approval.</b> 
Therefore, the following columns can be safely discarded from the analysis:
<span style="color: red;"><br>delinq_2yrs<br>earliest_cr_line<br>inq_last_6mths<br>open_acc<br>pub_rec<br>revol_bal<br>revol_util<br>total_acc
<br>out_prncp<br>out_prncp_inv<br>total_pymnt<br>total_pymnt_inv<br>total_rec_prncp<br>total_rec_int<br>total_rec_late_fee<br>recoveries<br>collection_recovery_fee<br>last_pymnt_d<br>last_pymnt_amnt<br>last_credit_pull_d<br>application_type</span>

# 2. Data Cleaning