# Loan Approval Analysis

## Problem Statement

Dream Housing Finance Inc. specializes in home loans across different market segments - rural, urban and semi-urban. Thier loan eligibility process is based on customer details provided while filling an online application form. To create a targeted marketing campaign for different segments, they have asked for a comprehensive analysis of the data collected so far.

About the Dataset
The snapshot of the data, you will be working on dataset :

<img src="Loan_snapshot.png">

The dataset has details of 614 customers with the following 13 features:

<img src="Loan_description.png">


## Why solve this project ?

After completing this project, you will have better grip on working with pandas. In this project you will apply following concepts.

- Dataframe slicing
- Dataframe aggregation
- Pivot table operations

## Step 1 : Load the data
Let's check which variable is categorical and which one is numerical so that you will get a basic idea about the features of the bank dataset.


In [1]:
# Import packages
import numpy as np
import pandas as pd
from scipy.stats import mode 
path = './file.csv'

#Load Dataset
bank = pd.read_csv(path)
    
# Display categorical variable
categorical_var=bank.select_dtypes(include='object')
categorical_var

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,Property_Area,Loan_Status
0,LP001002,Male,No,0,Graduate,No,Urban,Y
1,LP001003,Male,Yes,1,Graduate,No,Rural,N
2,LP001005,Male,Yes,0,Graduate,Yes,Urban,Y
3,LP001006,Male,Yes,0,Not Graduate,No,Urban,Y
4,LP001008,Male,No,0,Graduate,No,Urban,Y
...,...,...,...,...,...,...,...,...
609,LP002978,Female,No,0,Graduate,No,Rural,Y
610,LP002979,Male,Yes,3+,Graduate,No,Rural,Y
611,LP002983,Male,Yes,1,Graduate,No,Urban,Y
612,LP002984,Male,Yes,2,Graduate,No,Urban,Y


In [2]:
#Code for numerical variable
numerical_var=bank.select_dtypes(include='number')
numerical_var

Unnamed: 0,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History
0,5849,0.0,,360.0,1.0
1,4583,1508.0,128.0,360.0,1.0
2,3000,0.0,66.0,360.0,1.0
3,2583,2358.0,120.0,360.0,1.0
4,6000,0.0,141.0,360.0,1.0
...,...,...,...,...,...
609,2900,0.0,71.0,360.0,1.0
610,4106,0.0,40.0,180.0,1.0
611,8072,240.0,253.0,360.0,1.0
612,7583,0.0,187.0,360.0,1.0


## Step 2 : Something is Missing!
Sometimes customers forget to fill in all the details or they don't want to share other details. Because of that, some of the fields in the dataset will have missing values. Now you have to check which columns have missing values and also check the count of missing values each column has. If you get the columns that have missing values, try to fill them.

In [3]:
# load the dataset and drop the Loan_ID
print("Before Imputation:")
banks= bank.drop(columns='Loan_ID')

# check  all the missing values filled.
print(banks.isnull().sum())

# apply mode 
bank_mode = banks.mode().iloc[0]

print("--"*40)
print("After Imputation:")
   
# Fill the missing values with 
banks.fillna(bank_mode, inplace=True)

# check again all the missing values filled.
print(banks.isnull().sum())


Before Imputation:
Gender               13
Married               3
Dependents           15
Education             0
Self_Employed        32
ApplicantIncome       0
CoapplicantIncome     0
LoanAmount           22
Loan_Amount_Term     14
Credit_History       50
Property_Area         0
Loan_Status           0
dtype: int64
--------------------------------------------------------------------------------
After Imputation:
Gender               0
Married              0
Dependents           0
Education            0
Self_Employed        0
ApplicantIncome      0
CoapplicantIncome    0
LoanAmount           0
Loan_Amount_Term     0
Credit_History       0
Property_Area        0
Loan_Status          0
dtype: int64


## Step 3 : Loan Amount vs Gender
Now let's check the loan amount of an average person based on 'Gender', 'Married', 'Self_Employed'. This will give a basic idea of the average loan amount of a person

In [4]:
# check the avg_loan_amount
avg_loan_amount = banks.pivot_table(values=["LoanAmount"], index=["Gender","Married","Self_Employed"], aggfunc=np.mean)
avg_loan_amount

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,LoanAmount
Gender,Married,Self_Employed,Unnamed: 3_level_1
Female,No,No,114.768116
Female,No,Yes,125.272727
Female,Yes,No,133.714286
Female,Yes,Yes,282.25
Male,No,No,129.508621
Male,No,Yes,180.588235
Male,Yes,No,152.60815
Male,Yes,Yes,167.42


## Step 4 : Loan Approval vs Employment
Now let's check the percentage of loan approved based on a person's employment type.

In [5]:
# code for loan aprroved for self employed
loan_approved_se = banks.loc[(banks["Self_Employed"]=="Yes")  & (banks["Loan_Status"]=="Y"), ["Loan_Status"]].count()
print(loan_approved_se)

# code for loan approved for non self employed
loan_approved_nse = banks.loc[(banks["Self_Employed"]=="No")  & (banks["Loan_Status"]=="Y"), ["Loan_Status"]].count()
print(loan_approved_nse)

# percentage of loan approved for self employed
percentage_se = (loan_approved_se * 100 / 614)
percentage_se=percentage_se[0]

# print percentage of loan approved for self employed
print(f"Percentage of loan approved for self employed : {percentage_se}")

#percentage of loan for non self employed
percentage_nse = (loan_approved_nse * 100 / 614)
percentage_nse=percentage_nse[0]

#print percentage of loan for non self employed
print (f"Percentage of loan for non self employed : {percentage_nse}")


Loan_Status    56
dtype: int64
Loan_Status    366
dtype: int64
Percentage of loan approved for self employed : 9.120521172638437
Percentage of loan for non self employed : 59.60912052117264


## Step  5 : Transform the loan tenure from months to years
A government audit is happening real soon! So the company wants to find out those applicants with long loan amount term.

In [6]:
# loan amount term 
loan_term = banks['Loan_Amount_Term'].apply(lambda x: int(x)/12 )

big_loan_term=len(loan_term[loan_term>=25])

print(big_loan_term)


554


## Step 6 :  Income/ Credit History vs Loan Amount
Now let's check the average income of an applicant and the average loan given to a person based on their income.

In [7]:
columns_to_show = ['ApplicantIncome', 'Credit_History']
 
loan_groupby = banks.groupby(['Loan_Status'])

loan_groupby = loan_groupby[columns_to_show]

# Check the mean value 
mean_values = loan_groupby.agg([np.mean])
mean_values

Unnamed: 0_level_0,ApplicantIncome,Credit_History
Unnamed: 0_level_1,mean,mean
Loan_Status,Unnamed: 1_level_2,Unnamed: 2_level_2
N,5446.078125,0.572917
Y,5384.06872,0.983412
