# Loan Approval Analysis

### Instructions : (Step 1)
Let's check which variable is categorical and which one is numerical so that 
you will get a basic idea about the features of the bank dataset.

In [1]:
import pandas as pd
import numpy as np

In [2]:
bank = pd.read_csv("C:\\Users\\Shashank\\Documents\\bank.csv")

In [3]:
categorical_var = bank.select_dtypes(include = 'object')
print(categorical_var)

      Loan_ID  Gender Married Dependents     Education Self_Employed  \
0    LP001002    Male      No          0      Graduate            No   
1    LP001003    Male     Yes          1      Graduate            No   
2    LP001005    Male     Yes          0      Graduate           Yes   
3    LP001006    Male     Yes          0  Not Graduate            No   
4    LP001008    Male      No          0      Graduate            No   
..        ...     ...     ...        ...           ...           ...   
609  LP002978  Female      No          0      Graduate            No   
610  LP002979    Male     Yes         3+      Graduate            No   
611  LP002983    Male     Yes          1      Graduate            No   
612  LP002984    Male     Yes          2      Graduate            No   
613  LP002990  Female      No          0      Graduate           Yes   

    Property_Area Loan_Status  
0           Urban           Y  
1           Rural           N  
2           Urban           Y  
3      

In [4]:
numerical_var = bank.select_dtypes(include = 'number')
print(numerical_var)

     ApplicantIncome  CoapplicantIncome  LoanAmount  Loan_Amount_Term  \
0               5849                0.0         NaN             360.0   
1               4583             1508.0       128.0             360.0   
2               3000                0.0        66.0             360.0   
3               2583             2358.0       120.0             360.0   
4               6000                0.0       141.0             360.0   
..               ...                ...         ...               ...   
609             2900                0.0        71.0             360.0   
610             4106                0.0        40.0             180.0   
611             8072              240.0       253.0             360.0   
612             7583                0.0       187.0             360.0   
613             4583                0.0       133.0             360.0   

     Credit_History  
0               1.0  
1               1.0  
2               1.0  
3               1.0  
4            

# In the first step we see the two types of values first is categorical and second is numerical type value. 

### Step 2: 
Sometimes customers forget to fill in all the details or they don't want to share other details. Because of that, some of the fields in the dataset will have missing values. Now you have to check which columns have missing values and also check the count of missing values each column has. If you get the columns that have missing values, try to fill them.



In [5]:
banks = bank.drop('Loan_ID', axis = 1)

In [6]:
print(banks.isnull().sum())

Gender               13
Married               3
Dependents           15
Education             0
Self_Employed        32
ApplicantIncome       0
CoapplicantIncome     0
LoanAmount           22
Loan_Amount_Term     14
Credit_History       50
Property_Area         0
Loan_Status           0
dtype: int64


In [7]:
bank_mode = banks.mode()

In [8]:
banks.fillna(bank_mode, inplace = True)

In [9]:
print(banks.isnull().sum().values.sum())

148


### Step 3: 
Now let's check the loan amount of an average person based on 'Gender', 'Married', 'Self_Employed'. This will give a basic idea of the average loan amount of a person.

In [10]:
avg_loan_amount = pd.pivot_table(banks, index = ['Gender', 'Married', 'Self_Employed'], values  = 'LoanAmount', aggfunc = 'mean')
print(avg_loan_amount)

                              LoanAmount
Gender Married Self_Employed            
Female No      No             110.596774
               Yes            125.800000
       Yes     No             135.480000
               Yes            282.250000
Male   No      No             128.058252
               Yes            173.625000
       Yes     No             151.709220
               Yes            169.355556


### Step 4: 
Now let's check the percentage of loan approved based on a person's employment type.

In [11]:
loan_approved_se = banks[(banks['Self_Employed'] == 'Yes') & (banks['Loan_Status'] == 'Y')].count()
print(loan_approved_se)

Gender               52
Married              56
Dependents           55
Education            56
Self_Employed        56
ApplicantIncome      56
CoapplicantIncome    56
LoanAmount           54
Loan_Amount_Term     54
Credit_History       50
Property_Area        56
Loan_Status          56
dtype: int64


In [12]:
loan_approved_nse = banks[(banks['Self_Employed'] == 'No') & (banks['Loan_Status'] == 'Y')].count()
print(loan_approved_nse)

Gender               339
Married              340
Dependents           335
Education            343
Self_Employed        343
ApplicantIncome      343
CoapplicantIncome    343
LoanAmount           335
Loan_Amount_Term     338
Credit_History       313
Property_Area        343
Loan_Status          343
dtype: int64


In [13]:
percentage_se = loan_approved_se * (100 / 614)

In [14]:
print(round(percentage_se, 2))

Gender               8.47
Married              9.12
Dependents           8.96
Education            9.12
Self_Employed        9.12
ApplicantIncome      9.12
CoapplicantIncome    9.12
LoanAmount           8.79
Loan_Amount_Term     8.79
Credit_History       8.14
Property_Area        9.12
Loan_Status          9.12
dtype: float64


In [15]:
percentage_nse = loan_approved_nse * (100 / 614)
print(round(percentage_nse, 2))

Gender               55.21
Married              55.37
Dependents           54.56
Education            55.86
Self_Employed        55.86
ApplicantIncome      55.86
CoapplicantIncome    55.86
LoanAmount           54.56
Loan_Amount_Term     55.05
Credit_History       50.98
Property_Area        55.86
Loan_Status          55.86
dtype: float64


### Step 5: 
A government audit is happening real soon! So the company wants to find out those applicants with long loan amount term.

In [16]:
loan_term = banks['Loan_Amount_Term'].apply(lambda n:n/12)

In [17]:
def big_loan(term):
    count = 0
    for n in term:
        if n >= 25:
            count = count + 1
    return count
big_loan_term = big_loan(loan_term) 
print(big_loan_term)

540


### Step 6: 
Now let's check the average income of an applicant and the average loan given to a person based on their income.

In [18]:
loan_groupby = banks.groupby('Loan_Status')

In [19]:
loan_groupby = loan_groupby['ApplicantIncome', 'Credit_History']

  loan_groupby = loan_groupby['ApplicantIncome', 'Credit_History']


In [20]:
mean_values = loan_groupby.mean()

In [21]:
print(round(mean_values, 2))

             ApplicantIncome  Credit_History
Loan_Status                                 
N                    5446.08            0.54
Y                    5384.07            0.98
