### Loan Prediction via Pandas

Gareth Duffy 

https://www.analyticsvidhya.com/blog/2016/01/12-pandas-techniques-python-data-manipulation/

https://github.com/shri1407/Loan-Prediction-Dataset/blob/master/LoanPrediction.ipynb

In [15]:
# Load the data

import pandas as pd
import numpy as np
import csv 
data = pd.read_csv("https://raw.githubusercontent.com/shri1407/Loan-Prediction-Dataset/master/test.csv", index_col="Loan_ID")
data

Unnamed: 0_level_0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area
Loan_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
LP001015,Male,Yes,0,Graduate,No,5720,0,110.0,360.0,1.0,Urban
LP001022,Male,Yes,1,Graduate,No,3076,1500,126.0,360.0,1.0,Urban
LP001031,Male,Yes,2,Graduate,No,5000,1800,208.0,360.0,1.0,Urban
LP001035,Male,Yes,2,Graduate,No,2340,2546,100.0,360.0,,Urban
LP001051,Male,No,0,Not Graduate,No,3276,0,78.0,360.0,1.0,Urban
LP001054,Male,Yes,0,Not Graduate,Yes,2165,3422,152.0,360.0,1.0,Urban
LP001055,Female,No,1,Not Graduate,No,2226,0,59.0,360.0,1.0,Semiurban
LP001056,Male,Yes,2,Not Graduate,No,3881,0,147.0,360.0,0.0,Rural
LP001059,Male,Yes,2,Graduate,,13633,0,280.0,240.0,1.0,Urban
LP001067,Male,No,0,Not Graduate,No,2400,2400,123.0,360.0,1.0,Semiurban


In [19]:
# Boolean indexing
# list of all females who are not graduate are married.

data.loc[(data["Gender"]=="Female") & (data["Education"]=="Not Graduate") & (data["Married"]=="Yes"), ["Gender","Education","Married"]]



Unnamed: 0_level_0,Gender,Education,Married
Loan_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
LP002442,Female,Not Graduate,Yes
LP002476,Female,Not Graduate,Yes
LP002635,Female,Not Graduate,Yes
LP002786,Female,Not Graduate,Yes
LP002826,Female,Not Graduate,Yes


In [21]:
# Apply function

# Apply returns some value after passing each row/column of a data frame with some function. The function can be both default or user-defined. 
# For instance, here it can be used to find the missing values in each row and column.

#Create the function:
def num_missing(x):
  return sum(x.isnull())

#Applying per column:
print("Missing values per column:")
print(data.apply(num_missing, axis=0)) #axis=0 defines that function is to be applied on each column

#Applying per row:
print("\nMissing values per row:")
print(data.apply(num_missing, axis=1).head()) #axis=1 defines that function is to be applied on each row


Missing values per column:
Gender               11
Married               0
Dependents           10
Education             0
Self_Employed        23
ApplicantIncome       0
CoapplicantIncome     0
LoanAmount            5
Loan_Amount_Term      6
Credit_History       29
Property_Area         0
dtype: int64

Missing values per row:
Loan_ID
LP001015    0
LP001022    0
LP001031    0
LP001035    1
LP001051    0
dtype: int64


In [34]:
# Pivot table

#Determine pivot table
impute_grps = data.pivot_table(values=["LoanAmount"], index=["Gender","Married","Self_Employed"], aggfunc=np.mean)
print(impute_grps)

                              LoanAmount
Gender Married Self_Employed            
Female No      No             112.868421
               Yes            171.000000
       Yes     No             141.708333
Male   No      No             124.671429
               Yes            127.285714
       Yes     No             141.919255
               Yes            157.333333


In [35]:
# Merge DataFrames

prop_rates = pd.DataFrame([1000, 5000, 12000], index=['Rural','Semiurban','Urban'],columns=['rates'])
prop_rates

Unnamed: 0,rates
Rural,1000
Semiurban,5000
Urban,12000


In [36]:
# Now we can merge this information with the original dataframe as:

data_merged = data.merge(right=prop_rates, how='inner',left_on='Property_Area',right_index=True, sort=False)
data_merged.pivot_table(values='Credit_History',index=['Property_Area','rates'], aggfunc=len)

Unnamed: 0_level_0,Unnamed: 1_level_0,Credit_History
Property_Area,rates,Unnamed: 2_level_1
Rural,1000,111.0
Semiurban,5000,116.0
Urban,12000,140.0


In [37]:
# Iterating over rows of a dataframe

#Check current type:
data.dtypes

Gender                object
Married               object
Dependents            object
Education             object
Self_Employed         object
ApplicantIncome        int64
CoapplicantIncome      int64
LoanAmount           float64
Loan_Amount_Term     float64
Credit_History       float64
Property_Area         object
dtype: object