# Credit Loan Case Study using Python 

# Introduction
This assignment aims to give you an idea of applying EDA in a real business
scenario. In this assignment, apart from applying the techniques that you have
learnt in the EDA module, you will also develop a basic understanding of risk
analytics in banking and financial services and understand how data is used to
minimise the risk of losing money while lending to customers.

# Business Understanding
The loan providing companies find it hard to give loans to the people due to
their insufficient or non-existent credit history. Because of that, some
consumers use it to their advantage by becoming a defaulter. Suppose you
work for a consumer finance company which specialises in lending various
types of loans to urban customers. You have to use EDA to analyse the
patterns present in the data. This will ensure that the applicants capable of
repaying the loan are not rejected.


When the company receives a loan application, the company has to decide for
loan approval based on the applicant’s profile. Two types of risks are
associated with the bank’s decision

# Business Objectives
This case study aims to identify patterns which indicate if a client has
difficulty paying their instalments which may be used for taking actions such
as denying the loan, reducing the amount of loan, lending (to risky applicants)
at a higher interest rate, etc. This will ensure that the consumers capable of
repaying the loan are not rejected. Identification of such applicants using
EDA is the aim of this case study.
In other words, the company wants to understand the driving factors (or
driver variables) behind loan default, i.e. the variables which are strong
indicators of default. The company can utilise this knowledge for its portfolio
and risk assessment.


To develop your understanding of the domain, you are advised to
independently research a little about risk analytics - understanding the types
of variables and their significance should be enough.

# Importing all Liabraries 

In [0]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# To make a readable all columns in datasets

In [0]:
pd.set_option('display.max_columns', None)


# Files Path

In [0]:
path = r'E:\CSV_Files\Project\Credit EDA Case Study'

# Read the Datasets

In [0]:
df = pd.read_csv(f'{path}/application_data.csv')

In [0]:
df

# Step 1:   Data Cleaning

In [0]:
df.shape

# Check the Null Value in Columns and delete more than 10% null values

In [0]:
df.isnull().sum()

In [0]:
null_col = df.isnull().sum()
null_col = null_col.loc[null_col.values > df.shape[0] * 0.1]
null_col

In [0]:
df.drop(columns=null_col.index,inplace=True)

# Check the Null Value in rows and delete more than 10% null values

In [0]:
df.dropna(thresh=df.shape[1] - df.shape[1] * 0.1,inplace=True)

# Finding the Outlier and delete it

In [0]:
plt.figure(figsize=(25,10))
sns.boxplot(data = df, x='AMT_INCOME_TOTAL')

In [0]:
df['AMT_INCOME_TOTAL'].min()

In [0]:
df['AMT_INCOME_TOTAL'].max()

IQR is used to measure variability by dividing a data set into quartiles. The data is sorted in ascending order and split into 4 equal parts. Q1, Q2, Q3 called first, second and third quartiles are the values which separate the 4 equal parts.

Q1 represents the 25th percentile of the data.
Q2 represents the 50th percentile of the data.
Q3 represents the 75th percentile of the data.

In [0]:
Q1, Q3 = np.percentile(df['AMT_INCOME_TOTAL'],[25,75])

In [0]:
Q1

In [0]:
Q3

IQR is the range between the first and the third quartiles namely Q1 and Q3: IQR = Q3 – Q1. The data points which fall below Q1 – 1.5 IQR or above Q3 + 1.5 IQR are outliers.

In [0]:
IQR = Q3 - Q1

In [0]:
IQR

In [0]:
lower_x = Q1 - 1.5 * IQR
Upper_x = Q3 + 1.5 * IQR

In [0]:
lower_x

In [0]:
Upper_x

In [0]:
df = df.loc[df['AMT_INCOME_TOTAL'] < Upper_x]

In [0]:
plt.figure(figsize=(25,10))
sns.boxplot(data = df, x='AMT_INCOME_TOTAL')

In [0]:
df['AMT_INCOME_TOTAL'].min()

In [0]:
df['AMT_INCOME_TOTAL'].max()

# Dealing with "XNA" in Code Gender Columns

In [0]:
df['CODE_GENDER'].value_counts()

In [0]:
df['CODE_GENDER'] = df['CODE_GENDER'].str.replace('XNA','F')

In [0]:
df['CODE_GENDER'].value_counts()

# Dealing with "XNA" in Organisation type Columns

In [0]:
df['ORGANIZATION_TYPE'].value_counts()

In [0]:
df = df.loc[df['ORGANIZATION_TYPE'] != 'XNA' ]

In [0]:
df['ORGANIZATION_TYPE'].value_counts()

In [0]:
df

# EDA Data Analysis Analysis

# Filter the Data for;
* 0 means who make payment as on time
* 1 mean that who is defaulter

In [0]:
df0 = df.loc[df['TARGET'] == 0 ]
df1 = df.loc[df['TARGET'] == 1 ]

In [0]:
df0

In [0]:
df1

# Exporting the Cleaned Data in CSV Formate

In [0]:
df.to_csv(r'E:\CSV_Files\Project\Credit EDA Case Study\CleaningData.csv')

* Target Analysis

In [0]:
plt.figure(figsize=(25,10))
plt.title('Defaulter VS Non-Defaulter',fontsize= 25)
plt.xlabel('Targets',fontsize= 15)
plt.ylabel('Count',fontsize= 15)
sns.countplot(data= df,x='TARGET')

* Gender for Non- Defaulter VS Defaulter

In [0]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(25, 10))
sns.countplot(data=df0,x='CODE_GENDER',ax=ax1)
sns.countplot(data=df1,x='CODE_GENDER',ax=ax2)
ax1.set_title('Gender for Non-Defaulter (0)',fontsize=25,fontweight='bold')
ax2.set_title('Gender for Defaulter (1)',fontsize=25,fontweight='bold')
ax1.set_xlabel('Gender',fontsize=15,fontweight='bold')
ax1.set_ylabel('Counts',fontsize=15,fontweight='bold')
ax2.set_xlabel('Gender',fontsize=15,fontweight='bold')
ax2.set_ylabel('Counts',fontsize=15,fontweight='bold')
ax1.grid(True)  # Enable grid lines
ax2.grid(True)  # Enable grid lines

plt.tight_layout()

# Distribution of Income for Non- Defaulter and Defaulter

In [0]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(25, 10))
ax1.hist(df0['AMT_INCOME_TOTAL'])
ax2.hist(df1['AMT_INCOME_TOTAL'])
ax1.set_title('Distribution of Total Income for Non- Defaulter',fontsize=25,fontweight='bold')
ax2.set_title('Distribution of Total Income for Defaulter',fontsize=25,fontweight='bold')
ax1.set_xlabel('Total Income',fontsize=15,fontweight='bold')
ax1.set_ylabel('Counts',fontsize=15,fontweight='bold')
ax2.set_xlabel('Total Income',fontsize=15,fontweight='bold')
ax2.set_ylabel('Counts',fontsize=15,fontweight='bold')

plt.tight_layout()


# Loan Contract type for Non - Defauletr and Defaulter

In [0]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(25, 10))
sns.countplot(data=df0,x='NAME_CONTRACT_TYPE',ax=ax1)
sns.countplot(data=df1,x='NAME_CONTRACT_TYPE',ax=ax2)
ax1.set_title('Loan Contract Types',fontsize=25,fontweight='bold')
ax2.set_title('Loan Contract Types',fontsize=25,fontweight='bold')
ax1.set_xlabel('Name Contract Types',fontsize=15,fontweight='bold')
ax1.set_ylabel('Counts',fontsize=15,fontweight='bold')
ax2.set_xlabel('Name Contract Types',fontsize=15,fontweight='bold')
ax2.set_ylabel('Counts',fontsize=15,fontweight='bold')
ax1.grid(True)  # Enable grid lines
ax2.grid(True)  # Enable grid lines

plt.tight_layout()

# Gender Wise Income for Non-Defaulter VS Defaulter

In [0]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(25, 10))
sns.barplot(data=df0,x='CODE_GENDER',y='AMT_INCOME_TOTAL',ax=ax1)
sns.barplot(data=df1,x='CODE_GENDER',y='AMT_INCOME_TOTAL',ax=ax2)
ax1.set_title('Gender wise Income for Non-Defaulter',fontsize=25,fontweight='bold')
ax2.set_title('Gender wise Income for Defaulter',fontsize=25,fontweight='bold')
ax1.set_xlabel('Gender',fontsize=15,fontweight='bold')
ax1.set_ylabel('Income',fontsize=15,fontweight='bold')
ax2.set_xlabel('Gender',fontsize=15,fontweight='bold')
ax2.set_ylabel('Income',fontsize=15,fontweight='bold')
ax1.grid(True)  # Enable grid lines
ax2.grid(True)  # Enable grid lines

plt.tight_layout()

# Organisation Type for Non-Defaulter VS Defaulter

In [0]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(25, 20))
sns.countplot(data=df0,y='ORGANIZATION_TYPE',ax=ax1)
sns.countplot(data=df1,y='ORGANIZATION_TYPE',ax=ax2)
ax1.set_title('Organisation Types for Non-Defaulter (0)',fontsize=25,fontweight='bold')
ax2.set_title('Organisation Types for Defaulter (1)',fontsize=25,fontweight='bold')
ax1.set_xlabel('Counts',fontsize=15,fontweight='bold')
ax1.set_ylabel('ORGANIZATION_TYPE',fontsize=15,fontweight='bold')
ax2.set_xlabel('Counts',fontsize=15,fontweight='bold')
ax2.set_ylabel('ORGANIZATION_TYPE',fontsize=15,fontweight='bold')


plt.tight_layout()

In [0]:
df['AMT_CREDIT'].min()

In [0]:
df['AMT_CREDIT'].max()

# Distribution of Credit Amount for Non-Defaulter VS Defaulter

In [0]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(25, 10))
ax1.hist(df0['AMT_CREDIT'])
ax2.hist(df1['AMT_CREDIT'])
ax1.set_title('Distribution of Credit Amount for Non- Defaulter',fontsize=25,fontweight='bold')
ax2.set_title('Distribution of Credit Amount for Defaulter',fontsize=25,fontweight='bold')
ax1.set_xlabel('Range of Credit Amount in Lakh',fontsize=15,fontweight='bold')
ax1.set_ylabel('Counts',fontsize=15,fontweight='bold')
ax2.set_xlabel('Range of Credit Amount in Lakh',fontsize=15,fontweight='bold')
ax2.set_ylabel('Counts',fontsize=15,fontweight='bold')


plt.tight_layout()
plt.show()


# # Gender Wise Credit Amount for Non-Defaulter VS Defaulter

In [0]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(25, 10))
sns.barplot(data=df0,x='CODE_GENDER',y='AMT_CREDIT',ax=ax1)
sns.barplot(data=df1,x='CODE_GENDER',y='AMT_CREDIT',ax=ax2)
ax1.set_title('Gender wise Credit Amount for Non-Defaulter',fontsize=25,fontweight='bold')
ax2.set_title('Gender wise Credit Amount for Defaulter',fontsize=25,fontweight='bold')
ax1.set_xlabel('Gender',fontsize=15,fontweight='bold')
ax1.set_ylabel('Credit Amount',fontsize=15,fontweight='bold')
ax2.set_xlabel('Gender',fontsize=15,fontweight='bold')
ax2.set_ylabel('Credit Amount',fontsize=15,fontweight='bold')


plt.tight_layout()

# # Gender Wise Education Type for Non-Defaulter VS Defaulter

In [0]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(25, 10))
sns.countplot(data=df0,x='NAME_EDUCATION_TYPE',ax=ax1)
sns.countplot(data=df1,x='NAME_EDUCATION_TYPE',ax=ax2)
ax1.set_title('Gender wise Education Type for Non-Defaulter',fontsize=25,fontweight='bold')
ax2.set_title('Gender wise Education Type for Defaulter',fontsize=25,fontweight='bold')
ax1.set_xlabel('Education Types',fontsize=15,fontweight='bold')
ax1.set_ylabel('Counts',fontsize=15,fontweight='bold')
ax2.set_xlabel('Education Types',fontsize=15,fontweight='bold')
ax2.set_ylabel('Counts',fontsize=15,fontweight='bold')


plt.tight_layout()

# Family Status for Non-Defaulter VS Defaulter

In [0]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(25, 10))
sns.countplot(data=df0,x='NAME_FAMILY_STATUS',ax=ax1)
sns.countplot(data=df1,x='NAME_FAMILY_STATUS',ax=ax2)
ax1.set_title('Distribution of Name Family Status for pays on time',fontsize=25,fontweight='bold')
ax2.set_title('Distribution of Name Family Status for Defaults',fontsize=25,fontweight='bold')
ax1.set_xlabel('Name Family Status',fontsize=15,fontweight='bold')
ax1.set_ylabel('Counts',fontsize=15,fontweight='bold')
ax2.set_xlabel('Name Family Status',fontsize=15,fontweight='bold')
ax2.set_ylabel('Counts',fontsize=15,fontweight='bold')


plt.tight_layout()

# Name Housing Type for Non-Defaulter VS Defaulter

In [0]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(25, 10))
sns.countplot(data=df0,x='NAME_HOUSING_TYPE',ax=ax1)
sns.countplot(data=df1,x='NAME_HOUSING_TYPE',ax=ax2)
ax1.set_title('Distribution of Name Housing Type for pays on time',fontsize=25,fontweight='bold')
ax2.set_title('Distribution of Name Housing Type for Defaults',fontsize=25,fontweight='bold')
ax1.set_xlabel('Name Housing Type',fontsize=15,fontweight='bold')
ax1.set_ylabel('Counts',fontsize=15,fontweight='bold')
ax2.set_xlabel('Name Housing Type',fontsize=15,fontweight='bold')
ax2.set_ylabel('Counts',fontsize=15,fontweight='bold')


plt.tight_layout()

# Distribution for Annuity Amount for Non-Defaulter VS Defaulter

In [0]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(25, 10))
ax1.hist(df0['AMT_ANNUITY'])
ax2.hist(df1['AMT_ANNUITY'])
ax1.set_title('Distribution of Annuity Amount for Non- Defaulter',fontsize=25,fontweight='bold')
ax2.set_title('Distribution of Annuity Amount for Defaulter',fontsize=25,fontweight='bold')
ax1.set_xlabel('Range of Annuity Amount in Lakh',fontsize=15,fontweight='bold')
ax1.set_ylabel('Counts',fontsize=15,fontweight='bold')
ax2.set_xlabel('Range of Annuity Amount in Lakh',fontsize=15,fontweight='bold')
ax2.set_ylabel('Counts',fontsize=15,fontweight='bold')


plt.tight_layout()
plt.show()


# Clients Who Have Own car status for Non-Defaulter VS Defaulter

In [0]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(25, 10))
sns.countplot(data=df0,x='FLAG_OWN_CAR',ax=ax1)
sns.countplot(data=df1,x='FLAG_OWN_CAR',ax=ax2)
ax1.set_title('Distribution of Own Car for pays on time',fontsize=25,fontweight='bold')
ax2.set_title('Distribution of Own Car for Defaults',fontsize=25,fontweight='bold')
ax1.set_xlabel('Own Car Type',fontsize=15,fontweight='bold')
ax1.set_ylabel('Counts',fontsize=15,fontweight='bold')
ax2.set_xlabel('Own Car Type',fontsize=15,fontweight='bold')
ax2.set_ylabel('Counts',fontsize=15,fontweight='bold')


plt.tight_layout()

# Dustribution of Family member Number for Non-Defaulter VS Defaulter

In [0]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(25, 10))
sns.countplot(data=df0,x='CNT_FAM_MEMBERS',ax=ax1)
sns.countplot(data=df1,x='CNT_FAM_MEMBERS',ax=ax2)
ax1.set_title('Distribution of Family Member no for pays on time',fontsize=25,fontweight='bold')
ax2.set_title('Distribution of Family Member no for Defaults',fontsize=25,fontweight='bold')
ax1.set_xlabel('Family Member no',fontsize=15,fontweight='bold')
ax1.set_ylabel('Counts',fontsize=15,fontweight='bold')
ax2.set_xlabel('Family Member no',fontsize=15,fontweight='bold')
ax2.set_ylabel('Counts',fontsize=15,fontweight='bold')


plt.tight_layout()

# Name Suite Type for Non-Defaulter VS Defaulter

In [0]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(25, 10))
sns.countplot(data=df0,x='NAME_TYPE_SUITE',ax=ax1)
sns.countplot(data=df1,x='NAME_TYPE_SUITE',ax=ax2)
ax1.set_title('Distribution of Name Suite Type no for pays on time',fontsize=25,fontweight='bold')
ax2.set_title('Distribution of Name Suite Type no for Defaults',fontsize=25,fontweight='bold')
ax1.set_xlabel('Name Suite Type',fontsize=15,fontweight='bold')
ax1.set_ylabel('Counts',fontsize=15,fontweight='bold')
ax2.set_xlabel('Name Suite Type',fontsize=15,fontweight='bold')
ax2.set_ylabel('Counts',fontsize=15,fontweight='bold')


plt.tight_layout()

# Marital Status for Non-Defaulter VS Defaulter

In [0]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(25, 10))
sns.countplot(data=df0,x='NAME_FAMILY_STATUS',ax=ax1)
sns.countplot(data=df1,x='NAME_FAMILY_STATUS',ax=ax2)
ax1.set_title('Distribution of Marrital Status no for pays on time',fontsize=25,fontweight='bold')
ax2.set_title('Distribution of Marrital Status no for Defaults',fontsize=25,fontweight='bold')
ax1.set_xlabel('Marrital Status',fontsize=15,fontweight='bold')
ax1.set_ylabel('Counts',fontsize=15,fontweight='bold')
ax2.set_xlabel('Marrital Status',fontsize=15,fontweight='bold')
ax2.set_ylabel('Counts',fontsize=15,fontweight='bold')


plt.tight_layout()

# Clients Own Housing Status

In [0]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(25, 10))
sns.countplot(data=df0,x='FLAG_OWN_REALTY',ax=ax1)
sns.countplot(data=df1,x='FLAG_OWN_REALTY',ax=ax2)
ax1.set_title('Distribution of Client Own Realty for pays on time',fontsize=25,fontweight='bold')
ax2.set_title('Distribution of Client Own Realty for Defaults',fontsize=25,fontweight='bold')
ax1.set_xlabel('Client Own Realty',fontsize=15,fontweight='bold')
ax1.set_ylabel('Counts',fontsize=15,fontweight='bold')
ax2.set_xlabel('Client Own Realty',fontsize=15,fontweight='bold')
ax2.set_ylabel('Counts',fontsize=15,fontweight='bold')


plt.tight_layout()

# Number of Child 

In [0]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(25, 10))
sns.countplot(data=df0,x='CNT_CHILDREN',ax=ax1)
sns.countplot(data=df1,x='CNT_CHILDREN',ax=ax2)
ax1.set_title('Distribution of Child Number for pays on time',fontsize=25,fontweight='bold')
ax2.set_title('Distribution of Child Number for Defaults',fontsize=25,fontweight='bold')
ax1.set_xlabel('Child Number',fontsize=15,fontweight='bold')
ax1.set_ylabel('Counts',fontsize=15,fontweight='bold')
ax2.set_xlabel('Child Number',fontsize=15,fontweight='bold')
ax2.set_ylabel('Counts',fontsize=15,fontweight='bold')


plt.tight_layout()

# Housing Type

In [0]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(25, 10))

sns.countplot(data=df0,x='NAME_HOUSING_TYPE',ax=ax1)
sns.countplot(data=df1,x='NAME_HOUSING_TYPE',ax=ax2)
ax1.set_title('Distribution of Housing Type for pays on time',fontsize=25,fontweight='bold')
ax2.set_title('Distribution of Housing Type for Defaults',fontsize=25,fontweight='bold')
ax1.set_xlabel('Housing Type',fontsize=15,fontweight='bold')
ax1.set_ylabel('Counts',fontsize=15,fontweight='bold')
ax2.set_xlabel('Housing Type',fontsize=15,fontweight='bold')
ax2.set_ylabel('Counts',fontsize=15,fontweight='bold')


plt.tight_layout()