## Problem Statement: 

Normally, most of the bank's wealth is obtained from providing credit loans so that a marketing bank must be able to reduce the risk of non-performing credit loans. The risk of providing loans can be minimized by studying patterns from existing lending data. One technique that you can use to solve this problem is to use data mining techniques. Data mining makes it possible to find hidden information from large data sets by way of classification.


### The goal of this project, you have to build a model to predict whether the person, described by the attributes of the dataset, is a good (1) or a bad (0) credit risk

In [3]:
import pandas as pd
import seaborn as sns
import numpy as np
from statistics import mean
import matplotlib.pyplot as plt
import warnings

In [2]:
## load the data as pandas dataframe

df = pd.read_csv("dataset.csv")

In [4]:
## Show top 5 records

df.head()

Unnamed: 0,status,duration,credit_history,purpose,amount,savings,employment_duration,installment_rate,personal_status_sex,other_debtors,...,property,age,other_installment_plans,housing,number_credits,job,people_liable,telephone,foreign_worker,credit_risk
0,no checking account,18,all credits at this bank paid back duly,car (used),1049,unknown/no savings account,< 1 yr,< 20,female : non-single or male : single,none,...,car or other,21,none,for free,1,skilled employee/official,0 to 2,no,no,good
1,no checking account,9,all credits at this bank paid back duly,others,2799,unknown/no savings account,1 <= ... < 4 yrs,25 <= ... < 35,male : married/widowed,none,...,unknown / no property,36,none,for free,3-Feb,skilled employee/official,3 or more,no,no,good
2,... < 0 DM,12,no credits taken/all credits paid back duly,retraining,841,... < 100 DM,4 <= ... < 7 yrs,25 <= ... < 35,female : non-single or male : single,none,...,unknown / no property,23,none,for free,1,unskilled - resident,0 to 2,no,no,good
3,no checking account,12,all credits at this bank paid back duly,others,2122,unknown/no savings account,1 <= ... < 4 yrs,20 <= ... < 25,male : married/widowed,none,...,unknown / no property,39,none,for free,3-Feb,unskilled - resident,3 or more,no,yes,good
4,no checking account,12,all credits at this bank paid back duly,others,2171,unknown/no savings account,1 <= ... < 4 yrs,< 20,male : married/widowed,none,...,car or other,38,bank,rent,3-Feb,unskilled - resident,0 to 2,no,yes,good


In [5]:
## Show bottom 5 records
df.tail()

Unnamed: 0,status,duration,credit_history,purpose,amount,savings,employment_duration,installment_rate,personal_status_sex,other_debtors,...,property,age,other_installment_plans,housing,number_credits,job,people_liable,telephone,foreign_worker,credit_risk
995,no checking account,24,no credits taken/all credits paid back duly,furniture/equipment,1987,unknown/no savings account,1 <= ... < 4 yrs,25 <= ... < 35,male : married/widowed,none,...,unknown / no property,21,none,for free,1,unskilled - resident,3 or more,no,no,bad
996,no checking account,24,no credits taken/all credits paid back duly,others,2303,unknown/no savings account,>= 7 yrs,< 20,male : married/widowed,co-applicant,...,unknown / no property,45,none,rent,1,skilled employee/official,0 to 2,no,no,bad
997,... >= 200 DM / salary for at least 1 year,21,all credits at this bank paid back duly,others,12680,... >= 1000 DM,>= 7 yrs,< 20,male : married/widowed,none,...,real estate,30,none,own,1,manager/self-empl./highly qualif. employee,0 to 2,yes (under customer name),no,bad
998,... < 0 DM,12,no credits taken/all credits paid back duly,furniture/equipment,6468,... >= 1000 DM,unemployed,25 <= ... < 35,male : married/widowed,none,...,real estate,52,none,rent,1,manager/self-empl./highly qualif. employee,0 to 2,yes (under customer name),no,bad
999,no checking account,30,no credits taken/all credits paid back duly,car (used),6350,... >= 1000 DM,>= 7 yrs,< 20,male : married/widowed,none,...,car or other,31,none,rent,1,skilled employee/official,0 to 2,no,no,bad


In [6]:
## Shape of the data

df.shape
print('The Number of rows of the dataframe is:', df.shape[0], '.')
print('The Number of rows of the dataframe is:', df.shape[1], '.')

The Number of rows of the dataframe is: 1000 .
The Number of rows of the dataframe is: 21 .


### Data Dictionary:

- `Status`: Status of the debtor's checking account with the bank (categorical)
- `duration`: credit duration in months (quantitative)
- `credit_history` : history of compliance with previous or concurrent credit contracts (categorical)
- `purpose`: purpose for which the credit is needed (categorical)
- `amount`: credit amount in DM 
- `savings` : debtor's savings (categorical)
- `employment_duration`:duration of debtor's employment with current employer (ordinal; discretized quantitative)
- `installment_rate` : credit installments as a percentage of debtor's disposable income (ordinal; discretized quantitative)
- `personal_status_sex`: combined information on sex and marital status
- `other_debtors`: Is there another debtor or a guarantor for the credit? (categorical)
- `present_residence`: length of time (in years) the debtor lives in the present residence (ordinal; discretized quantitative)
- `property`: the debtor's most valuable property
- `age`: age in years (quantitative)
- `other_installment_plans`: installment plans from providers other than the credit-giving bank (categorical)
- `housing`: type of housing the debtor lives in (categorical)
- `number_credits`: number of credits including the current one the debtor has (or had) at this bank 
- `job` : quality of debtor's job (ordinal)
- `people_liable`: number of persons who financially depend on the debtor (i.e., are entitled to maintenance) 
- `telephone`: Is there a telephone landline registered on the debtor's name? 
- `foreign_worker`: Is the debtor a foreign worker? (binary)
- `credit_risk`: Has the credit contract been complied with (good) or not (bad) 



In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 21 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   status                   1000 non-null   object
 1   duration                 1000 non-null   int64 
 2   credit_history           1000 non-null   object
 3   purpose                  1000 non-null   object
 4   amount                   1000 non-null   int64 
 5   savings                  1000 non-null   object
 6   employment_duration      1000 non-null   object
 7   installment_rate         1000 non-null   object
 8   personal_status_sex      1000 non-null   object
 9   other_debtors            1000 non-null   object
 10  present_residence        1000 non-null   object
 11  property                 1000 non-null   object
 12  age                      1000 non-null   int64 
 13  other_installment_plans  1000 non-null   object
 14  housing                  1000 non-null   

In [8]:
df.dtypes.value_counts()

object    18
int64      3
dtype: int64

In [10]:
round(df.describe().T,2)

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
duration,1000.0,20.9,12.06,4.0,12.0,18.0,24.0,72.0
amount,1000.0,3271.25,2822.75,250.0,1365.5,2319.5,3972.25,18424.0
age,1000.0,35.54,11.35,19.0,27.0,33.0,42.0,75.0


### Insights:

