# Credit Score Classification

#### About Dataset

**Problem Statement**
- You are working as a data scientist in a global finance company. Over the years, the company has collected basic bank details and gathered a lot of credit-related information. The management wants to build an intelligent system to segregate the people into credit score brackets to reduce the manual efforts.

**Task**
- Given a person’s credit-related information, build a machine learning model that can classify the credit score.

**Features Information**

- **ID -- Represents a unique identification of an entry**
- **Customer_ID -- Represents a unique identification of a person**
- **Month -- Represents the month of the year**
- **Name -- Represents the name of a person**
- **Age -- Represents the age of the person**
- **SSN -- Represents the social security number of a person**
- **Occupation -- Represents the occupation of the person**
- **Annual_Income -- Represents the annual income of the person**
- **Monthly_Inhand_Salary --Represents the monthly base salary of a person**
- **Num_Bank_Accounts -- Represents the number of bank accounts a person holds**
- **Num_Credit_Card --Represents the number of other credit cards held by a person**
- **Interest_Rate -- Represents the interest rate on credit card**
- **Num_of_Loan -- Represents the number of loans taken from the bank**
- **Type_of_Loan -- Represents the types of loan taken by a person**
- **Delay_from_due_date -- Represents the average number of days delayed from the payment date**
- **Num_of_Delayed_Payment -- Represents the average number of payments delayed by a person**
- **Changed_Credit_Limit -- Represents the percentage change in credit card limit**
- **Num_Credit_Inquiries -- Represents the number of credit card inquiries**
- **Credit_Mix --Represents the classification of the mix of credits**
- **Outstanding_Debt -- Represents the remaining debt to be paid (in USD)**
- **Credit_Utilization_Ratio -- Represents the utilization ratio of credit card**
- **Credit_History_Age -- Represents the age of credit history of the person**
- **Payment_of_Min_Amount -- Represents whether only the minimum amount was paid by the person**
- **Total_EMI_per_month -- Represents the monthly EMI payments (in USD)**
- **Amount_invested_monthly -- Represents the monthly amount invested by the customer (in USD)**
- **Payment_Behaviour -- Represents the payment behavior of the customer (in USD)**
- **Monthly_Balance -- Represents the monthly balance amount of the customer (in USD)**
- **Credit_Score --Represents the bracket of credit score (Poor, Standard, Good)**

## Import libraries


In [31]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

## Load Dataset

In [32]:
pd.set_option('display.max_columns', None)

In [33]:
df = pd.read_csv('./dataset.csv')
df.head()

Unnamed: 0,ID,Customer_ID,Month,Name,Age,SSN,Occupation,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Type_of_Loan,Delay_from_due_date,Num_of_Delayed_Payment,Changed_Credit_Limit,Num_Credit_Inquiries,Credit_Mix,Outstanding_Debt,Credit_Utilization_Ratio,Credit_History_Age,Payment_of_Min_Amount,Total_EMI_per_month,Amount_invested_monthly,Payment_Behaviour,Monthly_Balance,Credit_Score
0,0x1602,CUS_0xd40,January,Aaron Maashoh,23,821-00-0265,Scientist,19114.12,1824.843333,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",3,7.0,11.27,4.0,_,809.98,26.82262,22 Years and 1 Months,No,49.574949,80.41529543900253,High_spent_Small_value_payments,312.49408867943663,Good
1,0x1603,CUS_0xd40,February,Aaron Maashoh,23,821-00-0265,Scientist,19114.12,,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",-1,,11.27,4.0,Good,809.98,31.94496,,No,49.574949,118.28022162236736,Low_spent_Large_value_payments,284.62916249607184,Good
2,0x1604,CUS_0xd40,March,Aaron Maashoh,-500,821-00-0265,Scientist,19114.12,,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",3,7.0,_,4.0,Good,809.98,28.609352,22 Years and 3 Months,No,49.574949,81.699521264648,Low_spent_Medium_value_payments,331.2098628537912,Good
3,0x1605,CUS_0xd40,April,Aaron Maashoh,23,821-00-0265,Scientist,19114.12,,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",5,4.0,6.27,4.0,Good,809.98,31.377862,22 Years and 4 Months,No,49.574949,199.4580743910713,Low_spent_Small_value_payments,223.45130972736783,Good
4,0x1606,CUS_0xd40,May,Aaron Maashoh,23,821-00-0265,Scientist,19114.12,1824.843333,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",6,,11.27,4.0,Good,809.98,24.797347,22 Years and 5 Months,No,49.574949,41.420153086217326,High_spent_Medium_value_payments,341.48923103222177,Good


-------

## Data Preprocessing

### Data Errors Handling

- Some missing values in Name column, monthly inhand salary,Type_of_Loan, Num_of_Delayed_Payment, Num_Credit_Inquiries, Credit_History_Age, Amount_invested_monthly, Monthly_Balance
- Convert Age, Num of loan, Num_of_Delayed_Payment into INT
- Convert Annual income, Changed_Credit_Limit, Outstanding_Debt, Amount_invested_monthly, Monthly_Balance into Float

Convert the object dtype column into INT

In [34]:
into_int = ['Age', 'Num_of_Loan', 'Num_of_Delayed_Payment', 'Num_Bank_Accounts','Num_Credit_Card','Interest_Rate','Delay_from_due_date']

df[into_int] = df[into_int].apply(pd.to_numeric, errors='coerce').astype('Int64')

Convert the object dtype column into Float

In [35]:
into_float = ['Annual_Income', 'Changed_Credit_Limit', 'Outstanding_Debt', 'Amount_invested_monthly', 'Monthly_Balance']

df[into_float] = df[into_float].apply(pd.to_numeric, errors='coerce')

### Summary of Dataset

In [36]:
df.info()
df.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 28 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   ID                        100000 non-null  object 
 1   Customer_ID               100000 non-null  object 
 2   Month                     100000 non-null  object 
 3   Name                      90015 non-null   object 
 4   Age                       95061 non-null   Int64  
 5   SSN                       100000 non-null  object 
 6   Occupation                100000 non-null  object 
 7   Annual_Income             93020 non-null   float64
 8   Monthly_Inhand_Salary     84998 non-null   float64
 9   Num_Bank_Accounts         100000 non-null  Int64  
 10  Num_Credit_Card           100000 non-null  Int64  
 11  Interest_Rate             100000 non-null  Int64  
 12  Num_of_Loan               95215 non-null   Int64  
 13  Type_of_Loan              88592 non-null   ob

Unnamed: 0,Age,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Delay_from_due_date,Num_of_Delayed_Payment,Changed_Credit_Limit,Num_Credit_Inquiries,Outstanding_Debt,Credit_Utilization_Ratio,Total_EMI_per_month,Amount_invested_monthly,Monthly_Balance
count,95061.0,93020.0,84998.0,100000.0,100000.0,100000.0,95215.0,100000.0,90254.0,97909.0,98035.0,98991.0,100000.0,100000.0,91216.0,98791.0
mean,110.934505,178579.0,4194.17085,17.09128,22.47443,72.46604,2.780339,21.06878,31.033051,10.389025,27.754251,1426.5037,32.285173,1403.118217,195.539456,402.551258
std,689.407864,1442878.0,3183.686167,117.404834,129.05741,466.422621,62.50094,14.860104,226.955758,6.789496,193.177339,1155.045753,5.116875,8306.04127,199.564527,213.925499
min,-500.0,7005.93,303.645417,-1.0,0.0,1.0,-100.0,-5.0,-3.0,-6.49,0.0,0.23,20.0,0.0,0.0,0.00776
25%,24.0,19435.6,1625.568229,3.0,4.0,8.0,1.0,10.0,9.0,5.32,3.0,566.08,28.052567,30.30666,72.236692,270.10663
50%,33.0,37550.74,3093.745,6.0,5.0,13.0,3.0,18.0,14.0,9.4,6.0,1166.37,32.305784,69.249473,128.954538,336.731225
75%,42.0,72843.38,5957.448333,7.0,7.0,20.0,5.0,28.0,18.0,14.87,9.0,1948.2,36.496663,161.224249,236.815814,470.262938
max,8698.0,24198060.0,15204.633333,1798.0,1499.0,5797.0,1496.0,67.0,4397.0,36.97,2597.0,4998.07,50.0,82331.0,1977.326102,1602.040519


- We have 1 Lac instances of 28 Features
- 7 Int, 9 Float, 12 object dtypes

In [75]:
num = df.select_dtypes(include=['number']).columns

for i in num:
    print(f'{i}: {df[i].nunique()}')
    print(f'Values Counts of: {df[i].value_counts()}')
    print(f'Unique values: {df[i].unique()}')
    

    print('=' * 70)
    print('\n')

Age: 1661
Values Counts of: Age
38      2833
28      2829
31      2806
26      2792
32      2749
        ... 
5741       1
7178       1
5621       1
1908       1
1342       1
Name: count, Length: 1661, dtype: Int64
Unique values: <IntegerArray>
[  23, -500, <NA>,   28,   34,   54,   55,   21,   31,   33,
 ...
 6135,  920, 4402, 8490, 2406, 8315, 8425, 6476, 2263, 1342]
Length: 1662, dtype: Int64


Annual_Income: 13437
Values Counts of: Annual_Income
17273.83       16
20867.67       16
36585.12       16
95596.35       15
9141.63        15
               ..
14855994.00     1
21725043.00     1
16237903.00     1
4894060.00      1
12029909.00     1
Name: count, Length: 13437, dtype: int64
Unique values: [19114.12 34847.84      nan ... 37188.1  20002.88 39628.99]


Monthly_Inhand_Salary: 13235
Values Counts of: Monthly_Inhand_Salary
6769.130000    15
6358.956667    15
2295.058333    15
6082.187500    15
3080.555000    14
               ..
1087.546445     1
3189.212103     1
5640.117744     1

In [79]:
 df[(df['Age'] < 18) | (df['Age'] > 80)]

Unnamed: 0,ID,Customer_ID,Month,Name,Age,SSN,Occupation,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Type_of_Loan,Delay_from_due_date,Num_of_Delayed_Payment,Changed_Credit_Limit,Num_Credit_Inquiries,Credit_Mix,Outstanding_Debt,Credit_Utilization_Ratio,Credit_History_Age,Payment_of_Min_Amount,Total_EMI_per_month,Amount_invested_monthly,Payment_Behaviour,Monthly_Balance,Credit_Score
2,0x1604,CUS_0xd40,March,Aaron Maashoh,-500,821-00-0265,Scientist,19114.12,,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",3,7,,4.0,Good,809.98,28.609352,22 Years and 3 Months,No,49.574949,81.699521,Low_spent_Medium_value_payments,331.209863,Good
56,0x1656,CUS_0x5407,January,Annk,7580,500-92-6408,Media_Manager,,,8,7,15,3,"Not Specified, Auto Loan, and Student Loan",30,11,17.13,5.0,Standard,1704.18,24.448063,,NM,70.478333,162.441009,Low_spent_Large_value_payments,298.192158,Poor
113,0x16ab,CUS_0xff4,February,,-500,655-05-7666,Entrepreneur,25546.26,,8,7,14,5,"Not Specified, Student Loan, Student Loan, Cre...",16,13,7.83,,Standard,758.44,29.711376,18 Years and 3 Months,Yes,101.328637,300.323232,Low_spent_Small_value_payments,129.933631,Standard
122,0x16b8,CUS_0x33d2,March,Chalmersa,181,965-46-2491,Scientist,31993.78,2942.148333,6,6,7,2,"Payday Loan, and Home Equity Loan",8,14,10.28,1.0,Standard,818.22,27.380109,17 Years and 0 Months,Yes,45.141298,264.257089,Low_spent_Small_value_payments,274.816447,Standard
219,0x1749,CUS_0x3edc,April,Williamso,995,663-16-3845,Accountant,43070.24,3622.186667,3,3,18,1,Debt Consolidation Loan,11,8,8.97,4.0,Standard,1233.10,24.331772,19 Years and 5 Months,Yes,30.576085,74.920375,High_spent_Medium_value_payments,506.722207,Standard
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99913,0x25f6f,CUS_0x1619,February,Phil Wahbao,2263,683-59-7399,Media_Manager,20059.98,1523.665000,8,5,12,4,"Home Equity Loan, Payday Loan, Not Specified, ...",25,11,2.97,3.0,Good,909.01,25.982154,16 Years and 4 Months,No,45.076827,74.713580,High_spent_Small_value_payments,292.576093,Poor
99937,0x25f93,CUS_0xad4f,February,Sabina Zawadzkig,-500,226-45-0652,_______,22620.79,1722.065833,7,3,9,,,25,,5.31,2.0,Standard,642.46,31.841872,,No,0.000000,105.076293,Low_spent_Large_value_payments,337.130290,Standard
99950,0x25fa4,CUS_0x51b3,July,Ryana,1342,837-85-9800,Media_Manager,59146.36,4908.863333,3,6,6,1,Personal Loan,8,6,6.68,5.0,_,418.03,38.199635,20 Years and 7 Months,No,26.778419,502.376320,Low_spent_Small_value_payments,251.731594,Standard
99963,0x25fb9,CUS_0x372c,April,Lucia Mutikanik,-500,340-85-7301,Lawyer,42903.79,,0,4,6,1,Not Specified,14,0,5.10,1.0,Good,1079.48,30.625298,,No,34.975457,31.193919,High_spent_Large_value_payments,520.662207,Standard
