# **Credit Score Classification with Machine Learning**

Financial institutions, such as banks and credit card companies, rely on credit scores to assess an individual's creditworthiness. These numerical scores serve as a crucial factor in determining whether to approve or deny loan applications and credit card issuances. In the modern era, banks and credit card companies have embraced Machine Learning algorithms to automate the process of classifying their vast customer databases based on credit histories. If you're interested in learning how to apply Machine Learning techniques for credit score classification, this article is tailored for you. Throughout this article, we'll delve into the task of credit score classification using Machine Learning algorithms implemented in Python.

# **Credit Score Classification**

There are three credit scores that banks and credit card companies use to label their customers:

1. Good
2. Standard
3. Poor

A person with a good credit score will get loans from any bank and financial institution. For the task of Credit Score Classification, we need a labelled dataset with credit scores.

In the section below, I will take you through the task of credit score classification with Machine Learning using Python.

# **Credit Score Classification: Case Study**
The credit score of a person determines the creditworthiness of the person. It helps financial companies determine if you can repay the loan or credit you are applying for.

Here is a dataset based on the credit score classification. Below are all the features in the dataset:

1. ID: Unique ID of the record
2. Customer_ID: Unique ID of the customer
3. Month: Month of the year
4. Name: The name of the person
5. Age: The age of the person
6. SSN: Social Security Number of the person
7. Occupation: The occupation of the person
8. Annual_Income: The Annual Income of the person
9. Monthly_Inhand_Salary: Monthly in-hand salary of the person
10. Num_Bank_Accounts: The number of bank accounts of the person
11. Num_Credit_Card: Number of credit cards the person is having
12. Interest_Rate: The interest rate on the credit card of the person
13. Num_of_Loan: The number of loans taken by the person from the bank
14. Type_of_Loan: The types of loans taken by the person from the bank
15. Delay_from_due_date: The average number of days delayed by the person from the date of payment
16. Num_of_Delayed_Payment: Number of payments delayed by the person
17. Changed_Credit_Card: The percentage change in the credit card limit of the person
18. Num_Credit_Inquiries: The number of credit card inquiries by the person
19. Credit_Mix: Classification of Credit Mix of the customer
20. Outstanding_Debt: The outstanding balance of the person
21. Credit_Utilization_Ratio: The credit utilization ratio of the credit card of the customer
22. Credit_History_Age: The age of the credit history of the person
23. Payment_of_Min_Amount: Yes if the person paid the minimum amount to be paid only, otherwise no.
24. Total_EMI_per_month: The total EMI per month of the person
25. Amount_invested_monthly: The monthly amount invested by the person
26. Payment_Behaviour: The payment behaviour of the person
27. Monthly_Balance: The monthly balance left in the account of the person
29. Credit_Score: The credit score of the person

The Credit_Score column is the target variable in this problem. You are required to find relationships based on how banks classify credit scores and train a model to classify the credit score of a person.

In [4]:
# Import required libraries
import pandas as pd  # For data manipulation and analysis
import numpy as np  # For numerical operations
import plotly.express as px  # For creating interactive data visualizations
import plotly.graph_objects as go  # For creating and customizing Plotly figures
import plotly.io as pio  # For configuring Plotly's default settings

# Set the default Plotly template to a clean, white background
pio.templates.default = "plotly_white"

# Read the credit score dataset from the specified file path
data = pd.read_csv("/content/drive/MyDrive/Dataset/credit-score.csv")

# Print the first few rows of the dataset to get a glimpse of the data
print(data.head())

     ID  Customer_ID  Month          Name  Age        SSN Occupation  \
0  5634         3392      1  Cloud Strife   23  821000265  Scientist   
1  5635         3392      2  Cloud Strife   23  821000265  Scientist   
2  5636         3392      3  Cloud Strife   23  821000265  Scientist   
3  5637         3392      4  Cloud Strife   23  821000265  Scientist   
4  5638         3392      5  Cloud Strife   23  821000265  Scientist   

   Annual_Income  Monthly_Inhand_Salary  Num_Bank_Accounts  ...  Credit_Mix  \
0       19114.12            1824.843333                  3  ...        Good   
1       19114.12            1824.843333                  3  ...        Good   
2       19114.12            1824.843333                  3  ...        Good   
3       19114.12            1824.843333                  3  ...        Good   
4       19114.12            1824.843333                  3  ...        Good   

   Outstanding_Debt  Credit_Utilization_Ratio Credit_History_Age  \
0            809.98     

Let’s have a look at the information about the columns in the dataset:

In [5]:
# Import required libraries
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio

# Set the default Plotly template to a clean, white background
pio.templates.default = "plotly_white"

# Read the credit score dataset from the specified file path
data = pd.read_csv("/content/drive/MyDrive/Dataset/credit-score.csv")

# Print information about the dataset, including column names, data types, and non-null counts
print(data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 28 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   ID                        100000 non-null  int64  
 1   Customer_ID               100000 non-null  int64  
 2   Month                     100000 non-null  int64  
 3   Name                      100000 non-null  object 
 4   Age                       100000 non-null  int64  
 5   SSN                       100000 non-null  int64  
 6   Occupation                100000 non-null  object 
 7   Annual_Income             100000 non-null  float64
 8   Monthly_Inhand_Salary     100000 non-null  float64
 9   Num_Bank_Accounts         100000 non-null  int64  
 10  Num_Credit_Card           100000 non-null  int64  
 11  Interest_Rate             100000 non-null  int64  
 12  Num_of_Loan               100000 non-null  int64  
 13  Type_of_Loan              100000 non-null  ob

Prior to any subsequent steps, let's verify the dataset's completeness by identifying and addressing any null or missing data points.

In [7]:
# Import required libraries
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio

# Set the default Plotly template to a clean, white background
pio.templates.default = "plotly_white"

# Read the credit score dataset from the specified file path
data = pd.read_csv("/content/drive/MyDrive/Dataset/credit-score.csv")

# Check for the presence of null values in each column of the dataset
print(data.isnull().sum())

ID                          0
Customer_ID                 0
Month                       0
Name                        0
Age                         0
SSN                         0
Occupation                  0
Annual_Income               0
Monthly_Inhand_Salary       0
Num_Bank_Accounts           0
Num_Credit_Card             0
Interest_Rate               0
Num_of_Loan                 0
Type_of_Loan                0
Delay_from_due_date         0
Num_of_Delayed_Payment      0
Changed_Credit_Limit        0
Num_Credit_Inquiries        0
Credit_Mix                  0
Outstanding_Debt            0
Credit_Utilization_Ratio    0
Credit_History_Age          0
Payment_of_Min_Amount       0
Total_EMI_per_month         0
Amount_invested_monthly     0
Payment_Behaviour           0
Monthly_Balance             0
Credit_Score                0
dtype: int64


Since the dataset is devoid of any null values, we can proceed to examine the values within the 'Credit_Score' column, as this dataset is labeled.

In [8]:
# Import required libraries
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio

# Set the default Plotly template to a clean, white background
pio.templates.default = "plotly_white"

# Read the credit score dataset from the specified file path
data = pd.read_csv("/content/drive/MyDrive/Dataset/credit-score.csv")

# Get the count of unique values in the 'Credit_Score' column
data["Credit_Score"].value_counts()

Standard    53174
Poor        28998
Good        17828
Name: Credit_Score, dtype: int64