<a href="https://colab.research.google.com/github/Jayavathsan/MachineLearning/blob/main/Credit_Score_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Import necessary libraries

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.templates.default = "plotly_white"

Load the dataset

In [None]:
data = pd.read_csv("/content/train.csv")

Check the data head

In [None]:
print(data.head())

Information about columns in the dataset

In [None]:
print(data.info())

Check for null values in the dataset

In [None]:
print(data.isnull().sum())

Dropping rows with null values

In [None]:
data = data.dropna()

Check for null values in new data

In [None]:
print(data.isnull().sum())

Credit_Score column values

In [None]:
data["Credit_Score"].value_counts()

# Data exploration

Credit score based on occupation

In [None]:
fig = px.box(data,
             x="Occupation",
             color="Credit_Score",
             title="Credit Scores based on Occupation",
             color_discrete_map={'Poor':'red',
                                 'Standard':'yellow',
                                 'Good':'green'})

fig.show()

Occupation does not impact the credit score much.

Impact of annual income

In [None]:
fig = px.box(data,
             x="Credit_Score",
             y="Annual_Income",
             color="Credit_Score",
             title="Credit Scores based on Annual income",
             color_discrete_map={'Poor':'red',
                                 'Standard':'yellow',
                                 'Good':'green'})

fig.update_traces(quartilemethod="exclusive")
fig.show()

According to the above visualization more annual salary is directly in correlation with better credit score.

Impact of monthly in hand salary

In [None]:
fig = px.box(data,
             x="Credit_Score",
             y="Monthly_Inhand_Salary",
             color="Credit_Score",
             title="Credit Scores Based on Monthly in hand salary",
             color_discrete_map={'Poor':'red',
                                 'Standard':'yellow',
                                 'Good':'green'})

fig.update_traces(quartilemethod="exclusive")
fig.show()

According to the above visualization, having more monthly inhand salary results in better credit score

Impact of number of bank accounts

In [None]:
fig = px.box(data,
             x="Credit_Score",
             y="Num_Bank_Accounts",
             color="Credit_Score",
             title="Credit Scores Based on Number of bank accounts",
             color_discrete_map={'Poor':'red',
                                 'Standard':'yellow',
                                 'Good':'green'})

fig.update_traces(quartilemethod="exclusive")
fig.show()

According to the above visualization, having more bank accounts does not positively impact the credit score

Impact of number of credit cards

In [None]:
fig = px.box(data,
             x="Credit_Score",
             y="Num_Credit_Card",
             color="Credit_Score",
             title="Credit Scores Based on number of credit cards",
             color_discrete_map={'Poor':'red',
                                 'Standard':'yellow',
                                 'Good':'green'})

fig.update_traces(quartilemethod="exclusive")
fig.show()

According to the above visualization, having more credit cards does not positively impact credit scores. Having 3-5 cards results better credit scores

Credit score based on average interest paid in loans and EMIs

In [None]:
fig = px.box(data,
             x="Credit_Score",
             y="Interest_Rate",
             color="Credit_Score",
             title="Credit Scores Based on average interest rate",
             color_discrete_map={'Poor':'red',
                                 'Standard':'yellow',
                                 'Good':'green'})

fig.update_traces(quartilemethod="exclusive")
fig.show()

According to the above visualization, having 4-11% interest results in Good Credit Score and having more than 15% credit scores result in Bad credit scores.

Impact of number of loans on Credit Score

In [None]:
fig = px.box(data,
             x="Credit_Score",
             y="Num_of_Loan",
             color="Credit_Score",
             title="Credit Scores Based on Number of Loans",
             color_discrete_map={'Poor':'red',
                                 'Standard':'yellow',
                                 'Good':'green'})

fig.update_traces(quartilemethod="exclusive")
fig.show()

According to the above visualization, having more than 3 loans negatively impact the credit score

Impact of delaying payments on the due date

In [None]:
fig = px.box(data,
             x="Credit_Score",
             y="Delay_from_due_date",
             color="Credit_Score",
             title="Credit Scores Based on Average Number of days Delayed for Credit card Payments",
             color_discrete_map={'Poor':'red',
                                 'Standard':'yellow',
                                 'Good':'green'})

fig.update_traces(quartilemethod="exclusive")
fig.show()

According to the above visualization, delaying payments for more than 17 days impact the credit score negatively

Impact of number fo delayed payments on credit score

In [None]:
fig = px.box(data,
             x="Credit_Score",
             y="Num_of_Delayed_Payment",
             color="Credit_Score",
             title="Credit Scores Based on Number of delayed payments",
             color_discrete_map={'Poor':'red',
                                 'Standard':'yellow',
                                 'Good':'green'})

fig.update_traces(quartilemethod="exclusive")
fig.show()

According to the above visualization, delaying more than 12 payments from the due date affects the credit scores negatively.

In [None]:
fig = px.box(data,
             x="Credit_Score",
             y="Outstanding_Debt",
             color="Credit_Score",
             title="Credit Scores Based on Outstanding Debt",
             color_discrete_map={'Poor':'red',
                                 'Standard':'yellow',
                                 'Good':'green'})
fig.update_traces(quartilemethod="exclusive")
fig.show()

According to the above visualization, having a debt of more than $1338 affects credit scores negatively.

In [None]:
fig = px.box(data,
             x="Credit_Score",
             y="Credit_Utilization_Ratio",
             color="Credit_Score",
             title="Credit Scores Based on Credit Utilization Ratio",
             color_discrete_map={'Poor':'red',
                                 'Standard':'yellow',
                                 'Good':'green'})
fig.update_traces(quartilemethod="exclusive")
fig.show()

According to the above visualization, Credit utilization ratio does not affect credit scores.

In [None]:
fig = px.box(data,
             x="Credit_Score",
             y="Credit_History_Age",
             color="Credit_Score",
             title="Credit Scores Based on Credit History Age",
             color_discrete_map={'Poor':'red',
                                 'Standard':'yellow',
                                 'Good':'green'})
fig.update_traces(quartilemethod="exclusive")
fig.show()

According to the above visualization, having a long credit history results in better credit scores.

In [None]:
fig = px.box(data,
             x="Credit_Score",
             y="Total_EMI_per_month",
             color="Credit_Score",
             title="Credit Scores Based on Total Number of EMIs per Month",
             color_discrete_map={'Poor':'red',
                                 'Standard':'yellow',
                                 'Good':'green'})
fig.update_traces(quartilemethod="exclusive")
fig.show()

According to the above visualization, number of EMIs being paid in a month does not affect credit scores much.

In [None]:
fig = px.box(data,
             x="Credit_Score",
             y="Amount_invested_monthly",
             color="Credit_Score",
             title="Credit Scores Based on Amount Invested Monthly",
             color_discrete_map={'Poor':'red',
                                 'Standard':'yellow',
                                 'Good':'green'})
fig.update_traces(quartilemethod="exclusive")
fig.show()

According to the above visualization, amount of money invested monthly does not affect credit scores.

In [None]:
fig = px.box(data,
             x="Credit_Score",
             y="Monthly_Balance",
             color="Credit_Score",
             title="Credit Scores Based on Monthly Balance Left",
             color_discrete_map={'Poor':'red',
                                 'Standard':'yellow',
                                 'Good':'green'})
fig.update_traces(quartilemethod="exclusive")
fig.show()

According to the above visualization, having a high monthly balance in your account at the end of the month is good for your credit scores. A monthly balance of less than $250 is bad for credit scores.

# Credit Score Classification Model

Credit_mix feature of the data represents the tyoes of credits and loans taken.

Since the Credit_Mix column is categorical, it is transformed into a numerical feature for easy use in Machine Learing models

In [None]:
data["Credit_Mix"] = data["Credit_Mix"].map({"Standard": 1,
                               "Good": 2,
                               "Bad": 0})

# Splitting the data into features and labels by selecting the features that has impact on the Credit Scores

In [None]:
from sklearn.model_selection import train_test_split
x = np.array(data[["Annual_Income", "Monthly_Inhand_Salary", "Num_Bank_Accounts",
                   "Num_Credit_Card", "Interest_Rate", "Num_of_Loan",
                   "Delay_from_due_date", "Num_of_Delayed_Payment",
                   "Credit_Mix", "Outstanding_Debt",
                   "Credit_History_Age", "Monthly_Balance"]])

y = np.array(data[["Credit_Score"]])

Split the data into training and test sets to train and evaluate the model.


In [None]:
xtrain, xtest, ytrain, ytest = train_test_split(x, y,
                                                    test_size=0.33,
                                                    random_state=42)
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(xtrain, ytrain)

Make predictions from the model by giving inputs according to the features used in training

In [None]:
print("Credit Score Prediction : ")
a = float(input("Annual Income: "))
b = float(input("Monthly Inhand Salary: "))
c = float(input("Number of Bank Accounts: "))
d = float(input("Number of Credit cards: "))
e = float(input("Interest rate: "))
f = float(input("Number of Loans: "))
g = float(input("Average number of days delayed by the person: "))
h = float(input("Number of delayed payments: "))
i = input("Credit Mix (Bad: 0, Standard: 1, Good: 3) : ")
j = float(input("Outstanding Debt: "))
k = float(input("Credit History Age: "))
l = float(input("Monthly Balance: "))

features = np.array([[a, b, c, d, e, f, g, h, i, j, k, l]])
print("Predicted Credit Score = ", model.predict(features))