<a href="https://colab.research.google.com/github/Swathy1209/CREDIT-SCORE-CLASSIFICATION/blob/main/CREDIT_SCORE_CLASSIFICATION.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Credit Score Classification**


There are Four credit scores that banks and credit card companies use to label their customers:

1.   Excellent
2.   Good
3.   Standard
4.   Poor

A person with a good credit score will get loans from any bank and financial institution. For the task of Credit Score Classification, we need a labelled dataset with credit scores since the credit scores is not given i have used credit numbers representing credit scores as mentioned above.

# 1. *Data Preprocessing*:
   - Import necessary libraries: pandas, numpy, and sklearn's LabelEncoder for encoding categorical variables.
   - Set the default template for Plotly plots to "plotly_white" using Plotly's pio module.


In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
from sklearn.preprocessing import LabelEncoder
pio.templates.default = "plotly_white"

- Read the CSV file containing the dataset into a DataFrame using pd.read_csv().
   - Print the first few rows of the dataset using data.head().
   - Print information about the dataset using data.info() to understand its structure.


In [4]:
data = pd.read_csv("/content/C&T_test.csv")
print(data.head())
print(data.info())

   sno acc_info  duration_month credit_history purpose savings_acc  \
0    1      A14              24            A34     A46         A61   
1    2      A12              18            A34     A43         A61   
2    3      A11              20            A34     A42         A61   
3    4      A14              12            A34     A43         A65   
4    5      A12              12            A32     A40         A65   

  employment_st poi personal_status gurantors  resident_since property_type  \
0           A75   4             A93      A101               4          A124   
1           A75   3             A92      A103               4          A121   
2           A75   1             A92      A101               4          A122   
3           A75   4             A93      A101               4          A123   
4           A71   1             A92      A101               2          A121   

  age installment_type housing_type  credits_no job_type  liables telephone  \
0  54             A143   

   - Drop columns with missing values using data.dropna(axis=1).
   - Check for missing values in the dataset using data.isnull().sum().


In [5]:
print(data.isnull().sum())

sno                 0
acc_info            0
duration_month      0
credit_history      0
purpose             0
savings_acc         0
employment_st       0
poi                 0
personal_status     0
gurantors           0
resident_since      0
property_type       0
age                 0
installment_type    0
housing_type        0
credits_no          0
job_type            0
liables             0
telephone           0
foreigner           0
dtype: int64


- Check the distribution of the target variable (credits_no) using data["credits_no"].value_counts()

CONSIDERATION OF CREDITWORTHINESS:
      1:EXCELLENT
      2:GOOD
      3:FAIR
      4:BAD

In [6]:
print(data["credits_no"].value_counts())

credits_no
1    120
2     72
3      6
4      2
Name: count, dtype: int64


# 2. *Exploratory Data Analysis (EDA)*:
   - Create scatter plots using Plotly Express (px.scatter()) to visualize the relationships between different features and the target variable (credits_no).



CREDIT SCORES BASED ON EMPLOYMENT STATUS:

In [24]:
fig = px.box(data,
             x="employment_st",
             color="credits_no",
             title="Credit Scores Based on Employment Status",
             color_discrete_map={'1':'red',
                                 '2':'yellow',
                                 '3':'green',
                                 '4':'orange'})
fig.show()

CREDIT SCORES BASED ON DURATION MONTH:

In [25]:
fig = px.box(data,
             x="credits_no",
             y="duration_month",
             color="credits_no",
             title="Credit Scores Based on Duration Month",
             color_discrete_map={'1':'red',
                                 '2':'yellow',
                                 '3':'green',
                                 '4':'orange'})
fig.show()

CREDIT SCORES BASED ON CREDIT HISTORY:

In [26]:
fig = px.box(data,
             x="credits_no",
             y="credit_history",
             color="credits_no",
             title="Credit Scores Based on Monthly Credit History",
             color_discrete_map={'1':'red',
                                 '2':'yellow',
                                 '3':'green',
                                 '4':'orange'})
fig.show()

CREDIT SCORE BASED ON PURPOSE:

In [27]:
fig = px.box(data,
             x="credits_no",
             y="purpose",
             color="credits_no",
             title="Credit Scores Based on Monthly Purpose",
             color_discrete_map={'1':'red',
                                 '2':'yellow',
                                 '3':'green',
                                 '4':'orange'})
fig.show()

CREDIT SCORE BASED ON SAVINGS ACCOUNT:

In [28]:
fig = px.box(data,
             x="credits_no",
             y="savings_acc",
             color="credits_no",
             title="Credit Scores Based on Savings Account",
             color_discrete_map={'1':'red',
                                 '2':'yellow',
                                 '3':'green',
                                 '4':'orange'})
fig.show()


CREDIT SCORES BASED ON PERCENTAGE OF INCOME SPENT(POI):

In [29]:
fig = px.box(data,
             x="credits_no",
             y="poi",
             color="credits_no",
             title="Credit Scores Based on Percentage of Income Spent on Loan Interest",
             color_discrete_map={'1':'red',
                                 '2':'yellow',
                                 '3':'green',
                                 '4':'orange'})
fig.show()

CREDIT SCORES BASED ON AGE:

In [33]:
fig = px.box(data,
             x="credits_no",
             y="age",
             color="credits_no",
             title="Credit Scores Based on Age",
             color_discrete_map={'1':'red',
                                 '2':'yellow',
                                 '3':'green',
                                 '4':'orange'})
fig.show()

CREDIT SCORES BASED ON JOBTYPE:

In [34]:
fig = px.box(data,
             x="credits_no",
             y="job_type",
             color="credits_no",
             title="Credit Scores Based on Job Type",
             color_discrete_map={'1':'red',
                                 '2':'yellow',
                                 '3':'green',
                                 '4':'orange'})
fig.show()

CREDIT SCORES BASED ON HOUSING TYPE:

In [35]:
fig = px.box(data,
             x="credits_no",
             y="housing_type",
             color="credits_no",
             title="Credit Scores Based on Housing Type",
             color_discrete_map={'1':'red',
                                 '2':'yellow',
                                 '3':'green',
                                 '4':'orange'})
fig.show()

CREDIT SCORE BASED ON TELEPHONE:

In [36]:
fig = px.box(data,
             x="credits_no",
             y="telephone",
             color="credits_no",
             title="Credit Scores Based on telephone",
             color_discrete_map={'1':'red',
                                 '2':'yellow',
                                 '3':'green',
                                 '4':'orange'})
fig.show()

CREDIT SCORE BASED ON FOREIGNER:

In [37]:
fig = px.box(data,
             x="credits_no",
             y="foreigner",
             color="credits_no",
             title="Credit Scores Based on Foreigner",
             color_discrete_map={'1':'red',
                                 '2':'yellow',
                                 '3':'green',
                                 '4':'orange'})
fig.show()

# 3. *Feature Engineering*:
   - Map the credit_history column to numerical values using a dictionary mapping.


Credit Score Classification Model:
One more important feature (Credit History) in the dataset is valuable for determining credit scores. The credit history feature tells about the types of credits and loans you have taken.

As the credit_history column is categorical, I will transform it into a numerical feature so that we can use it to train a Machine Learning model for the task of credit score classification:

In [18]:
data["credits_history"] = data["credit_history"].map({"A34": 4,
                               "A32": 3,
                               "A33":2,
                               "A31":1,
                               "A30": 0})

# 4. *Splitting Data*:
   - Split the dataset into features (x) and the target variable (y).
   - Split the dataset into training and testing sets using train_test_split() from sklearn.model_selection.


Now I will split the data into features and labels by selecting the features we found important for our model:

In [19]:
from sklearn.model_selection import train_test_split
label_encoder = LabelEncoder()

# Encode categorical variables
data["employment_st"] = label_encoder.fit_transform(data["employment_st"])
data["purpose"] = label_encoder.fit_transform(data["purpose"])
data["job_type"] = label_encoder.fit_transform(data["job_type"])
data["savings_acc"] = label_encoder.fit_transform(data["savings_acc"])
data["housing_type"] = label_encoder.fit_transform(data["housing_type"])
data["telephone"] = label_encoder.fit_transform(data["telephone"])
data["foreigner"] = label_encoder.fit_transform(data["foreigner"])
data["credit_history"] = label_encoder.fit_transform(data["credit_history"])

x = np.array(data[["employment_st", "duration_month",
                   "purpose", "job_type",
                   "savings_acc", "credit_history",
                   "poi","age", "liables",
                   "housing_type", "telephone","foreigner"]])
y = np.array(data["credits_no"]).ravel()


# 5. *Model Training*:
   - Initialize a RandomForestClassifier from sklearn.ensemble.
   - Impute missing values in the training set using SimpleImputer from sklearn.impute.
   - Train the model on the training data using model.fit().



*Now, I had split the data into training and test sets and proceed further by training a credit score classification model
*And prepares the data by handling missing values through imputation with the mean, and then trains a random forest classifier model using the preprocessed data.

In [None]:
xtrain, xtest, ytrain, ytest = train_test_split(x, y,
                                                    test_size=0.33,
                                                    random_state=42)
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='mean')
xtrain_imputed = imputer.fit_transform(xtrain)
xtest_imputed = imputer.transform(xtest)

model = RandomForestClassifier()
model.fit(xtrain_imputed, ytrain)

## 6. *Model Evaluation and Prediction*:
   - Prompt the user to input values for different features to predict the credit score.
   - Prepare the input features as a numpy array.
   - Use the trained model to predict the credit score for the given input features using model.predict().
   - Print the predicted credit score.


# 7. *Deployment*:
   - This code is ready to be deployed in a production environment where it can be used to predict credit scores based on new input data.


Now, i made predictions from the model by giving inputs to the model according to the features we used to train the model
And this code snippet predicts credit scores using the trained model and then initializes and trains a new random forest classifier model using the preprocessed training data.

In [22]:
print("Credit Score Prediction : ")
a = float(input("employment_st: "))
b = float(input("duration_month: "))
c = float(input("purpose: "))
d = float(input("job_type: "))
e = float(input("savings_acc: "))
f = float(input("poi: "))
g = float(input("age: "))
h = input("Credit_history (A34: 4,A32: 3,A33:2A31:1,A30: 0) : ")
i = float(input("liables: "))
j = float(input("housing_type: "))
k = float(input("telephone: "))
l = float(input("foreigner:"))

features = np.array([[a, b, c, d, e, f, g, h, i, j, k, l]])
print("Predicted Credit Score = ", model.predict(features))
xtrain, xtest, ytrain, ytest = train_test_split(x, y,
                                                    test_size=0.33,
                                                    random_state=42)
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(xtrain_imputed, ytrain)

Credit Score Prediction : 
employment_st: 2.0
duration_month: 24.0
purpose: 1.0
job_type: 1.0
savings_acc: 0.0
poi: 4.0
age: 35.0
Credit_history (A34: 4,A32: 3,A33:2A31:1,A30: 0) : 3
liables: 2.0
housing_type: 1.0
telephone: 1.0
foreigner:0.0
Predicted Credit Score =  [2]


# 8. Summary

Classifying customers based on their credit scores helps banks and credit card companies immediately to issue loans to customers with good creditworthiness. A person with a good credit score will get loans from any bank and financial institution.