<a href="https://colab.research.google.com/github/akinolanath5519/Credit-Scoring-and-Segmentation-Financial-Institutions-/blob/main/Credit_Scoring_and_Segementation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
pio.templates.default = "plotly_white"


Credit scoring aims to determine the creditworthiness of individuals based on their credit profiles. By analyzing factors such as payment history, credit utilization ratio, and number of credit accounts, we can assign a credit score to each individual, providing a quantitative measure of their creditworthiness.

The given dataset includes features such as age, gender, marital status, education level, employment status, credit utilization ratio, payment history, number of credit accounts, loan amount, interest rate, loan term, type of loan, and income level.


In [4]:
from google.colab import files
data=files.upload()

Saving credit_scoring.csv to credit_scoring (1).csv


In [6]:
data = pd.read_csv("credit_scoring.csv")
data

Unnamed: 0,Age,Gender,Marital Status,Education Level,Employment Status,Credit Utilization Ratio,Payment History,Number of Credit Accounts,Loan Amount,Interest Rate,Loan Term,Type of Loan
0,60,Male,Married,Master,Employed,0.22,2685.0,2,4675000,2.65,48,Personal Loan
1,25,Male,Married,High School,Unemployed,0.20,2371.0,9,3619000,5.19,60,Auto Loan
2,30,Female,Single,Master,Employed,0.22,2771.0,6,957000,2.76,12,Auto Loan
3,58,Female,Married,PhD,Unemployed,0.12,1371.0,2,4731000,6.57,60,Auto Loan
4,32,Male,Married,Bachelor,Self-Employed,0.99,828.0,2,3289000,6.28,36,Personal Loan
...,...,...,...,...,...,...,...,...,...,...,...,...
995,59,Male,Divorced,High School,Employed,0.74,1285.0,8,3530000,12.99,48,Auto Loan
996,64,Male,Divorced,Bachelor,Unemployed,0.77,1857.0,2,1377000,18.02,60,Home Loan
997,63,Female,Single,Master,Self-Employed,0.18,2628.0,10,2443000,18.95,12,Personal Loan
998,51,Female,Married,PhD,Self-Employed,0.32,1142.0,3,1301000,1.80,24,Auto Loan


In [7]:
print(data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 12 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Age                        1000 non-null   int64  
 1   Gender                     1000 non-null   object 
 2   Marital Status             1000 non-null   object 
 3   Education Level            1000 non-null   object 
 4   Employment Status          1000 non-null   object 
 5   Credit Utilization Ratio   1000 non-null   float64
 6   Payment History            1000 non-null   float64
 7   Number of Credit Accounts  1000 non-null   int64  
 8   Loan Amount                1000 non-null   int64  
 9   Interest Rate              1000 non-null   float64
 10  Loan Term                  1000 non-null   int64  
 11  Type of Loan               1000 non-null   object 
dtypes: float64(3), int64(4), object(5)
memory usage: 93.9+ KB
None


In [8]:
print(data.describe())

               Age  Credit Utilization Ratio  Payment History  \
count  1000.000000               1000.000000      1000.000000   
mean     42.702000                  0.509950      1452.814000   
std      13.266771                  0.291057       827.934146   
min      20.000000                  0.000000         0.000000   
25%      31.000000                  0.250000       763.750000   
50%      42.000000                  0.530000      1428.000000   
75%      54.000000                  0.750000      2142.000000   
max      65.000000                  1.000000      2857.000000   

       Number of Credit Accounts   Loan Amount  Interest Rate    Loan Term  
count                1000.000000  1.000000e+03    1000.000000  1000.000000  
mean                    5.580000  2.471401e+06      10.686600    37.128000  
std                     2.933634  1.387047e+06       5.479058    17.436274  
min                     1.000000  1.080000e+05       1.010000    12.000000  
25%                     3.000

In [23]:
data.isnull().sum()

Age                             0
Gender                          0
Marital Status                  0
Education Level              1000
Employment Status            1000
Credit Utilization Ratio        0
Payment History                 0
Number of Credit Accounts       0
Loan Amount                     0
Interest Rate                   0
Loan Term                       0
Type of Loan                    0
Credit Score                 1000
dtype: int64

In [27]:
data.shape

(1000, 13)

Now let’s have a look at the distribution of the credit utilization ratio in the data:

In [9]:
credit_utilization_fig = px.box(data, y='Credit Utilization Ratio',
                                title='Credit Utilization Ratio Distribution')
credit_utilization_fig.show()

Now let’s have a look at the distribution of the loan amount in the data:



In [10]:
loan_amount_fig = px.histogram(data, x='Loan Amount',
                               nbins=20,
                               title='Loan Amount Distribution')
loan_amount_fig.show()

Now let’s have a look at the correlation in the data:

In [11]:
numeric_df = data[['Credit Utilization Ratio',
                   'Payment History',
                   'Number of Credit Accounts',
                   'Loan Amount', 'Interest Rate',
                   'Loan Term']]
correlation_fig = px.imshow(numeric_df.corr(),
                            title='Correlation Heatmap')
correlation_fig.show()

# Calculating Credit Scores
The dataset doesn’t have any feature representing the credit scores of individuals. To calculate the credit scores, we need to use an appropriate technique. There are several widely used techniques for calculating credit scores, each with its own calculation process. One example is the FICO score, a commonly used credit scoring model in the industry.

Below is how we can implement the FICO score method to calculate credit scores:

In [29]:
# Import necessary libraries
import pandas as pd

# Define the mapping for categorical features
education_level_mapping = {'High School': 1, 'Bachelor': 2, 'Master': 3, 'PhD': 4}
employment_status_mapping = {'Unemployed': 0, 'Employed': 1, 'Self-Employed': 2}

# Apply mapping to categorical features
data['Education Level'] = data['Education Level'].map(education_level_mapping)
data['Employment Status'] = data['Employment Status'].map(employment_status_mapping)

# Define weights for the FICO formula
weights = {'Payment History': 0.35, 'Credit Utilization Ratio': 0.30, 'Number of Credit Accounts': 0.15, 'Education Level': 0.10, 'Employment Status': 0.10}

# Calculate credit scores using the FICO formula with vectorized operations
data['Credit Score'] = (data[list(weights.keys())] * list(weights.values())).sum(axis=1)

print(data.head())


   Age  Gender Marital Status  Education Level  Employment Status  \
0   60    Male        Married              NaN                NaN   
1   25    Male        Married              NaN                NaN   
2   30  Female         Single              NaN                NaN   
3   58  Female        Married              NaN                NaN   
4   32    Male        Married              NaN                NaN   

   Credit Utilization Ratio  Payment History  Number of Credit Accounts  \
0                      0.22           2685.0                          2   
1                      0.20           2371.0                          9   
2                      0.22           2771.0                          6   
3                      0.12           1371.0                          2   
4                      0.99            828.0                          2   

   Loan Amount  Interest Rate  Loan Term   Type of Loan  Credit Score  
0      4675000           2.65         48  Personal Loan       

In [30]:
# Apply mapping to categorical features
data['Education Level'] = data['Education Level'].map(education_level_mapping)
data['Employment Status'] = data['Employment Status'].map(employment_status_mapping)

In [31]:
# Calculate credit scores using the complete FICO formula
credit_scores = []

for index, row in data.iterrows():
    payment_history = row['Payment History']
    credit_utilization_ratio = row['Credit Utilization Ratio']
    number_of_credit_accounts = row['Number of Credit Accounts']
    education_level = row['Education Level']
    employment_status = row['Employment Status']

    # Apply the FICO formula to calculate the credit score
    credit_score = (payment_history * 0.35) + (credit_utilization_ratio * 0.30) + (number_of_credit_accounts * 0.15) + (education_level * 0.10) + (employment_status * 0.10)
    credit_scores.append(credit_score)

Below is how the above code works:

Firstly, it defines mappings for two categorical features: “Education Level” and “Employment Status”. The “Education Level” mapping assigns numerical values to different levels of education, such as “High School” being mapped to 1, “Bachelor” to 2, “Master” to 3, and “PhD” to 4. The “Employment Status” mapping assigns numerical values to different employment statuses, such as “Unemployed” being mapped to 0, “Employed” to 1, and “Self-Employed” to 2.
Next, the code applies the defined mappings to the corresponding columns in the DataFrame. It transforms the values of the “Education Level” and “Employment Status” columns from their original categorical form to the mapped numerical representations.
After that, the code initiates an iteration over each row of the DataFrame to calculate the credit scores for each individual. It retrieves the values of relevant features, such as “Payment History”, “Credit Utilization Ratio”, “Number of Credit Accounts”, “Education Level”, and “Employment Status”, from each row.
Within the iteration, the FICO formula is applied to calculate the credit score for each individual. The formula incorporates the weighted values of the features mentioned earlier:

35% weight for “Payment History”,
30% weight for “Credit Utilization Ratio”,
15% weight for “Number of Credit Accounts”,
10% weight for “Education Level”,
and 10% weight for “Employment Status”.
The calculated credit score is then stored in a list called “credit_scores”.



## Segmentation Based on Credit Scores

Now, let’s use the KMeans clustering algorithm to segment customers based on their credit scores:

In [32]:
from sklearn.cluster import KMeans

X = data[['Credit Score']]
kmeans = KMeans(n_clusters=4, n_init=10, random_state=42)
kmeans.fit(X)
data['Segment'] = kmeans.labels_

In [33]:
# Convert the 'Segment' column to category data type
data['Segment'] = data['Segment'].astype('category')

# Visualize the segments using Plotly
fig = px.scatter(data, x=data.index, y='Credit Score', color='Segment',
                 color_discrete_sequence=['green', 'blue', 'yellow', 'red'])
fig.update_layout(
    xaxis_title='Customer Index',
    yaxis_title='Credit Score',
    title='Customer Segmentation based on Credit Scores'
)
fig.show()

In [34]:
data['Segment'] = data['Segment'].map({2: 'Very Low',
                                       0: 'Low',
                                       1: 'Good',
                                       3: "Excellent"})

# Convert the 'Segment' column to category data type
data['Segment'] = data['Segment'].astype('category')

# Visualize the segments using Plotly
fig = px.scatter(data, x=data.index, y='Credit Score', color='Segment',
                 color_discrete_sequence=['green', 'blue', 'yellow', 'red'])
fig.update_layout(
    xaxis_title='Customer Index',
    yaxis_title='Credit Score',
    title='Customer Segmentation based on Credit Scores'
)
fig.show()

# Conclusion

Credit scoring and segmentation refer to the process of evaluating the creditworthiness of individuals or businesses and dividing them into distinct groups based on their credit profiles. It aims to assess the likelihood of borrowers repaying their debts and helps financial institutions make informed decisions regarding lending and managing credit risk.