## CREDIT RISK ASSESSMENT

Introduction:
A machine learning model to predict the probability of a borrower defaulting on a loan based on historical financial and borrower data.
Accurate prediction of credit risk helps financial institutions make informed lending decisions, reduce default rates, and manage financial risk effectively.

**About Dataset**

- Labels: "Status" Approved" or "Denied

- Features:

  - Duration: Time in months for the credit request or grant.
  - Credit History: Historical credit behavior and reliability.
  - Purpose: Intended use of the credit.
  - Amount: Requested or granted credit amount.
  - Savings: Level of applicant's savings or financial stability.
  - Employment Duration: Time employed at current job.
  - Installment Rate: Number of payments over time.
  - Person Status: Marital or personal status of the applicant.
  - Other Debtors: Presence of other debtors or co-applicants.

In [2]:
# Import dependencies

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [13]:
# Reading and exploring the dataset

credit = pd.read_csv('credit_risk.csv')

- Data Exploration

In [43]:
credit.head()

Unnamed: 0,status,duration,credit_history,purpose,amount,savings,employment_duration,installment_rate,personal_status_sex,other_debtors,...,property,age,other_installment_plans,housing,number_credits,job,people_liable,telephone,foreign_worker,credit_risk
0,... < 100 DM,6,critical account/other credits existing,domestic appliances,1169,unknown/no savings account,... >= 7 years,4,male : single,none,...,real estate,67,none,own,2,skilled employee/official,1,yes,yes,1
1,0 <= ... < 200 DM,48,existing credits paid back duly till now,domestic appliances,5951,... < 100 DM,1 <= ... < 4 years,2,female : divorced/separated/married,none,...,real estate,22,none,own,1,skilled employee/official,1,no,yes,0
2,no checking account,12,critical account/other credits existing,retraining,2096,... < 100 DM,4 <= ... < 7 years,2,male : single,none,...,real estate,49,none,own,1,unskilled - resident,2,no,yes,1
3,... < 100 DM,42,existing credits paid back duly till now,radio/television,7882,... < 100 DM,4 <= ... < 7 years,2,male : single,guarantor,...,building society savings agreement/life insurance,45,none,for free,1,skilled employee/official,2,no,yes,1
4,... < 100 DM,24,delay in paying off in the past,car (new),4870,... < 100 DM,1 <= ... < 4 years,3,male : single,none,...,unknown/no property,53,none,for free,2,skilled employee/official,2,no,yes,0


In [44]:
# Checking for the dataset information
credit.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 21 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   status                   1000 non-null   object
 1   duration                 1000 non-null   int64 
 2   credit_history           1000 non-null   object
 3   purpose                  1000 non-null   object
 4   amount                   1000 non-null   int64 
 5   savings                  1000 non-null   object
 6   employment_duration      1000 non-null   object
 7   installment_rate         1000 non-null   int64 
 8   personal_status_sex      1000 non-null   object
 9   other_debtors            1000 non-null   object
 10  present_residence        1000 non-null   int64 
 11  property                 1000 non-null   object
 12  age                      1000 non-null   int64 
 13  other_installment_plans  1000 non-null   object
 14  housing                  1000 non-null   

In [45]:
# Checking for NaN values
credit.isnull().sum()

status                     0
duration                   0
credit_history             0
purpose                    0
amount                     0
savings                    0
employment_duration        0
installment_rate           0
personal_status_sex        0
other_debtors              0
present_residence          0
property                   0
age                        0
other_installment_plans    0
housing                    0
number_credits             0
job                        0
people_liable              0
telephone                  0
foreign_worker             0
credit_risk                0
dtype: int64

In [46]:
# Checking for the statistical information
credit.describe()

Unnamed: 0,duration,amount,installment_rate,present_residence,age,number_credits,people_liable,credit_risk
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,20.903,3271.258,2.973,2.845,35.546,1.407,1.155,0.7
std,12.058814,2822.736876,1.118715,1.103718,11.375469,0.577654,0.362086,0.458487
min,4.0,250.0,1.0,1.0,19.0,1.0,1.0,0.0
25%,12.0,1365.5,2.0,2.0,27.0,1.0,1.0,0.0
50%,18.0,2319.5,3.0,3.0,33.0,1.0,1.0,1.0
75%,24.0,3972.25,4.0,4.0,42.0,2.0,1.0,1.0
max,72.0,18424.0,4.0,4.0,75.0,4.0,2.0,1.0


In [47]:
# Checking for duplicate values
credit.duplicated().sum()

0

In [48]:
# Checking for the feature corrlation 
credit.corr()

  credit.corr()


Unnamed: 0,duration,amount,installment_rate,present_residence,age,number_credits,people_liable,credit_risk
duration,1.0,0.624984,0.074749,0.034067,-0.036136,-0.011284,-0.023834,-0.214927
amount,0.624984,1.0,-0.271316,0.028926,0.032716,0.020795,0.017142,-0.154739
installment_rate,0.074749,-0.271316,1.0,0.049302,0.058266,0.021669,-0.071207,-0.072404
present_residence,0.034067,0.028926,0.049302,1.0,0.266419,0.089625,0.042643,-0.002967
age,-0.036136,0.032716,0.058266,0.266419,1.0,0.149254,0.118201,0.091127
number_credits,-0.011284,0.020795,0.021669,0.089625,0.149254,1.0,0.109667,0.045732
people_liable,-0.023834,0.017142,-0.071207,0.042643,0.118201,0.109667,1.0,0.003015
credit_risk,-0.214927,-0.154739,-0.072404,-0.002967,0.091127,0.045732,0.003015,1.0


In [49]:
# Checking for the list of the features
list(credit.columns)

['status',
 'duration',
 'credit_history',
 'purpose',
 'amount',
 'savings',
 'employment_duration',
 'installment_rate',
 'personal_status_sex',
 'other_debtors',
 'present_residence',
 'property',
 'age',
 'other_installment_plans',
 'housing',
 'number_credits',
 'job',
 'people_liable',
 'telephone',
 'foreign_worker',
 'credit_risk']

In [34]:
# credit['credit_risk']

- Seperating the dataset into dependent and independent variables

In [50]:
X = credit.drop('credit_risk', axis = 1)

In [51]:
y = credit['credit_risk']

In [52]:
X.head()

Unnamed: 0,status,duration,credit_history,purpose,amount,savings,employment_duration,installment_rate,personal_status_sex,other_debtors,present_residence,property,age,other_installment_plans,housing,number_credits,job,people_liable,telephone,foreign_worker
0,... < 100 DM,6,critical account/other credits existing,domestic appliances,1169,unknown/no savings account,... >= 7 years,4,male : single,none,4,real estate,67,none,own,2,skilled employee/official,1,yes,yes
1,0 <= ... < 200 DM,48,existing credits paid back duly till now,domestic appliances,5951,... < 100 DM,1 <= ... < 4 years,2,female : divorced/separated/married,none,2,real estate,22,none,own,1,skilled employee/official,1,no,yes
2,no checking account,12,critical account/other credits existing,retraining,2096,... < 100 DM,4 <= ... < 7 years,2,male : single,none,3,real estate,49,none,own,1,unskilled - resident,2,no,yes
3,... < 100 DM,42,existing credits paid back duly till now,radio/television,7882,... < 100 DM,4 <= ... < 7 years,2,male : single,guarantor,4,building society savings agreement/life insurance,45,none,for free,1,skilled employee/official,2,no,yes
4,... < 100 DM,24,delay in paying off in the past,car (new),4870,... < 100 DM,1 <= ... < 4 years,3,male : single,none,4,unknown/no property,53,none,for free,2,skilled employee/official,2,no,yes


In [53]:
y.head()

0    1
1    0
2    1
3    1
4    0
Name: credit_risk, dtype: int64

In [79]:
# Splitting the data into Training and Testing Dataset

from sklearn.model_selection import train_test_split

In [80]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)

- Preprocessing for numerical and categorical features.

In [81]:
credit.columns

Index(['status', 'duration', 'credit_history', 'purpose', 'amount', 'savings',
       'employment_duration', 'installment_rate', 'personal_status_sex',
       'other_debtors', 'present_residence', 'property', 'age',
       'other_installment_plans', 'housing', 'number_credits', 'job',
       'people_liable', 'telephone', 'foreign_worker', 'credit_risk'],
      dtype='object')

In [82]:
numeric_features = ['duration', 'amount', 'installment_rate', 'age', 'present_residence','number_credits', 'people_liable']

categorical_features = ['status','credit_history','purpose', 'savings','employment_duration','personal_status_sex','other_debtors','property','other_installment_plans','housing','telephone', 'foreign_worker'] 

- Feature Transformation

In [83]:
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LogisticRegression

In [85]:
numerical_transformer = StandardScaler()

In [86]:
categorical_transformer = OneHotEncoder(handle_unknown = 'ignore')

In [87]:
# Combine Transformer into a Preprocessor

preprocessor = ColumnTransformer(transformers = [('num', numerical_transformer, numeric_features),
                                                 ('cat', categorical_transformer, categorical_features)])

In [88]:
# Creating a Pipiline that includes preprocessor and the model

from sklearn.pipeline import Pipeline

In [89]:
model = Pipeline(steps = [('preprocessor', preprocessor),
                         ('classifier', LogisticRegression(max_iter = 1000))])

In [90]:
# Model fitting

model.fit(X_train, y_train)

- Model Evaluation

In [91]:
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score

In [92]:
credit_prediction = model.predict(X_test)

In [94]:
# credit_prediction

In [106]:
print('\nThe Classifcation Report of the Model: ')
print(classification_report(y_test, credit_prediction))


The Classifcation Report of the Model: 
              precision    recall  f1-score   support

           0       0.68      0.48      0.56        91
           1       0.80      0.90      0.85       209

    accuracy                           0.77       300
   macro avg       0.74      0.69      0.71       300
weighted avg       0.76      0.77      0.76       300



In [108]:
print('Confusion Matrix: ')
confusion_matrix(y_test, credit_prediction)

Confusion Matrix: 


array([[ 44,  47],
       [ 21, 188]], dtype=int64)

In [109]:
print("\nROC AUC Score:")
print(roc_auc_score(y_test, model.predict_proba(X_test)[:, 1]))


ROC AUC Score:
0.8172353961827646


- Feature Importance and Interpretation

In [111]:
# Extracting the feature names from the OneHotEncoder

feature_names = (numeric_features + list(model.named_steps['preprocessor'].named_transformers_['cat'].get_feature_names_out()))

In [113]:
# Getting the Coefficients

coefficients = model.named_steps['classifier'].coef_.flatten()

In [115]:
# Creating a DataFrame for Feature Importance

feature_importance = pd.DataFrame({'Feaure': feature_names, 'Coefficient': coefficients}).sort_values(by = 'Coefficient', ascending = False)

- Model Deployment

In [116]:
import joblib

In [117]:
# Save the model

joblib.dump(model, 'credit_risk_model.pkl')

['credit_risk_model.pkl']

In [118]:
# Load the model (for future use)

loaded_model = joblib.load('credit_risk_model.pkl')

In [119]:
loaded_model

In [None]:
preprocesso