## Credit Risk Modeling

* Analyze & Breakdown german credit Portfolio to learn about possible trends
 * POC Credit Scoring model 
   * **Project Goal:** The credit risk application scorecard would assist in evaluating probability of customer defaults. This end-to-end advanced credit risk analytics would focuses on:
     * Meeting business goals 
     * Optimization of the overall customer acquisition funnel
     * Build a holistic customer risk profile
  ##### PART 1 Data Preparation and Exporatory Data Analysis

  * Data Audit: Verifying data quality
  * Distribution of categories for each variable
  * Trend of variable vs default 
  * Handle missing values

    
 ##### PART 2 Feature Engineering
  * Cross variables
  * Ratios

  ##### PART 3 Characteristics Analysis
  * Characteristics Analysis Report
    * Fine Classing 
    * Coarse Classing 
    * Information value 
    * WoE transformations 
    * Univariate Analysis/ Feature Selection
  
 ##### PART 4 Scorecard Development
  * Modelling Phase
   * Variable Selection
  * Model Correlation Analysis
  * Model evaluation: 
     * Hold Out Sample Validation


In [1]:
#import necessary libraries
import numpy as np
import pandas as pd
import seaborn as sns
import re
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.metrics import fbeta_score
from sklearn.metrics import make_scorer
from sklearn import metrics
from matplotlib import pyplot
from sklearn.linear_model import LogisticRegression

from sklearn.ensemble import RandomForestClassifier


import statsmodels.api as sm
from sklearn.metrics import roc_auc_score, roc_curve, precision_recall_curve, auc, accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from scipy.stats import pearsonr


#import utility file
%run utils.ipynb

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

Util Libraries successfully imported


In [2]:
#read data
Train = pd.read_csv('Train.csv')
Test = pd.read_csv('Test.csv')

Train.head()

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,checking_balance,months_loan_duration,credit_history,purpose,amount,savings_balance,employment_length,installment_rate,personal_status,other_debtors,residence_history,property,age,installment_plan,housing,existing_credits,default,dependents,telephone,foreign_worker,job,gender,res_split1,residence_history_in mnths,employment_length_in mnths,purpose_grped,GBFlag,job_grped,credit_history_grped,Age_grped,residence_history_in mnths_grped,employment_length_in mnths_grped,savings_balance_grped,months_loan_duration_grped,checking_balance_grped
0,521,521,-39.0,18,repaid,radio/tv,3190,89.0,2 years,2,single,none,0 months,real estate,24,none,own,1,1,1,,yes,skilled employee,female,0.0,0.0,24.0,domestic appliance,Bad,skilled employee,delayed/repaid,"[24, 34)","[-inf, 2.0)","[3.0, 24.0)","[31.0, 155.0)","[16, 19)","[-inf, -15.0)"
1,737,737,-8.0,18,repaid,car (new),4380,157.0,2 years,3,single,none,18 years,other,35,none,own,1,0,2,2342986000.0,yes,unskilled resident,male,18.0,216.0,24.0,Car,Good,unskilled resident,delayed/repaid,"[34, 40)","[132.0, 240.0)","[3.0, 24.0)","[155.0, inf)","[16, 19)","[-15.0, 0.0)"
2,740,740,-20.0,24,fully repaid this bank,car (new),2325,436.0,5 years,2,single,none,4 years,other,32,bank,own,1,0,1,,yes,skilled employee,male,4.0,48.0,60.0,Car,Good,skilled employee,fully repaid,"[24, 34)","[2.0, 132.0)","[36.0, 84.0)","[155.0, inf)","[19, 31)","[-inf, -15.0)"
3,660,660,925.0,12,repaid,radio/tv,1297,11.0,1 years,3,married,none,14 years,real estate,23,none,rent,1,0,1,,yes,skilled employee,male,14.0,168.0,12.0,domestic appliance,Good,skilled employee,delayed/repaid,"[18, 24)","[132.0, 240.0)","[3.0, 24.0)","[1.0, 31.0)","[10, 16)","[102.0, inf)"
4,411,411,0.0,33,critical,car (used),7253,36.0,4 years,3,single,none,1 months,other,35,none,own,2,0,1,2349994000.0,yes,mangement self-employed,male,1.0,1.0,48.0,Car,Good,Others,critical,"[34, 40)","[-inf, 2.0)","[36.0, 84.0)","[31.0, 155.0)","[31, 37)","[-15.0, 0.0)"


Weight of Evidence (WoE) is a statistical technique used in credit scoring and predictive modeling to assess the predictive power of independent variables and their relationship with the target variable. WoE measures the strength of association between a categorical independent variable and the target variable by examining the distribution of the target variable within each category of the independent variable.

### Modeling

Different statistical modelling techniques can be employed for modeling credit risk. Central Bank of Nigeria which is the governing body of Nigeria banks emphasized on the importance of Financial institutions using an interpretable model. They don't want a black box model they won't understand. 
• Stepwise Logistic Weight of Evidence (WoE) Regression.

(i) stipulates models with a min of 30 Gini on train & test to be deemed relevant
(ii) One-hot encoding of variables (dummification) is an alternative to WoE approach 

In [3]:
cols = Train.columns.to_list()
cols.remove('GBFlag')
# cols.remove('LoanId')

## Get all WoE and IV of the variables and their categories 
variables_woe_iv = woe_iv(Train, cols, 'GBFlag')

In [4]:
variables_woe_iv.head()

Unnamed: 0,Variables,Categories,WOE,Variable IV
0,Unnamed: 0.1,0,-inf,inf
1,Unnamed: 0.1,2,-inf,inf
2,Unnamed: 0.1,3,-inf,inf
3,Unnamed: 0.1,5,-inf,inf
4,Unnamed: 0.1,6,-inf,inf


In [5]:
## Replace categories with their respective WoEs
train = replace_with_woe(Train)
test = replace_with_woe(Test)

train['GBFlag']= train['GBFlag'].replace({'Good': 0, 'Bad': 1})

test['GBFlag']= test['GBFlag'].replace({'Good': 0, 'Bad': 1})

Using Information value to select predictive variables. It The rule of thumb in using IV are highlighted in the table below

| Information Value | Variable Predictiveness | 
| -------------- | -------------- | 
| Less than 0.02 | Not useful for prediction   |
| 0.02 to 0.1  | Weak predictive Power   |
| 0.1 to 0.3  | Medium predictive Power   |
| 0.3 to 0.5  | Stong predictive Power   |
| Above 0.5  | Too good to be true   |

In [6]:
### Feature Selection With Variable Information Value(IV)
var_list = []
for var in variables_woe_iv['Variables'].unique():
        # Get Variable IV
        IV = variables_woe_iv[variables_woe_iv['Variables']==var]['Variable IV'].unique()[0]
        # If condition holds append to variable list
        if (IV<=0.5) & (IV >=0.02):
            var_list.append(var)
            
var_list

['credit_history',
 'purpose',
 'employment_length',
 'installment_rate',
 'personal_status',
 'other_debtors',
 'residence_history',
 'property',
 'installment_plan',
 'housing',
 'foreign_worker',
 'job',
 'gender',
 'res_split1',
 'residence_history_in mnths',
 'employment_length_in mnths',
 'purpose_grped',
 'job_grped',
 'credit_history_grped',
 'Age_grped',
 'employment_length_in mnths_grped',
 'savings_balance_grped',
 'months_loan_duration_grped',
 'checking_balance_grped']

In [7]:
X_train = Train[var_list]
y_train = Train['GBFlag']
                
X_test = Test[var_list]
y_test = Test['GBFlag']

In [8]:
#since we've regrouped some of these variables, it's okay to drop 
X_train.drop(['employment_length', 'residence_history', 'residence_history_in mnths', 'employment_length_in mnths'], axis=1, inplace=True )
X_test.drop(['employment_length', 'residence_history', 'residence_history_in mnths', 'employment_length_in mnths'], axis=1, inplace=True )


In [9]:
X_train.columns

Index(['credit_history', 'purpose', 'installment_rate', 'personal_status',
       'other_debtors', 'property', 'installment_plan', 'housing',
       'foreign_worker', 'job', 'gender', 'res_split1', 'purpose_grped',
       'job_grped', 'credit_history_grped', 'Age_grped',
       'employment_length_in mnths_grped', 'savings_balance_grped',
       'months_loan_duration_grped', 'checking_balance_grped'],
      dtype='object')

#### Correlation vs. Collinearity vs. Multicollinearity
  - Correlation measures the strength and direction between two columns in your dataset. Correlation is often used to find the relationship between a feature and the target
  - Collinearity, on the other hand, is a situation where two features are linearly associated (high correlation), and they are used as predictors for the target
  - Multicollinearity is a special case of collinearity where a feature exhibits a linear relationship with two or more features
#### Fixing Multicollinearity

A small R² value (close to 0) will cause the denominator to be large (1 minus a value close to 0 will give you a number close to 1). This will result in a small Variable Inflation Factor (VIF). A small VIF indicates that this feature exhibits low multicollinearity with the other features.
(1- R²) is also known as the tolerance.
#### Interpreting VIF Values
The valid value for VIF ranges from 1 to infinity. A rule of thumb for interpreting VIF values is:
- 1 — features are not correlated
- 1<VIF<5 — features are moderately correlated
- VIF>5 — features are highly correlated
- VIF>10 — high correlation between features and is cause for concern

In [10]:

def calculate_vif(df, features):    
    vif, tolerance = {}, {}
    # all the features that you want to examine
    for feature in features:
        # extract all the other features you will regress against
        X = [f for f in features if f != feature]        
        X, y = df[X], df[feature]
        # extract r-squar
        # ed from the fit
        r2 = LinearRegression().fit(X, y).score(X, y)                
        
        # calculate tolerance
        tolerance[feature] = 1 - r2
        # calculate VIF
        vif[feature] = 1/(tolerance[feature])
    # return VIF DataFrame
    return pd.DataFrame({'VIF': vif, 'Tolerance': tolerance}).sort_values(by='VIF', ascending=False)
  
vif = calculate_vif(df=X_train, features=[  'installment_rate',
       'personal_status', 'other_debtors', 'property', 'installment_plan',
       'housing', 'foreign_worker',  'gender', 'purpose_grped',
       'job_grped', 'credit_history_grped', 'Age_grped',
       'employment_length_in mnths_grped', 'savings_balance_grped',
       'months_loan_duration_grped', 'checking_balance_grped'])

In [11]:
vif

Unnamed: 0,VIF,Tolerance
property,1.199387,0.833759
housing,1.131417,0.883848
Age_grped,1.129897,0.885037
months_loan_duration_grped,1.107152,0.903218
gender,1.102866,0.906729
credit_history_grped,1.093901,0.91416
job_grped,1.083526,0.922913
employment_length_in mnths_grped,1.080101,0.925839
installment_plan,1.075423,0.929867
purpose_grped,1.071518,0.933256


#### Test of significance using chi-square test of independence

In [17]:
from scipy.stats import chi2_contingency
a = [col for col in X_train.columns]
X= Train[[  'installment_rate',
       'personal_status', 'other_debtors', 'property', 'installment_plan',
       'housing', 'foreign_worker',  'gender', 'purpose_grped',
       'job_grped', 'credit_history_grped', 'Age_grped',
       'employment_length_in mnths_grped', 'savings_balance_grped',
       'months_loan_duration_grped', 'checking_balance_grped','GBFlag']]
def select_features(X):
    '''
    This function estimates the statistical significance of categorical 
    variables with respect to predicting the target variable
    
    parameters ::
    - X :: XFrame Object
    
    output ::
    - Xframe result of variables in order of significance
    '''
    # define an empty dictionary to store chi-test results
    chi2_check = {}
    
    # select only categorical variables
#     training_X_cat = X.select_dtypes(include = ['object','category'])
    
    # iteratively pick columns and calculate chi statistic with the target variable
    for column in X.columns.tolist()[:-1]:
     
          chi, p, dof, ex = chi2_contingency(pd.crosstab(X['GBFlag'], X[column]))
          chi2_check.setdefault('Features',[]).append(column)
          chi2_check.setdefault('p-values',[]).append(round(p, 3))

    # convert the dictionary to a DF
    chi2_result = pd.DataFrame(data = chi2_check).sort_values('p-values', ascending=True, ignore_index=True)
    
    # return result
    return chi2_result


select_features = select_features(X)
select_features

Unnamed: 0,Features,p-values
0,credit_history_grped,0.0
1,savings_balance_grped,0.0
2,months_loan_duration_grped,0.0
3,checking_balance_grped,0.0
4,installment_plan,0.001
5,employment_length_in mnths_grped,0.001
6,property,0.006
7,housing,0.009
8,gender,0.01
9,Age_grped,0.027


In [374]:
vars = [  'installment_rate',
       'personal_status', 'other_debtors', 'property', 'installment_plan',
       'housing', 'foreign_worker',  'gender', 'purpose_grped',
       'job_grped', 'credit_history_grped', 'Age_grped',
       'employment_length_in mnths_grped', 'savings_balance_grped',
       'months_loan_duration_grped', 'checking_balance_grped']

Gini coefficient is used to evaluate model perfomance. Gini has a range from 0 to 1, where 0 is worse case scenario and 1 is best case scenario

In [375]:
X_train_sm = X_train[vars]
# X_test_sm = X_test[vars]

X_train_sm = sm.add_constant(X_train_sm)
X_test_sm = sm.add_constant(X_test[vars])


model = sm.Logit(y_train, X_train_sm)
result = model.fit()
display(result.summary2())
# Extract variables used for training from X_test
## Get Model Predictions 

test_pred = result.predict(X_test_sm)
train_pred = result.predict(X_train_sm)

Test_Gini, Train_Gini = compute_gini(y_test, y_train, test_pred, train_pred)
print('Model AIC:', round(result.aic, 3))
print('Model Train Gini:', round(Train_Gini, 3))
print('Model Test Gini:', round(Test_Gini, 3))

Optimization terminated successfully.
         Current function value: 0.482940
         Iterations 6


0,1,2,3
Model:,Logit,Method:,MLE
Dependent Variable:,GBFlag,Pseudo R-squared:,0.208
Date:,2023-06-29 00:17,AIC:,758.4101
No. Observations:,750,BIC:,836.9513
Df Model:,16,Log-Likelihood:,-362.21
Df Residuals:,733,LL-Null:,-457.30
Converged:,1.0000,LLR p-value:,7.5715e-32
No. Iterations:,6.0000,Scale:,1.0000

0,1,2,3,4,5,6
,Coef.,Std.Err.,z,P>|z|,[0.025,0.975]
const,-0.8596,0.0936,-9.1807,0.0000,-1.0431,-0.6761
installment_rate,1.7845,0.6107,2.9223,0.0035,0.5876,2.9814
personal_status,1.4512,0.5594,2.5943,0.0095,0.3548,2.5476
other_debtors,1.1177,0.5716,1.9555,0.0505,-0.0025,2.2380
property,0.4205,0.3508,1.1990,0.2305,-0.2669,1.1080
installment_plan,0.8966,0.3155,2.8423,0.0045,0.2783,1.5149
housing,0.5030,0.4002,1.2569,0.2088,-0.2814,1.2874
foreign_worker,0.6934,0.5595,1.2392,0.2153,-0.4033,1.7900
gender,0.9972,0.4539,2.1970,0.0280,0.1076,1.8869


Model AIC: 758.41
Model Train Gini: 0.607
Model Test Gini: 0.464


In [377]:
from sklearn.metrics import classification_report
prediction_test = list(map(round, test_pred))
prediction_train = list(map(round, train_pred))

print(classification_report(y_test, prediction_test))

              precision    recall  f1-score   support

           0       0.77      0.90      0.83       174
           1       0.62      0.38      0.47        76

    accuracy                           0.74       250
   macro avg       0.69      0.64      0.65       250
weighted avg       0.72      0.74      0.72       250



In [378]:
# cross validation

data = pd.concat([X_train[vars], X_test[vars]])
X= data

y = pd.concat([y_train, y_test])

from sklearn.model_selection import cross_val_score

clf = LogisticRegression()
scores = cross_val_score(clf, X, y, cv=5, scoring='roc_auc')
scores.mean()

0.77

This means the gini on CV is 0.54. Formula for Gini: (2 * auc_score) -1

After extensive methods of backward elimination, forward selection and stepwise selection of variable selection for modeling, these variables led to the best model. Gini Difference of < 0.13 between Train and Test Gini, good balance between recall and precision on the test set

In [417]:
vars = [  'installment_rate',
       'personal_status', 
       'other_debtors', 
       #'property', 
       'installment_plan',
       #'housing', 
       #'foreign_worker', 
       'gender',
       #'purpose_grped',
       #'job_grped', 
       'credit_history_grped', 
       'Age_grped',
       #'employment_length_in mnths_grped', 
       'savings_balance_grped',
       'months_loan_duration_grped',
       'checking_balance_grped']



X_train_sm = X_train[vars]
# X_test_sm = X_test[vars]

X_train_sm = sm.add_constant(X_train_sm)
X_test_sm = sm.add_constant(X_test[vars])


model = sm.Logit(y_train, X_train_sm)
result = model.fit()
display(result.summary2())
# Extract variables used for training from X_test
## Get Model Predictions 

test_pred = result.predict(X_test_sm)
train_pred = result.predict(X_train_sm)

Test_Gini, Train_Gini = compute_gini(y_test, y_train, test_pred, train_pred)
print('Model AIC:', round(result.aic, 3))
print('Model Train Gini:', round(Train_Gini, 3))
print('Model Test Gini:', round(Test_Gini, 3))

from sklearn.metrics import classification_report
prediction_test = list(map(round, test_pred))
prediction_train = list(map(round, train_pred))

print(classification_report(y_test, prediction_test))

# cross validation
data = pd.concat([X_train[vars], X_test[vars]])
X= data

y = pd.concat([y_train, y_test])

from sklearn.model_selection import cross_val_score

clf = LogisticRegression()
scores = cross_val_score(clf, X, y, cv=5, scoring='roc_auc')
scores.mean()

Optimization terminated successfully.
         Current function value: 0.493652
         Iterations 6


0,1,2,3
Model:,Logit,Method:,MLE
Dependent Variable:,GBFlag,Pseudo R-squared:,0.190
Date:,2023-06-29 00:41,AIC:,762.4785
No. Observations:,750,BIC:,813.2993
Df Model:,10,Log-Likelihood:,-370.24
Df Residuals:,739,LL-Null:,-457.30
Converged:,1.0000,LLR p-value:,3.8921e-32
No. Iterations:,6.0000,Scale:,1.0000

0,1,2,3,4,5,6
,Coef.,Std.Err.,z,P>|z|,[0.025,0.975]
const,-0.8606,0.0921,-9.3408,0.0000,-1.0412,-0.6801
installment_rate,1.7253,0.6005,2.8732,0.0041,0.5484,2.9023
personal_status,1.6110,0.5505,2.9263,0.0034,0.5320,2.6900
other_debtors,1.4067,0.5462,2.5752,0.0100,0.3361,2.4773
installment_plan,0.9309,0.3122,2.9814,0.0029,0.3189,1.5429
gender,1.1790,0.4400,2.6795,0.0074,0.3166,2.0415
credit_history_grped,0.8684,0.1899,4.5733,0.0000,0.4962,1.2405
Age_grped,0.7738,0.3305,2.3416,0.0192,0.1261,1.4215
savings_balance_grped,1.0958,0.2590,4.2303,0.0000,0.5881,1.6035


Model AIC: 762.478
Model Train Gini: 0.58
Model Test Gini: 0.454
              precision    recall  f1-score   support

           0       0.78      0.88      0.82       174
           1       0.60      0.42      0.50        76

    accuracy                           0.74       250
   macro avg       0.69      0.65      0.66       250
weighted avg       0.72      0.74      0.72       250



0.7664761904761905

Diff of Train and Test gini is 0.126

In [445]:

import json
with open('Credit_bureau_sample_data.json') as xml_file:
    data_dict = json.load(xml_file[0])
    
    
    # json_data = json.dumps(data_dict['data']['consumerfullcredit'])
json_data = pd.json_normalize(data_dict['data']['consumerfullcredit'])
json_data

TypeError: '_io.TextIOWrapper' object is not subscriptable

In [455]:
data_dict[0]['data']['consumerfullcredit']

{'subjectlist': {'reference': '12876566',
  'consumerid': '17628566',
  'searchoutput': 'XXX '},
 'accountrating': {'noofotheraccountsbad': '0',
  'noofotheraccountsgood': '3',
  'noofretailaccountsbad': '0',
  'noofretailaccountsgood': '2',
  'nooftelecomaccountsbad': '0',
  'noofautoloanaccountsbad': '0',
  'noofautoloanccountsgood': '0',
  'noofhomeloanaccountsbad': '0',
  'nooftelecomaccountsgood': '0',
  'noofhomeloanaccountsgood': '0',
  'noofjointloanaccountsbad': '0',
  'noofstudyloanaccountsbad': '0',
  'noofcreditcardaccountsbad': '0',
  'noofjointloanaccountsgood': '0',
  'noofstudyloanaccountsgood': '0',
  'noofcreditcardaccountsgood': '1',
  'noofpersonalloanaccountsbad': '0',
  'noofpersonalloanaccountsgood': '1'},
 'enquirydetails': {'productid': '45',
  'matchingrate': '90',
  'subscriberenquiryengineid': '5012874225',
  'subscriberenquiryresultid': '6381470'},
 'guarantorcount': {'accounts': '0', 'guarantorssecured': '0'},
 'guarantordetails': {'guarantorgender': None,

In [458]:
data_dict[0]['data']['consumerfullcredit']['creditaccountsummary']

{'rating': '13',
 'amountarrear': '24,041.00',
 'amountarrear1': '0.00',
 'totalaccounts': '7',
 'totalaccounts1': '0',
 'lastjudgementdate': '-',
 'lastjudgementdate1': '-',
 'totalaccountarrear': '2',
 'totalaccountarrear1': '0',
 'totaljudgementamount': '0',
 'totaloutstandingdebt': '105,435.00',
 'totaljudgementamount1': '0',
 'totaloutstandingdebt1': '0.00',
 'totaldishonouredamount': '0.00',
 'totalmonthlyinstalment': '77,404.00',
 'totalnumberofjudgement': '0',
 'totaldishonouredamount1': '0.00',
 'totalmonthlyinstalment1': '0.00',
 'totalnumberofjudgement1': '0',
 'totalnumberofdishonoured': '0',
 'totalnumberofdishonoured1': '0',
 'totalaccountingodcondition': '0',
 'totalaccountingodcondition1': '0'}

In [None]:
vars = ['accountrating', 'creditaccountsummary','enquirydetails', 'deliquencyinformation', 'creditagreementsummary' ]

In [414]:
2*0.7634

1.5268

#### second iteration

In [385]:
### InSignificant variables minus constant
insignificant = result.pvalues[result.pvalues > 0.05].index.tolist()[1:]
# significant.remove('const')
len(insignificant)

6

In [386]:
X_train_sm.drop(insignificant, axis=1, inplace=True)
X_test_sm.drop(insignificant, axis=1, inplace=True)

## Train a new model with the significant variables from earlier model
model = sm.Logit(np.array(y_train), X_train_sm)
result_2 = model.fit()

Optimization terminated successfully.
         Current function value: 0.491976
         Iterations 6


In [387]:
## Get Model Predictions 
display(result_2.summary())
test_pred = result_2.predict(X_test_sm)
train_pred = result_2.predict(X_train_sm)
Test_Gini, Train_Gini = compute_gini(y_test, y_train, test_pred, train_pred)
print('Model AIC:', round(result_2.aic, 3))
print('Model Train Gini:', round(Train_Gini, 3))
print('Model Test Gini:', round(Test_Gini, 3))

0,1,2,3
Dep. Variable:,y,No. Observations:,750.0
Model:,Logit,Df Residuals:,739.0
Method:,MLE,Df Model:,10.0
Date:,"Thu, 29 Jun 2023",Pseudo R-squ.:,0.1931
Time:,00:26:17,Log-Likelihood:,-368.98
converged:,True,LL-Null:,-457.3
Covariance Type:,nonrobust,LLR p-value:,1.172e-32

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,-0.8575,0.092,-9.292,0.000,-1.038,-0.677
installment_rate,1.6897,0.600,2.816,0.005,0.514,2.866
personal_status,1.4527,0.548,2.652,0.008,0.379,2.526
other_debtors,1.3002,0.557,2.336,0.020,0.209,2.391
installment_plan,0.8836,0.311,2.842,0.004,0.274,1.493
gender,1.2305,0.437,2.814,0.005,0.373,2.088
credit_history_grped,0.8218,0.189,4.340,0.000,0.451,1.193
employment_length_in mnths_grped,0.7425,0.265,2.797,0.005,0.222,1.263
savings_balance_grped,1.0928,0.259,4.220,0.000,0.585,1.600


Model AIC: 759.964
Model Train Gini: 0.585
Model Test Gini: 0.425


In [388]:
from sklearn.metrics import classification_report
prediction_test = list(map(round, test_pred))
prediction_train = list(map(round, train_pred))

print(classification_report(y_test, prediction_test))

              precision    recall  f1-score   support

           0       0.76      0.89      0.82       174
           1       0.57      0.36      0.44        76

    accuracy                           0.72       250
   macro avg       0.67      0.62      0.63       250
weighted avg       0.70      0.72      0.70       250

