### Variables Description

    1. Age (numeric)
    2. Sex (text: male, female)
    3. Job (numeric: 0 - unskilled and non-resident, 1 - unskilled and resident, 2 - skilled, 3 - highly skilled)
    4. Housing (text: own, rent, or free)
    5. Saving accounts (text - little, moderate, quite rich, rich)
    6. Checking account (numeric, in DM - Deutsch Mark)
    7. Duration (numeric, in month)
    8. Purpose (text: car, furniture/equipment, radio/TV, domestic appliances, repairs, education, business, vacation/others)
    9. Risk (good or bad. bad means default. Dependent Variable)
    10. Credit amount (numeric, in DM)
    11. Recovered Principle- Amount of Princeiple recovred from bad cases
    12. Recoveries- Total recoveries made from bad cases
    


# Population Stability Index

Topics Covered Here
    1. Loading the Libraries & Dataset
    2. Missing Value Treatment
    3. Binning Categorical Variables
    4. Binning Continuous Variables
    5. Loading the Scorecard Dataset
    6. Making Bins of Scores
    7. Population Stability Index




## 1. Loading the Libraries & Dataset

In [1]:
#import some necessary librairies

import os
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

color = sns.color_palette()
sns.set_style('darkgrid')
plt.figure(figsize=(12,8))

import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.float_format', lambda x: '{:.2f}'.format(x)) #Limiting floats output to 3 decimal points
os.getcwd()

'C:\\Users\\sohai\\Desktop\\My Personal\\1. FinTech\\Stephanie\\Credit Risk'

<Figure size 864x576 with 0 Axes>

In [2]:
#Change the working directory
%cd "C:\Users\sohai\Desktop\My Personal\1. FinTech\Stephanie\Credit Risk"
print(os.listdir())

C:\Users\sohai\Desktop\My Personal\1. FinTech\Stephanie\Credit Risk
['.ipynb_checkpoints', '3.A- Binning of Variables.ipynb', '3.B- PD & Scorecard Model.ipynb', '3.C- Population Stability Index.ipynb', '4. LGD, EAD & EL Models.ipynb', 'German binned_data.xlsx', 'German Credit Data.xlsx', 'German Credit Final CRM.xlsx', 'lgd_model_stage_2.sav', 'pd_model.sav', 'reg_lgd.sav', 'scorecard values.xlsx', 'Test Set Score.csv']


In [3]:
#Now let's import and put the train and test datasets in  pandas dataframe

df = pd.read_excel("German Credit Data.xlsx", sheet_name= 'PSI', date_parser=True)
df.head(3)

Unnamed: 0,Age,Sex,Job,Housing,Saving accounts,Checking account,Duration,Purpose,Risk,Credit amount,Recovered_Principle,Recoveries
0,25,female,1,own,little,moderate,24,furniture/equipment,bad,25000,0.0,26308.47
1,31,male,2,rent,little,little,24,business,bad,18000,0.0,21941.26
2,32,female,3,own,little,moderate,48,vacation/others,bad,14700,374.1,14746.89


In [4]:

df = 'https://raw.githubusercontent.com/TheJuniorLebowski/Data/master/German%20Credit%20Data.csv'


# Since it has both CRM training data and PSI data, we only need PSI data.
df = pd.read_csv(df)
df = df[df['type']== 'PSI Data']
df = df.drop(['type'], axis=1)
df.head(4)

Unnamed: 0,Age,Sex,Job,Housing,Saving accounts,Checking account,Duration,Purpose,Risk,Credit amount,Recovered_Principle,Recoveries
901,25,female,1,own,little,moderate,24,furniture/equipment,bad,25000,0.0,26308.47
902,31,male,2,rent,little,little,24,business,bad,18000,0.0,21941.26
903,32,female,3,own,little,moderate,48,vacation/others,bad,14700,374.1,14746.89
904,68,male,3,own,little,little,6,car,bad,12000,0.0,12679.5


## 2. Missing Value Treatment

    1. We delete any variable which has more than 20% of the observations as missing value. We already did this 
    2. We replace the missing value of a numeric variable with its median values
    3. We replace the missing value of a categorical variable with its modal values

    

In [5]:
#missing data
total = df.isnull().sum().sort_values(ascending=False)
percent = (df.isnull().sum()/df.isnull().count()).sort_values(ascending=False)
missing_data = pd.concat([total, percent], axis=1, keys=['Total Missing', 'Percent Missing'])
missing_data.head(20)

Unnamed: 0,Total Missing,Percent Missing
Checking account,38,0.38
Saving accounts,20,0.2
Recoveries,0,0.0
Recovered_Principle,0,0.0
Credit amount,0,0.0
Risk,0,0.0
Purpose,0,0.0
Duration,0,0.0
Housing,0,0.0
Job,0,0.0


In [6]:
# Delete any variable that has more than 20% of the data as missing. We already did this above
#df = df.dropna(thresh=0.8*len(df), axis=1)
#df.isnull().sum()

In [7]:
# Replace missing values with median in numeric variables
for col in df.select_dtypes(include=np.number):
    df[col] = df[col].fillna(df[col].median())
    

df.isnull().sum()

Age                     0
Sex                     0
Job                     0
Housing                 0
Saving accounts        20
Checking account       38
Duration                0
Purpose                 0
Risk                    0
Credit amount           0
Recovered_Principle     0
Recoveries              0
dtype: int64

In [8]:
# Replace missing values with mode for categorical variables
df.fillna(df.select_dtypes(include='object').mode().iloc[0], inplace=True)
    

df.isnull().sum()

Age                    0
Sex                    0
Job                    0
Housing                0
Saving accounts        0
Checking account       0
Duration               0
Purpose                0
Risk                   0
Credit amount          0
Recovered_Principle    0
Recoveries             0
dtype: int64

In [9]:
df['Risk'].value_counts()

good    68
bad     31
Name: Risk, dtype: int64

## 3. Binning Categorical Variables

We do the binning for numeric variables exactly similar to what we did in "Binning of Variables" notebook.

In [10]:
# Here we take bad case as 0 and good cases as 1
df['default'] = np.where(df['Risk'].isin(['good']),0,1)
df.head(5)

Unnamed: 0,Age,Sex,Job,Housing,Saving accounts,Checking account,Duration,Purpose,Risk,Credit amount,Recovered_Principle,Recoveries,default
901,25,female,1,own,little,moderate,24,furniture/equipment,bad,25000,0.0,26308.47,1
902,31,male,2,rent,little,little,24,business,bad,18000,0.0,21941.26,1
903,32,female,3,own,little,moderate,48,vacation/others,bad,14700,374.1,14746.89,1
904,68,male,3,own,little,little,6,car,bad,12000,0.0,12679.5,1
905,33,male,2,own,moderate,little,24,furniture/equipment,bad,12000,0.0,12427.85,1


In [11]:
df.columns

Index(['Age', 'Sex', 'Job', 'Housing', 'Saving accounts', 'Checking account',
       'Duration', 'Purpose', 'Risk', 'Credit amount', 'Recovered_Principle',
       'Recoveries', 'default'],
      dtype='object')

In [12]:
# Let's drop variables that arent used in PD model
df_PSI = df.drop(['Credit amount', 'Purpose', 'Risk', 'Recovered_Principle', 'Recoveries' ], axis=1)
df_PSI.head(3)

Unnamed: 0,Age,Sex,Job,Housing,Saving accounts,Checking account,Duration,default
901,25,female,1,own,little,moderate,24,1
902,31,male,2,rent,little,little,24,1
903,32,female,3,own,little,moderate,48,1


In [13]:
df_PSI = pd.get_dummies(df, columns = ['Sex','Job', 'Housing', 'Saving accounts', 'Checking account'])
df_PSI.head(3)


Unnamed: 0,Age,Duration,Purpose,Risk,Credit amount,Recovered_Principle,Recoveries,default,Sex_female,Sex_male,...,Housing_free,Housing_own,Housing_rent,Saving accounts_little,Saving accounts_moderate,Saving accounts_quite rich,Saving accounts_rich,Checking account_little,Checking account_moderate,Checking account_rich
901,25,24,furniture/equipment,bad,25000,0.0,26308.47,1,1,0,...,0,1,0,1,0,0,0,0,1,0
902,31,24,business,bad,18000,0.0,21941.26,1,0,1,...,0,0,1,1,0,0,0,1,0,0
903,32,48,vacation/others,bad,14700,374.1,14746.89,1,1,0,...,0,1,0,1,0,0,0,0,1,0


## 4. Binning Continuous Variables

We do the binning for numeric variables exactly similar to what we did in "Binning of Variables" notebook.

In [14]:
df_PSI['Age:<25'] = np.where((df_PSI['Age'] < 25), 1, 0)
df_PSI['Age:25-35'] = np.where((df_PSI['Age']>=25) & (df_PSI['Age'] < 35), 1, 0)
df_PSI['Age:35-45'] = np.where((df_PSI['Age']>=35) & (df_PSI['Age'] < 45), 1, 0)
df_PSI['Age:45-50'] = np.where((df_PSI['Age']>=45) & (df_PSI['Age'] < 50), 1, 0)

df_PSI['Age:>= 50'] = np.where((df_PSI['Age'] >= 50), 1, 0)

df_PSI.tail(5)

Unnamed: 0,Age,Duration,Purpose,Risk,Credit amount,Recovered_Principle,Recoveries,default,Sex_female,Sex_male,...,Saving accounts_quite rich,Saving accounts_rich,Checking account_little,Checking account_moderate,Checking account_rich,Age:<25,Age:25-35,Age:35-45,Age:45-50,Age:>= 50
995,50,12,car,good,20000,20000.0,0.0,0,0,1,...,0,0,1,0,0,0,0,0,0,1
996,31,12,furniture/equipment,good,11350,11350.0,0.0,0,1,0,...,0,0,1,0,0,0,1,0,0,0
997,40,30,car,good,3200,3200.0,0.0,0,0,1,...,0,0,1,0,0,0,0,1,0,0
998,38,12,radio/TV,good,15475,15475.0,0.0,0,0,1,...,0,0,1,0,0,0,0,1,0,0
999,27,45,car,good,10000,10000.0,0.0,0,0,1,...,0,0,0,1,0,0,1,0,0,0


In [15]:
df_PSI['Duration:<7'] = np.where((df_PSI['Duration'] < 7), 1, 0)
df_PSI['Duration:7-14'] = np.where((df_PSI['Duration']>=7) & (df_PSI['Duration'] < 14), 1, 0)
df_PSI['Duration:14-21'] = np.where((df_PSI['Duration']>=14) & (df_PSI['Duration'] < 21), 1, 0)
df_PSI['Duration:21-36'] = np.where((df_PSI['Duration']>=21) & (df_PSI['Duration'] < 36), 1, 0)

df_PSI['Duration:>= 36'] = np.where((df_PSI['Duration'] >= 36), 1, 0)

df_PSI.tail(5)


Unnamed: 0,Age,Duration,Purpose,Risk,Credit amount,Recovered_Principle,Recoveries,default,Sex_female,Sex_male,...,Age:<25,Age:25-35,Age:35-45,Age:45-50,Age:>= 50,Duration:<7,Duration:7-14,Duration:14-21,Duration:21-36,Duration:>= 36
995,50,12,car,good,20000,20000.0,0.0,0,0,1,...,0,0,0,0,1,0,1,0,0,0
996,31,12,furniture/equipment,good,11350,11350.0,0.0,0,1,0,...,0,1,0,0,0,0,1,0,0,0
997,40,30,car,good,3200,3200.0,0.0,0,0,1,...,0,0,1,0,0,0,0,0,1,0
998,38,12,radio/TV,good,15475,15475.0,0.0,0,0,1,...,0,0,1,0,0,0,1,0,0,0
999,27,45,car,good,10000,10000.0,0.0,0,0,1,...,0,1,0,0,0,0,0,0,0,1


In [16]:
df_PSI['Loan:<3500'] = np.where((df_PSI['Credit amount'] < 3500), 1, 0)
df_PSI['Loan:3.5k-8k'] = np.where((df_PSI['Credit amount']>=3500) & (df_PSI['Credit amount'] < 8000), 1, 0)
df_PSI['Loan:8k-12k'] = np.where((df_PSI['Credit amount']>=3000) & (df_PSI['Credit amount'] < 12000), 1, 0)
df_PSI['Loan:12k-20k'] = np.where((df_PSI['Credit amount']>=4500) & (df_PSI['Credit amount'] < 20000), 1, 0)
df_PSI['Loan:20-26k'] = np.where((df_PSI['Credit amount']>=7000) & (df_PSI['Credit amount'] < 26000), 1, 0)


df_PSI['Loan:>= 26k'] = np.where((df_PSI['Credit amount'] >= 26000), 1, 0)

df_PSI.tail(5)


Unnamed: 0,Age,Duration,Purpose,Risk,Credit amount,Recovered_Principle,Recoveries,default,Sex_female,Sex_male,...,Duration:7-14,Duration:14-21,Duration:21-36,Duration:>= 36,Loan:<3500,Loan:3.5k-8k,Loan:8k-12k,Loan:12k-20k,Loan:20-26k,Loan:>= 26k
995,50,12,car,good,20000,20000.0,0.0,0,0,1,...,1,0,0,0,0,0,0,0,1,0
996,31,12,furniture/equipment,good,11350,11350.0,0.0,0,1,0,...,1,0,0,0,0,0,1,1,1,0
997,40,30,car,good,3200,3200.0,0.0,0,0,1,...,0,0,1,0,1,0,1,0,0,0
998,38,12,radio/TV,good,15475,15475.0,0.0,0,0,1,...,1,0,0,0,0,0,0,1,1,0
999,27,45,car,good,10000,10000.0,0.0,0,0,1,...,0,0,0,1,0,0,1,1,1,0


## 5. Loading the Scorecard Dataset

We saved this file in the "PD & Scorecard Model" script. We simply upload it to do scoring on old dataset and new dataset. Then we check the scores difference to understand if there is significance difference between both the populations.

In [17]:
scorecard = pd.read_excel('scorecard values.xlsx', sheet_name='Sheet1')
scorecard.head(3)

Unnamed: 0.1,Unnamed: 0,index,Feature name,Coefficients,p_values,Original feature name,Score - Calculation,Score - Preliminary,Difference,Score - Final
0,0,0,Intercept,-1.14,,Intercept,595.47,595,-0.47,595
1,1,1,Sex_male,-0.18,0.35,Sex,-14.96,-15,-0.04,-15
2,2,2,Housing_own,-0.68,0.01,Housing,-56.39,-56,0.39,-56


In [18]:
# Now, there is only difference of intercept. We add intercept and give zero value
df_PSI.insert(0, 'Intercept', 1)

In [19]:
scorecard = scorecard.drop(['Unnamed: 0', 'index'], axis=1)
scorecard.head(3)

Unnamed: 0,Feature name,Coefficients,p_values,Original feature name,Score - Calculation,Score - Preliminary,Difference,Score - Final
0,Intercept,-1.14,,Intercept,595.47,595,-0.47,595
1,Sex_male,-0.18,0.35,Sex,-14.96,-15,-0.04,-15
2,Housing_own,-0.68,0.01,Housing,-56.39,-56,0.39,-56


In [20]:
print(df_PSI.shape)
df_PSI_final = df_PSI[scorecard['Feature name'].values]
df_PSI_final.shape

(99, 41)


(99, 29)

In [21]:
#verifying if the values are in order
display(df_PSI_final.columns)
display(scorecard['Feature name'].values)

Index(['Intercept', 'Sex_male', 'Housing_own', 'Housing_rent',
       'Saving accounts_moderate', 'Saving accounts_quite rich',
       'Saving accounts_rich', 'Checking account_moderate',
       'Checking account_rich', 'Age:25-35', 'Age:35-45', 'Age:45-50',
       'Age:>= 50', 'Duration:7-14', 'Duration:14-21', 'Duration:21-36',
       'Duration:>= 36', 'Loan:3.5k-8k', 'Loan:8k-12k', 'Loan:12k-20k',
       'Loan:20-26k', 'Loan:>= 26k', 'Sex_female', 'Housing_free',
       'Saving accounts_little', 'Checking account_little', 'Age:<25',
       'Duration:<7', 'Loan:<3500'],
      dtype='object')

array(['Intercept', 'Sex_male', 'Housing_own', 'Housing_rent',
       'Saving accounts_moderate', 'Saving accounts_quite rich',
       'Saving accounts_rich', 'Checking account_moderate',
       'Checking account_rich', 'Age:25-35', 'Age:35-45', 'Age:45-50',
       'Age:>= 50', 'Duration:7-14', 'Duration:14-21', 'Duration:21-36',
       'Duration:>= 36', 'Loan:3.5k-8k', 'Loan:8k-12k', 'Loan:12k-20k',
       'Loan:20-26k', 'Loan:>= 26k', 'Sex_female', 'Housing_free',
       'Saving accounts_little', 'Checking account_little', 'Age:<25',
       'Duration:<7', 'Loan:<3500'], dtype=object)

In [22]:
scorecard_scores = scorecard['Score - Final']
scorecard_scores = scorecard_scores.values.reshape(scorecard.shape[0], 1)

In [23]:

PSI_scores = df_PSI_final.dot(scorecard_scores)
# Here we multiply the values of each row of the dataframe by the values of each column of the variable,
# which is an argument of the 'dot' method, and sum them. It's essentially the sum of the products.
PSI_scores.head()


Unnamed: 0,0
901,715
902,665
903,757
904,536
905,613


In [24]:
#Adding the score column
df_PSI_final['Score'] = PSI_scores


In [25]:
test = pd.read_csv("Test Set Score.csv")

test.rename(columns = {'Final_score': 'Score'}, inplace=True)

test.head(2)

Unnamed: 0.1,Unnamed: 0,Sex_female,Sex_male,Job_0,Job_1,Job_2,Job_3,Housing_free,Housing_own,Housing_rent,...,Loan:<3500,Loan:3.5k-8k,Loan:8k-12k,Loan:12k-20k,Loan:20-26k,Loan:>= 26k,Predicted_Class,Actual,Predicted Prob,Score
0,60,0,1,0,0,1,0,0,1,0,...,0,0,1,1,1,0,0,0,0.39,652.0
1,338,0,1,0,0,1,0,0,1,0,...,0,0,1,1,1,0,0,0,0.32,625.0


## 6. Making Bins of Scores

In [26]:
test['Score:300-350'] = np.where((test['Score'] >= 300) & (test['Score'] < 350), 1, 0)
test['Score:350-400'] = np.where((test['Score'] >= 350) & (test['Score'] < 400), 1, 0)
test['Score:400-450'] = np.where((test['Score'] >= 400) & (test['Score'] < 450), 1, 0)
test['Score:450-500'] = np.where((test['Score'] >= 450) & (test['Score'] < 500), 1, 0)
test['Score:500-550'] = np.where((test['Score'] >= 500) & (test['Score'] < 550), 1, 0)
test['Score:550-600'] = np.where((test['Score'] >= 550) & (test['Score'] < 600), 1, 0)
test['Score:600-650'] = np.where((test['Score'] >= 600) & (test['Score'] < 650), 1, 0)
test['Score:650-700'] = np.where((test['Score'] >= 650) & (test['Score'] < 700), 1, 0)
test['Score:700-750'] = np.where((test['Score'] >= 700) & (test['Score'] < 750), 1, 0)
test['Score:750-800'] = np.where((test['Score'] >= 750) & (test['Score'] < 800), 1, 0)
test['Score:800-850'] = np.where((test['Score'] >= 800) & (test['Score'] <= 850), 1, 0)

In [27]:
df_PSI_final['Score:300-350'] = np.where((df_PSI_final['Score'] >= 300) & (df_PSI_final['Score'] < 350), 1, 0)
df_PSI_final['Score:350-400'] = np.where((df_PSI_final['Score'] >= 350) & (df_PSI_final['Score'] < 400), 1, 0)
df_PSI_final['Score:400-450'] = np.where((df_PSI_final['Score'] >= 400) & (df_PSI_final['Score'] < 450), 1, 0)
df_PSI_final['Score:450-500'] = np.where((df_PSI_final['Score'] >= 450) & (df_PSI_final['Score'] < 500), 1, 0)
df_PSI_final['Score:500-550'] = np.where((df_PSI_final['Score'] >= 500) & (df_PSI_final['Score'] < 550), 1, 0)
df_PSI_final['Score:550-600'] = np.where((df_PSI_final['Score'] >= 550) & (df_PSI_final['Score'] < 600), 1, 0)
df_PSI_final['Score:600-650'] = np.where((df_PSI_final['Score'] >= 600) & (df_PSI_final['Score'] < 650), 1, 0)
df_PSI_final['Score:650-700'] = np.where((df_PSI_final['Score'] >= 650) & (df_PSI_final['Score'] < 700), 1, 0)
df_PSI_final['Score:700-750'] = np.where((df_PSI_final['Score'] >= 700) & (df_PSI_final['Score'] < 750), 1, 0)
df_PSI_final['Score:750-800'] = np.where((df_PSI_final['Score'] >= 750) & (df_PSI_final['Score'] < 800), 1, 0)
df_PSI_final['Score:800-850'] = np.where((df_PSI_final['Score'] >= 800) & (df_PSI_final['Score'] <= 850), 1, 0)


## 7. Population Stability Index

#### Calculation and Interpretation

Main goal of PSI is to estimate whether the new data is too different from original data on which scorecard was created. Smaller of values of PSI (PSI=0) means there is no difference between new and original data. Larger of values of PSI (PSI=1) means there is difference between new and original data; and we might have to take some action (retrain the statistical model) depending upon the actual value of PSI.

    PSI = 0          No difference
    PSI < 0.1        Little to no difference
    0.1 <PSI <0.25   Little difference (No action is taken)
    PSI > 0.25       Big difference (Action needs to be taken)
    PSI = 1          Absolute difference


In [28]:
PSI_calc_test = test.sum() / test.shape[0]
# We create a dataframe with proportions of observations for each dummy variable for the old ("expected") data

PSI_calc_latest = df_PSI_final.sum() / df_PSI_final.shape[0]
# We create a dataframe with proportions of observations for each dummy variable for the old ("expected") data

PSI_calc = pd.concat([PSI_calc_test, PSI_calc_latest], axis = 1)
# We concatenate the two dataframes along the columns.

In [29]:
PSI_calc = PSI_calc.reset_index()
# We reset the index of the dataframe. The index becomes from 0 to the total number of rows less one.
# The old index, which is the dummy variable name, becomes a column, named 'index'.
PSI_calc['Original feature name'] = PSI_calc['index'].str.split('[:_]').str[0]
# We create a new column, called 'Original feature name', which contains the value of the 'Feature name' column,
# up to the column symbol.
PSI_calc.columns = ['index', 'Proportions_Test', 'Proportions_New', 'Original feature name']
# We change the names of the columns of the dataframe.

PSI_calc = PSI_calc[np.array(['index', 'Original feature name', 'Proportions_Test', 'Proportions_New'])]

PSI_calc

Unnamed: 0,index,Original feature name,Proportions_Test,Proportions_New
0,Unnamed: 0,Unnamed,470.95,
1,Sex_female,Sex,0.32,0.28
2,Sex_male,Sex,0.68,0.72
3,Job_0,Job,0.02,
4,Job_1,Job,0.17,
5,Job_2,Job,0.66,
6,Job_3,Job,0.15,
7,Housing_free,Housing,0.13,0.07
8,Housing_own,Housing,0.69,0.73
9,Housing_rent,Housing,0.19,0.2


In [30]:
PSI_calc = PSI_calc[(PSI_calc['index'] != 'Intercept') & (PSI_calc['index'] != 'Score')]
# We remove the rows with values in the 'index' column 'Intercept' and 'Score'.

In [31]:
PSI_calc['Contribution'] = np.where((PSI_calc['Proportions_Test'] == 0) | (PSI_calc['Proportions_New'] == 0), 0, (PSI_calc['Proportions_New'] - PSI_calc['Proportions_Test']) * np.log(PSI_calc['Proportions_New'] / PSI_calc['Proportions_Test']))
# We calculate the contribution of each dummy variable to the PSI of each original variable it comes from.
# If either the proportion of old data or the proportion of new data are 0, the contribution is 0.
# Otherwise, we apply the PSI formula for each contribution.

In [32]:
PSI_calc

Unnamed: 0,index,Original feature name,Proportions_Test,Proportions_New,Contribution
0,Unnamed: 0,Unnamed,470.95,,
1,Sex_female,Sex,0.32,0.28,0.0
2,Sex_male,Sex,0.68,0.72,0.0
3,Job_0,Job,0.02,,
4,Job_1,Job,0.17,,
5,Job_2,Job,0.66,,
6,Job_3,Job,0.15,,
7,Housing_free,Housing,0.13,0.07,0.03
8,Housing_own,Housing,0.69,0.73,0.0
9,Housing_rent,Housing,0.19,0.2,0.0


In [33]:
PSI_calc.groupby('Original feature name')['Contribution'].sum()
# Finally, we sum all contributions for each original independent variable and the 'Score' variable.

Original feature name
Actual             0.00
Age                0.07
Checking account   0.00
Duration           0.06
Housing            0.04
Job                0.00
Loan               0.18
Predicted          0.00
Predicted Prob     0.00
Purpose            0.00
Saving accounts    0.05
Score              0.06
Sex                0.01
Unnamed            0.00
Name: Contribution, dtype: float64



We wanna calculate PSI not only for the original independent variables from the the PD model but also for its outcome: the credit Score. We wanna see if the distribution of the score itself has changed.

We had already saved the scorecard file with dummy variables; their coffiecients and intercept. We just need to take corresponding dot products to caculate scorecard for the train as well as new dataset
