# Credit Risk Model (Home Loan) — Model Building (Step 5)

In [146]:
import numpy as np
import pandas as pd

In [147]:
train_df=pd.read_csv("/Users/abhijit/Desktop/Courses/Application Score Card - Home Loan new/Data/4. Feature Selection/train_woe_iv_corr_vif.csv")

In [148]:
train_df.columns

Index(['Unnamed: 0', 'flag_%90p_18MOB', 'Auto Loan_Enq_L30D_max_woe',
       'OFF_US_Auto Loan_Inquiry_Requested_Amount_max_woe',
       'OFF_US_Property Loan_DPD30p_L3M_sum_woe',
       'OFF_US_Personal Loan_Inquiry_Requested_Amount_max_woe',
       'Personal Loan_DPD30p_L3M_sum_woe',
       'OFF_US_Auto Loan_DPD30p_L3M_sum_woe', 'ON_US_DPD30p_L12M_sum_woe',
       'Loan_Amount_woe', 'Property Loan_DPD30p_L12M_sum_woe',
       'ON_US_DPD90p_L6M_max_woe', 'OFF_US_Credit Card_DPD30p_L3M_sum_woe',
       'Property Loan_DPD30p_L3M_max_woe', 'Auto Loan_DPD60p_L6M_sum_woe',
       'OFF_US_Credit Card_Inquiry_Requested_Amount_mean_woe',
       'Personal Loan_DPD60p_L6M_sum_woe', 'Property Loan_DPD60p_L3M_sum_woe',
       'Auto Loan_Inquiry_Requested_Amount_sum_woe',
       'OFF_US_Personal Loan_Enq_L30D_max_woe',
       'ON_US_Home Loan_DPD30p_L3M_max_woe', 'OFF_US_Enq_Loan_Count_woe',
       'Personal Loan_Enq_L30D_sum_woe', 'All_DPD30p_L3M_max_woe',
       'OFF_US_Credit Card_Enq_L30D_max_

In [149]:
train_df=train_df.drop(columns=['Unnamed: 0'])

In [150]:
train_df.shape

(70000, 70)

In [151]:
X_train=train_df.drop(columns=["flag_%90p_18MOB"]).copy()

In [152]:
X_train.shape

(70000, 69)

In [153]:
y_train=train_df["flag_%90p_18MOB"].copy()

In [154]:
y_train.shape

(70000,)

In [155]:
test_df=pd.read_csv("/Users/abhijit/Desktop/Courses/Application Score Card - Home Loan new/Data/4. Feature Selection/test_woe_final.csv")

In [156]:
test_df.shape

(30000, 718)

In [157]:
test_df=test_df[train_df.columns].copy()

In [158]:
test_df.shape

(30000, 70)

In [159]:
X_test=test_df.drop(columns=["flag_%90p_18MOB"]).copy()

In [160]:
X_test.shape

(30000, 69)

In [161]:
y_test=test_df["flag_%90p_18MOB"].copy()

In [162]:
y_test.shape

(30000,)

### Model Building 

### Logistic Regression

### Why Logistic Regression for a Scorecard

We use **Logistic Regression** because it is the standard model for binary risk outcomes (Good/Bad):
- Predicts a **probability of default / bad**: `P(Bad=1)`
- The log-odds is linear: `log(p/(1-p)) = β0 + β1x1 + ... + βkxk`
- Works very well with **WOE variables**, which are designed to have a near-linear relationship with log-odds
- Coefficients are **interpretable** and easy to convert into **scorecard points**
- Widely accepted for governance / validation

With **WOE-transformed variables**, this relationship is usually close to linear, making the model:
- **interpretable** (sign and size of coefficients show risk impact),
- easy to convert into **score points**,
- standard for governance and validation.

---

### Regularization in Logistic Regression (L1 and L2)

**Regularization** adds a penalty to the logistic regression objective to discourage overly large coefficients.  
This helps prevent **overfitting** and improves **out-of-sample stability**, especially when the model has many or correlated predictors.

- **L1 (Lasso) Regularization**
  - Adds penalty: `λ Σ |βj|`
  - Shrinks some coefficients **exactly to zero**
  - Acts as **automatic feature selection**
  - Useful for removing weak or redundant variables and reducing model complexity

**In short:**
- **L1 → select variables**



#### L1 Regularization

In [163]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import StratifiedKFold, cross_val_score

In [164]:
Cs = [0.1,0.2, 0.3, 0.5, 1, 3, 10]

In [165]:
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

In [166]:
l1_rows=[]
for C in Cs:
    l1=LogisticRegression(penalty="l1",solver="liblinear",C=C,max_iter=2000)
    auc=cross_val_score(l1,X_train,y_train,cv=cv,scoring="roc_auc").mean()
    l1.fit(X_train,y_train)
    n_selected=(np.abs(l1.coef_[0])>1e-6).sum()
    l1_rows.append([C,auc,n_selected])

In [167]:
l1_table=pd.DataFrame(l1_rows,columns=["C","cv_auc","n_selected"])

In [168]:
l1_table.sort_values("cv_auc", ascending=False, inplace=True)

In [169]:
print("L1 tuning results (top):")
print(l1_table.head(10))

L1 tuning results (top):
      C    cv_auc  n_selected
0   0.1  0.668664          38
1   0.2  0.668398          51
2   0.3  0.668069          54
3   0.5  0.667537          58
4   1.0  0.667106          62
5   3.0  0.666690          68
6  10.0  0.666524          69


In [170]:
best_C_l1=l1_table.iloc[0]["C"]
best_C_l1

np.float64(0.1)

In [171]:
l1_best = LogisticRegression(penalty="l1", solver="liblinear", C=float(best_C_l1), max_iter=2000)
l1_best.fit(X_train,y_train)

In [172]:
Variables=X_train.columns
Coefficient=l1_best.coef_[0]

parameters=pd.DataFrame({"Variables":Variables,"Coefficient":Coefficient})

print("Final Variables and their Coefficients:")
final_variables=parameters[parameters["Coefficient"]>1e-12].copy().reset_index()
final_variables

Final Variables and their Coefficients:


Unnamed: 0,index,Variables,Coefficient
0,2,OFF_US_Property Loan_DPD30p_L3M_sum_woe,0.046852
1,3,OFF_US_Personal Loan_Inquiry_Requested_Amount_...,0.012495
2,5,OFF_US_Auto Loan_DPD30p_L3M_sum_woe,0.043559
3,7,Loan_Amount_woe,0.568403
4,8,Property Loan_DPD30p_L12M_sum_woe,0.085311
5,12,Auto Loan_DPD60p_L6M_sum_woe,0.086461
6,13,OFF_US_Credit Card_Inquiry_Requested_Amount_me...,0.122078
7,14,Personal Loan_DPD60p_L6M_sum_woe,0.001258
8,20,Personal Loan_Enq_L30D_sum_woe,0.086965
9,21,All_DPD30p_L3M_max_woe,0.085235


In [71]:
final_variables_list=final_variables["Variables"].to_list()
len(final_variables_list)

32

In [72]:
X_train_final=X_train[final_variables_list].copy()
X_test_final=X_test[final_variables_list].copy()

#### Final Model

In [73]:
LR=LogisticRegression(solver="liblinear",max_iter=2000)
auc=cross_val_score(LR,X_train,y_train,cv=cv,scoring="roc_auc").mean()
LR.fit(X_train_final,y_train)


### Model Performance


### AUC (Area Under the ROC Curve)

**What it measures:**  
AUC tells you how well your model can **rank** a random “Bad” higher risk than a random “Good”.

**Intuition (best way to remember):**  
> **AUC = Probability( score_bad > score_good )**

So if AUC = 0.78, it means:
- In **78%** of random (Bad, Good) pairs, the model assigns a higher predicted risk to the Bad.

**Why it’s useful in scorecards:**
- It is **threshold-independent** (no need to pick 0.5 or any cutoff).
- It measures **overall discrimination** (separation power).

**Typical interpretation:**
- 0.50 = random
- 0.60–0.70 = fair
- 0.70–0.80 = good
- 0.80–0.90 = very good
- greater than 0.90 = unusually high (check leakage)


In [74]:
from sklearn.metrics import roc_auc_score, roc_curve, confusion_matrix

In [75]:
y_prob_train=LR.predict_proba(X_train_final)[:,1]

In [76]:
y_prob=LR.predict_proba(X_test_final)[:,1]

In [77]:
auc=roc_auc_score(y_test,y_prob)

In [79]:
round(auc,2)

np.float64(0.68)

An AUC of 0.6781 means the scorecard does a decent job of separating risky borrowers from safer ones — not perfect, but clearly better than random and usable for decision-making.

### Gini (in scorecards)

**Gini** measures how well a model separates **Good (0)** from **Bad (1)** (discriminatory power).  
It is derived from ROC–AUC:

Gini = 2 * AUC - 1


**Interpretation**
- **Gini = 0** (AUC = 0.50): no separation (random).
- **Higher Gini**: better rank-ordering (bads get higher scores than goods more often).

In [80]:
gini = 2 * auc - 1
round(gini,2)

np.float64(0.36)

- Gini = 0.36 means the model is decent at separating risky borrowers from safe one. It is better than random but not very strong


### KS Statistic (Kolmogorov–Smirnov) in credit risk

**KS** measures how well a score separates **Good (0)** and **Bad (1)** by taking the **maximum distance** between their cumulative distributions.


**Interpretation**
- **KS = 0**: no separation (goods and bads overlap completely).
- **Higher KS**: better separation; there exists a cutoff score where the gap between bads and goods is largest.
- In practice, KS is often reported as a **percentage** (e.g., KS = 0.30 → 30%).


In [81]:
fpr, tpr, thresholds = roc_curve(y_test,y_prob)
ks = max(tpr - fpr)
print(f"KS   : {ks:.2f}")

KS   : 0.28


- KS = 0.28 means the model has moderate separation between risky and sage customers (about 28% max gap in their score distributions) — acceptable but not strong for a scorecard.

### Decile Ranking

In [82]:
train_df['pred_prob']= y_prob_train

In [83]:
test_df['pred_prob']=y_prob

In [87]:
cuts=train_df['pred_prob'].quantile([i/10 for i in range(11)]).values
cuts[0], cuts[-1] = -float("inf"), float("inf")

In [89]:
train_df['decile']=pd.cut(train_df['pred_prob'], bins=cuts, labels=range(1, 11)).astype(int)
test_df['decile']=pd.cut(test_df['pred_prob'], bins=cuts, labels=range(1, 11)).astype(int)

In [131]:
decile_train=train_df.groupby('decile')['flag_%90p_18MOB'].agg(['count','sum']).reset_index()

In [132]:
decile_train=decile_train.rename(columns={'count':'Total loans train','sum':'Total loans 90+ at 18 MOB train'})

In [133]:
decile_train['%90+ at 18MOB train']=round(decile_train['Total loans 90+ at 18 MOB train']/decile_train['Total loans train'],3)

In [134]:
decile_train

Unnamed: 0,decile,Total loans train,Total loans 90+ at 18 MOB train,%90+ at 18MOB train
0,1,7009,276,0.039
1,2,6991,319,0.046
2,3,7000,361,0.052
3,4,7073,402,0.057
4,5,7046,452,0.064
5,6,6881,483,0.07
6,7,7000,555,0.079
7,8,7000,615,0.088
8,9,7000,869,0.124
9,10,7000,1699,0.243


In [135]:
decile_test=test_df.groupby('decile')['flag_%90p_18MOB'].agg(['count','sum']).reset_index()

In [136]:
decile_test=decile_test.rename(columns={'count':'Total loans test','sum':'Total loans 90+ at 18 MOB test'})

In [137]:
decile_test['%90+ at 18MOB test']=round(decile_test['Total loans 90+ at 18 MOB test']/decile_test['Total loans test'],3)

In [138]:
decile_test

Unnamed: 0,decile,Total loans test,Total loans 90+ at 18 MOB test,%90+ at 18MOB test
0,1,2926,110,0.038
1,2,2963,144,0.049
2,3,2994,155,0.052
3,4,3091,183,0.059
4,5,3124,186,0.06
5,6,2996,185,0.062
6,7,2947,229,0.078
7,8,3003,242,0.081
8,9,3019,360,0.119
9,10,2937,790,0.269


In [139]:
decile_df=decile_train.merge(decile_test,on='decile',how='left')

In [140]:
decile_df

Unnamed: 0,decile,Total loans train,Total loans 90+ at 18 MOB train,%90+ at 18MOB train,Total loans test,Total loans 90+ at 18 MOB test,%90+ at 18MOB test
0,1,7009,276,0.039,2926,110,0.038
1,2,6991,319,0.046,2963,144,0.049
2,3,7000,361,0.052,2994,155,0.052
3,4,7073,402,0.057,3091,183,0.059
4,5,7046,452,0.064,3124,186,0.06
5,6,6881,483,0.07,2996,185,0.062
6,7,7000,555,0.079,2947,229,0.078
7,8,7000,615,0.088,3003,242,0.081
8,9,7000,869,0.124,3019,360,0.119
9,10,7000,1699,0.243,2937,790,0.269


- The decile cut-offs were derived from training data and applied unchanged to test data, ensuring no data leakage. A clear and monotonic increase in 90+ delinquency across deciles in both samples indicates good risk ranking and model stability.

In [142]:
decile_df.to_excel("/Users/abhijit/Desktop/Courses/Application Score Card - Home Loan new/Data/5. Model Development and Performance/decile_summary.xlsx", index=False)


 However, with AUC = 0.68, Gini = 0.36, and KS = 0.28, the model shows only moderate separation, suggesting that threshold calibration is required before deployment.

### Conclusion

- The decile cut-offs were derived from the training data and then applied to the testing data, ensuring no data leakage. A clear and monotonic increase in 90+ delinquency at 18 MOB across deciles in both training and testing data demonstrates good risking ranking and stable behaviors. 

- Approving loans in deciles from 1 to 7 removes about 30% of the loans in the testing data while reducing the 90+ delinquency at 18 MOB rates from 8.61% to 5.67% which means 54% of bad loans will be removed if we remove 30% of the portfolio at the time of application. 

- As the data used for training and testing the model was synthesized, the observed AUC of 0.68, Gini of 0.36, and KS of 0.28 reflect moderate separation. A stronger performance can be expected when the model is trained and validated on the real-world data.
