# Lazy Prediction of Loan Interest Rate Model

First, all the necessary libraries are imported. 

In [1]:
import lazypredict
import pandas as pd
from lazypredict.Supervised import LazyRegressor
from helper_functions.ml_data_prep import (
    stratified_sample,
    X_y_spilt,
)

Computationally expensive and unable to execute regressors are removed.

In [2]:
regressors = lazypredict.Supervised.REGRESSORS
regressors_to_remove = [
    "QuantileRegressor",
    "GaussianProcessRegressor",
    "KernelRidge",
    "NuSVR",
    "SVR",
    "RandomForestRegressor",
    "ExtraTreesRegressor",
]
for model, _ in regressors[:]:
    if model in regressors_to_remove:
        regressors.remove((model, _))

Data loaded and splited. Training performed on balanced data. Only 25% of validation data is used.

In [3]:
drop_cols = ["sub_grade", "sub_grade_enc", "grade_enc"]
X_train, y_train = (
    pd.read_pickle("./data/data_train_balanced_mod2.pkl")
    .drop(columns=drop_cols)
    .pipe(X_y_spilt, target="int_rate")
)
X_val, y_val = (
    pd.read_pickle("./data/data_val_mod2.pkl")
    .pipe(stratified_sample, frac=0.25, col="sub_grade")
    .drop(columns=drop_cols)
    .pipe(X_y_spilt, target="int_rate")
)
print(f"Number of training instances {X_train.shape[0]}")
print(f"Number of validation instances {X_val.shape[0]}")

Number of training instances 53935
Number of validation instances 59660


A number of different regressors are trained and evaluated.

In [4]:
reg = LazyRegressor(random_state=42)
reg_models, predictions = reg.fit(X_train, X_val, y_train, y_val)
reg_models

  0%|          | 0/35 [00:00<?, ?it/s]

 97%|█████████▋| 34/35 [01:58<00:00,  1.31it/s]

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.003131 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2765
[LightGBM] [Info] Number of data points in the train set: 53935, number of used features: 38
[LightGBM] [Info] Start training from score 17.816268


100%|██████████| 35/35 [01:59<00:00,  3.40s/it]


Unnamed: 0_level_0,Adjusted R-Squared,R-Squared,RMSE,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
HistGradientBoostingRegressor,0.95,0.95,1.17,1.39
LGBMRegressor,0.95,0.95,1.17,0.66
XGBRegressor,0.94,0.94,1.2,0.58
GradientBoostingRegressor,0.94,0.94,1.22,23.2
BaggingRegressor,0.94,0.94,1.24,16.15
MLPRegressor,0.93,0.93,1.3,42.88
LassoCV,0.92,0.92,1.44,1.27
LassoLarsCV,0.92,0.92,1.45,0.75
RANSACRegressor,0.92,0.92,1.46,0.7
LinearRegression,0.92,0.92,1.47,0.38


## Outcome

HistGradientBoostingRegressor perform similarly to LGBMRegressor, but the latter is faster. Therefore, LGBMRegressor is selected for further tuning to predict loan interests.