# 7.0.0 Building a Final Classifier Model

### Methodology

The modeling process culminates in constructing the final classifier model using XGBoost. The final model incorporates a set of finely tuned parameters to optimize performance, mainly focusing on handling the dataset's imbalanced nature. The parameters were selected based on previous optimization efforts, ensuring the model is robust, generalizes well on unseen data, and maximizes the AUC metric.

Final Model Parameters:

    - 'objective': 'binary:logistic'
    - 'booster': 'gbtree'
    - 'eval_metric': 'auc'
    - 'eta': 0.01
    - 'gamma': 0.1
    - 'max_depth': 6
    - 'min_child_weight': 3
    - 'subsample': 0.8
    - 'colsample_bytree': 0.8
    - 'scale_pos_weight': 4
    - 'lambda': 1
    - 'alpha': 0.1
    - 'max_delta_step': 1
    - 'n_estimators': 100

Final Features: 

    - 'previous_internal_apps__account_to_application_days'
    - 'previous_internal_apps__n_bnpl_approved_apps'
    - 'previous_internal_apps__n_sf_apps'
    - 'previous_internal_apps__ratio_bnpl_approved'
    - 'credit_reports__balance_due_ratio_median'
    - 'credit_reports__balance_due_std_revolvente'
    - 'credit_reports__balance_due_worst_delay_ratio_median_pagos_fijos'
    - 'credit_reports__cdc_inquiry_id_count_por_determinar'
    - 'credit_reports__credit_limit_median_revolvente'
    - 'credit_reports__debt_ratio_median_pagos_fijos'
    - 'credit_reports__loans_with_at_least_one_delayed_ratio'
    - 'credit_reports__severity_delayed_payments_median'
    - 'credit_reports__severity_delayed_payments_median_pagos_fijos'
    - 'credit_reports__severity_delayed_payments_median_revolvente'


### Conclusion
The final model demonstrated a decent level of predictive performance, as evidenced by the key metrics:

- **ROC AUC Score:** 0.6111
- **Precision-Recall AUC:** 0.2832
- **Kolmogorov-Smirnov Statistic:** 0.1780

These results indicate the model's effectiveness in discriminating between positive and negative classes despite the inherent challenges posed by the imbalanced dataset. The ROC AUC score highlights the model's capability to separate classes at various threshold settings. However, the Precision-Recall AUC, particularly critical in the context of imbalanced datasets, suggests room for improvement, especially in capturing the minority class effectively.

**Future Work:**
- Further tuning of hyperparameters might yield better performance, particularly in improving the Precision-Recall AUC.
- Incorporating additional features or exploring more advanced feature engineering techniques could also enhance the model's ability to capture complex patterns.
- Experimentation with different model architectures or ensemble methods could improve overall accuracy and stability.

These steps will help refine the model to meet better the business goals of reducing credit risk while ensuring fair and efficient loan approval processes.

In [1]:
import pandas as pd
import numpy as np
import yaml
from pathlib import Path
from xgboost import XGBClassifier

from src.utils import calculate_metrics, save_pickle

### 1. Loan Data


In [2]:
with open("config.yaml", "r") as f:
    config = yaml.safe_load(f)
    
model_parameters = config["model_parameters"]["xgbm"]
numeric_features = config["filter_features"]["numerical"]
features = numeric_features
target = config["main"]["target"]
data_train_path = Path.cwd().parent / config["main"]["data_train_path"]
train_validation_path = Path.cwd().parent / config["main"]["data_validation_path"]

train_df = pd.read_pickle(data_train_path)
validation_df = pd.read_pickle(train_validation_path)

X_train, Y_train = train_df[features], train_df[target]
X_valid, Y_valid = validation_df[features], validation_df[target]

split_seed = config["main"]["random_seed"]

X_train.shape

(9479, 14)

### 2. Train Model

In [3]:
xgbm_model = XGBClassifier(missing=np.nan, **model_parameters, random_state=split_seed)

xgbm_model.fit(X_train, Y_train)
xgbm_preds = xgbm_model.predict_proba(X_valid)[:, 1]

model_results = calculate_metrics(Y_valid, xgbm_preds)
model_results

{'roc_auc_score': 0.6110941648308197,
 'pr_auc': 0.28320533857646,
 'ks': 0.17798678190137265}

### 3. Save Model Classifier Object 

In [6]:
dataset_date = str(train_validation_path).split("/")[-1:][0][0:6]
model_train_path = Path.cwd().parent / f"models/develop/{dataset_date}_xgbm_classifier.pickle"

save_pickle(xgbm_model, model_train_path)