# Introduction

This notebook seeks to create cross validation of logistic regression with both built-in and custom metrics. 

This seemingly simple task turns out to be not that simple after all: For the case of `LogisticRegression` models, the `cross_validate()` function utilizes binary predictions rather than predicted probabilities for metrics calculation, which make the results inaccurate. 

There must be multiple solutions for the problem. The solution implemented in this notebook is to create a subclass of LogisticRegression() and enforce the `predict_proba()` returns there.

In [1]:
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_validate, KFold, cross_val_predict
from sklearn.metrics import make_scorer, roc_auc_score, accuracy_score, precision_score, recall_score, f1_score, fbeta_score
import numpy as np

In [2]:
# Load breast cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Create logistic regression model
model = LogisticRegression(max_iter = 5000)


# Set up k-fold cross-validation
kfold = KFold(n_splits=5, shuffle=True, random_state=123)

# Define scoring dictionary with multiple metrics
# Define scoring dictionary with multiple metrics
scoring = {
    'gini': make_scorer(lambda y_true, y_pred_prob: 
                        2*roc_auc_score(y_true, y_pred_prob)-1),
    'auc': make_scorer(lambda y_true, y_pred_prob: 
                        roc_auc_score(y_true, y_pred_prob))
}
# Perform cross-validation with multiple metrics
scores = cross_validate(model, X, y, cv=kfold, scoring=scoring)

predicts = cross_val_predict(model, X, y, cv=kfold, method = 'predict_proba')

# Print the scores for each metric
print("Gini scores:", scores['test_gini'])
print("AUC scores:", scores['test_auc'])
print("Mean Gini:       ", np.mean(scores['test_gini']))
print("Mean AUC scores: ", np.mean(scores['test_auc']))
print("Predicts:        ", predicts[:2])

Gini scores: [0.95121951 0.94887266 0.89294489 0.93452381 0.81503268]
AUC scores: [0.97560976 0.97443633 0.94647245 0.9672619  0.90751634]
Mean Gini:        0.9085187104778096
Mean AUC scores:  0.9542593552389048
Predicts:         [[1.00000000e+00 2.24267433e-13]
 [9.99997384e-01 2.61599435e-06]]


In many real-world business cases, we might want to utlize a customed metric to identify the optimal model for our own use case. It is very easy to implete with sklearn. 

Below is a (presumably) simple example where a function is created to calculate Fbeta score then feed into make_scorer() so it can be calculated in the cross-validation process. (for more on business cases, custom function and Fbeta, pls see ·metrics-fbeta.iynb` in the same notebook.)

In [3]:
def f_beta_prob(y_true, y_pred, threshold, beta):
    """
    This function calculate Fbeta based on a threshold of 0.7, 
    which means cases with predicted probability higher than 0.7 
    will be judged as positive by the model.
    """
    y_pred_binary = (y_pred >= threshold).astype(int)
    return fbeta_score(y_true, y_pred_binary, beta = beta)

# Update the scoring dictionary with the new custom metrics
scoring.update([ 
    ('f_beta', make_scorer(f_beta_prob, threshold=0.7, beta=0.8))
])

# Perform cross-validation with multiple metrics
scores = cross_validate(model, X, y, cv=kfold, scoring=scoring)

# Print the scores for each metric
print("Gini scores:", scores['test_gini'])
print("AUC scores:", scores['test_auc'])
print('F_beta scores', scores['test_f_beta'])
print("Mean Gini:         ", np.mean(scores['test_gini']))
print("Mean AUC score:    ", np.mean(scores['test_auc']))
print("Mean F_beta score: ", np.mean(scores['test_f_beta']))

Gini scores: [0.95121951 0.94887266 0.89294489 0.93452381 0.81503268]
AUC scores: [0.97560976 0.97443633 0.94647245 0.9672619  0.90751634]
F_beta scores [0.98356885 0.97905952 0.96637335 0.97484493 0.92893924]
Mean Gini:          0.9085187104778096
Mean AUC score:     0.9542593552389048
Mean F_beta score:  0.9665571768233491


The above code seem straightforward and reasonable but actually the result is not accurate. Because as you can see in the printed result, the y_pred fed into the function is a binary judgement (based on a default threshold of 0.5) rather than a predicted probability. 

A related question would be: "is the built-in function for roc_auc_score() receiving a predicted probability as it should?" Coz if not, the cross-validated roc_auc_score and gini score printed above would be mis-leading as well! The following cell can confirm our suspicion. 

In [None]:
# Run the following cell to check the y_pred_prob which appear to be binary only
scoring = {
    'gini': make_scorer(lambda y_true, y_prob: (
                            2*roc_auc_score(y_true, y_prob)-1,
                            print("y_pred_prob values:", y_prob)
                        )),
    'auc': make_scorer(lambda y_true, y_prob: (
                            roc_auc_score(y_true, y_prob),
                            print("y_pred_prob values:", y_prob),
                        ))
}

scores = cross_validate(model, X, y, cv=kfold, scoring=scoring)

So now we have a problem to solve: We need to feed `make_scorer()` function with the predicted probability to get the correct cross_validated metrics.  

There must be multiple ways, but from the top of my head, at least we could create a new class
* which is a subclass of LogisticRegression() and therefore inherit its abilities :P
* then enforce this new class to return the predicted probilities (rather than binary predictions)

Below pls see the code to impletement this simple idea.

In [5]:
class proba_logreg(LogisticRegression):
    def __init__(self):
        super().__init__(max_iter=5000)
    def predict(self, X):
        return LogisticRegression.predict_proba(self, X)
    
model_proba = proba_logreg()

def f_beta_prob(y_true, y_prob, threshold, beta):
    """
    This function calculate Fbeta based on a threshold of 0.7, 
    which means cases with predicted probability higher than 0.7 
    will be judged as positive by the model.
    """
    y_pred = (y_prob[:,1] >= threshold).astype(int)
    return fbeta_score(y_true, y_pred, beta = beta)

# Update the scoring dictionary with the new custom metrics
scoring = {
    'gini': make_scorer(lambda y_true, y_prob: 
                            2*roc_auc_score(y_true, y_prob[:,1])-1
                        ),
    'auc': make_scorer(lambda y_true, y_prob:
                            roc_auc_score(y_true, y_prob[:,1])
                        ),
    'f_beta': make_scorer(lambda y_true, y_prob:
                            f_beta_prob(y_true, y_prob, threshold=0.7, beta=0.8)
                        )
}

scores_proba = cross_validate(model_proba, X, y, cv=kfold, scoring=scoring)

print("Gini scores:", scores_proba['test_gini'])
print("AUC scores:", scores_proba['test_auc'])
print("f-beta:", scores_proba['test_f_beta'])
print("Mean Gini (proba):           ", np.mean(scores_proba['test_gini']))
print("Mean f-beta (proba):         ", np.mean(scores_proba['test_f_beta']))
print("Mean AUC scores (proba):     ", np.mean(scores_proba['test_auc']))
print("Mean AUC scores (original):  ", np.mean(scores['test_auc']))


Gini scores: [0.98128968 0.99301366 0.96489996 0.98941799 0.97908497]
AUC scores: [0.99064484 0.99650683 0.98244998 0.99470899 0.98954248]
f-beta: [0.98356885 0.97905952 0.95011265 0.95719844 0.94738713]
Mean Gini (proba):            0.981541250535457
Mean f-beta (proba):          0.9634653170418692
Mean AUC scores (proba):      0.9907706252677286
Mean AUC scores (original):   0.9542593552389048
