## Precision, Recall and Fbeta

To navigate the tricky trade-off between precision and recall ⚖️, we have the option to optimize F1, the harmonious mean of the two. However, in many scenarios, we need to prioritize one over another. 

For instance, in financial services, missing a fraudulent transaction can result in significant 💰 financial losses and damage a company's reputation. To minimize such occurrences, optimizing recall (minimizing false negatives) is crucial, even if it means tolerating more false alarms. 

This is where `f_beta` score comes in handy 🎯. While still maintaining a balance between precision and recall, it allows us to fine-tune the importance of one over the other. When the beta value is greater than one, it places more emphasis on recall, while a beta value less than one emphasizes precision. For a simple demo on how f_beta works, please see [here](https://github.com/MenaWANG/ML_toy_examples/blob/main/train%20models/metrics-fbeta.ipynb).  

Now, let's dive into a practical demo on how to fine-tune a LightGBM classifier to optimize Fbeta scores using {hyperopt}. 🚀🧐

In [87]:
import lightgbm as lgb
from sklearn.metrics import make_scorer, fbeta_score
from hyperopt import fmin, tpe, hp, Trials, STATUS_OK, space_eval
from hyperopt.early_stop import no_progress_loss
from sklearn.model_selection import StratifiedKFold, train_test_split
import numpy as np

# Step 2: Perform hyperparameter tuning using Hyperopt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import cross_val_score

data = load_breast_cancer()
X, y = data.data, data.target
X_train, X_holdout, y_train, y_holdout = train_test_split(X, y, test_size = 0.2, random_state=321, stratify = y)

print(y.mean(), y_train.mean())

0.6274165202108963 0.6263736263736264


## Tune lightgbm classifier to optimize Fbeta (default threshold)

In [113]:
beta = 2
# Define scorer for F2 and a cross-validation method
def custom_f2_score(y_true, y_preb):
    beta = beta 
    return fbeta_score(y_true, y_preb, beta=beta)
f2_scorer = make_scorer(custom_f2_score, needs_proba=False)
cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)

# Define the objective function for hyperparameter tuning
def objective(params):
    params['num_leaves'] = int(params['num_leaves'])
    params['n_estimators'] = int(params['n_estimators'])
    clf = lgb.LGBMClassifier(**params, verbose=-1)
    score = cross_val_score(clf, X_train, y_train, cv=cv, scoring=f2_scorer).mean()
    return {'loss': -score, 'status': STATUS_OK}

# Define the hyperparameter search space for the LightGBM classifier
space = {
    'num_leaves': hp.quniform('num_leaves', 10, 50, 1),
    'learning_rate': hp.loguniform('learning_rate', -5, 0),
    'n_estimators': hp.quniform('n_estimators', 50, 300, 1),
    'boosting_type': hp.choice('boosting_type', ['gbdt', 'dart', 'goss']),
}

# Run optimization
trials = Trials()
best = fmin(fn=objective,
            space=space,
            algo=tpe.suggest,
            max_evals=100,
            early_stop_fn=no_progress_loss(20),
            trials=trials)

# Retrieve the best hyperparameters which returns the highest F2
best_params = space_eval(space, best)
best_params['num_leaves'] = int(best_params['num_leaves'])
best_params['n_estimators'] = int(best_params['n_estimators'])
print("Best hyperparameters:", best_params)

  0%|          | 0/100 [00:00<?, ?trial/s, best loss=?]

 54%|█████▍    | 54/100 [00:08<00:07,  6.25trial/s, best loss: -0.9838817508687244]
Best hyperparameters: {'boosting_type': 'goss', 'learning_rate': 0.11365471176804709, 'n_estimators': 225, 'num_leaves': 43}


In [114]:
# performance on holdout set 
best_model = lgb.LGBMClassifier(**best_params, verbose=-1)
best_model.fit(X_train, y_train)
y_pred = best_model.predict(X_holdout)
f2 = fbeta_score(y_holdout, y_pred, beta=2)
print(f'F2 score on the holdout set: {f2:.3f}')

F2 score on the holdout set: 0.970


## 2nd Iteration: Optimize the threshold

We've just completed our initial, minimalist iteration. Now let's make some further improvement at the second round. 

When calculating F_beta, we didn't specify the threshold used to distinguish between positive and negative cases. This led to the use of the default threshold, 0.5. However, this threshold should be fine-tuned as well. Let's do it in this second iteration. 

In [105]:
# Define the space
beta = 2
space = {
    'hyper_params': {
        'num_leaves': hp.quniform('num_leaves', 10, 50, 1),
        'learning_rate': hp.loguniform('learning_rate', -5, 0),
        'n_estimators': hp.quniform('n_estimators', 50, 300, 1),
        'boosting_type': hp.choice('boosting_type', ['gbdt', 'dart', 'goss'])
    },
    'score_params': {
        'threshold': hp.uniform('threshold', 0.0, 1.0),  # Specify threshold as a hyperparameter
        'beta': hp.choice('beta', [beta])
    },
}

# Define the objective function for hyperparameter tuning and threshold optimization
def objective(params):
    # should've been `hyper_params = params['hyper_params']` only, see discussions at the end
    hyper_params = {
        'num_leaves': int(params['hyper_params']['num_leaves']),
        'learning_rate': params['hyper_params']['learning_rate'],
        'n_estimators': int(params['hyper_params']['n_estimators']),
        'boosting_type': params['hyper_params']['boosting_type']
         }
    
    threshold = params['score_params']['threshold']
    beta = params['score_params']['beta']

    clf = lgb.LGBMClassifier(**hyper_params,verbose=-1)
    scores = []
    for train_idx, valid_idx in cv.split(X, y):
        X_train, X_valid = X[train_idx], X[valid_idx]
        y_train, y_valid = y[train_idx], y[valid_idx]

        clf.fit(X_train, y_train)
        y_pred_prob = clf.predict_proba(X_valid)[:, 1]
        # Apply the threshold to convert probabilities to predictions
        y_pred = (y_pred_prob > threshold).astype(int)
        # Calculate f score with the specified beta
        score = fbeta_score(y_valid, y_pred, beta = beta)
        scores.append(score)

    # Calculate the mean F score
    mean_score = np.mean(scores)

    return {'loss': -mean_score, 'status': STATUS_OK}
    

In [111]:
# Run the optimization
trials = Trials()
best = fmin(fn=objective,
            space=space,
            algo=tpe.suggest,
            max_evals=100,
            early_stop_fn=no_progress_loss(20),
            trials=trials)

best_params = space_eval(space, best)
best_params['hyper_params']['num_leaves'] = int(best_params['hyper_params']['num_leaves'])
best_params['hyper_params']['n_estimators'] = int(best_params['hyper_params']['n_estimators'])
print("Best hyperparameters:", best_params)

  0%|          | 0/100 [00:00<?, ?trial/s, best loss=?]

 74%|███████▍  | 74/100 [00:17<00:06,  4.23trial/s, best loss: -0.9860798668549601]
Best hyperparameters: {'hyper_params': {'boosting_type': 'dart', 'learning_rate': 0.7781450728394775, 'n_estimators': 203, 'num_leaves': 27}, 'score_params': {'beta': 2, 'threshold': 0.3746420531186503}}


In [112]:
# performance on holdout set 
best_model = lgb.LGBMClassifier(**best_params['hyper_params'], verbose=-1)
best_model.fit(X_train, y_train)
y_prob = best_model.predict_proba(X_holdout)
y_pred = np.where(y_prob[:,1]>best_params['score_params']['threshold'], 1, 0)
f2 = fbeta_score(y_holdout, y_pred, beta=2)
print(f'F2 score on the holdout set: {f2:.3f}')

F2 score on the holdout set: 0.983


## Discussions: 

In the code above, you might notice some redundant lines that are a result of a known issue related to the `hp.quniform()`` function not returning integers. This issue has been reported and discussed in detail on the Hyperopt GitHub repository, and you can find the discussion [here](https://github.com/hyperopt/hyperopt/issues/253).

While waiting for the bug to be fixed, there are at least two temporary solutions that can be applied:

A. Utilize `hp.choice()`` as an alternative.
B. Manually cast the parameter values to integers. For example, `params['param'] = int(params['param'])`.

In the code provided, I've opted for choice B, despite its slightly more verbose nature. This approach ensures that the search values remain ordered, which can be particularly beneficial when using iterative search methods with `Hyperopt`.