## Precision, Recall and Fbeta

To navigate the tricky trade-off between precision and recall ⚖️, we have the option to optimize F1, the harmonious mean of the two. However, in many scenarios, we need to prioritize one over another. 

For instance, in financial services, missing a fraudulent transaction can result in significant 💰 financial losses and damage a company's reputation. To minimize such occurrences, optimizing recall (minimizing false negatives) is crucial, even if it means tolerating more false alarms. 

This is where `f_beta` score comes in handy 🎯. While still maintaining a balance between precision and recall, it allows us to fine-tune the importance of one over the other. When the beta value is greater than one, it places more emphasis on recall, while a beta value less than one emphasizes precision. For a simple demo on how f_beta works, please see [here](https://github.com/MenaWANG/ML_toy_examples/blob/main/train%20models/metrics-fbeta.ipynb).  

Now, let's dive into a practical demo on how to fine-tune a LightGBM classifier to optimize Fbeta scores using {hyperopt}. 🚀🧐

In [10]:
import lightgbm as lgb
from sklearn.metrics import make_scorer, fbeta_score
from hyperopt import fmin, tpe, hp, Trials, STATUS_OK, space_eval
from hyperopt.early_stop import no_progress_loss
from hyperopt.pyll.base import scope
import hyperopt
from sklearn.model_selection import StratifiedKFold, train_test_split
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import cross_val_score

# demo data, a holdout set to evaluate the optimized model and avoid overfitting
data = load_breast_cancer()
X, y = data.data, data.target
X_train, X_holdout, y_train, y_holdout = train_test_split(X, y, test_size = 0.2, random_state=321, stratify = y)

print(y.mean(), y_train.mean())

0.6274165202108963 0.6263736263736264


In [13]:
# Define the space
beta = 2
space = {
    'hyper_params': {
        'num_leaves': scope.int(hp.quniform('num_leaves', 10, 50, 1)),
        'learning_rate': hp.loguniform('learning_rate', -5, 0),
        'n_estimators': scope.int(hp.quniform('n_estimators', 50, 300, 1)),
        'boosting_type': hp.choice('boosting_type', ['gbdt', 'dart', 'goss']),
    },
    'score_params': {
        'threshold': hp.uniform('threshold', 0.0, 1.0),  # Specify threshold as a hyperparameter
        'beta': hp.choice('beta', [beta])
    },
}

# Define the objective function for hyperparameter tuning and threshold optimization
def objective(params):
    hyper_params = params['hyper_params']    
    threshold = params['score_params']['threshold']
    beta = params['score_params']['beta']

    clf = lgb.LGBMClassifier(**hyper_params,verbose=-1)
    scores = []
    for train_idx, valid_idx in cv.split(X, y):
        X_train, X_valid = X[train_idx], X[valid_idx]
        y_train, y_valid = y[train_idx], y[valid_idx]

        clf.fit(X_train, y_train)
        y_pred_prob = clf.predict_proba(X_valid)[:, 1]
        # Apply the threshold to convert probabilities to predictions
        y_pred = (y_pred_prob > threshold).astype(int)
        # Calculate f score with the specified beta
        score = fbeta_score(y_valid, y_pred, beta = beta)
        scores.append(score)

    # Calculate the mean F score
    mean_score = np.mean(scores)

    return {'loss': -mean_score, 'status': STATUS_OK}
    

In [14]:
# Run the optimization
trials = Trials()
best = fmin(fn=objective,
            space=space,
            algo=tpe.suggest,
            max_evals=100,
            early_stop_fn=no_progress_loss(20),
            trials=trials)

best_params = space_eval(space, best)
print("Best hyperparameters:", best_params)

 50%|█████     | 50/100 [00:10<00:10,  4.90trial/s, best loss: -0.9860798668549601]
Best hyperparameters: {'hyper_params': {'boosting_type': 'dart', 'learning_rate': 0.8773315695314797, 'n_estimators': 205, 'num_leaves': 18}, 'score_params': {'beta': 2, 'threshold': 0.27613312698849946}}


In [15]:
# performance on holdout set 
best_model = lgb.LGBMClassifier(**best_params['hyper_params'], verbose=-1)
best_model.fit(X_train, y_train)
y_prob = best_model.predict_proba(X_holdout)
y_pred = np.where(y_prob[:,1]>best_params['score_params']['threshold'], 1, 0)
f2 = fbeta_score(y_holdout, y_pred, beta=2)
print(f'F2 score on the holdout set: {f2:.3f}')

F2 score on the holdout set: 0.994


## Discussions


## Troubleshooting with `hp.quniform()`

At first the code returned an error, complaining that `num_leaves` should be integer but received a float. See below

In [18]:
# Define the hyperparameter search space using `hp.quniform() directly`
space = {
    'num_leaves': hp.quniform('num_leaves', 10, 50, 1),
    'n_estimators': hp.quniform('n_estimators', 50, 300, 1),
}
print(hyperopt.pyll.stochastic.sample(space))

{'n_estimators': 282.0, 'num_leaves': 24.0}


Unpon search I found this relates to the known issue that `hp.quniform()` function not returning integers. A detailed report and discussions are available on the Hyperopt GitHub repository [here](https://github.com/hyperopt/hyperopt/issues/253).

While waiting for the bug to be fixed, two temporary solutions were compared:

A. Utilize `hp.choice()` as an alternative.    
B. Manually cast the parameter values to integers. For example, `params['param'] = int(params['param'])`.

In the first iteration of the project, I've opted for choice B, despite its slightly more verbose nature (pls see the next section). This approach ensures that the search values remain ordered, which can be particularly beneficial when using iterative search methods with `Hyperopt`.

But then I realize there is also a more updated discussion on the same issue [here](https://github.com/hyperopt/hyperopt/issues/508), where I discovered a better solution, also went back to share at the 1st thread, just in case anyone else stumbled upon it first like I did 🤪 

In [20]:
# in short, the solution is `scope.int()`
space = {
    'num_leaves': scope.int(hp.quniform('num_leaves', 10, 50, 1)),
    'n_estimators': scope.int(hp.quniform('n_estimators', 50, 300, 1)),
}
print(hyperopt.pyll.stochastic.sample(space))

{'n_estimators': 292, 'num_leaves': 31}
{'n_estimators': 60, 'num_leaves': 44}


## The 1st iteration and the benefit of Iterative Development

Below was the 1st iteration of the project. Thee issues were fixed from there

1. A less than ideal fix for `hp.quniform()` (see the section above)
2. Used the default threshold (0.5) for determining positve/negative class 
3. Beta value can be better parameterized. 

The 1st iteration wasn't perfect, but it was quick and easy. In business context, a quick POC can be helpful in many ways. Just to list a few:

* **Rapid Validation (or Fail Fast)**: POCs help validate the feasibility and viability of a machine learning project in a short timeframe. This prevents investing significant resources in an idea that might not work.
* **Stakeholder Buy-In**: POCs provide a tangible result that you can show to stakeholders, making it easier to gain their support and secure budget for the full project.
* **Baseline for Comparison**: POC performance provides a baseline against which you can compare the performance of the full-scale model. This helps in assessing the progress made and the actual impact of your machine learning solution.
* **Iterative Improvement**: Results from a POC can be used to fine-tune and improve the full project. Lessons learned from a POC can guide you in making necessary adjustments.


In [None]:
# Define scorer for F2 and a cross-validation method
def custom_f2_score(y_true, y_preb):
    beta = 2 
    return fbeta_score(y_true, y_preb, beta=beta)

f_scorer = make_scorer(custom_f2_score, needs_proba=False)
cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)

# Define the hyperparameter search space for the LightGBM classifier
space = {
    'num_leaves': scope.int(hp.quniform('num_leaves', 10, 50, 1)),
    'learning_rate': hp.loguniform('learning_rate', -5, 0),
    'n_estimators': scope.int(hp.quniform('n_estimators', 50, 300, 1)),
    'boosting_type': hp.choice('boosting_type', ['gbdt', 'dart', 'goss']),
}

# Define the objective function for hyperparameter tuning
def objective(params):
    clf = lgb.LGBMClassifier(**params, verbose=-1)
    score = cross_val_score(clf, X_train, y_train, cv=cv, scoring=f_scorer).mean()
    return {'loss': -score, 'status': STATUS_OK}

# Run optimization
trials = Trials()
best = fmin(fn=objective,
            space=space,
            algo=tpe.suggest,
            max_evals=100,
            early_stop_fn=no_progress_loss(20),
            trials=trials)

# Retrieve the best hyperparameters which returns the highest F2
best_params = space_eval(space, best)
print("Best hyperparameters:", best_params)

# performance on holdout set 
best_model = lgb.LGBMClassifier(**best_params, verbose=-1)
best_model.fit(X_train, y_train)
y_pred = best_model.predict(X_holdout)
f2 = fbeta_score(y_holdout, y_pred, beta=2)
print(f'F2 score on the holdout set: {f2:.3f}')