Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error using loss and custom metric #9782

Closed
koch-tobias opened this issue Nov 12, 2023 · 4 comments
Closed

Error using loss and custom metric #9782

koch-tobias opened this issue Nov 12, 2023 · 4 comments

Comments

@koch-tobias
Copy link

Problem
I have a binary classification task with a highly imbalanced dataset. Therefore, i want to use the logloss in combination with a custom function (fbeta score) as scoring metric. However, i get the error (Unknown metric function <function Classifier.xgb_custom_fbeta_score at 0x0000015F662DADD0>) for the following implementation.

If i use only the custom metric it works but then i can not track the logloss during training and validation.

Implementation

model = XGBClassifier(booster=config["xgb_params_binary"]["boosting_type"], 
                    objective="binary:logistic",
                    disable_default_eval_metric=True,
                    eval_metric = None,
                    max_depth=hp["max_depth"],   
                    learning_rate=hp["lr"],
                    colsample_bytree=hp["colsample_bytree"],
                    scale_pos_weight=weight_factor,
                    gamma=hp["gamma"],
                    n_estimators= config["train_settings"]["n_estimators"],
                    verbosity=0,
                    random_state=42
                    )  
def xgb_custom_fbeta_score(preds, dtrain):
        beta = 2 
        labels = dtrain.get_label()
                
        preds_binary = np.round(preds)

        fbeta = fbeta_score(labels, preds_binary, beta=beta)
        
        return 'fbeta', fbeta
callbacks = [xgb.callback.EarlyStopping(config["train_settings"]["early_stopping"], metric_name='fbeta')]

metrics = ['logloss', xgb_custom_fbeta_score]
model.fit(X_train, y_train,
        eval_set=[(X_train, y_train), (X_val, y_val)],
        eval_metric=metrics, 
        callbacks=callbacks
       )      
@trivialfis
Copy link
Member

Hi, due to limitations in the existing implementation, when a custom metric is used, no additional metric names is allowed. However, feel free to remove the disable_default_eval_metric parameter since the default metric should be logloss, then pass the custom metric as:

xgb.XGBClassifier(eval_metric=fbeta_score)

assuming the fbeta_score is similar to the scores in sklearn. Turn it into an error function if you need early stopping.

@pangjac
Copy link

pangjac commented Nov 15, 2023

Hi @trivialfis ,

Could you help clarify below suggestion ?

when a custom metric is used, no additional metric names is allowed.

If I want to track both logloss and one custom metric fbeta, can I write them into one single function such as below, or the customized metric should contain one single metric only ?

def xg_custom_composed_score(preds, dtrain):
    logloss = function_calculate_logloss(preds, dtrain)
    fbeta = fbeta_score(preds, dtrain, beta)
    return [("logloss", logloss), ("fbeta", fbeta)]

Thank you so much!

@trivialfis
Copy link
Member

Unfortunately, XGBoost doesn't work with composite metrics. However, you can use the logloss inside XGBoost along with the fbeta score:

import numpy as np
import xgboost as xgb
from sklearn.datasets import load_digits
from sklearn.metrics import fbeta_score

X, y = load_digits(n_class=2, return_X_y=True)


def error(y_true, y_pred, *args, **kwargs) -> float:
    classes = np.repeat(0, y_pred.shape[0])
    classes[y_pred > 0.5] = 1
    y_pred = classes
    return -fbeta_score(y_true, y_pred, *args, **kwargs, beta=1.0)


clf = xgb.XGBClassifier(eval_metric=error)

clf.fit(X, y, eval_set=[(X, y)])
[0]     validation_0-logloss:0.44484    validation_0-error:0.99726
[1]     validation_0-logloss:0.30749    validation_0-error:0.99726
[2]     validation_0-logloss:0.21896    validation_0-error:0.99726

@pangjac
Copy link

pangjac commented Nov 27, 2023

In case anyone using XGBoost native API , not sklearn API, here is the example:

import numpy as np
import xgboost as xgb
from sklearn.datasets import load_digits
from sklearn.metrics import fbeta_score
from sklearn.model_selection import train_test_split

X, y = load_digits(n_class=2, return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=131)

dtrain = xgb.DMatrix(data=X_train, label=y_train, missing=-999999999)
dvalid = xgb.DMatrix(data=X_test,  label=y_test,  missing=-999999999)

def error_func(predt: np.ndarray, dtrain:xgb.DMatrix) -> Tuple[str, float]:
    y_true = dtrain.get_label()
    classes = np.repeat(0, y_true.shape[0])
    classes[predt > 0.5] = 1
    y_pred = classes
    return "-fbeta", -fbeta_score(y_true, y_pred, beta=0.5)

params = {
    "base_score": 0.5,
    "gamma": 0,
    "eta": 0.1,
    "max_depth": 3,
    "reg_alpha": 0,
    "reg_lambda": 1,
    "objective": "binary:logistic",
}

output = xgb.train(params=params, 
                   dtrain=dtrain, 
                   num_boost_round=100,  # set a high number
                   evals=[(dtrain, "train"),(dvalid, "validation")], 
                   custom_metric=error_func,
                   verbose_eval=True
                   )
    

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants