m2cgen output for xgboost with binary:logistic objective returns raw (not transformed) scores #96

ehoppmann · 2019-08-23T19:07:39Z

Our xgboost models use the binary:logistic' objective function, however the m2cgen converted version of the models return raw scores instead of the transformed scores.

This is fine as long as the user knows this is happening! I didn't, so it took a while to figure out what was going on. I'm wondering if perhaps a useful warning could be raised for users to alert them of this issue? A warning could include a note that they can transform these scores back to the expected probabilities [0, 1] by prob = logistic.cdf(score - base_score) where base_score is an attribute of the xgboost model.

In our case, I'd like to minimize unnecessary processing on the device, so I am actually happy with the current m2cgen output and will instead inverse transform our threshold when evaluating the model output from the transpiled model...but it did take me a bit before I figured out what was going on, which is why I'm suggesting that a user friendly message might be raised when an unsupported objective function is encountered.

Thanks for creating & sharing this great tool!

The text was updated successfully, but these errors were encountered:

izeigerman · 2019-08-23T20:54:22Z

Hey @ehoppmann! Thank you so much for reporting this issue and for your kind feedback!

When you say "transformed" do you mean a probability value between 0 and 1? If so then this is exactly what code generated by m2cgen suppose to return.
The code generated by m2cgen from XGBoost model applies a sigmoid function on a computed result before returning it (for binary:logistic objective). This behavior is consistent with predict_proba method of XGBClassifier.
So the value shouldn't be "raw" unless you accidentally supplied XGBRegressor instance instead of XGBClassifier. If your input is correct then there is a possibility of a bug here.

Can you perhaps share the very last line of the score function from the code generated by m2cgen to make sure that the sigmoid was applied there (or not)?

ehoppmann · 2019-08-26T16:57:02Z

Hey there. I am indeed passing an instance of type xgboost.sklearn.XGBRegressor to m2cgen. If I call the .predict() method on that instance I get back values between 0 and 1, so I had assumed that the code generated by m2cgen would do the same thing. Perhaps this is a misunderstanding of the intended usage patterns of xgb models on my part?

izeigerman · 2019-08-26T19:33:46Z

@ehoppmann I see, thank you for sharing!
The current logic in m2cgen implies that the sigmoid should only be applied if XGBClassifier was passed. Which is clearly wrong based on your example.

Apparently the decision whether to apply the sigmoid or not should be made based on an objective function instead.

I believe that the described issue is a bug and that it should be fixed. Thank you so much for reporting it!

izeigerman · 2019-08-28T19:01:42Z

@ehoppmann As a workaround you can just try passing the XGBClassifier instance instead of XGBRegressor one to ensure that a sigmoid is being applied in the generated code.

HanLiii · 2020-12-10T13:47:22Z

Our xgboost models use the binary:logistic' objective function, however the m2cgen converted version of the models return raw scores instead of the transformed scores.

This is fine as long as the user knows this is happening! I didn't, so it took a while to figure out what was going on. I'm wondering if perhaps a useful warning could be raised for users to alert them of this issue? A warning could include a note that they can transform these scores back to the expected probabilities [0, 1] by prob = logistic.cdf(score - base_score) where base_score is an attribute of the xgboost model.

In our case, I'd like to minimize unnecessary processing on the device, so I am actually happy with the current m2cgen output and will instead inverse transform our threshold when evaluating the model output from the transpiled model...but it did take me a bit before I figured out what was going on, which is why I'm suggesting that a user friendly message might be raised when an unsupported objective function is encountered.

Thanks for creating & sharing this great tool!

hi,in the xgb2c code,what does output in the parameter of the function score mean and how can i get the predicted prob [0, 1] , thanks

izeigerman added bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed labels Aug 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

m2cgen output for xgboost with binary:logistic objective returns raw (not transformed) scores #96

m2cgen output for xgboost with binary:logistic objective returns raw (not transformed) scores #96

ehoppmann commented Aug 23, 2019

izeigerman commented Aug 23, 2019

ehoppmann commented Aug 26, 2019

izeigerman commented Aug 26, 2019

izeigerman commented Aug 28, 2019

HanLiii commented Dec 10, 2020

m2cgen output for xgboost with binary:logistic objective returns raw (not transformed) scores #96

m2cgen output for xgboost with binary:logistic objective returns raw (not transformed) scores #96

Comments

ehoppmann commented Aug 23, 2019

izeigerman commented Aug 23, 2019

ehoppmann commented Aug 26, 2019

izeigerman commented Aug 26, 2019

izeigerman commented Aug 28, 2019

HanLiii commented Dec 10, 2020