New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Imported JSON xgb.dump yields incorrect predictions due to internal single precision floats #4097
Comments
See #3960 (comment). Can you convert your input data into 32-bit float first? |
Thanks for the reference. In current applications, much of the data comes from a very large DB, so it would be very cumbersome to convert to 32-bit float. I use xgb.dump to convert the model scoring process into SQL so that I can score in-database (instead of attempting to score in-memory). So the features in the database would also need to be converted, and that's unlikely. |
There may not be an easy solution, since XGBoost converts training data into 32-bit floats internally at training time. So splits are chosen assuming that input data is 32-bit. To accommodate your use case, we need to dump 32-bit floating-point
This may or may not be practicable. EDIT. In fact, such guarantee is NOT possible. There will be always a gap between @khotilov Does my explanation seem reasonable to you? |
I agree. I think the only reasonable solution is to converting any data to 32 bit. Thanks for sharing. |
In case anyone is interested, I was able to calculate the same values with the JSON output by using the following code. Note the use of the float library to convert both input data and tree values to floats:
|
Related to the discussion in: #3960
xgboost handles values as single precision floats internally, however when the model is exported as JSON, the values in the child leaf nodes lose precision. This is exacerbated in binary logistic scores if attempting to import the JSON model and parse it for scoring. Importing a binary-saved model does not have this issue, since it apparently maintains the single precision float values.
The text was updated successfully, but these errors were encountered: