New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handle floating point values more accurate #277
Conversation
@@ -151,7 +151,7 @@ def __init__(self, model): | |||
|
|||
def _assemble_tree(self, tree): | |||
if "leaf" in tree: | |||
return ast.NumVal(tree["leaf"]) | |||
return ast.NumVal(tree["leaf"], dtype=np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly that weight
and bias
are also float
internally:
But at Python side they are loaded into double
numpy array:
https://github.com/dmlc/xgboost/blob/12110c900eff0aaa06045ecf717e6c5a36a164d5/python-package/xgboost/sklearn.py#L717-L718
# all thresholds into float32. | ||
threshold_num_val = ast.NumVal(threshold, dtype=np.float32) | ||
|
||
threshold_num_val = ast.NumVal(self._tree.threshold[node_id]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to #190 (review).
Now threshold matches original type in scikit-learn (double
).
|
||
|
||
def format_float(value): | ||
return np.format_float_positional(value, unique=True, trim="0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe format_float_scientific
will be better: https://numpy.org/doc/stable/reference/generated/numpy.format_float_scientific.html. But I'm not sure how many languages support scientific notation.
y_pred_executed = np.array( | ||
y_pred_executed, dtype=y_pred_true.dtype, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite often different packages not only cast input values in predict
method, but also return result with different types. For instance, XGBoost always returns float
: https://github.com/dmlc/xgboost/blob/12110c900eff0aaa06045ecf717e6c5a36a164d5/python-package/xgboost/core.py#L1373
if isinstance(estimator, (BaseDecisionTree, BaseForest)): | ||
self.X_test = self.X_test.astype(np.float32, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BaseDecisionTree
BaseForest
https://github.com/scikit-learn/scikit-learn/blob/4a60ec129d3088d095b30cf54a670fd596ca4cc8/sklearn/ensemble/_forest.py#L421 (same as for BaseDecisionTree
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly that XGBoost also cast all inputs to float
during predict
.
But including it here makes a lot of tests to fail...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little concerned that here we treat a symptom not the cause. By casting the input vector to float32 and them passing it as strings into estimators we don't exactly reproduce the actual environment, where casted values will be transformed back to doubles because of the score
function signature. What do you think? Am I overthinking this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little concerned that here we treat a symptom not the cause.
Yeah, you're absolutely right! To fix the root cause we should support multiple floating types in target languages. In this PR I just propose to make our tests a little bit fairer. Native libraries do double -> float
conversion and in our tests we perform double -> float -> double
. Unfortunately, I'm not a numerical expert but it seems that float -> double
is a safe conversion: https://stackoverflow.com/questions/29648271/convert-float-double-float.
elif isinstance(estimator, BaseLibSVM): | ||
self.X_test = self.X_test.astype(np.float64, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return np.float64(items[0]) | ||
else: | ||
return [float(i) for i in items] | ||
return [np.float64(i) for i in items] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use numpy types across all codebase for the consistency.
This is some amazing investigation (as always)! I'll take a look soon. Thank you 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall but I don't understand why numbers in tests shifted so dramatically.
tests/assemblers/test_xgboost.py
Outdated
ast.IfExpr( | ||
ast.CompExpr( | ||
ast.FeatureRef(5), | ||
ast.NumVal(6.79699993), | ||
ast.FeatureRef(12), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand why the feature index changed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, all tests have changed due to new train/test splitting routine which is now done by scikit-learn function (train_test_split
). I found it easier to change random_state
param only in one place compared to manual shuffle
ing in multiple places (see the diff in my opening comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, stupid me. I overlooked that change. It's a bit harder to compare apples to apples because of this though. Do you by chance remember whether there were any changes to expected values before you updated the splitting logic? I'm trying to identify the impact of this update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, updating splitting logic was the first step as I wanted to play around with random_state
. Let's choose easier way. Let me split this PR into two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in aeb4301.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is so much better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, this looks great. Thanks a lot 👍
I run all tests with the increased test dataset fraction (
0.6
) and compared results with ones obtained frommaster
but with changed dataset splitting routine to ensure that inputs are the same in both cases.Seems that at least it doesn't make things worse, but even decreases the number of failed tests sometimes.