-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade xgboost dependency to be 1.6.1 version #822
Conversation
@@ -33,6 +33,8 @@ class XGBoostClassificationModelParitySpec extends SparkParityBase { | |||
// These params are not needed for making predictions, so we don't serialize them | |||
override val unserializedParams = Set("labelCol", "evalMetric", "objective") | |||
|
|||
override val excludedColsForComparison = Array[String]("prediction") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exclude this column when comparing result.
The reason is xgboost might generate some prediction result with equal probability ( 0.5/0.5 ) on each label, then the prediction label might be nondeterministic 0 or 1.
So I exclude the prediction column.
But, we still compare the "probability" column which can ensure the model inference parity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we set the random seed can we get things to be deterministic for the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seemingly no. There're some other non-deterministic factors such as multiple spark tasks run in parallelism and causes different sum order when aggregation.
Assuming it works, I still don't recommend the approach because next time xgboost upgrading we might need to update test again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😢 but makes sense
CC @jsleight Could you help review and merge if it is good ? |
Signed-off-by: Weichen Xu weichen.xu@databricks.com
Upgrade xgboost dependency to be 1.6.1 version