Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade xgboost dependency to be 1.6.1 version #822

Merged
merged 4 commits into from
Aug 18, 2022

Conversation

WeichenXu123
Copy link
Contributor

Signed-off-by: Weichen Xu weichen.xu@databricks.com

Upgrade xgboost dependency to be 1.6.1 version

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@@ -33,6 +33,8 @@ class XGBoostClassificationModelParitySpec extends SparkParityBase {
// These params are not needed for making predictions, so we don't serialize them
override val unserializedParams = Set("labelCol", "evalMetric", "objective")

override val excludedColsForComparison = Array[String]("prediction")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exclude this column when comparing result.
The reason is xgboost might generate some prediction result with equal probability ( 0.5/0.5 ) on each label, then the prediction label might be nondeterministic 0 or 1.
So I exclude the prediction column.
But, we still compare the "probability" column which can ensure the model inference parity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we set the random seed can we get things to be deterministic for the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seemingly no. There're some other non-deterministic factors such as multiple spark tasks run in parallelism and causes different sum order when aggregation.

Assuming it works, I still don't recommend the approach because next time xgboost upgrading we might need to update test again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😢 but makes sense

@WeichenXu123
Copy link
Contributor Author

CC @jsleight Could you help review and merge if it is good ?

@jsleight jsleight merged commit fc45c5a into combust:master Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants