Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apache Spark 3.0 compatibility #4926

Closed
jkbradley opened this issue Oct 9, 2019 · 4 comments
Closed

Apache Spark 3.0 compatibility #4926

jkbradley opened this issue Oct 9, 2019 · 4 comments

Comments

@jkbradley
Copy link
Contributor

This is meant to be a discussion around preparing xgboost for the upcoming Apache Spark 3.0 release.

I participate in the Apache Spark project and would like to make it easier for xgboost to adapt to Spark's upcoming major release. For reference, the current Spark dev list discussions indicate these rough dates:

Some questions to get started:

  • What are reasonable timelines around this?
  • Any comments about coordination with ongoing efforts around an XGBoost 1.0.0 release + PySpark support?

Notes about Spark 3.0:

  • Let me know if there are more questions around Spark 3.0 previews, testing, etc.
  • Overall, I would not expect the newer DataFrame-based API for ML in Spark to change much, but private APIs and the older RDD-based ML API may change.
  • The other big change will be a shift to Scala 2.12.

Thanks!

@jkbradley
Copy link
Contributor Author

jkbradley commented Oct 9, 2019

CC @CodingCat and @mengxr :)

@CodingCat
Copy link
Member

Hi, @jkbradley

One of the expectation from 1.0.0 release of XGB is that we should coordinate with Spark 3.0 if possible. specifically, we are looking forward to GPU scheduling, per stage spark.task.cpus, etc.

for the 1.0.0.preview of XGBoost, Scala 2.12 has been the way to go, so I don't worry about it. But I would think more about whether to support 3.0 preview or stay with 2.4 (or both)

PySpark is something I would like to finish reviewing soon, but i am out of bandwidth (maybe you or @mengxr can help on that ;) )

@jkbradley
Copy link
Contributor Author

Sorry for the slow response! We've been a bit underwater ourselves. But I talked with Xiangrui, and he recommended we ask @WeichenXu123 to help out with reviewing the PySpark integration. I've messaged Weichen offline about that, and we can follow up once we know when that can be prioritized.

That's great about Scala 2.12. For Spark 3.0 coordination, let me (or the dev list) know if you have specific questions!

@hcho3
Copy link
Collaborator

hcho3 commented Jul 16, 2020

Closing this, since we upgraded to Spark 3.0.0 for the latest source.

@hcho3 hcho3 closed this as completed Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants