Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration of Distributed XGBoost on Modin #7094

Open
prutskov opened this issue Jul 8, 2021 · 2 comments
Open

Integration of Distributed XGBoost on Modin #7094

prutskov opened this issue Jul 8, 2021 · 2 comments

Comments

@prutskov
Copy link

prutskov commented Jul 8, 2021

Hi XGBoost!

I am from Modin team. Modin provides an efficient distributed DataFrames and has a distributed implementation of XGBoost.

XGBoost already has support of Modin DataFrames, but currently partitions of Modin DataFrame are just transformed to numpy.array-s and concatenated to one:

data = np.ascontiguousarray(data.values, dtype=dtype)

and possible parallelization between partitions isn't used.

Modin XGBoost is implemented with Ray distribution technology under the hood but support of the other execution engines used in Modin (Dask e.g.) will be added as well. Training and inference happens in parallel between partitions of Modin DataFrame.

Modin team wants to start integration of Modin XGBoost in your repo to have support of distributed Modin DataFrames in the main xgboost package.

The high-level Modin XGBoost documentation can be found here. The developer's documentation with implementation details is here.

Are there any requirements for starting the integration?

@trivialfis
Copy link
Member

other execution engines used in Modin (Dask e.g.) will be added as well

Can I assume this issue is about dask? Since dask is maintained in xgboost's source tree.

@trivialfis
Copy link
Member

I will look into the example code you linked. Sorry for the early reply.

@trivialfis trivialfis added this to Need prioritize in 2.0 Roadmap Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
2.0 Roadmap
  
Need prioritize
Development

No branches or pull requests

2 participants