New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dask dataframe in train_test_split #351

Closed
TomAugspurger opened this Issue Aug 31, 2018 · 3 comments

Comments

Projects
None yet
2 participants
@TomAugspurger
Member

TomAugspurger commented Aug 31, 2018

Easy way: call to_dask_array(lengths=True). This will take some computation.

The harder (maybe not too hard) way to do this would be to directly support dask dataframes.

@mrocklin

This comment has been minimized.

@TomAugspurger

This comment has been minimized.

Member

TomAugspurger commented Aug 31, 2018

Yeah, I think so. Will just take a bit of work to ensure that we split multiple dataframes the same.

@mrocklin

This comment has been minimized.

Member

mrocklin commented Aug 31, 2018

TomAugspurger added a commit to TomAugspurger/dask-ml that referenced this issue Aug 31, 2018

TomAugspurger added a commit that referenced this issue Sep 4, 2018

ENH: Support dask dataframe in train_test_split (#352)
* ENH: Support dask dataframe in train_test_split

Closes #351
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment