Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test P2P Shuffle in integration tests #569

Closed
3 of 4 tasks
hendrikmakait opened this issue Dec 1, 2022 · 1 comment · Fixed by #597
Closed
3 of 4 tasks

Test P2P Shuffle in integration tests #569

hendrikmakait opened this issue Dec 1, 2022 · 1 comment · Fixed by #597
Assignees

Comments

@hendrikmakait
Copy link
Member

hendrikmakait commented Dec 1, 2022

Parametrize tests to use the tasks and p2p shuffle

  • test_shuffle
  • test_join
    • BONUS: Enable larger data sizes for p2p
  • test_h2o_benchmarks

Only run for 2022.11.0 and larger (which includes dask/distributed@9c6904d)

@ncclementi
Copy link
Contributor

For the BONUS, we can check that out. We do have a 50GB and 500GB dataset version of the h2o-benchmark groupby data on S3 where id1, id2, id3 are as pyarrow strings here
50GB: s3://coiled-datasets/h2o-benchmark/pyarrow_strings/N_1e9_K_1e2/
500GB: s3://coiled-datasets/h2o-benchmark/pyarrow_strings/N_1e10_K_1e2/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants