Skip to content

Allow p2p shuffle kwarg for DataFrame merges#9900

Merged
fjetter merged 4 commits intodask:mainfrom
fjetter:use_hash_join_p2p
Feb 23, 2023
Merged

Allow p2p shuffle kwarg for DataFrame merges#9900
fjetter merged 4 commits intodask:mainfrom
fjetter:use_hash_join_p2p

Conversation

@fjetter
Copy link
Copy Markdown
Member

@fjetter fjetter commented Jan 31, 2023

Symmetric change to dask/distributed#7514 that will use a specialized HashJoin layer.

Not sure how to test this, yet. Suggestions welcome

TODO:

  • This is not respecting the dask.config("shuffle") value

@fjetter
Copy link
Copy Markdown
Member Author

fjetter commented Feb 23, 2023

I'm using get_default_shuffle_algorithm now as well for the merge. with #9991 this would be enabled as well, be default

@fjetter fjetter marked this pull request as ready for review February 23, 2023 17:12
@fjetter
Copy link
Copy Markdown
Member Author

fjetter commented Feb 23, 2023

Since this is a very low impact change that has only an effect if one provides the keyword p2p and this is blocking dask/distributed#7514 I will go ahead and merge this.

If there is anything wrong with it, I'll take care of it of course.

@fjetter fjetter changed the title Use hash join p2p Allow p2p shuffle kwarg for DataFrame merges Feb 23, 2023
@fjetter fjetter merged commit 769f672 into dask:main Feb 23, 2023
@fjetter fjetter deleted the use_hash_join_p2p branch February 23, 2023 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant