Skip to content

Improve multi dataframe join performance#8740

Merged
jsignell merged 3 commits intodask:mainfrom
holdenk:improve-multi-dataframe-merges
Feb 22, 2022
Merged

Improve multi dataframe join performance#8740
jsignell merged 3 commits intodask:mainfrom
holdenk:improve-multi-dataframe-merges

Conversation

@holdenk
Copy link
Copy Markdown
Contributor

@holdenk holdenk commented Feb 19, 2022

by allowing recursive merge function to take also operate on the initial frame rather than a separate non-parallel merge.

…ive merge function to take also operate on the initial frame rather than a seperate non-parallel merge.

Fix
@GPUtester
Copy link
Copy Markdown
Collaborator

Can one of the admins verify this patch?

@holdenk holdenk changed the title [WIP] Try and improve multi dataframe merges performance Improve multi dataframe join performance Feb 19, 2022
@holdenk
Copy link
Copy Markdown
Contributor Author

holdenk commented Feb 19, 2022

cc @KrishanBhasin who was the last to work on _recursive_pairwise_outer_join.

Copy link
Copy Markdown
Contributor

@KrishanBhasin KrishanBhasin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice! LGTM 🚀

@jsignell
Copy link
Copy Markdown
Member

add to allowlist

@jsignell jsignell merged commit 1f8d2c1 into dask:main Feb 22, 2022
@jsignell
Copy link
Copy Markdown
Member

Thanks for this change @holdenk and thanks @KrishanBhasin for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

join should take maximal advantage of _recursive_pairwise_outer_join when present

4 participants