Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix divisions when joining two single-partition dataframes #4636

Merged
merged 5 commits into from Mar 27, 2019
Merged

Fix divisions when joining two single-partition dataframes #4636

merged 5 commits into from Mar 27, 2019

Conversation

@bluecoconut
Copy link
Contributor

@bluecoconut bluecoconut commented Mar 27, 2019

There was an error where if npartition=1 for both left and right on a merge, then the check inside of single_partition_join would short to using the right divisions, even if it was supposed to be left table's divisions that are used.

  • Tests added / passed
  • Passes flake8 dask

Note: the test I added first validated the bug (on previous code), and then was resolved with the fix in single_partition_join

This answers the bug identified in #4617

@bluecoconut bluecoconut changed the title Single parititon dask error Single parititon merge bug Mar 27, 2019
@@ -295,7 +295,7 @@ def single_partition_join(left, right, **kwargs):
else:
divisions = [None for _ in right.divisions]

elif right.npartitions == 1:
elif right.npartitions == 1 and kwargs['how'] in ('inner', 'left'):
Copy link
Member

@mrocklin mrocklin Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there cases where neither of these two branches is taken?

Loading

Copy link
Contributor Author

@bluecoconut bluecoconut Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the conditions required to enter single_partition_join method.

dask.dataframe.multi.py#370-371

Loading

Copy link
Contributor Author

@bluecoconut bluecoconut Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    # Single partition on one side
    elif (left.npartitions == 1 and how in ('inner', 'right') or
          right.npartitions == 1 and how in ('inner', 'left')):
        return single_partition_join(left, right, how=how, right_on=right_on,
                                     left_on=left_on, left_index=left_index,
                                     right_index=right_index,
                                     suffixes=suffixes, indicator=indicator)

Loading

Copy link
Contributor Author

@bluecoconut bluecoconut Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add (if desired for future developer ease) an else: raise NotImplementedError... Not sure what best practices in this code base are. At the moment, the method is only called inside of that parent elif statement.

Loading

Copy link
Member

@mrocklin mrocklin Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, sounds like it's safe then. An else: raise block seems reasonable to me. I'm happy to defer to your preferences here.

Loading

Copy link
Contributor Author

@bluecoconut bluecoconut Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the raise error~

Loading

@mrocklin
Copy link
Member

@mrocklin mrocklin commented Mar 27, 2019

Thanks for taking this on @bluecoconut ! Sorry for the slow response on your issue. A combination of conferences and newborns I think is saturating most maintainer cycles these days :)

Loading

@mrocklin mrocklin changed the title Single parititon merge bug Fix divisions when joining two single-partition dataframes Mar 27, 2019
@mrocklin mrocklin merged commit a611e60 into dask:master Mar 27, 2019
2 checks passed
Loading
@mrocklin
Copy link
Member

@mrocklin mrocklin commented Mar 27, 2019

Thanks @bluecoconut !

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants