Fix divisions when joining two single-partition dataframes#4636
Fix divisions when joining two single-partition dataframes#4636mrocklin merged 5 commits intodask:masterfrom bluecoconut:single-parititon-dask-error
Conversation
| divisions = [None for _ in right.divisions] | ||
|
|
||
| elif right.npartitions == 1: | ||
| elif right.npartitions == 1 and kwargs['how'] in ('inner', 'left'): |
There was a problem hiding this comment.
Are there cases where neither of these two branches is taken?
There was a problem hiding this comment.
These are the conditions required to enter single_partition_join method.
dask.dataframe.multi.py#370-371
There was a problem hiding this comment.
# Single partition on one side
elif (left.npartitions == 1 and how in ('inner', 'right') or
right.npartitions == 1 and how in ('inner', 'left')):
return single_partition_join(left, right, how=how, right_on=right_on,
left_on=left_on, left_index=left_index,
right_index=right_index,
suffixes=suffixes, indicator=indicator)
There was a problem hiding this comment.
Could add (if desired for future developer ease) an else: raise NotImplementedError... Not sure what best practices in this code base are. At the moment, the method is only called inside of that parent elif statement.
There was a problem hiding this comment.
Ah, ok, sounds like it's safe then. An else: raise block seems reasonable to me. I'm happy to defer to your preferences here.
There was a problem hiding this comment.
Added the raise error~
|
Thanks for taking this on @bluecoconut ! Sorry for the slow response on your issue. A combination of conferences and newborns I think is saturating most maintainer cycles these days :) |
|
Thanks @bluecoconut ! |
There was an error where if
npartition=1for bothleftandrighton a merge, then the check inside ofsingle_partition_joinwould short to using therightdivisions, even if it was supposed to belefttable's divisions that are used.flake8 daskNote: the test I added first validated the bug (on previous code), and then was resolved with the fix in
single_partition_joinThis answers the bug identified in #4617