Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dask.bag.from_sequence : ensure that it always includes at least a single partition even if that partition is empty #4475

Merged
merged 2 commits into from Feb 17, 2019

Conversation

Projects
None yet
2 participants
@andersy005
Copy link
Contributor

commented Feb 12, 2019

This addresses #4321

andersy005 added some commits Feb 12, 2019

@martindurant

This comment has been minimized.

Copy link
Member

commented Feb 12, 2019

Does npartitions==0 break functionality elsewhere? I can see that that case and this would be semantically equivalent, am happy with how this is achieved, just wondering why the change was necessary.

@andersy005

This comment has been minimized.

Copy link
Contributor Author

commented Feb 12, 2019

As pointed out in #4321, the current behavior when you try to create a dask dataframe from an empty bag with zero partitions is that you get an unexpected/inconsistent return type (empty tuple), instead of an empty dataframe:

In [1]: import dask.bag                                                      

In [2]: a = dask.bag.from_sequence([])                                       

In [3]: a                                                                    
Out[3]: dask.bag<from_se..., npartitions=0>

In [4]: df = a.to_dataframe(meta={'a': 'int'})                               

In [5]: df.compute()                                                         
Out[5]: ()
@martindurant

This comment has been minimized.

Copy link
Member

commented Feb 12, 2019

OK, that makes sense, and explains your test :)

+1

@martindurant martindurant merged commit 95c6750 into dask:master Feb 17, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@andersy005 andersy005 deleted the andersy005:fix/from_empty_sequence branch Feb 17, 2019

jorge-pessoa pushed a commit to jorge-pessoa/dask that referenced this pull request May 14, 2019

dask.bag.from_sequence : ensure that it always includes at least a si…
…ngle partition even if that partition is empty (dask#4475)

* dask.bag.from_sequence : ensure that it always includes at least a single partition

* Fix linting issues
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.