Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dask Bag frequencies call generates error for empty partitions #1938

Closed
jspreston opened this issue Jan 27, 2017 · 2 comments
Closed

dask Bag frequencies call generates error for empty partitions #1938

jspreston opened this issue Jan 27, 2017 · 2 comments
Milestone

Comments

@jspreston
Copy link
Contributor

A call to dask.bag.Bag.frequencies() generates an error if the bag has empty partitions, see the example below:

import dask
import dask.bag as db
import numpy as np

bag = db.from_sequence(np.arange(100), partition_size=10)
freq = bag.filter(lambda x: x < 50).frequencies()
freq.compute()

generates:

IndexError: list index out of range

Traceback
---------
  File "dask/async.py", line 266, in execute_task
    result = _execute_task(task, data)
  File "dask/async.py", line 247, in _execute_task
    return func(*args2)
  File "dask/bag/core.py", line 1651, in empty_safe_aggregate
    return empty_safe_apply(func, parts2)
  File "dask/bag/core.py", line 1646, in empty_safe_apply
    return func(part)
  File "dask/bag/core.py", line 1452, in merge_frequencies
    first, rest = seqs[0], seqs[1:]
@jspreston
Copy link
Contributor Author

added fix in pull request #1939

jspreston pushed a commit to jspreston/dask that referenced this issue Jan 31, 2017
@jcrist
Copy link
Member

jcrist commented Jan 31, 2017

Fixed by #1939. Closing.

@jcrist jcrist closed this as completed Jan 31, 2017
@sinhrks sinhrks added this to the 0.14.0 milestone Mar 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants