Skip to content

Error when using a Numpy boolean mask on a Dask array #6089

@astrofrog

Description

@astrofrog

I am trying to extract values from a Dask array using a boolean Numpy array and am running into the following error:

In [8]: import numpy as np                                                                                                                                                                                                                                                                                                                                                                                                                                                                              

In [9]: from dask.array import random                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

In [10]: array = random.random((4, 3, 2))                                                                                                                                                                                                                                                                                                                                                                                                                                                               

In [11]: mask = np.random.random((4, 3, 2)) > 0.5                                                                                                                                                                                                                                                                                                                                                                                                                                                       

In [12]: array[mask]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-12-38297b4b7b0f> in <module>
----> 1 array[mask]

~/python/dev/lib/python3.8/site-packages/dask/array/core.py in __getitem__(self, index)
   1537         )
   1538 
-> 1539         index2 = normalize_index(index, self.shape)
   1540 
   1541         dependencies = {self.name}

~/python/dev/lib/python3.8/site-packages/dask/array/slicing.py in normalize_index(idx, shape)
    813     for i, d in zip(idx, none_shape):
    814         if d is not None:
--> 815             check_index(i, d)
    816     idx = tuple(map(sanitize_index, idx))
    817     idx = tuple(map(normalize_slice, idx, none_shape))

~/python/dev/lib/python3.8/site-packages/dask/array/slicing.py in check_index(ind, dimension)
    867         if x.dtype == bool:
    868             if x.size != dimension:
--> 869                 raise IndexError(
    870                     "Boolean array length %s doesn't equal dimension %s"
    871                     % (x.size, dimension)

IndexError: Boolean array length 24 doesn't equal dimension 4

Note that converting the mask to a dask array fixes this:

In [14]: from dask.array import asarray                                                                                                                                                                                                                                                                                                                                                                                                                                                                 

In [15]: array[asarray(mask)]                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
Out[15]: dask.array<getitem, shape=(nan,), dtype=float64, chunksize=(nan,), chunktype=numpy.ndarray>

In [16]: array[asarray(mask)].compute()                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
Out[16]: 
array([0.87541677, 0.72561386, 0.80415762, 0.90781457, 0.41807167,
       0.88501242, 0.33921153, 0.5392456 , 0.31133289, 0.86372707,
       0.32839977, 0.77310795])

but I thought I should report this as a bug.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions