dask.array indexing bug with slicing with small chunk size on large arrays #452

shoyer · 2015-07-21T18:41:36Z

import dask.array as da
import numpy as np

print np.arange(15000)[12120:12170][4]
#12124
print da.from_array(np.arange(15000), chunks=1)[12120:12170][4].compute()
#12164

mrocklin · 2015-07-21T18:58:48Z

That is indeed odd. Checking it out now

mrocklin · 2015-07-21T19:01:30Z

Perhaps more telling

In [1]: import dask.array as da

In [2]: import numpy as np

In [3]: x = da.from_array(np.arange(10000), chunks=1)[8000:9000][4]

In [4]: x.compute()
Out[4]: array(8196)

In [5]: from dask.optimize import cull

In [6]: cull(x.dask, x._keys())
Out[6]: 
{'from-array-1': array([   0,    1,    2, ..., 9997, 9998, 9999]),
 ('from-array-1', 8196): (<function dask.array.core.getarray>,
  'from-array-1',
  (slice(8196, 8197, None),)),
 ('x_1', 4): (<function operator.getitem>,
  ('from-array-1', 8196),
  (slice(None, None, None),)),
 ('x_2',): (<function operator.getitem>, ('x_1', 4), (0,))}

shoyer · 2015-07-21T20:01:31Z

Secondary indexing is not even necessary:

>>> da.from_array(np.arange(10000), chunks=1)[8000:8200].compute()
array([8192, 8193, 8194, 8195, 8196, 8197, 8198, 8199, 8000, 8001, 8002,
       8003, 8004, 8005, 8006, 8007, 8008, 8009, 8010, 8011, 8012, 8013,
       8014, 8015, 8016, 8017, 8018, 8019, 8020, 8021, 8022, 8023, 8024,
       8025, 8026, 8027, 8028, 8029, 8030, 8031, 8032, 8033, 8034, 8035,
       8036, 8037, 8038, 8039, 8040, 8041, 8042, 8043, 8044, 8045, 8046,
       8047, 8048, 8049, 8050, 8051, 8052, 8053, 8054, 8055, 8056, 8057,
       8058, 8059, 8060, 8061, 8062, 8063, 8064, 8065, 8066, 8067, 8068,
       8069, 8070, 8071, 8072, 8073, 8074, 8075, 8076, 8077, 8078, 8079,
       8080, 8081, 8082, 8083, 8084, 8085, 8086, 8087, 8088, 8089, 8090,
       8091, 8092, 8093, 8094, 8095, 8096, 8097, 8098, 8099, 8100, 8101,
       8102, 8103, 8104, 8105, 8106, 8107, 8108, 8109, 8110, 8111, 8112,
       8113, 8114, 8115, 8116, 8117, 8118, 8119, 8120, 8121, 8122, 8123,
       8124, 8125, 8126, 8127, 8128, 8129, 8130, 8131, 8132, 8133, 8134,
       8135, 8136, 8137, 8138, 8139, 8140, 8141, 8142, 8143, 8144, 8145,
       8146, 8147, 8148, 8149, 8150, 8151, 8152, 8153, 8154, 8155, 8156,
       8157, 8158, 8159, 8160, 8161, 8162, 8163, 8164, 8165, 8166, 8167,
       8168, 8169, 8170, 8171, 8172, 8173, 8174, 8175, 8176, 8177, 8178,
       8179, 8180, 8181, 8182, 8183, 8184, 8185, 8186, 8187, 8188, 8189,
       8190, 8191])

mrocklin · 2015-07-21T20:02:08Z

Whoa, that's bizarre

mrocklin · 2015-07-21T20:09:44Z

I think I've tracked this down to how dicts order keys. We weren't calling sorted when we should have been.

Fixes dask#452

shoyer changed the title ~~dask.array indexing bug with slicing followed by point selection~~ dask.array indexing bug with slicing with small chunk size on large arrays Jul 21, 2015

mrocklin added a commit to mrocklin/dask that referenced this issue Jul 21, 2015

Fix long slicing from dask.keys() sorted issue

d4bb563

Fixes dask#452

mrocklin mentioned this issue Jul 21, 2015

Fix long slicing from dask.keys() sorted issue #453

Merged

mrocklin closed this as completed in #453 Jul 21, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dask.array indexing bug with slicing with small chunk size on large arrays #452

dask.array indexing bug with slicing with small chunk size on large arrays #452

shoyer commented Jul 21, 2015

mrocklin commented Jul 21, 2015

mrocklin commented Jul 21, 2015

shoyer commented Jul 21, 2015

mrocklin commented Jul 21, 2015

mrocklin commented Jul 21, 2015

dask.array indexing bug with slicing with small chunk size on large arrays #452

dask.array indexing bug with slicing with small chunk size on large arrays #452

Comments

shoyer commented Jul 21, 2015

mrocklin commented Jul 21, 2015

mrocklin commented Jul 21, 2015

shoyer commented Jul 21, 2015

mrocklin commented Jul 21, 2015

mrocklin commented Jul 21, 2015