Add indices by jakirkham · Pull Request #2268 · dask/dask

jakirkham · 2017-04-28T00:01:14Z

Adds an equivalent function to numpy.indices.

Behaves like NumPy's equivalently named function except with Dask Arrays instead.

mrocklin · 2017-04-28T12:34:33Z

dask/array/creation.py

+            r = r[s]
+
+            for j in chain(range(i), range(i + 1, len(dimensions))):
+                r = r.repeat(dimensions[j], axis=j).rechunk({j: chunks[j + 1]})


The repeated rechunking here might be expensive in terms of both graph creation and numpy copies.

Also a slight preference to use itertools.chain rather than chain

Hrm, is it possible to get by with arange and slicing (as you've done above), and multiplying by da.ones(full_shape, chunks=full_chunks)? This might also handle the rechunking automatically. This multiplication is probably more expensive on the workers than is repeat, but I suspect that this isn't that significant and that it's cheaper when constructing the graph

Was initially worried as I saw some chunking by 1 in earlier code, but it seems repeat here actually results in chunks equal to the number of repeats. Once everything is stacked, it does appear to get the right chunking for everything except the first dimension. However, stack does need to be rechunked on the first dimension as that is chunked by 1.

Also should add it does seem to go a little faster by dropping the chunking after repeat based on some simple testing.

It appears that calling `repeat` here results in a chunk size that is equal to the number of repetitions. As such these repeated dimensions will be rechunked to the right chunk size automatically. Namely rechunking will be done according to the chunk size of all of the `arange` calls. However when it comes to `stack` rechunking is still required to return the intended chunk size.

Use `chain` from the module instead of importing the function directly.

jakirkham · 2017-04-28T14:24:00Z

dask/array/creation.py

+            grid.append(r)
+
+    if grid:
+        grid = stack(grid).rechunk({0: chunks[0]})


I suppose we could actually have chunks just be 1 here. Then the user would only specify chunks for the rest of the array. That way there is no need to rechunk. Thoughts?

Counterpoint would be this could be confusing to the user to not specify chunks of the same dimensionality of the expected shape.

So we have k arrays of the same shape and we want to stack them along a new axis. The question is if we should chunk those arrays together or leave them as is? If so then leaving them as is seems fine to me.

Agreed. That simplifies some code as well. Have pushed a change.

jakirkham · 2017-04-28T14:26:52Z

Have addressed your comments @mrocklin. Also have raised a question of my own above.

This seems to be passing except for one failure due to the test_interrupt issue ( #2192 ).

The `indices` function was chunking the 0th dimension based on the user supplied `chunks` argument. Following some discussion it was decided that chunking along this dimension should simply be skipped. After all the user can easily do this themselves if they want. Plus this simplifies a bit of code in the process. So now the chunks are only used for the `arange` function calls. Thus it determines the 1st dimension onwards. The 0th dimension keeps chunks of length 1.

jakirkham · 2017-04-28T16:36:54Z

dask/array/creation.py

    return Array(dsk, name, chunks, dtype=dtype)
+
+
+@wraps(np.indices)


I basically got this from copy-n-pasting something from dask.array.core. Is there any value in having this for Dask beyond getting the docstring? If not, was planning to replace the docstring given our parameters differ.

Replacing with a custom docstring is preferred if you're willing.

Cool, already have something ready to go. 😉

jakirkham · 2017-04-28T19:00:28Z

Well I can't think of anything else needed on my end. Anything else you'd like to see @mrocklin or is this good to go?

jakirkham · 2017-04-28T19:25:17Z

Hmm...not sure why the AppVeyor tests mysteriously stopped, but it seems unrelated Unfortunately no more information is provided by the log.

ref: https://ci.appveyor.com/project/daskdev/dask/build/1.0.1180#L467

jakirkham · 2017-04-28T21:14:21Z

Repushed and it seems everything is green.

mrocklin · 2017-04-29T00:10:15Z

Thanks @jakirkham !

jakirkham · 2017-04-29T18:35:57Z

Thanks @mrocklin. Just realized I forgot to add this to the docs. Fixing that with PR ( #2275 ).

jakirkham mentioned this pull request Apr 28, 2017

Adding indices #2267

Closed

jakirkham force-pushed the add_indices branch 4 times, most recently from 1903f8c to 36af208 Compare April 28, 2017 00:44

Add an indices function

d67baa3

Behaves like NumPy's equivalently named function except with Dask Arrays instead.

jakirkham force-pushed the add_indices branch 4 times, most recently from 1cbe508 to 115d49c Compare April 28, 2017 01:48

Include some tests for the new indices function

92392c4

jakirkham force-pushed the add_indices branch from 115d49c to 92392c4 Compare April 28, 2017 01:51

mrocklin reviewed Apr 28, 2017

View reviewed changes

jakirkham added 2 commits April 28, 2017 09:20

Simply import itertools

3d29e50

Use `chain` from the module instead of importing the function directly.

jakirkham commented Apr 28, 2017

View reviewed changes

Document the indices function

e117c6b

jakirkham force-pushed the add_indices branch from aebda72 to e117c6b Compare April 28, 2017 20:36

mrocklin merged commit ee3b1ed into dask:master Apr 29, 2017

jakirkham deleted the add_indices branch April 29, 2017 18:30

jakirkham mentioned this pull request Apr 29, 2017

Document the indices function #2275

Merged

jakirkham mentioned this pull request May 9, 2017

Implement fourier_shift with Dask Array dask-image/dask-ndfourier#3

Merged

sinhrks added this to the 0.14.2 milestone May 11, 2017

		return Array(dsk, name, chunks, dtype=dtype)


		@wraps(np.indices)

Uh oh!

Conversation

jakirkham commented Apr 28, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakirkham commented Apr 28, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakirkham commented Apr 28, 2017

Uh oh!

jakirkham commented Apr 28, 2017

Uh oh!

jakirkham commented Apr 28, 2017

Uh oh!

mrocklin commented Apr 29, 2017

Uh oh!

jakirkham commented Apr 29, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants