Conversation
1903f8c to
36af208
Compare
Behaves like NumPy's equivalently named function except with Dask Arrays instead.
1cbe508 to
115d49c
Compare
dask/array/creation.py
Outdated
| r = r[s] | ||
|
|
||
| for j in chain(range(i), range(i + 1, len(dimensions))): | ||
| r = r.repeat(dimensions[j], axis=j).rechunk({j: chunks[j + 1]}) |
There was a problem hiding this comment.
The repeated rechunking here might be expensive in terms of both graph creation and numpy copies.
Also a slight preference to use itertools.chain rather than chain
There was a problem hiding this comment.
Hrm, is it possible to get by with arange and slicing (as you've done above), and multiplying by da.ones(full_shape, chunks=full_chunks)? This might also handle the rechunking automatically. This multiplication is probably more expensive on the workers than is repeat, but I suspect that this isn't that significant and that it's cheaper when constructing the graph
There was a problem hiding this comment.
Was initially worried as I saw some chunking by 1 in earlier code, but it seems repeat here actually results in chunks equal to the number of repeats. Once everything is stacked, it does appear to get the right chunking for everything except the first dimension. However, stack does need to be rechunked on the first dimension as that is chunked by 1.
There was a problem hiding this comment.
Also should add it does seem to go a little faster by dropping the chunking after repeat based on some simple testing.
It appears that calling `repeat` here results in a chunk size that is equal to the number of repetitions. As such these repeated dimensions will be rechunked to the right chunk size automatically. Namely rechunking will be done according to the chunk size of all of the `arange` calls. However when it comes to `stack` rechunking is still required to return the intended chunk size.
Use `chain` from the module instead of importing the function directly.
dask/array/creation.py
Outdated
| grid.append(r) | ||
|
|
||
| if grid: | ||
| grid = stack(grid).rechunk({0: chunks[0]}) |
There was a problem hiding this comment.
I suppose we could actually have chunks just be 1 here. Then the user would only specify chunks for the rest of the array. That way there is no need to rechunk. Thoughts?
There was a problem hiding this comment.
Counterpoint would be this could be confusing to the user to not specify chunks of the same dimensionality of the expected shape.
There was a problem hiding this comment.
So we have k arrays of the same shape and we want to stack them along a new axis. The question is if we should chunk those arrays together or leave them as is? If so then leaving them as is seems fine to me.
There was a problem hiding this comment.
Agreed. That simplifies some code as well. Have pushed a change.
The `indices` function was chunking the 0th dimension based on the user supplied `chunks` argument. Following some discussion it was decided that chunking along this dimension should simply be skipped. After all the user can easily do this themselves if they want. Plus this simplifies a bit of code in the process. So now the chunks are only used for the `arange` function calls. Thus it determines the 1st dimension onwards. The 0th dimension keeps chunks of length 1.
dask/array/creation.py
Outdated
| return Array(dsk, name, chunks, dtype=dtype) | ||
|
|
||
|
|
||
| @wraps(np.indices) |
There was a problem hiding this comment.
I basically got this from copy-n-pasting something from dask.array.core. Is there any value in having this for Dask beyond getting the docstring? If not, was planning to replace the docstring given our parameters differ.
There was a problem hiding this comment.
Replacing with a custom docstring is preferred if you're willing.
There was a problem hiding this comment.
Cool, already have something ready to go. 😉
|
Well I can't think of anything else needed on my end. Anything else you'd like to see @mrocklin or is this good to go? |
|
Hmm...not sure why the AppVeyor tests mysteriously stopped, but it seems unrelated Unfortunately no more information is provided by the log. ref: https://ci.appveyor.com/project/daskdev/dask/build/1.0.1180#L467 |
|
Repushed and it seems everything is green. |
|
Thanks @jakirkham ! |
Fixes #2267
Adds an equivalent function to
numpy.indices.