Skip to content

Array: non-dask-arrays and broadcasting in ravel_multi_index#7594

Merged
jrbourbeau merged 9 commits intodask:mainfrom
gjoseph92:array/ravel_multi_index_non_array_args
Jun 9, 2021
Merged

Array: non-dask-arrays and broadcasting in ravel_multi_index#7594
jrbourbeau merged 9 commits intodask:mainfrom
gjoseph92:array/ravel_multi_index_non_array_args

Conversation

@gjoseph92
Copy link
Copy Markdown
Collaborator

@gjoseph92 gjoseph92 commented Apr 23, 2021

ravel_multi_index assumed the input was always a dask array (#7580). This fixes that, but also resolves other things that diverged from the NumPy implementation. Now:

  • Each index array can be >1D
  • Index arrays are broadcast to a common shape first, so they can have different numbers of dimensions (including scalars)

With a little bit more thinking, I'm sure we could get this to use blockwise for better graph performance—it just wasn't working on the multidimensional/broadcasting case. But hopefully, we can rewrite broadcast_arrays and stack soon to use blockwise, so we can use these higher-level operations and get the benefits.

@Illviljan sorry, I found this issue independently and didn't see #7584 until I had already written all this... how does this seem to you?

cc @jrbourbeau @dougiesquire

Note: requires #7593 for tests to pass.

@github-actions github-actions bot added the array label Apr 23, 2021
Copy link
Copy Markdown
Contributor

@Illviljan Illviljan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems more thought out than my version so I think we'll go with this one.

gjoseph92 and others added 6 commits April 26, 2021 12:46
Fixes dask#7580, but is still an incomplete fix: `ravel_multi_index` doesn't correctly broadcast the index arrays against each other like NumPy does, and gets the output chunks/shape wrong when index arrays are multidimensional.

NOTE: tests require dask#7593 to pass.
Works for all cases besides when some input arrays require broadcasting. Also the logic when multi_index is arraylike isn't working.
This reverts commit e1df54e.

Hopefully we can rewrite `broadcast`-style functions, plus `stack`,
to use Blockwise/HLGs and get some optimizations for them, so eventually
the non-blockwise implementation will be just as performant.
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
Copy link
Copy Markdown
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gjoseph92! One small comment, otherwise this looks great

@gjoseph92
Copy link
Copy Markdown
Collaborator Author

@jrbourbeau good point, I've now left it to downstream methods be restrictive about unknown chunksizes.

All green except for one DataFrame test_pivot_table[sum-values1] test on 3.8. Seems flaky, since that should be totally unrelated?

Copy link
Copy Markdown
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, agreed that looks unrelated. Just restarted CI to see if the failure is transient

@jrbourbeau
Copy link
Copy Markdown
Member

jrbourbeau commented May 6, 2021

I've restarted CI a few times and test_pivot_table appears to be failing consistently. I also opened a separate PR (xref #7629) to test CI against the current main branch and didn't see any test failures. That said, I don't see how the changes here are related to test_pivot_table. Could I ask you investigate a bit further before we merge this?

@gjoseph92
Copy link
Copy Markdown
Collaborator Author

Yup, I'll dig into it more. This is very strange.

Copy link
Copy Markdown
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just merged main into this PR and test_pivot_table is now passing ✨. I'm going to merge this and keep an eye out for any test_pivot_table failures. Thanks @gjoseph92!

@jrbourbeau jrbourbeau merged commit 35c926c into dask:main Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dask.array.ravel_multi_index doesn't replicate numpy API

3 participants