Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for map_blocks with no array arguments #4713

Merged
merged 4 commits into from Apr 24, 2019

Conversation

Projects
None yet
2 participants
@bmerry
Copy link
Contributor

commented Apr 18, 2019

This comprises multiple pieces:

  • Make unify_chunks work correctly when given only constants

  • If there are no arguments, consider the output to be zero-dimensional
    prior to application of new_axis.

  • Allow blockwise to take a list of chunk sizes in new_axes instead of
    just a length.

  • Add an entry to block_info to describe the output.

  • Tests added / passed

  • Passes flake8 dask

Support for map_blocks with no array arguments
This comprises multiple pieces:
- Make `unify_chunks` work correctly when given only constants
- If there are no arguments, consider the output to be zero-dimensional
  prior to application of `new_axis`.
- Allow blockwise to take a list of chunk sizes in `new_axes` instead of
  just a length.
- Add an entry to `block_info` to describe the output.
@bmerry

This comment has been minimized.

Copy link
Contributor Author

commented Apr 18, 2019

@jakirkham @mrocklin this is an alternative to #4476 which makes it possible to achieve the same things using map_blocks. It requires slightly more boilerplate to achieve the same thing, and maybe we'd still want to add a from_block_function on top of it, but the implementation is a lot simpler since it's just dealing with some special cases in the existing machinery. Here's an example:

    >>> def func(block_info=None):
    ...     loc = block_info[None]['array-location'][0]
    ...     return np.arange(loc[0], loc[1])

    >>> da.map_blocks(func, new_axis=[0], chunks=((4, 4),), dtype=np.float_)
    dask.array<func, shape=(8,), dtype=float64, chunksize=(4,)>

    >>> _.compute()
    array([0, 1, 2, 3, 4, 5, 6, 7])
@mrocklin
Copy link
Member

left a comment

Things here look reasonable to me. Merging in 24 hours if there are no further comments.

I did leave a couple comments for potential improvement, one of which might be easy to accomplish short term, but there's no obligation here (and I suspect that you'd like to see things merged finally :))


x = da.blockwise(f, 'ab', np.float32, None, new_axes={'a': 2, 'b': (3, 3)}, dtype=np.float32)
assert x.chunks == ((2,), (3, 3))
assert_eq(x, np.ones((2, 6), np.float32))

This comment has been minimized.

Copy link
@mrocklin

mrocklin Apr 22, 2019

Member

It's quite cool to see these. In the future I wonder if we might be able to simplify operations like da.ones and da.random to use blockwise.

This comment has been minimized.

Copy link
@bmerry

bmerry Apr 22, 2019

Author Contributor

Quite possibly.

One thing I've found slightly annoying with this interface is that there isn't a field that directly tells you the shape you need to output; you have to compute it by taking the difference of the lower and upper corners from array-location. Do you think it's worth adding another field to the dict for this? Another candidate would be the dtype, which would make it easier to write generic callback functions without having to pass the dtype both to the callback function and to map_blocks itself (although I'm not sure how dtype inference would work).


x = da.map_blocks(func, np.float32, new_axis=[0], chunks=((5, 3),), dtype=np.float32)
assert x.chunks == ((5, 3),)
assert_eq(x, np.arange(8, dtype=np.float32))

This comment has been minimized.

Copy link
@mrocklin

mrocklin Apr 22, 2019

Member

I tried using this naively and came up with.

In [26]: def f():
    ...:     return np.ones((2, 3), np.float32)
    ...:

In [27]: x = da.map_blocks(f, chunks=((2, 2, 2), (3, 3)))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-27-f1c3322fbb11> in <module>
----> 1 x = da.map_blocks(f, chunks=((2, 2, 2), (3, 3)))

~/workspace/dask/dask/array/core.py in map_blocks(func, *args, **kwargs)
    523         if len(chunks) != len(out.numblocks):
    524             raise ValueError("Provided chunks have {0} dims, expected {1} "
--> 525                              "dims.".format(len(chunks), len(out.numblocks)))
    526         chunks2 = []
    527         for i, (c, nb) in enumerate(zip(chunks, out.numblocks)):

ValueError: Provided chunks have 2 dims, expected 0 dims.

I then realized that I needed to provide the new_axis= keyword argument (by looking at this test)

x = da.map_blocks(f, chunks=((2, 2, 2), (3, 3)), new_axis=[0, 1])

But I can imagine that it'd be nicer not to have to do this. No obligation to handle this, but do you have any thoughts on nice ways to improve this for users?

This comment has been minimized.

Copy link
@bmerry

bmerry Apr 22, 2019

Author Contributor

That bugged me too. I wanted to come up with something generic rather than a special rule for the zero-input case.

How about the following: if chunks is specified and new_axis is not specified, then new_axis will default to insert the appropriate number of axes on the left (inserting new axes on the left is consistent with numpy's broadcasting rules for mixed-dimensionality operations).

If you're happy with that, I'll check the code to figure out what "appropriate number" actually means in code, and make sure it interacts sensibly with drop_axis.

This comment has been minimized.

Copy link
@mrocklin

mrocklin Apr 22, 2019

Member

That sounds sensible to me (though I wouldn't be surprised if there were odd complications) I say lets go for it! (assuming you're willing to do all the work :) )

@bmerry

This comment has been minimized.

Copy link
Contributor Author

commented Apr 22, 2019

and I suspect that you'd like to see things merged finally

I'm not in a mad rush - let's do it right the first time. For example, are you happy with the choice of None as the block_info key for the output? Would it be worth introducing a constant for it (similar to how np.newaxis is None but makes code clearer)?

@mrocklin

This comment has been minimized.

Copy link
Member

commented Apr 22, 2019

For example, are you happy with the choice of None as the block_info key for the output? Would it be worth introducing a constant for it (similar to how np.newaxis is None but makes code clearer)?

My first reaction was to use 'out' actually, but decided not to mention it and instead defer to your aesthetic. I don't think that introducing a new constant like np.newaxis for this makes sense. It's not widely enough used that people are likely to come across it and understand what it means.

@bmerry

This comment has been minimized.

Copy link
Contributor Author

commented Apr 22, 2019

My first reaction was to use 'out' actually, but decided not to mention it and instead defer to your aesthetic.

I also considered using a string like this, but it's at least theoretically possible that someone has a kwarg array called out and this would clash with it.

@mrocklin

This comment has been minimized.

Copy link
Member

commented Apr 22, 2019

Good point. Lets just stick with None then, and maybe leave the constant for future discussion?

@bmerry

This comment has been minimized.

Copy link
Contributor Author

commented Apr 22, 2019

Good point. Lets just stick with None then, and maybe leave the constant for future discussion?

Sounds good. I'll take a look at the new_axis thing this week.

bmerry added some commits Apr 23, 2019

map_blocks: infer new_axis from chunks
If chunks is specified and new_axis is not, new_axis defaults to
inserting axes on the left to make the output dimension match the length
of chunks. This is mainly expected to be used for zero-input map_blocks.

This made one of the existing tests fail because it was expecting a
ValueError in a case that is now legal. Changed the test to test the
case where `chunks` has too few dimensions instead of too many, and also
fixed a bug in `map_blocks` where `chunks=()` was being treated as if
`chunks` wasn't supplied.
@bmerry

This comment has been minimized.

Copy link
Contributor Author

commented Apr 23, 2019

I've added two commits:

  • 862eda7 adds the new_axis inference
  • b08f0ee adds extra convenience fields to the output block_info. Let me know what you think - if you're happy with it I should add a test that the right information is being passed.

@bmerry bmerry referenced this pull request Apr 23, 2019

Closed

Add from_block_function to array API #4476

2 of 3 tasks complete

bmerry referenced this pull request Apr 24, 2019

Update block_info tests for output info
This includes fixing a bug from #4301 where `old_k` has an off-by-one
error (and the unit test didn't actually run any of the assertions,
which is how it slipped through).
@bmerry

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2019

@mrocklin I think this should be ready to go now.

@mrocklin

This comment has been minimized.

Copy link
Member

commented Apr 24, 2019

Thanks @bmerry for all of your work here . Merging in!

@mrocklin mrocklin merged commit 21f8de8 into dask:master Apr 24, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@bmerry bmerry deleted the bmerry:map_blocks_no_inputs branch Apr 25, 2019

jorge-pessoa pushed a commit to jorge-pessoa/dask that referenced this pull request May 14, 2019

Support for map_blocks with no array arguments (dask#4713)
* Support for map_blocks with no array arguments

This comprises multiple pieces:
- Make `unify_chunks` work correctly when given only constants
- If there are no arguments, consider the output to be zero-dimensional
  prior to application of `new_axis`.
- Allow blockwise to take a list of chunk sizes in `new_axes` instead of
  just a length.
- Add an entry to `block_info` to describe the output.

* map_blocks: infer new_axis from chunks

If chunks is specified and new_axis is not, new_axis defaults to
inserting axes on the left to make the output dimension match the length
of chunks. This is mainly expected to be used for zero-input map_blocks.

This made one of the existing tests fail because it was expecting a
ValueError in a case that is now legal. Changed the test to test the
case where `chunks` has too few dimensions instead of too many, and also
fixed a bug in `map_blocks` where `chunks=()` was being treated as if
`chunks` wasn't supplied.

* Add 'chunk-shape' and 'dtype' to map_blocks output block_info

* Update block_info tests for output info

This includes fixing a bug from dask#4301 where `old_k` has an off-by-one
error (and the unit test didn't actually run any of the assertions,
which is how it slipped through).

Thomas-Z added a commit to Thomas-Z/dask that referenced this pull request May 17, 2019

Support for map_blocks with no array arguments (dask#4713)
* Support for map_blocks with no array arguments

This comprises multiple pieces:
- Make `unify_chunks` work correctly when given only constants
- If there are no arguments, consider the output to be zero-dimensional
  prior to application of `new_axis`.
- Allow blockwise to take a list of chunk sizes in `new_axes` instead of
  just a length.
- Add an entry to `block_info` to describe the output.

* map_blocks: infer new_axis from chunks

If chunks is specified and new_axis is not, new_axis defaults to
inserting axes on the left to make the output dimension match the length
of chunks. This is mainly expected to be used for zero-input map_blocks.

This made one of the existing tests fail because it was expecting a
ValueError in a case that is now legal. Changed the test to test the
case where `chunks` has too few dimensions instead of too many, and also
fixed a bug in `map_blocks` where `chunks=()` was being treated as if
`chunks` wasn't supplied.

* Add 'chunk-shape' and 'dtype' to map_blocks output block_info

* Update block_info tests for output info

This includes fixing a bug from dask#4301 where `old_k` has an off-by-one
error (and the unit test didn't actually run any of the assertions,
which is how it slipped through).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.