Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add draft of array best practices #4705

Merged
merged 9 commits into from Apr 30, 2019
Merged

Conversation

@mrocklin
Copy link
Member

@mrocklin mrocklin commented Apr 16, 2019

Direct edits to this would be particularly welcome, either as pushes to this branch (for people who have permission) or as comments.

  • Tests added / passed
  • Passes flake8 dask
Copy link
Contributor

@djhoese djhoese left a comment

I don't have permissions on this repository so just made some comments. Looks really good. Thanks for adding this information. I think it will be a really helpful resource.

docs/source/array-best-practices.rst Outdated Show resolved Hide resolved
docs/source/array-best-practices.rst Outdated Show resolved Hide resolved
docs/source/array-best-practices.rst Outdated Show resolved Hide resolved
@mrocklin
Copy link
Member Author

@mrocklin mrocklin commented Apr 16, 2019

Thanks @djhoese . I've integrated your suggestions.

@mrocklin
Copy link
Member Author

@mrocklin mrocklin commented Apr 16, 2019

I'd also encourage you to think about new sections that might be appropriate here, if you have time.

@djhoese
Copy link
Contributor

@djhoese djhoese commented Apr 16, 2019

I think I mentioned this in the other thread but may have forgotten, not using nested functions as callbacks. I'm not sure if there is a more general name for this rule but doing the following works in a threaded scheduler but not in others:

def my_processing(dask_arr):
    def my_block_func(chunk_arr, arg1, arg2):
        # complex logic
        return result_arr
    return dask_arr.map_blocks(my_block_func, 5, 6)

The my_block_func should be moved to a global scope.

Copy link
Contributor

@rabernat rabernat left a comment

A few minor suggestions and additions. I have another larger suggestion that I will add via PR to your branch.

docs/source/array-best-practices.rst Outdated Show resolved Hide resolved
docs/source/array-best-practices.rst Show resolved Hide resolved
>>> x = da.from_array(storage, chunks=(1280, 6400))
Note that if you provide ``chunks='auto'`` then Dask Array will look for a
``.chunks`` attribute and use that to provide a good chunking.
Copy link
Contributor

@rabernat rabernat Apr 17, 2019

Unrelated to this PR, but we should make sure that xarray is passing this .chunks attribute properly to dask for auto-chunking with netCDF and zarr storage.

Copy link
Member

@quasiben quasiben Apr 29, 2019

@rabernat have you verified that xarray is doing the correct thing (passing chunks) ? Should we open another issue ?

Copy link
Member

@jakirkham jakirkham Apr 30, 2019

It sounds like this should be raised in Xarray. Though would be happy to follow along in that issue.

docs/source/array-best-practices.rst Outdated Show resolved Hide resolved
@mrocklin mrocklin changed the title [WIP] Add draft of array best practices Add draft of array best practices Apr 27, 2019
@mrocklin
Copy link
Member Author

@mrocklin mrocklin commented Apr 27, 2019

See #4745 for master best practices

@quasiben
Copy link
Member

@quasiben quasiben commented Apr 29, 2019

This has been lingering for a bit. Anything else that needs to be done before this is merged in ?

@mrocklin
Copy link
Member Author

@mrocklin mrocklin commented Apr 30, 2019

Fixes #4514

@mrocklin
Copy link
Member Author

@mrocklin mrocklin commented Apr 30, 2019

@jakirkham can I ask you to take a look here and merge in if things look ok enough?

Co-Authored-By: mrocklin <mrocklin@gmail.com>
jakirkham and others added 2 commits Apr 30, 2019
Co-Authored-By: mrocklin <mrocklin@gmail.com>
Co-Authored-By: mrocklin <mrocklin@gmail.com>
.. autosummary::
map_blocks
reduction
map_overlap
Copy link
Member

@jakirkham jakirkham Apr 30, 2019

Minor suggestion would be to group map_blocks and map_overlap together. Though no strong feelings if that is opposed.

Note: Sorry for not adding a suggested change. GitHub had trouble doing this correctly here.

Copy link
Member Author

@mrocklin mrocklin Apr 30, 2019

Yeah, the single-line restriction is limiting. I'll make the change and push up normally.

Copy link
Member

@jakirkham jakirkham left a comment

Generally looks great! Very helpful. Certainly know a few people that would benefit from having something like this as reference. Thanks for working on it @mrocklin !

Made a few minor comments inline with suggested code changes. Though should be pretty easy to go through.

jorge-pessoa pushed a commit to jorge-pessoa/dask that referenced this issue May 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants