Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add draft of array best practices #4705

Merged
merged 9 commits into from Apr 30, 2019

Conversation

Projects
None yet
7 participants
@mrocklin
Copy link
Member

commented Apr 16, 2019

Direct edits to this would be particularly welcome, either as pushes to this branch (for people who have permission) or as comments.

  • Tests added / passed
  • Passes flake8 dask
@djhoese
Copy link
Contributor

left a comment

I don't have permissions on this repository so just made some comments. Looks really good. Thanks for adding this information. I think it will be a really helpful resource.

Show resolved Hide resolved docs/source/array-best-practices.rst Outdated
Show resolved Hide resolved docs/source/array-best-practices.rst Outdated
Show resolved Hide resolved docs/source/array-best-practices.rst Outdated
@mrocklin

This comment has been minimized.

Copy link
Member Author

commented Apr 16, 2019

Thanks @djhoese . I've integrated your suggestions.

@mrocklin

This comment has been minimized.

Copy link
Member Author

commented Apr 16, 2019

I'd also encourage you to think about new sections that might be appropriate here, if you have time.

@djhoese

This comment has been minimized.

Copy link
Contributor

commented Apr 16, 2019

I think I mentioned this in the other thread but may have forgotten, not using nested functions as callbacks. I'm not sure if there is a more general name for this rule but doing the following works in a threaded scheduler but not in others:

def my_processing(dask_arr):
    def my_block_func(chunk_arr, arg1, arg2):
        # complex logic
        return result_arr
    return dask_arr.map_blocks(my_block_func, 5, 6)

The my_block_func should be moved to a global scope.

@rabernat
Copy link

left a comment

A few minor suggestions and additions. I have another larger suggestion that I will add via PR to your branch.

Show resolved Hide resolved docs/source/array-best-practices.rst Outdated
Show resolved Hide resolved docs/source/array-best-practices.rst
>>> x = da.from_array(storage, chunks=(1280, 6400))
Note that if you provide ``chunks='auto'`` then Dask Array will look for a
``.chunks`` attribute and use that to provide a good chunking.

This comment has been minimized.

Copy link
@rabernat

rabernat Apr 17, 2019

Unrelated to this PR, but we should make sure that xarray is passing this .chunks attribute properly to dask for auto-chunking with netCDF and zarr storage.

This comment has been minimized.

Copy link
@quasiben

quasiben Apr 29, 2019

Member

@rabernat have you verified that xarray is doing the correct thing (passing chunks) ? Should we open another issue ?

This comment has been minimized.

Copy link
@jakirkham

jakirkham Apr 30, 2019

Member

It sounds like this should be raised in Xarray. Though would be happy to follow along in that issue.

Show resolved Hide resolved docs/source/array-best-practices.rst Outdated

@mrocklin mrocklin changed the title [WIP] Add draft of array best practices Add draft of array best practices Apr 27, 2019

@mrocklin

This comment has been minimized.

Copy link
Member Author

commented Apr 27, 2019

See #4745 for master best practices

@quasiben

This comment has been minimized.

Copy link
Member

commented Apr 29, 2019

This has been lingering for a bit. Anything else that needs to be done before this is merged in ?

@mrocklin

This comment has been minimized.

Copy link
Member Author

commented Apr 30, 2019

Fixes #4514

@mrocklin

This comment has been minimized.

Copy link
Member Author

commented Apr 30, 2019

@jakirkham can I ask you to take a look here and merge in if things look ok enough?

Update docs/source/array-best-practices.rst
Co-Authored-By: mrocklin <mrocklin@gmail.com>

jakirkham and others added some commits Apr 30, 2019

Update docs/source/array-best-practices.rst [skip ci]
Co-Authored-By: mrocklin <mrocklin@gmail.com>
Update docs/source/array-best-practices.rst [skip ci]
Co-Authored-By: mrocklin <mrocklin@gmail.com>
.. autosummary::
map_blocks
reduction
map_overlap

This comment has been minimized.

Copy link
@jakirkham

jakirkham Apr 30, 2019

Member

Minor suggestion would be to group map_blocks and map_overlap together. Though no strong feelings if that is opposed.

Note: Sorry for not adding a suggested change. GitHub had trouble doing this correctly here.

This comment has been minimized.

Copy link
@mrocklin

mrocklin Apr 30, 2019

Author Member

Yeah, the single-line restriction is limiting. I'll make the change and push up normally.

@jakirkham
Copy link
Member

left a comment

Generally looks great! Very helpful. Certainly know a few people that would benefit from having something like this as reference. Thanks for working on it @mrocklin !

Made a few minor comments inline with suggested code changes. Though should be pretty easy to go through.

@mrocklin mrocklin merged commit 274c4f6 into dask:master Apr 30, 2019

jorge-pessoa pushed a commit to jorge-pessoa/dask that referenced this pull request May 14, 2019

Thomas-Z added a commit to Thomas-Z/dask that referenced this pull request May 17, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.