Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zarr to CI environment #4604

Merged
merged 4 commits into from Mar 26, 2019

Conversation

Projects
None yet
5 participants
@jrbourbeau
Copy link
Member

commented Mar 17, 2019

This PR adds zarr to the CI test environment. Right now all zarr-related tests are being skipped (see for example https://travis-ci.org/dask/dask/jobs/507332049#L2190-L2193).

While running tests with zarr installed locally, dask/array/tests/test_array_core.py::test_zarr_return_stored[False] fails.

  • Tests added / passed
  • Passes flake8 dask

@jrbourbeau jrbourbeau marked this pull request as ready for review Mar 17, 2019

@mrocklin

This comment has been minimized.

Copy link
Member

commented Mar 17, 2019

While running tests with zarr installed locally, dask/array/tests/test_array_core.py::test_zarr_return_stored[False] fails.

Should we be concerned that tests passed?

@jrbourbeau

This comment has been minimized.

Copy link
Member Author

commented Mar 18, 2019

The test failures here are occurring because the HighLevelGraph.merge method is creating layers with int keys

dask/dask/highlevelgraph.py

Lines 168 to 170 in aea27d1

elif isinstance(g, Mapping):
layers[id(g)] = g
dependencies[id(g)] = set()

and then dask.array.assert_eq, which checks that HighLevelGraph layer keys are either strings or tuples, raises

assert all(isinstance(k, (tuple, str)) for k in dsk.layers)

@jrbourbeau

This comment has been minimized.

Copy link
Member Author

commented Mar 18, 2019

We could:

  1. Convert int keys to strings (e.g. id(g) -> str(id(g)))
  2. Relax the condition that layers must have keys that are either tuple or str

I don't see an immediate preferred option between these two, but I could be missing something. @mrocklin any thoughts here?

@jrbourbeau

This comment has been minimized.

Copy link
Member Author

commented Mar 18, 2019

Another note: I don't know why the Python 2.7 build is passing on Travis

@mrocklin

This comment has been minimized.

Copy link
Member

commented Mar 18, 2019

This check is intended to enforce that we name our layers, and generally produce well formed HighLevelGraphs. If we provide a raw dictionary (as appears to be happening with the result of retrieve_from_ooc then HighLevelGraphs aren't able to do any high-level deduplication.

My guess is that we want to create a HighLevelGraph from load_store_dsk around line 728 in core.py

@jakirkham

This comment has been minimized.

Copy link
Member

commented Mar 21, 2019

So just call HighLevelGraph.merge() on it or something else?

@mrocklin

This comment has been minimized.

Copy link
Member

commented Mar 22, 2019

@jrbourbeau

This comment has been minimized.

Copy link
Member Author

commented Mar 22, 2019

test_zarr_return_stored is failing for the compute=False case. So it looks like the HighLevelGraph created here

store_dsk = HighLevelGraph.merge(store_dsk, targets_dsk, sources_dsk)

is causing the failure because both targets_dsk and sources_dsk are raw dictionaries

@jakirkham

This comment has been minimized.

Copy link
Member

commented Mar 23, 2019

My guess would be this code from insert_to_ooc and this code from retrieve_from_ooc should change to use HighLevelGraph objects.

targets_dsk should be a list of graphs collected from Delayed objects. Would figure the Delayed objects also return HighLevelGraphs, but it sounds like that is not the case? Or perhaps this issue lies with Delayed.__dask_optimize__? In either case, this particular part of the issue is starting to sound like a problem with Delayed objects not using HighLevelGraphs.

@jakirkham

This comment has been minimized.

Copy link
Member

commented Mar 23, 2019

Looking a little closer it does appear that __dask_optimize__ is returning raw dicts. ( #4626 )

@jrbourbeau

This comment has been minimized.

Copy link
Member Author

commented Mar 25, 2019

Thank you for investigating @jakirkham

In light of the __dask_optimize__ behavior, I'm inclined to either set check_graph=False in the failing assert_eq or mark this test as an xfail for the time being. That way the CI can start running all the zarr-related tests in the meantime, and we can revisit fixing test_zarr_return_stored as a separate issue. Does this seem like a reasonable way forward?

@jakirkham

This comment has been minimized.

Copy link
Member

commented Mar 26, 2019

Happy to help @jrbourbeau.

Working around the problem seems reasonable. Would suggest making the smaller of the two changes namely setting check_graph=False for now if that sounds good to you.

@jrbourbeau

This comment has been minimized.

Copy link
Member Author

commented Mar 26, 2019

Merging in a few hours, unless there are additional comments

@martindurant

This comment has been minimized.

Copy link
Member

commented Mar 26, 2019

+1

@jakirkham jakirkham merged commit ec3e016 into dask:master Mar 26, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jrbourbeau jrbourbeau deleted the jrbourbeau:add-zarr-ci branch Mar 26, 2019

@jakirkham

This comment has been minimized.

Copy link
Member

commented Mar 26, 2019

Noted we should revert the check_graph=False workaround later.

@jakirkham

This comment has been minimized.

Copy link
Member

commented Mar 28, 2019

Thanks @jrbourbeau :)

asmith26 added a commit to asmith26/dask that referenced this pull request Apr 22, 2019

Add zarr to CI environment (dask#4604)
* Add zarr to CI environment

* Remove NUMPY_EXPERIMENTAL_ARRAY_FUNCTION export

* Set check_graph=False

jorge-pessoa pushed a commit to jorge-pessoa/dask that referenced this pull request May 14, 2019

Add zarr to CI environment (dask#4604)
* Add zarr to CI environment

* Remove NUMPY_EXPERIMENTAL_ARRAY_FUNCTION export

* Set check_graph=False
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.