Require additional dependencies: cloudpickle, partd, fsspec, toolz #7345

jsignell · 2021-03-09T16:21:04Z

Closes Make trivial dependencies mandatory #7244
Tests added / passed
Passes black dask / flake8 dask

I am planning on adding commits to remove handling of cases where these packages aren't available.

jrbourbeau

Nice! Thanks @jsignell -- let me know when you'd like someone to review

jsignell · 2021-03-09T19:48:10Z

This failure has me kind of stumped.

=================================== FAILURES ===================================
_______________________ test_works_with_highlevel_graph ________________________

    def test_works_with_highlevel_graph():
        """Previously `dask.multiprocessing.get` would accidentally forward
        `HighLevelGraph` graphs through the dask optimization/scheduling routines,
        resulting in odd errors. One way to trigger this was to have a
        non-indexable object in a task. This is just a smoketest to ensure that
        things work properly even if `HighLevelGraph` objects get passed to
        `dask.multiprocessing.get`. See https://github.com/dask/dask/issues/7190.
        """
    
        class NoIndex:
            def __init__(self, x):
                self.x = x
    
            def __getitem__(self, key):
                raise Exception("Oh no!")
    
        x = delayed(lambda x: x)(NoIndex(1))
        (res,) = get(x.dask, x.__dask_keys__())
>       assert isinstance(res, NoIndex)
E       AssertionError: assert False
E        +  where False = isinstance(<dask.tests.test_multiprocessing.NoIndex object at 0x7fd42c727490>, <class 'dask.tests.test_multiprocessing.test_works_with_highlevel_graph.<locals>.NoIndex'>)

dask/tests/test_multiprocessing.py:174: AssertionError

jrbourbeau · 2021-03-09T20:42:36Z

That looks to be an issue with cloudpickle which was fixed in the 1.1.1 release. We can fix things by either moving the NoIndex class definition outside of the body of test_works_with_highlevel_graph or updating our minimum cloudpickle to be >=1.1.1 (which is coming up on being two years old)

jsignell · 2021-03-09T20:45:06Z

Oh nice find! I say let's bump up our cloudpickle.

jsignell · 2021-03-10T16:26:26Z

.github/workflows/ci-additional.yml

@@ -74,6 +74,28 @@ jobs:
        shell: bash -l {0}
        run: pytest -v --doctest-modules --ignore-glob='*/test_*.py' dask

+  doctest-pip:


I'm not sure if we care about this, but since it came up in #7358 I thought I'd try it out. I added a skip in conftest.py to support this.

dask/bag/text.py

dask/bytes/__init__.py

setup.py

jsignell · 2021-03-10T16:37:14Z

Ok @crusaderky and @jrbourbeau I think this is ready for you to take a look. I put some questions inline in comments.

jrbourbeau

Overall the changes here look good, thanks for working on this @jsignell. Some comments in addition to the ones below:

There are a couple of pytest.importorskip("tlz") occurances we can remove
Same with pytest.importorskip("dask.bag") and import_or_none("dask.bag")

.github/workflows/ci-additional.yml

conftest.py

dask/bag/text.py

dask/bytes/__init__.py

dask/bytes/core.py

docs/source/install.rst

setup.py

Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>

dask/array/core.py

dask/bag/avro.py

dask/dataframe/io/json.py

dask/diagnostics/profile_visualize.py

dask/tests/test_delayed.py

dask/tests/test_multiprocessing.py

docs/source/install.rst

setup.py

crusaderky · 2021-03-11T10:53:46Z

please fix docstring of dask.multiprocessing.get:

    func_dumps : function
        Function to use for function serialization
        (defaults to cloudpickle.dumps if available, otherwise pickle.dumps)
    func_loads : function
        Function to use for function deserialization
        (defaults to cloudpickle.loads if available, otherwise pickle.loads)

crusaderky · 2021-03-11T10:56:56Z

In docs/source/bag.rst, please remove:

Because the multiprocessing scheduler requires moving functions between multiple
processes, we encourage that Dask Bag users also install the cloudpickle_ library to
enable the transfer of more complex functions.

[...]
.. _cloudpickle: https://github.com/cloudpipe/cloudpickle

Co-authored-by: crusaderky <crusaderky@gmail.com>

jsignell · 2021-03-11T16:16:55Z

Ok! I have addressed all the comments. Thanks for the reviews @crusaderky and @jrbourbeau!

jsignell · 2021-03-11T16:30:01Z

Note that this will require changes in the conda recipes. @martindurant I don't know if we need to let the anaconda conda team know about this. I'll open a PR in the conda-forge recipe now

martindurant · 2021-03-11T16:45:22Z

Thanks @jsignell . I don't think so normally, but I passed it on anyway.

jsignell · 2021-03-11T16:53:21Z

🎉 green! 🎉 @jrbourbeau @crusaderky are you happy with the state of this?

jakirkham · 2021-03-11T17:15:07Z

setup.py

+    "array": ["numpy >= 1.15.1"],
+    "bag": [],  # keeping for backwards compatibility
+    "dataframe": ["numpy >= 1.15.1", "pandas >= 0.25.0"],


No need to do anything in this PR, but I'm curious when do we want to bump these and what versions should we bump them to?

We bumped them a few months ago to this. There was some talk in the maintenance meeting about using released > n months ago as the floor. Apparently xarray does something like this? But we didn't choose a specific policy

At xarray and pint we use a rolling policy based on NEP-29:
https://xarray.pydata.org/en/stable/installing.html#minimum-dependency-versions
The number of months listed is arbitrary and subject to negotiation. The key benefit of such a policy is that it allows developer to bump up minimum dependencies as needed, without initiating a discussion every time.

Leaving alone is fine. I think we have some workarounds for NumPy pre-1.17 that could be cleaned out once that is a minimum

cc @pentschev (in case you have thoughts here 🙂)

Yeah, I also agree we could consider dropping older versions of NumPy. As @crusaderky mentioned NEP-29, it's worth noticing that NumPy is supporting Python releases for 18 months. Is there any reason for us to keep on supporting NumPy versions (or all other dependencies for that matter) that are much older than that?

I opened #7378 to continue this conversation

jakirkham · 2021-03-11T17:23:06Z

setup.py

-install_requires = ["pyyaml"]
+install_requires = [
+    "pyyaml",
+    "cloudpickle >= 1.1.1",


I wonder what we think about bumping this to 1.5.0. That version added support for the pickle5 backport package. Admittedly that is only needed for Python pre-3.8, which is just Python 3.7 for Dask now

I don't think dask/dask has any need to directly acknowledge pickle protocol 5? 1.5.0 is very recent...

distributed pins to 1.5.0, but since that is less than a year old it seemed like overkill for dask/dask.

Cloudpickle will use it under-the-hood if installed

Yeah I don't have strong feelings here. Just wanted to raise for discussion. This also will become irrelevant once Python 3.7 is dropped (guessing that will coincide with the Python 3.10 release)

dask/diagnostics/profile_visualize.py

dask/multiprocessing.py

Co-authored-by: crusaderky <crusaderky@gmail.com>

jrbourbeau

Thanks @jsignell!

@kkraus14

dask/dask#7345 removed some imports that we were improperly using from a dask module. Fix the imports to properly target `fsspec`. Authors: - Keith Kraus (@kkraus14) Approvers: - @jakirkham - Ashwin Srinath (@shwina) URL: #7580

…ask#7345)

@kkraus14

dask/dask#7345 removed some imports that we were improperly using from a dask module. Fix the imports to properly target `fsspec`. Authors: - Keith Kraus (@kkraus14) Approvers: - @jakirkham - Ashwin Srinath (@shwina) URL: rapidsai#7580

jsignell added 3 commits March 9, 2021 10:49

Require additional dependencies

fce153b

Add deps to CI

6cdf109

Run distributed tests

4e1db04

jrbourbeau reviewed Mar 9, 2021

View reviewed changes

jsignell added 4 commits March 9, 2021 12:05

Update cloudpickle min and expect it

9e68808

Remove skips when cloudpickle not found

56cff98

Remove fsspec import oddities

39e0ea9

Import directly from fsspec

8817397

jsignell added 2 commits March 9, 2021 15:51

Update cloudpickle to >=1.1.1

0753d0b

Newer cloudpickle for distributed

16f118b

jsignell mentioned this pull request Mar 10, 2021

[DOCS] Development guide for pip-people incomplete? #7358

Closed

jsignell added 2 commits March 10, 2021 11:03

Run doctests with mindeps as well as latest

659b330

Different doctest with pip .[complete]

e4b38c0

jsignell commented Mar 10, 2021

View reviewed changes

Use deeper git fetch to satisfy pip

ede39ef

jsignell marked this pull request as ready for review March 10, 2021 16:30

jsignell commented Mar 10, 2021

View reviewed changes

dask/bag/text.py Show resolved Hide resolved

jsignell commented Mar 10, 2021

View reviewed changes

dask/bytes/__init__.py Outdated Show resolved Hide resolved

jsignell commented Mar 10, 2021

View reviewed changes

setup.py Outdated Show resolved Hide resolved

jrbourbeau reviewed Mar 10, 2021

View reviewed changes

jsignell and others added 6 commits March 10, 2021 13:06

Update .github/workflows/ci-additional.yml

f957c65

Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>

Don't include the pip doctest

c530dbf

Better format

bdad142

Add delayed extra_require for backwards compat

18d42c9

Add comment about backwards compat

63bf514

Update install doc

dd85f9c

crusaderky requested changes Mar 11, 2021

View reviewed changes

jsignell and others added 4 commits March 11, 2021 09:35

Apply suggestions from code review

4616c29

Co-authored-by: crusaderky <crusaderky@gmail.com>

Making changes from code review

f65bf39

Removing text about optional cloudpickle

1791146

get_mapper from fsspec

b1b11aa

jsignell mentioned this pull request Mar 11, 2021

[MAINT] Clean up import order #7368

Closed

Fix import

d7d8c59

This was referenced Mar 11, 2021

[DO NOT MERGE] Update required dependencies conda-forge/dask-core-feedstock#92

Closed

[DO NOT MERGE] Update required dependencies conda-forge/dask-feedstock#137

Closed

jakirkham reviewed Mar 11, 2021

View reviewed changes

crusaderky requested changes Mar 11, 2021

View reviewed changes

dask/diagnostics/profile_visualize.py Outdated Show resolved Hide resolved

dask/multiprocessing.py Show resolved Hide resolved

jsignell and others added 2 commits March 11, 2021 12:55

Update dask/diagnostics/profile_visualize.py

d52f0ca

Co-authored-by: crusaderky <crusaderky@gmail.com>

Add back in defaults

893a80c

crusaderky approved these changes Mar 11, 2021

View reviewed changes

jrbourbeau approved these changes Mar 12, 2021

View reviewed changes

jrbourbeau merged commit a62fced into dask:main Mar 12, 2021

jsignell mentioned this pull request Mar 12, 2021

Policy for minimum versions on dependencies #7378

Closed

kkraus14 mentioned this pull request Mar 12, 2021

Fix missing Dask imports rapidsai/cudf#7580

Merged

douglasdavis pushed a commit to douglasdavis/dask that referenced this pull request Mar 14, 2021

Require additional dependencies: cloudpickle, partd, fsspec, toolz (d…

bcf38a9

…ask#7345)

jrbourbeau mentioned this pull request Mar 16, 2021

Avoid unnecessary imports for HLG Layer unpacking and materialization #7381

Merged

jakirkham mentioned this pull request Dec 10, 2021

Dropping toolz and cytoolz from requirements conda-forge/scikit-image-feedstock#88

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Require additional dependencies: cloudpickle, partd, fsspec, toolz #7345

Require additional dependencies: cloudpickle, partd, fsspec, toolz #7345

jsignell commented Mar 9, 2021 •

edited

jrbourbeau left a comment

jsignell commented Mar 9, 2021

jrbourbeau commented Mar 9, 2021

jsignell commented Mar 9, 2021

jsignell Mar 10, 2021

jsignell commented Mar 10, 2021

jrbourbeau left a comment

crusaderky commented Mar 11, 2021

crusaderky commented Mar 11, 2021

jsignell commented Mar 11, 2021

jsignell commented Mar 11, 2021

martindurant commented Mar 11, 2021

jsignell commented Mar 11, 2021

jakirkham Mar 11, 2021

jsignell Mar 11, 2021

crusaderky Mar 11, 2021

jakirkham Mar 11, 2021

pentschev Mar 11, 2021

jsignell Mar 12, 2021

jakirkham Mar 11, 2021

crusaderky Mar 11, 2021

jsignell Mar 11, 2021

jakirkham Mar 11, 2021

jrbourbeau left a comment

Require additional dependencies: cloudpickle, partd, fsspec, toolz #7345

Require additional dependencies: cloudpickle, partd, fsspec, toolz #7345

Conversation

jsignell commented Mar 9, 2021 • edited

jrbourbeau left a comment

Choose a reason for hiding this comment

jsignell commented Mar 9, 2021

jrbourbeau commented Mar 9, 2021

jsignell commented Mar 9, 2021

Choose a reason for hiding this comment

jsignell commented Mar 10, 2021

jrbourbeau left a comment

Choose a reason for hiding this comment

crusaderky commented Mar 11, 2021

crusaderky commented Mar 11, 2021

jsignell commented Mar 11, 2021

jsignell commented Mar 11, 2021

martindurant commented Mar 11, 2021

jsignell commented Mar 11, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jrbourbeau left a comment

Choose a reason for hiding this comment

jsignell commented Mar 9, 2021 •

edited