New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Require additional dependencies: cloudpickle, partd, fsspec, toolz #7345
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks @jsignell -- let me know when you'd like someone to review
This failure has me kind of stumped. =================================== FAILURES ===================================
_______________________ test_works_with_highlevel_graph ________________________
def test_works_with_highlevel_graph():
"""Previously `dask.multiprocessing.get` would accidentally forward
`HighLevelGraph` graphs through the dask optimization/scheduling routines,
resulting in odd errors. One way to trigger this was to have a
non-indexable object in a task. This is just a smoketest to ensure that
things work properly even if `HighLevelGraph` objects get passed to
`dask.multiprocessing.get`. See https://github.com/dask/dask/issues/7190.
"""
class NoIndex:
def __init__(self, x):
self.x = x
def __getitem__(self, key):
raise Exception("Oh no!")
x = delayed(lambda x: x)(NoIndex(1))
(res,) = get(x.dask, x.__dask_keys__())
> assert isinstance(res, NoIndex)
E AssertionError: assert False
E + where False = isinstance(<dask.tests.test_multiprocessing.NoIndex object at 0x7fd42c727490>, <class 'dask.tests.test_multiprocessing.test_works_with_highlevel_graph.<locals>.NoIndex'>)
dask/tests/test_multiprocessing.py:174: AssertionError |
That looks to be an issue with |
Oh nice find! I say let's bump up our cloudpickle. |
.github/workflows/ci-additional.yml
Outdated
@@ -74,6 +74,28 @@ jobs: | |||
shell: bash -l {0} | |||
run: pytest -v --doctest-modules --ignore-glob='*/test_*.py' dask | |||
|
|||
doctest-pip: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we care about this, but since it came up in #7358 I thought I'd try it out. I added a skip in conftest.py to support this.
Ok @crusaderky and @jrbourbeau I think this is ready for you to take a look. I put some questions inline in comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the changes here look good, thanks for working on this @jsignell. Some comments in addition to the ones below:
- There are a couple of
pytest.importorskip("tlz")
occurances we can remove - Same with
pytest.importorskip("dask.bag")
andimport_or_none("dask.bag")
Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>
please fix docstring of dask.multiprocessing.get:
|
In docs/source/bag.rst, please remove:
|
Co-authored-by: crusaderky <crusaderky@gmail.com>
Ok! I have addressed all the comments. Thanks for the reviews @crusaderky and @jrbourbeau! |
Note that this will require changes in the conda recipes. @martindurant I don't know if we need to let the anaconda conda team know about this. I'll open a PR in the conda-forge recipe now |
Thanks @jsignell . I don't think so normally, but I passed it on anyway. |
🎉 green! 🎉 @jrbourbeau @crusaderky are you happy with the state of this? |
"array": ["numpy >= 1.15.1"], | ||
"bag": [], # keeping for backwards compatibility | ||
"dataframe": ["numpy >= 1.15.1", "pandas >= 0.25.0"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to do anything in this PR, but I'm curious when do we want to bump these and what versions should we bump them to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We bumped them a few months ago to this. There was some talk in the maintenance meeting about using released > n months ago as the floor. Apparently xarray does something like this? But we didn't choose a specific policy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At xarray and pint we use a rolling policy based on NEP-29:
https://xarray.pydata.org/en/stable/installing.html#minimum-dependency-versions
The number of months listed is arbitrary and subject to negotiation. The key benefit of such a policy is that it allows developer to bump up minimum dependencies as needed, without initiating a discussion every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving alone is fine. I think we have some workarounds for NumPy pre-1.17 that could be cleaned out once that is a minimum
cc @pentschev (in case you have thoughts here 🙂)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I also agree we could consider dropping older versions of NumPy. As @crusaderky mentioned NEP-29, it's worth noticing that NumPy is supporting Python releases for 18 months. Is there any reason for us to keep on supporting NumPy versions (or all other dependencies for that matter) that are much older than that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #7378 to continue this conversation
install_requires = ["pyyaml"] | ||
install_requires = [ | ||
"pyyaml", | ||
"cloudpickle >= 1.1.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what we think about bumping this to 1.5.0. That version added support for the pickle5
backport package. Admittedly that is only needed for Python pre-3.8, which is just Python 3.7 for Dask now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think dask/dask has any need to directly acknowledge pickle protocol 5? 1.5.0 is very recent...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
distributed pins to 1.5.0, but since that is less than a year old it seemed like overkill for dask/dask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloudpickle will use it under-the-hood if installed
Yeah I don't have strong feelings here. Just wanted to raise for discussion. This also will become irrelevant once Python 3.7 is dropped (guessing that will coincide with the Python 3.10 release)
Co-authored-by: crusaderky <crusaderky@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jsignell!
dask/dask#7345 removed some imports that we were improperly using from a dask module. Fix the imports to properly target `fsspec`. Authors: - Keith Kraus (@kkraus14) Approvers: - @jakirkham - Ashwin Srinath (@shwina) URL: #7580
dask/dask#7345 removed some imports that we were improperly using from a dask module. Fix the imports to properly target `fsspec`. Authors: - Keith Kraus (@kkraus14) Approvers: - @jakirkham - Ashwin Srinath (@shwina) URL: rapidsai#7580
black dask
/flake8 dask
I am planning on adding commits to remove handling of cases where these packages aren't available.