Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify has_parallel_type() #6927

Merged
merged 2 commits into from Dec 16, 2020
Merged

Conversation

madsbk
Copy link
Contributor

@madsbk madsbk commented Dec 3, 2020

This PR simplify the implementation of has_parallel_type() and make it easy to extend by down stream projects like rapidsai/dask-cuda#451

  • Tests added / passed
  • Passes black dask / flake8 dask

@quasiben
Copy link
Member

quasiben commented Dec 3, 2020

I think this is ok. @mrocklin do you think this will adversely affect GeoPandas users ? My guess is no -- GeoPandas is the only the library (aside from cuDF) I am familiar with which extends dask dataframes

Comment on lines -6418 to +6415
def parallel_types():
return tuple(
k
for k, v in get_parallel_type._lookup.items()
if v is not get_parallel_type_object
)
@get_parallel_type.register(object)
def get_parallel_type_object(_):
return Scalar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitley like your solution better. Do you think there is any need for a get_parallel_type test, or do you think the functionality is covered elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be covered in all the has_parallel_type() and new_dd_object() tests

@rjzamora
Copy link
Member

rjzamora commented Dec 4, 2020

...and make it easy to extend by down stream projects...

I like the simplification here, but can you explain how it helps down-stream projects? (Sorry if this is a silly question)

madsbk added a commit to madsbk/dask-cuda that referenced this pull request Dec 4, 2020
@madsbk
Copy link
Contributor Author

madsbk commented Dec 4, 2020

I think this is ok. @mrocklin do you think this will adversely affect GeoPandas users ? My guess is no -- GeoPandas is the only the library (aside from cuDF) I am familiar with which extends dask dataframes

The semantic of get_parallel_type and has_parallel_type is exactly the same. Extending it using get_parallel_type.register() is also exactly the same.
The only potential breakage is the removal of parallel_types() but AFAIK this is a help function used by has_parallel_type exclusively. At least, searching through github I haven't find any projects using it (GeoPandas doesn't use it).
Having said that, it is easy enough to re-implement if someone is using it.

@madsbk
Copy link
Contributor Author

madsbk commented Dec 4, 2020

I like the simplification here, but can you explain how it helps down-stream projects? (Sorry if this is a silly question)

My fault should have been more clear.
The current implementation makes it hard (if not impossible) for a wrapper class like the proxy object in Dask-CUDA to implement a dispatch handle of get_parallel_type(). The problem is that by registering the dispatch handle, parallel_types() will always include it as a parallel_type even if the instance being wrapped isn't a parallel type.
This is also why the current implementation of parallel_types() has to exclude get_parallel_type_object explicitly.

PS: The device memory spilling in Dask-CUDA makes use of this PR here: rapidsai/dask-cuda@8b977b7

madsbk added a commit to madsbk/dask-cuda that referenced this pull request Dec 7, 2020
@madsbk
Copy link
Contributor Author

madsbk commented Dec 8, 2020

@mrocklin @jrbourbeau, if any of you have time to review this, it would be great. It is currently blocking rapidsai/dask-cuda#451

madsbk added a commit to madsbk/dask-cuda that referenced this pull request Dec 9, 2020
madsbk added a commit to madsbk/dask-cuda that referenced this pull request Dec 10, 2020
madsbk added a commit to madsbk/dask-cuda that referenced this pull request Dec 10, 2020
madsbk added a commit to madsbk/dask-cuda that referenced this pull request Dec 14, 2020
Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @madsbk! Apologies for the delayed review

@jrbourbeau jrbourbeau merged commit 781b3eb into dask:master Dec 16, 2020
@madsbk
Copy link
Contributor Author

madsbk commented Dec 16, 2020

Thanks @madsbk! Apologies for the delayed review

No worries :)

madsbk added a commit to madsbk/dask-cuda that referenced this pull request Dec 17, 2020
@madsbk madsbk deleted the simplify_has_parallel_type branch March 1, 2021 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants