Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimpleShuffleLayer should compare parts_out with set(self.parts_out) #7787

Merged
merged 2 commits into from Jun 22, 2021

Conversation

GenevieveBuckley
Copy link
Contributor

I noticed that the SimpleShuffleLayer cull method should probably be comparing:

        if parts_out != set(self.parts_out):

instead of the current

        if parts_out != self.parts_out:

The reason for this is that we often find self.parts_out is something like range(0, 30), and python won't recognize if a set and range are equal or not, unless we turn the range into a set too.

The BroadcastJoinLayer does do this comparison with a set, I think this is the right way:

if parts_out != set(self.parts_out):

The consequences aren't super severe, I think it's just taking up extra time running an optimization that doesn't have any real effect (so we may as well save time and skip it).

Here's an example you can use where self.parts_out is a range and not a set. I put a breakpoint in before the line I've edited and looked at the values from there.

from dask.datasets import timeseries

ddf = timeseries().shuffle("id", shuffle="tasks")
ddf.compute()
 
  • Tests added / passed
  • Passes black dask / flake8 dask / isort dask

@GenevieveBuckley
Copy link
Contributor Author

@rjzamora I think you're the best person to tell me whether I have the right idea or not

Copy link
Member

@rjzamora rjzamora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @GenevieveBuckley ! It looks like you are correct.

I'm curious what is causing all the CI failures here :/

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just merged main to resolve some unrelated CI failures

@GenevieveBuckley
Copy link
Contributor Author

@dask/maintenance I'd say this is ready. It's been approved by Rick, and James merging main has fixed the CI.

@martindurant martindurant merged commit 486b789 into dask:main Jun 22, 2021
@GenevieveBuckley GenevieveBuckley deleted the set-parts-out branch June 23, 2021 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants