Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn if partitions overlap in compute_divisions #4600

Merged
merged 4 commits into from
Mar 27, 2019

Conversation

bchu
Copy link
Contributor

@bchu bchu commented Mar 15, 2019

This is my suggestion from #4591. Let me know if you think this is worth adding.

@bchu bchu changed the title Add warning if partitions overlap in compute_divisions Raise error if partitions overlap in compute_divisions Mar 16, 2019
Copy link
Member

@mrocklin mrocklin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. The implementation looks clean to me, but it raises some questions. Some thoughts below. Conversation welcome.

@@ -602,6 +602,9 @@ def test_set_index_sorted_true():
with pytest.raises(ValueError):
a.set_index(a.z, sorted=True)

with pytest.raises(ValueError):
a.set_index(a.y, sorted=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this would still work correctly, and we would move the extra data from one partition over to the neighboring one with a little bit of communication.

If we accept the change that you've proposed then we probably don't want to enforce this as a test, but might instead prefer an xfailed test that ensures that the resulting divisions are valid, even if the graph we have to produce is a little bit more complex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also be that we want to accept our current incorrect behavior rather than err. My guess is that in most cases where this happens today it doesn't negatively affect users. This is hard to judge though/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems pretty complicated. :)

Perhaps I could just change this to a warning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could also be that you could leave the existing semi-invalid divisions, and then call repartition(divisions=...) with a suitable new suggestion. I suspect that that would work today and may not be difficult to implement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that and it does not appear to work (without modifying repartition_divisions or creating a new method of repartitioning specific to this almost-partitioned case).

@mrocklin
Copy link
Member

mrocklin commented Mar 16, 2019 via email

@bchu bchu changed the title Raise error if partitions overlap in compute_divisions Warn if partitions overlap in compute_divisions Mar 26, 2019
@bchu
Copy link
Contributor Author

bchu commented Mar 26, 2019

Changed this to a warning only.

@mrocklin mrocklin merged commit 96c3381 into dask:master Mar 27, 2019
@mrocklin
Copy link
Member

Thanks @bchu . This is in.

Also, I notice that this is your first code contribution to this repository. Welcome!

@bchu bchu deleted the overlaperror branch April 23, 2019 10:26
jorge-pessoa pushed a commit to jorge-pessoa/dask that referenced this pull request May 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants