Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn if partitions overlap in compute_divisions #4600

Merged
merged 4 commits into from Mar 27, 2019

Conversation

Projects
None yet
2 participants
@bchu
Copy link
Contributor

commented Mar 15, 2019

This is my suggestion from #4591. Let me know if you think this is worth adding.

@bchu bchu changed the title Add warning if partitions overlap in compute_divisions Raise error if partitions overlap in compute_divisions Mar 16, 2019

@mrocklin
Copy link
Member

left a comment

Thanks for the PR. The implementation looks clean to me, but it raises some questions. Some thoughts below. Conversation welcome.

@@ -602,6 +602,9 @@ def test_set_index_sorted_true():
with pytest.raises(ValueError):
a.set_index(a.z, sorted=True)

with pytest.raises(ValueError):
a.set_index(a.y, sorted=True)

This comment has been minimized.

Copy link
@mrocklin

mrocklin Mar 16, 2019

Member

Ideally this would still work correctly, and we would move the extra data from one partition over to the neighboring one with a little bit of communication.

If we accept the change that you've proposed then we probably don't want to enforce this as a test, but might instead prefer an xfailed test that ensures that the resulting divisions are valid, even if the graph we have to produce is a little bit more complex.

This comment has been minimized.

Copy link
@mrocklin

mrocklin Mar 16, 2019

Member

It may also be that we want to accept our current incorrect behavior rather than err. My guess is that in most cases where this happens today it doesn't negatively affect users. This is hard to judge though/

This comment has been minimized.

Copy link
@bchu

bchu Mar 16, 2019

Author Contributor

Seems pretty complicated. :)

Perhaps I could just change this to a warning.

This comment has been minimized.

Copy link
@mrocklin

mrocklin Mar 17, 2019

Member

It could also be that you could leave the existing semi-invalid divisions, and then call repartition(divisions=...) with a suitable new suggestion. I suspect that that would work today and may not be difficult to implement.

This comment has been minimized.

Copy link
@bchu

bchu Mar 26, 2019

Author Contributor

I tried that and it does not appear to work (without modifying repartition_divisions or creating a new method of repartitioning specific to this almost-partitioned case).

@mrocklin

This comment has been minimized.

Copy link
Member

commented Mar 16, 2019

@bchu bchu changed the title Raise error if partitions overlap in compute_divisions Warn if partitions overlap in compute_divisions Mar 26, 2019

@bchu

This comment has been minimized.

Copy link
Contributor Author

commented Mar 26, 2019

Changed this to a warning only.

@mrocklin mrocklin merged commit 96c3381 into dask:master Mar 27, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@mrocklin

This comment has been minimized.

Copy link
Member

commented Mar 27, 2019

Thanks @bchu . This is in.

Also, I notice that this is your first code contribution to this repository. Welcome!

asmith26 added a commit to asmith26/dask that referenced this pull request Apr 22, 2019

@bchu bchu deleted the bchu:overlaperror branch Apr 23, 2019

jorge-pessoa pushed a commit to jorge-pessoa/dask that referenced this pull request May 14, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.