Warn if partitions overlap in compute_divisions #4600

bchu · 2019-03-15T23:24:18Z

This is my suggestion from #4591. Let me know if you think this is worth adding.

mrocklin

Thanks for the PR. The implementation looks clean to me, but it raises some questions. Some thoughts below. Conversation welcome.

mrocklin · 2019-03-16T02:39:41Z

dask/dataframe/tests/test_shuffle.py

@@ -602,6 +602,9 @@ def test_set_index_sorted_true():
    with pytest.raises(ValueError):
        a.set_index(a.z, sorted=True)

+    with pytest.raises(ValueError):
+        a.set_index(a.y, sorted=True)


Ideally this would still work correctly, and we would move the extra data from one partition over to the neighboring one with a little bit of communication.

If we accept the change that you've proposed then we probably don't want to enforce this as a test, but might instead prefer an xfailed test that ensures that the resulting divisions are valid, even if the graph we have to produce is a little bit more complex.

It may also be that we want to accept our current incorrect behavior rather than err. My guess is that in most cases where this happens today it doesn't negatively affect users. This is hard to judge though/

Seems pretty complicated. :)

Perhaps I could just change this to a warning.

It could also be that you could leave the existing semi-invalid divisions, and then call repartition(divisions=...) with a suitable new suggestion. I suspect that that would work today and may not be difficult to implement.

I tried that and it does not appear to work (without modifying repartition_divisions or creating a new method of repartitioning specific to this almost-partitioned case).

mrocklin · 2019-03-16T15:48:50Z

Sure

…

On Fri, Mar 15, 2019 at 11:56 PM Brian Chu ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In dask/dataframe/tests/test_shuffle.py <#4600 (comment)>: > @@ -602,6 +602,9 @@ def test_set_index_sorted_true(): with pytest.raises(ValueError): a.set_index(a.z, sorted=True) + with pytest.raises(ValueError): + a.set_index(a.y, sorted=True) Seems pretty complicated. :) Perhaps I could just change this to a warning. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4600 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszA8EkaMDRJCDg26Rm6--TMiWIElEks5vXJWFgaJpZM4b3dFn> .

bchu · 2019-03-26T23:38:35Z

Changed this to a warning only.

mrocklin · 2019-03-27T05:16:19Z

Thanks @bchu . This is in.

Also, I notice that this is your first code contribution to this repository. Welcome!

bchu added 3 commits March 15, 2019 15:55

Overlap error

a913b4b

Test case

b3f8759

Add another case

1b7b1ab

bchu changed the title ~~Add warning if partitions overlap in compute_divisions~~ Raise error if partitions overlap in compute_divisions Mar 16, 2019

mrocklin reviewed Mar 16, 2019

View reviewed changes

Change to warning

503d7b8

bchu changed the title ~~Raise error if partitions overlap in compute_divisions~~ Warn if partitions overlap in compute_divisions Mar 26, 2019

mrocklin merged commit 96c3381 into dask:master Mar 27, 2019

bchu deleted the overlaperror branch April 23, 2019 10:26

jorge-pessoa pushed a commit to jorge-pessoa/dask that referenced this pull request May 14, 2019

Warn if partitions overlap in compute_divisions (dask#4600)

76948f4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warn if partitions overlap in compute_divisions #4600

Warn if partitions overlap in compute_divisions #4600

bchu commented Mar 15, 2019

mrocklin left a comment

mrocklin Mar 16, 2019

mrocklin Mar 16, 2019

bchu Mar 16, 2019

mrocklin Mar 17, 2019

bchu Mar 26, 2019

mrocklin commented Mar 16, 2019 via email

bchu commented Mar 26, 2019

mrocklin commented Mar 27, 2019

Warn if partitions overlap in compute_divisions #4600

Warn if partitions overlap in compute_divisions #4600

Conversation

bchu commented Mar 15, 2019

mrocklin left a comment

Choose a reason for hiding this comment

mrocklin Mar 16, 2019

Choose a reason for hiding this comment

mrocklin Mar 16, 2019

Choose a reason for hiding this comment

bchu Mar 16, 2019

Choose a reason for hiding this comment

mrocklin Mar 17, 2019

Choose a reason for hiding this comment

bchu Mar 26, 2019

Choose a reason for hiding this comment

mrocklin commented Mar 16, 2019 via email

bchu commented Mar 26, 2019

mrocklin commented Mar 27, 2019