Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3032][Shuffle] Fix key comparison integer overflow introduced sorting exception #2514

Closed
wants to merge 4 commits into from

Conversation

jerryshao
Copy link
Contributor

Previous key comparison in ExternalSorter will get wrong sorting result or exception when key comparison overflows, details can be seen in SPARK-3032. Here fix this and add a unit test to prove it.

@@ -152,7 +152,7 @@ private[spark] class ExternalSorter[K, V, C](
override def compare(a: K, b: K): Int = {
val h1 = if (a == null) 0 else a.hashCode()
val h2 = if (b == null) 0 else b.hashCode()
h1 - h2
Integer.compare(h1, h2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method does not exist in Java 6. There is an equivalent in Guava or you can just write the comparisons directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks Sean.

@SparkQA
Copy link

SparkQA commented Sep 24, 2014

QA tests have started for PR 2514 at commit fa2a08f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 24, 2014

QA tests have started for PR 2514 at commit 83acb38.

  • This patch merges cleanly.

@@ -152,7 +152,7 @@ private[spark] class ExternalSorter[K, V, C](
override def compare(a: K, b: K): Int = {
val h1 = if (a == null) 0 else a.hashCode()
val h2 = if (b == null) 0 else b.hashCode()
h1 - h2
if (h1 < h2) -1 else if (h1 == h2) 0 else 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use this, use Integer.compare instead. Reason is Java will likely optimize the latter one. You can fix it in ExternalAppendOnlyMap too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what about Java 6 compatibility?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mateiz per my comment, that would no longer run in Java 6 as Integer.compare doesn't exist before Java 7.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, we can keep it as is then.

@SparkQA
Copy link

SparkQA commented Sep 24, 2014

QA tests have finished for PR 2514 at commit fa2a08f.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 24, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20741/

@SparkQA
Copy link

SparkQA commented Sep 24, 2014

QA tests have finished for PR 2514 at commit 83acb38.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 24, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20742/

@SparkQA
Copy link

SparkQA commented Sep 25, 2014

QA tests have started for PR 2514 at commit 01911e6.

  • This patch merges cleanly.

@jerryshao
Copy link
Contributor Author

Hi Matei, I modify the unit test to reproduce the exception. Seems it is difficult to reproduce this exception with small dataset manually, as this exception is unreliable.

Here is the reason searched from stack overflow

the exception behavior is unreliable: As long as you have small data sets (so small that a generated run may never gallop, as MIN_GALLOP is 7) or the generated runs always coincidentally generate a merge that never gallops, you will never receive the exception. Thus, without further reviewing the gallopRight method, we can come to the conclusion that you cannot rely on the exception: It may never be thrown no matter how wrong your comparator is.

So here I generate 1m random integer values to reproduce the exception. Seems in my local test with above 1000 rounds of test, this exception can always be produced. But it cannot be logically proved and still have chance to not throw exception.

Also I tested with 1k random integer plus some large data like Integer.MaxValue and Integer.MinValue, hard to reproduce this exception. And with 10k dataset, 1/3 of chance to get the exception.

I think unless someone familiar with TimSort can manually create effectively small dataset, potentially this unit test may fail.

So would you give me some suggestions? Thanks a lot.

@SparkQA
Copy link

SparkQA commented Sep 25, 2014

QA tests have finished for PR 2514 at commit 01911e6.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20788/

@mateiz
Copy link
Contributor

mateiz commented Sep 25, 2014

If you have a test that reproduces it in 1000 runs, it's fine. I'd improve it by seeding a random generator with a fixed seed that you saw produce the problem. TimSort is deterministic, so it will always throw it in that case (though this may break if we change the implementation).

sc = new SparkContext("local-cluster[1,1,512]", "test", conf)

// Using wrongHashOrdering to show integer overflow introduced exception.
val rand = new Random
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just seed this at a given value that you see causes the exception to occur.

@SparkQA
Copy link

SparkQA commented Sep 28, 2014

QA tests have started for PR 2514 at commit 6f3c302.

  • This patch merges cleanly.

@jerryshao
Copy link
Contributor Author

Hi Matei, thanks a lot for your suggestions. I've updated the code with fixed seed. Would you mind taking a look at this? Thanks a lot.

@SparkQA
Copy link

SparkQA commented Sep 28, 2014

QA tests have finished for PR 2514 at commit 6f3c302.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class IndexedRecordToJavaConverter extends Converter[IndexedRecord, JMap[String, Any]]
    • class AvroWrapperToJavaConverter extends Converter[Any, Any]

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20928/

@jerryshao
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Sep 28, 2014

QA tests have started for PR 2514 at commit 6f3c302.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 28, 2014

QA tests have finished for PR 2514 at commit 6f3c302.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class IndexedRecordToJavaConverter extends Converter[IndexedRecord, JMap[String, Any]]
    • class AvroWrapperToJavaConverter extends Converter[Any, Any]

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20931/

asfgit pushed a commit that referenced this pull request Sep 29, 2014
…sorting exception

Previous key comparison in `ExternalSorter` will get wrong sorting result or exception when key comparison overflows, details can be seen in [SPARK-3032](https://issues.apache.org/jira/browse/SPARK-3032). Here fix this and add a unit test to prove it.

Author: jerryshao <saisai.shao@intel.com>

Closes #2514 from jerryshao/SPARK-3032 and squashes the following commits:

6f3c302 [jerryshao] Improve the unit test according to comments
01911e6 [jerryshao] Change the test to show the contract violate exception
83acb38 [jerryshao] Minor changes according to comments
fa2a08f [jerryshao] Fix key comparison integer overflow introduced sorting exception

(cherry picked from commit dab1b0a)
Signed-off-by: Matei Zaharia <matei@databricks.com>
@mateiz
Copy link
Contributor

mateiz commented Sep 29, 2014

Thanks Saisai, this looks good. I've merged it.

@asfgit asfgit closed this in dab1b0a Sep 29, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants