[SPARK-16356][Follow-up][ML] Enforce ML test of exception for local/distributed Dataset. #15261

yanboliang · 2016-09-27T11:15:17Z

What changes were proposed in this pull request?

#14035 added testImplicits to ML unit tests and promoted toDF(), but left one minor issue at VectorIndexerSuite. If we create the DataFrame by Seq(...).toDF(), it will throw different error/exception compared with sc.parallelize(Seq(...)).toDF() for one of the test cases.

After in-depth study, I found it was caused by different behavior of local and distributed Dataset if the UDF failed at assert. If the data is local Dataset, it throws AssertionError directly; If the data is distributed Dataset, it throws SparkException which is the wrapper of AssertionError. I think we should enforce this test to cover both case.

How was this patch tested?

Unit test.

yanboliang · 2016-09-27T11:28:07Z

cc @HyukjinKwon

HyukjinKwon · 2016-09-27T11:56:47Z

Oh, I see. Thanks for looking into this. +1 for this PR.

HyukjinKwon · 2016-09-27T12:10:50Z

So, this is roughly error from driver-side vs executor side. I see.

SparkQA · 2016-09-27T12:15:28Z

Test build #65964 has finished for PR 15261 at commit c24021e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2016-09-27T12:22:45Z

@HyukjinKwon The root cause of this is Spark supported creating local Dataset which may not trigger a Spark job. This satisfied the design of Dataset, and in most case they have same behavior for local and distributed Dataset except whether to wrapper the exception. We need to enforce the test in ML. If Spark SQL guys unified them in the future, we can make corresponding change. Thanks!

yanboliang · 2016-09-27T12:25:10Z

@srowen Would you mind to have a look when you are available? Thanks.

srowen · 2016-09-28T13:26:39Z

mllib/src/test/scala/org/apache/spark/ml/feature/VectorIndexerSuite.scala

@@ -121,10 +119,17 @@ class VectorIndexerSuite extends SparkFunSuite with MLlibTestSparkContext

    model.transform(densePoints1) // should work
    model.transform(sparsePoints1) // should work
-    intercept[SparkException] {
+    // If the data is local Dataset, it throws AssertionError directly.
+    intercept[AssertionError] {


Maybe off-topic for this review, but is this an assertion error? bad input shouldn't cause an assertion to trip.

Yes, this is indeed a problem, let's try to find out all similar cases and resolved them in a separate PR. Thanks.

yanboliang · 2016-09-29T07:53:54Z

Merged into master, thanks for review.

Enforce ML test of exception for local/distributed Dataset.

c24021e

srowen reviewed Sep 28, 2016

View reviewed changes

srowen approved these changes Sep 28, 2016

View reviewed changes

asfgit closed this in a19a1bb Sep 29, 2016

yanboliang deleted the spark-16356 branch September 29, 2016 07:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-16356][Follow-up][ML] Enforce ML test of exception for local/distributed Dataset. #15261

[SPARK-16356][Follow-up][ML] Enforce ML test of exception for local/distributed Dataset. #15261

yanboliang commented Sep 27, 2016 •

edited

yanboliang commented Sep 27, 2016

HyukjinKwon commented Sep 27, 2016

HyukjinKwon commented Sep 27, 2016

SparkQA commented Sep 27, 2016

yanboliang commented Sep 27, 2016 •

edited

yanboliang commented Sep 27, 2016

srowen Sep 28, 2016

yanboliang Sep 29, 2016

yanboliang commented Sep 29, 2016

[SPARK-16356][Follow-up][ML] Enforce ML test of exception for local/distributed Dataset. #15261

[SPARK-16356][Follow-up][ML] Enforce ML test of exception for local/distributed Dataset. #15261

Conversation

yanboliang commented Sep 27, 2016 • edited

What changes were proposed in this pull request?

How was this patch tested?

yanboliang commented Sep 27, 2016

HyukjinKwon commented Sep 27, 2016

HyukjinKwon commented Sep 27, 2016

SparkQA commented Sep 27, 2016

yanboliang commented Sep 27, 2016 • edited

yanboliang commented Sep 27, 2016

srowen Sep 28, 2016

Choose a reason for hiding this comment

yanboliang Sep 29, 2016

Choose a reason for hiding this comment

yanboliang commented Sep 29, 2016

yanboliang commented Sep 27, 2016 •

edited

yanboliang commented Sep 27, 2016 •

edited