[SPARK-10188] [Pyspark] Pyspark CrossValidator with RMSE selects incorrect model #8399

noel-smith · 2015-08-24T19:07:18Z

Added isLargerBetter() method to Pyspark Evaluator to match the Scala version.
JavaEvaluator delegates isLargerBetter() to underlying Scala object.
Added check for isLargerBetter() in CrossValidator to determine whether to use argmin or argmax.
Added test cases for where smaller is better (RMSE) and larger is better (R-Squared).

(This contribution is my original work and that I license the work to the project under Sparks' open source license)

jkbradley · 2015-08-26T22:27:37Z

ok to test

jkbradley · 2015-08-26T22:27:48Z

Thanks for the PR! I'll take a look now.

jkbradley · 2015-08-26T23:23:10Z

python/pyspark/ml/evaluation.py

@@ -66,6 +66,9 @@ def evaluate(self, dataset, params=None):
        else:
            raise ValueError("Params must be a param map but got %s." % type(params))

+    def isLargerBetter(self):


Can you please copy the doc from Scala here? (no need to copy to child classes since the "inherit_doc" tag will handle that)

jkbradley · 2015-08-26T23:23:57Z

Looks good except for those 2 items.

SparkQA · 2015-08-26T23:26:24Z

Test build #1695 has finished for PR 8399 at commit 7794cf7.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-27T22:49:02Z

Test build #1699 has finished for PR 8399 at commit 7794cf7.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2015-08-28T01:56:56Z

Ping @mengxr In case I can't check this soon, it would be great to get this into 1.5 if there is an RC3.

SparkQA · 2015-08-28T02:11:53Z

Test build #1700 has finished for PR 8399 at commit bada453.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

noel-smith · 2015-08-28T06:47:13Z

That would be great - I've just messaged him. If there are any other changes you need to get this into 1.5 I'll get them in ASAP today.

jkbradley · 2015-08-28T06:59:03Z

No, this LGTM. I'll merge this with branch-1.5 and master now. Thanks very much!

…rrect model * Added isLargerBetter() method to Pyspark Evaluator to match the Scala version. * JavaEvaluator delegates isLargerBetter() to underlying Scala object. * Added check for isLargerBetter() in CrossValidator to determine whether to use argmin or argmax. * Added test cases for where smaller is better (RMSE) and larger is better (R-Squared). (This contribution is my original work and that I license the work to the project under Sparks' open source license) Author: noelsmith <mail@noelsmith.com> Closes #8399 from noel-smith/pyspark-rmse-xval-fix. (cherry picked from commit 7583681) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

noel-smith added 4 commits August 24, 2015 18:02

Added test for cross validation

d00357e

Added/fixed tests for cross validation

6cd4ed1

Removed print statements

63b3835

Added checks for isLargerBetter()

7794cf7

jkbradley reviewed Aug 26, 2015
View reviewed changes

Added fixes from PR notes. Fixed style test failures.

bada453

Style fix - missed a line of whitespace

84c8a40

asfgit closed this in 7583681 Aug 28, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-10188] [Pyspark] Pyspark CrossValidator with RMSE selects incorrect model #8399

[SPARK-10188] [Pyspark] Pyspark CrossValidator with RMSE selects incorrect model #8399

noel-smith commented Aug 24, 2015

jkbradley commented Aug 26, 2015

jkbradley commented Aug 26, 2015

jkbradley Aug 26, 2015

jkbradley commented Aug 26, 2015

SparkQA commented Aug 26, 2015

SparkQA commented Aug 27, 2015

jkbradley commented Aug 28, 2015

SparkQA commented Aug 28, 2015

noel-smith commented Aug 28, 2015

jkbradley commented Aug 28, 2015

[SPARK-10188] [Pyspark] Pyspark CrossValidator with RMSE selects incorrect model #8399

[SPARK-10188] [Pyspark] Pyspark CrossValidator with RMSE selects incorrect model #8399

Conversation

noel-smith commented Aug 24, 2015

jkbradley commented Aug 26, 2015

jkbradley commented Aug 26, 2015

jkbradley Aug 26, 2015

Choose a reason for hiding this comment

jkbradley commented Aug 26, 2015

SparkQA commented Aug 26, 2015

SparkQA commented Aug 27, 2015

jkbradley commented Aug 28, 2015

SparkQA commented Aug 28, 2015

noel-smith commented Aug 28, 2015

jkbradley commented Aug 28, 2015