Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10188] [Pyspark] Pyspark CrossValidator with RMSE selects incorrect model #8399

Closed
wants to merge 6 commits into from

Conversation

noel-smith
Copy link
Contributor

  • Added isLargerBetter() method to Pyspark Evaluator to match the Scala version.
  • JavaEvaluator delegates isLargerBetter() to underlying Scala object.
  • Added check for isLargerBetter() in CrossValidator to determine whether to use argmin or argmax.
  • Added test cases for where smaller is better (RMSE) and larger is better (R-Squared).

(This contribution is my original work and that I license the work to the project under Sparks' open source license)

@jkbradley
Copy link
Member

ok to test

@jkbradley
Copy link
Member

Thanks for the PR! I'll take a look now.

@@ -66,6 +66,9 @@ def evaluate(self, dataset, params=None):
else:
raise ValueError("Params must be a param map but got %s." % type(params))

def isLargerBetter(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please copy the doc from Scala here? (no need to copy to child classes since the "inherit_doc" tag will handle that)

@jkbradley
Copy link
Member

Looks good except for those 2 items.

@SparkQA
Copy link

SparkQA commented Aug 26, 2015

Test build #1695 has finished for PR 8399 at commit 7794cf7.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 27, 2015

Test build #1699 has finished for PR 8399 at commit 7794cf7.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

Ping @mengxr In case I can't check this soon, it would be great to get this into 1.5 if there is an RC3.

@SparkQA
Copy link

SparkQA commented Aug 28, 2015

Test build #1700 has finished for PR 8399 at commit bada453.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@noel-smith
Copy link
Contributor Author

That would be great - I've just messaged him. If there are any other changes you need to get this into 1.5 I'll get them in ASAP today.

@jkbradley
Copy link
Member

No, this LGTM. I'll merge this with branch-1.5 and master now. Thanks very much!

asfgit pushed a commit that referenced this pull request Aug 28, 2015
…rrect model

* Added isLargerBetter() method to Pyspark Evaluator to match the Scala version.
* JavaEvaluator delegates isLargerBetter() to underlying Scala object.
* Added check for isLargerBetter() in CrossValidator to determine whether to use argmin or argmax.
* Added test cases for where smaller is better (RMSE) and larger is better (R-Squared).

(This contribution is my original work and that I license the work to the project under Sparks' open source license)

Author: noelsmith <mail@noelsmith.com>

Closes #8399 from noel-smith/pyspark-rmse-xval-fix.

(cherry picked from commit 7583681)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
@asfgit asfgit closed this in 7583681 Aug 28, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants