Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7333][MLLIB] Add BinaryClassificationEvaluator to PySpark #5885

Closed
wants to merge 3 commits into from

Conversation

mengxr
Copy link
Contributor

@mengxr mengxr commented May 4, 2015

This PR adds BinaryClassificationEvaluator to Python ML Pipelines API, which is a simple wrapper of the Scala implementation. @oefirouz

@SparkQA
Copy link

SparkQA commented May 4, 2015

Test build #31751 has finished for PR 5885 at commit babdde7.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class BinaryClassificationEvaluator(JavaEvaluator, HasLabelCol, HasRawPredictionCol):
    • class HasRawPredictionCol(Params):
    • class Evaluator(object):
    • class JavaEvaluator(Evaluator, JavaWrapper):

@mengxr
Copy link
Contributor Author

mengxr commented May 4, 2015

test this please

@SparkQA
Copy link

SparkQA commented May 4, 2015

Test build #31752 has finished for PR 5885 at commit babdde7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait LDAOptimizer
    • class EMLDAOptimizer extends LDAOptimizer
    • class OnlineLDAOptimizer extends LDAOptimizer
    • class BinaryClassificationEvaluator(JavaEvaluator, HasLabelCol, HasRawPredictionCol):
    • class HasRawPredictionCol(Params):
    • class Evaluator(object):
    • class JavaEvaluator(Evaluator, JavaWrapper):

@mengxr
Copy link
Contributor Author

mengxr commented May 4, 2015

test this please

@SparkQA
Copy link

SparkQA commented May 4, 2015

Test build #31767 has finished for PR 5885 at commit babdde7.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class BinaryClassificationEvaluator(JavaEvaluator, HasLabelCol, HasRawPredictionCol):
    • class HasRawPredictionCol(Params):
    • class Evaluator(object):
    • class JavaEvaluator(Evaluator, JavaWrapper):

@mengxr
Copy link
Contributor Author

mengxr commented May 4, 2015

test this please

@SparkQA
Copy link

SparkQA commented May 4, 2015

Test build #31768 has finished for PR 5885 at commit babdde7.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class BinaryClassificationEvaluator(JavaEvaluator, HasLabelCol, HasRawPredictionCol):
    • class HasRawPredictionCol(Params):
    • class Evaluator(object):
    • class JavaEvaluator(Evaluator, JavaWrapper):

super(HasRawPredictionCol, self).__init__()
#: param for raw prediction column name
self.rawPredictionCol = Param(self, "rawPredictionCol", "raw prediction column name")
if 'rawPrediction' is not None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be a mistake? You are comparing a string to None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this code is generated. Please see _shared_param_code_gen.py.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my mistake!

@@ -652,7 +652,7 @@ def _python_to_sql_converter(dataType):

if isinstance(dataType, StructType):
names, types = zip(*[(f.name, f.dataType) for f in dataType.fields])
converters = map(_python_to_sql_converter, types)
converters = [_python_to_sql_converter(t) for t in types]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python 3, map returns a map object instead of a list. So I changed it to [...] that is compatible with both 2 & 3.

@SparkQA
Copy link

SparkQA commented May 5, 2015

Test build #31840 has finished for PR 5885 at commit 25d7451.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class BinaryClassificationEvaluator(JavaEvaluator, HasLabelCol, HasRawPredictionCol):
    • class HasRawPredictionCol(Params):
    • class Evaluator(object):
    • class JavaEvaluator(Evaluator, JavaWrapper):

@mengxr
Copy link
Contributor Author

mengxr commented May 5, 2015

test this please

@SparkQA
Copy link

SparkQA commented May 5, 2015

Test build #31841 has finished for PR 5885 at commit 25d7451.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class BinaryClassificationEvaluator(JavaEvaluator, HasLabelCol, HasRawPredictionCol):
    • class HasRawPredictionCol(Params):
    • class Evaluator(object):
    • class JavaEvaluator(Evaluator, JavaWrapper):

@mengxr
Copy link
Contributor Author

mengxr commented May 5, 2015

test this please

__metaclass__ = ABCMeta

@abstractmethod
def evaluate(self, dataset, params={}):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "params" be "paramMap" to match Scala?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python cannot overload methods. So it should be both paramMaps and paramMap. I used params here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized I didn't get this. What does "it should be both paramMaps and paramMap" mean?

@jkbradley
Copy link
Member

LGTM pending tests

@SparkQA
Copy link

SparkQA commented May 5, 2015

Test build #31887 has finished for PR 5885 at commit 25d7451.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class BinaryClassificationEvaluator(JavaEvaluator, HasLabelCol, HasRawPredictionCol):
    • class HasRawPredictionCol(Params):
    • class Evaluator(object):
    • class JavaEvaluator(Evaluator, JavaWrapper):

@mengxr
Copy link
Contributor Author

mengxr commented May 5, 2015

Merged into master and branch-1.4.

@asfgit asfgit closed this in ee374e8 May 5, 2015
asfgit pushed a commit that referenced this pull request May 5, 2015
This PR adds `BinaryClassificationEvaluator` to Python ML Pipelines API, which is a simple wrapper of the Scala implementation. oefirouz

Author: Xiangrui Meng <meng@databricks.com>

Closes #5885 from mengxr/SPARK-7333 and squashes the following commits:

25d7451 [Xiangrui Meng] fix tests in python 3
babdde7 [Xiangrui Meng] fix doc
cb51e6a [Xiangrui Meng] add BinaryClassificationEvaluator in PySpark

(cherry picked from commit ee374e8)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
This PR adds `BinaryClassificationEvaluator` to Python ML Pipelines API, which is a simple wrapper of the Scala implementation. oefirouz

Author: Xiangrui Meng <meng@databricks.com>

Closes apache#5885 from mengxr/SPARK-7333 and squashes the following commits:

25d7451 [Xiangrui Meng] fix tests in python 3
babdde7 [Xiangrui Meng] fix doc
cb51e6a [Xiangrui Meng] add BinaryClassificationEvaluator in PySpark
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
This PR adds `BinaryClassificationEvaluator` to Python ML Pipelines API, which is a simple wrapper of the Scala implementation. oefirouz

Author: Xiangrui Meng <meng@databricks.com>

Closes apache#5885 from mengxr/SPARK-7333 and squashes the following commits:

25d7451 [Xiangrui Meng] fix tests in python 3
babdde7 [Xiangrui Meng] fix doc
cb51e6a [Xiangrui Meng] add BinaryClassificationEvaluator in PySpark
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
This PR adds `BinaryClassificationEvaluator` to Python ML Pipelines API, which is a simple wrapper of the Scala implementation. oefirouz

Author: Xiangrui Meng <meng@databricks.com>

Closes apache#5885 from mengxr/SPARK-7333 and squashes the following commits:

25d7451 [Xiangrui Meng] fix tests in python 3
babdde7 [Xiangrui Meng] fix doc
cb51e6a [Xiangrui Meng] add BinaryClassificationEvaluator in PySpark
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants