[SPARK-7333][MLLIB] Add BinaryClassificationEvaluator to PySpark #5885

mengxr · 2015-05-04T08:13:47Z

This PR adds BinaryClassificationEvaluator to Python ML Pipelines API, which is a simple wrapper of the Scala implementation. @oefirouz

SparkQA · 2015-05-04T08:20:34Z

Test build #31751 has finished for PR 5885 at commit babdde7.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class BinaryClassificationEvaluator(JavaEvaluator, HasLabelCol, HasRawPredictionCol):
- class HasRawPredictionCol(Params):
- class Evaluator(object):
- class JavaEvaluator(Evaluator, JavaWrapper):

mengxr · 2015-05-04T08:53:08Z

test this please

SparkQA · 2015-05-04T10:07:31Z

Test build #31752 has finished for PR 5885 at commit babdde7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- trait LDAOptimizer
- class EMLDAOptimizer extends LDAOptimizer
- class OnlineLDAOptimizer extends LDAOptimizer
- class BinaryClassificationEvaluator(JavaEvaluator, HasLabelCol, HasRawPredictionCol):
- class HasRawPredictionCol(Params):
- class Evaluator(object):
- class JavaEvaluator(Evaluator, JavaWrapper):

mengxr · 2015-05-04T16:49:57Z

test this please

SparkQA · 2015-05-04T16:55:47Z

Test build #31767 has finished for PR 5885 at commit babdde7.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class BinaryClassificationEvaluator(JavaEvaluator, HasLabelCol, HasRawPredictionCol):
- class HasRawPredictionCol(Params):
- class Evaluator(object):
- class JavaEvaluator(Evaluator, JavaWrapper):

mengxr · 2015-05-04T17:04:17Z

test this please

SparkQA · 2015-05-04T18:38:54Z

Test build #31768 has finished for PR 5885 at commit babdde7.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class BinaryClassificationEvaluator(JavaEvaluator, HasLabelCol, HasRawPredictionCol):
- class HasRawPredictionCol(Params):
- class Evaluator(object):
- class JavaEvaluator(Evaluator, JavaWrapper):

oefirouz · 2015-05-04T20:59:16Z

python/pyspark/ml/param/shared.py

+        super(HasRawPredictionCol, self).__init__()
+        #: param for raw prediction column name
+        self.rawPredictionCol = Param(self, "rawPredictionCol", "raw prediction column name")
+        if 'rawPrediction' is not None:


I think this might be a mistake? You are comparing a string to None.

No, this code is generated. Please see _shared_param_code_gen.py.

Ah, my mistake!

mengxr · 2015-05-05T06:12:57Z

python/pyspark/sql/_types.py

@@ -652,7 +652,7 @@ def _python_to_sql_converter(dataType):

    if isinstance(dataType, StructType):
        names, types = zip(*[(f.name, f.dataType) for f in dataType.fields])
-        converters = map(_python_to_sql_converter, types)
+        converters = [_python_to_sql_converter(t) for t in types]


In Python 3, map returns a map object instead of a list. So I changed it to [...] that is compatible with both 2 & 3.

SparkQA · 2015-05-05T06:15:39Z

Test build #31840 has finished for PR 5885 at commit 25d7451.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class BinaryClassificationEvaluator(JavaEvaluator, HasLabelCol, HasRawPredictionCol):
- class HasRawPredictionCol(Params):
- class Evaluator(object):
- class JavaEvaluator(Evaluator, JavaWrapper):

mengxr · 2015-05-05T06:45:32Z

test this please

SparkQA · 2015-05-05T07:58:22Z

Test build #31841 has finished for PR 5885 at commit 25d7451.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class BinaryClassificationEvaluator(JavaEvaluator, HasLabelCol, HasRawPredictionCol):
- class HasRawPredictionCol(Params):
- class Evaluator(object):
- class JavaEvaluator(Evaluator, JavaWrapper):

mengxr · 2015-05-05T16:47:49Z

test this please

jkbradley · 2015-05-05T17:38:19Z

python/pyspark/ml/pipeline.py

+    __metaclass__ = ABCMeta
+
+    @abstractmethod
+    def evaluate(self, dataset, params={}):


Should "params" be "paramMap" to match Scala?

Python cannot overload methods. So it should be both paramMaps and paramMap. I used params here.

I realized I didn't get this. What does "it should be both paramMaps and paramMap" mean?

jkbradley · 2015-05-05T17:45:02Z

LGTM pending tests

SparkQA · 2015-05-05T18:35:31Z

Test build #31887 has finished for PR 5885 at commit 25d7451.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class BinaryClassificationEvaluator(JavaEvaluator, HasLabelCol, HasRawPredictionCol):
- class HasRawPredictionCol(Params):
- class Evaluator(object):
- class JavaEvaluator(Evaluator, JavaWrapper):

mengxr · 2015-05-05T18:46:07Z

Merged into master and branch-1.4.

This PR adds `BinaryClassificationEvaluator` to Python ML Pipelines API, which is a simple wrapper of the Scala implementation. oefirouz Author: Xiangrui Meng <meng@databricks.com> Closes #5885 from mengxr/SPARK-7333 and squashes the following commits: 25d7451 [Xiangrui Meng] fix tests in python 3 babdde7 [Xiangrui Meng] fix doc cb51e6a [Xiangrui Meng] add BinaryClassificationEvaluator in PySpark (cherry picked from commit ee374e8) Signed-off-by: Xiangrui Meng <meng@databricks.com>

This PR adds `BinaryClassificationEvaluator` to Python ML Pipelines API, which is a simple wrapper of the Scala implementation. oefirouz Author: Xiangrui Meng <meng@databricks.com> Closes apache#5885 from mengxr/SPARK-7333 and squashes the following commits: 25d7451 [Xiangrui Meng] fix tests in python 3 babdde7 [Xiangrui Meng] fix doc cb51e6a [Xiangrui Meng] add BinaryClassificationEvaluator in PySpark

mengxr added 2 commits May 4, 2015 01:03

add BinaryClassificationEvaluator in PySpark

cb51e6a

fix doc

babdde7

oefirouz reviewed May 4, 2015
View reviewed changes

fix tests in python 3

25d7451

mengxr reviewed May 5, 2015
View reviewed changes

jkbradley reviewed May 5, 2015
View reviewed changes

asfgit closed this in ee374e8 May 5, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-7333][MLLIB] Add BinaryClassificationEvaluator to PySpark #5885

[SPARK-7333][MLLIB] Add BinaryClassificationEvaluator to PySpark #5885

mengxr commented May 4, 2015

SparkQA commented May 4, 2015

mengxr commented May 4, 2015

SparkQA commented May 4, 2015

mengxr commented May 4, 2015

SparkQA commented May 4, 2015

mengxr commented May 4, 2015

SparkQA commented May 4, 2015

oefirouz May 4, 2015

mengxr May 5, 2015

oefirouz May 5, 2015

mengxr May 5, 2015

SparkQA commented May 5, 2015

mengxr commented May 5, 2015

SparkQA commented May 5, 2015

mengxr commented May 5, 2015

jkbradley May 5, 2015

mengxr May 5, 2015

jkbradley May 6, 2015

jkbradley commented May 5, 2015

SparkQA commented May 5, 2015

mengxr commented May 5, 2015

[SPARK-7333][MLLIB] Add BinaryClassificationEvaluator to PySpark #5885

[SPARK-7333][MLLIB] Add BinaryClassificationEvaluator to PySpark #5885

Conversation

mengxr commented May 4, 2015

SparkQA commented May 4, 2015

mengxr commented May 4, 2015

SparkQA commented May 4, 2015

mengxr commented May 4, 2015

SparkQA commented May 4, 2015

mengxr commented May 4, 2015

SparkQA commented May 4, 2015

oefirouz May 4, 2015

Choose a reason for hiding this comment

mengxr May 5, 2015

Choose a reason for hiding this comment

oefirouz May 5, 2015

Choose a reason for hiding this comment

mengxr May 5, 2015

Choose a reason for hiding this comment

SparkQA commented May 5, 2015

mengxr commented May 5, 2015

SparkQA commented May 5, 2015

mengxr commented May 5, 2015

jkbradley May 5, 2015

Choose a reason for hiding this comment

mengxr May 5, 2015

Choose a reason for hiding this comment

jkbradley May 6, 2015

Choose a reason for hiding this comment

jkbradley commented May 5, 2015

SparkQA commented May 5, 2015

mengxr commented May 5, 2015