[SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier #8067

yanboliang · 2015-08-10T11:19:03Z

Add Python API for MultilayerPerceptronClassifier.

SparkQA · 2015-08-10T17:35:56Z

Test build #40288 has finished for PR 8067 at commit 70d1da9.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,
- class MultilayerPerceptronClassifierModel(JavaModel):

yanboliang · 2015-08-11T04:41:38Z

It looks like we hit SPARK-7379, I try to find some clue.

SparkQA · 2015-08-11T09:41:07Z

Test build #40433 has finished for PR 8067 at commit 8510817.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,
- class MultilayerPerceptronClassifierModel(JavaModel):

mengxr · 2015-08-11T17:17:08Z

@yanboliang Could you describe the bug in the PR description?

…ssificationModel

yanboliang · 2015-08-15T10:00:29Z

Jenkins, test this please.

SparkQA · 2015-08-15T10:28:32Z

Test build #40953 has finished for PR 8067 at commit 8c94570.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,
- class MultilayerPerceptronClassificationModel(JavaModel):
- case class FilterNode(condition: Expression, child: LocalNode) extends UnaryLocalNode
- abstract class LocalNode extends TreeNode[LocalNode]
- abstract class LeafLocalNode extends LocalNode
- abstract class UnaryLocalNode extends LocalNode
- case class ProjectNode(projectList: Seq[NamedExpression], child: LocalNode) extends UnaryLocalNode
- case class SeqScanNode(output: Seq[Attribute], data: Seq[InternalRow]) extends LeafLocalNode
- public final class UTF8String implements Comparable<UTF8String>, Externalizable

SparkQA · 2015-08-15T10:39:56Z

Test build #40954 has finished for PR 8067 at commit 8c94570.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class StringIndexerModel (
- class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,
- class MultilayerPerceptronClassificationModel(JavaModel):
- case class FilterNode(condition: Expression, child: LocalNode) extends UnaryLocalNode
- abstract class LocalNode extends TreeNode[LocalNode]
- abstract class LeafLocalNode extends LocalNode
- abstract class UnaryLocalNode extends LocalNode
- case class ProjectNode(projectList: Seq[NamedExpression], child: LocalNode) extends UnaryLocalNode
- case class SeqScanNode(output: Seq[Attribute], data: Seq[InternalRow]) extends LeafLocalNode
- public final class UTF8String implements Comparable<UTF8String>, Externalizable

yanboliang · 2015-08-15T13:11:14Z

Jenkins, test this please.

SparkQA · 2015-08-15T14:00:02Z

Test build #40958 has finished for PR 8067 at commit 8c94570.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,
- class MultilayerPerceptronClassificationModel(JavaModel):
- case class FilterNode(condition: Expression, child: LocalNode) extends UnaryLocalNode
- abstract class LocalNode extends TreeNode[LocalNode]
- abstract class LeafLocalNode extends LocalNode
- abstract class UnaryLocalNode extends LocalNode
- case class ProjectNode(projectList: Seq[NamedExpression], child: LocalNode) extends UnaryLocalNode
- case class SeqScanNode(output: Seq[Attribute], data: Seq[InternalRow]) extends LeafLocalNode
- public final class UTF8String implements Comparable<UTF8String>, Externalizable

mengxr · 2015-08-18T01:10:54Z

@yanboliang Shall we split this PR into two? One makes layers and weights public, which can be merged into 1.5, and the other for the Python API, which will be merged into 1.6.

yanboliang · 2015-08-18T01:55:46Z

@mengxr Sure, I have submitted #8263 to make layers and weights public, and let this one focus on Python API.

yanboliang · 2015-09-09T09:18:38Z

python/pyspark/ml/classification.py

+    """
+    Model fitted by MultilayerPerceptronClassifier.
+    """
+


Here we also need to make layers as a property function, but if we do that like weights in the following lines, we will hit SPARK-7379. It can work well in Python 2 but raise errors in Python 3.
I propose to make the layers of MultilayerPerceptronClassificationModel as Vector rather than Array[Int] at Scala side. Because PySpark can tackle Vector elegantly. And I found all other interfaces of ML use Vector rather than Array. @mengxr

Shall we add a package private method to Scala's MPCM that returns a Java list of integers?

Agree, done.

SparkQA · 2015-09-09T09:31:46Z

Test build #42198 has finished for PR 8067 at commit db3c676.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,
- class MultilayerPerceptronClassificationModel(JavaModel):

SparkQA · 2015-09-10T10:37:20Z

Test build #42261 has finished for PR 8067 at commit b093862.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,
- class MultilayerPerceptronClassificationModel(JavaModel):

mengxr · 2015-09-11T03:46:09Z

python/pyspark/ml/classification.py

+
+    >>> from pyspark.sql import Row
+    >>> from pyspark.mllib.linalg import Vectors
+    >>> df = sc.parallelize([


sqlContext.createDataFrame([

SparkQA · 2015-09-11T08:16:14Z

Test build #42315 has finished for PR 8067 at commit 5ac6a70.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,
- class MultilayerPerceptronClassificationModel(JavaModel):

yanboliang · 2015-09-11T13:31:54Z

python/pyspark/ml/classification.py

+                  maxIter=100, tol=1e-4, seed=None, layers=None, blockSize=128):
+        """
+        setParams(self, featuresCol="features", labelCol="label", predictionCol="prediction", \
+                  maxIter=100, tol=1e-4, seed=None, layers=[1, 1], blockSize=128)


L882 we still keep layers=[1, 1] in doc to tell users the default value.

mengxr · 2015-09-11T15:52:54Z

LGTM. Merged into master. Thanks!

Add Python API for MultilayerPerceptronClassifier and fix bug

70d1da9

workaround for python 2&3 compatibility

8510817

yanboliang added 2 commits August 15, 2015 17:39

fix merge conflicts

abec976

Rename MultilayerPerceptronClassifierModel to MultilayerPerceptronCla…

8c94570

…ssificationModel

yanboliang changed the title ~~[SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier and fix bug~~ [SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier and make "layers" public Aug 16, 2015

yanboliang changed the title ~~[SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier and make "layers" public~~ [SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier and make "layers" and "weights" public Aug 16, 2015

yanboliang changed the title ~~[SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier and make "layers" and "weights" public~~ [SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier Aug 18, 2015

merge to master

db3c676

yanboliang reviewed Sep 9, 2015
View reviewed changes

add Java-friendly method javaLayers

b093862

mengxr reviewed Sep 11, 2015
View reviewed changes

remove mutable default arguments and fix doc test

5ac6a70

yanboliang reviewed Sep 11, 2015
View reviewed changes

asfgit closed this in b01b262 Sep 11, 2015

yanboliang deleted the SPARK-9773 branch May 5, 2016 07:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier #8067

[SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier #8067

yanboliang commented Aug 10, 2015

SparkQA commented Aug 10, 2015

yanboliang commented Aug 11, 2015

SparkQA commented Aug 11, 2015

mengxr commented Aug 11, 2015

yanboliang commented Aug 15, 2015

SparkQA commented Aug 15, 2015

SparkQA commented Aug 15, 2015

yanboliang commented Aug 15, 2015

SparkQA commented Aug 15, 2015

mengxr commented Aug 18, 2015

yanboliang commented Aug 18, 2015

yanboliang Sep 9, 2015

mengxr Sep 9, 2015

yanboliang Sep 10, 2015

SparkQA commented Sep 9, 2015

SparkQA commented Sep 10, 2015

mengxr Sep 11, 2015

SparkQA commented Sep 11, 2015

yanboliang Sep 11, 2015

mengxr commented Sep 11, 2015

[SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier #8067

[SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier #8067

Conversation

yanboliang commented Aug 10, 2015

SparkQA commented Aug 10, 2015

yanboliang commented Aug 11, 2015

SparkQA commented Aug 11, 2015

mengxr commented Aug 11, 2015

yanboliang commented Aug 15, 2015

SparkQA commented Aug 15, 2015

SparkQA commented Aug 15, 2015

yanboliang commented Aug 15, 2015

SparkQA commented Aug 15, 2015

mengxr commented Aug 18, 2015

yanboliang commented Aug 18, 2015

yanboliang Sep 9, 2015

Choose a reason for hiding this comment

mengxr Sep 9, 2015

Choose a reason for hiding this comment

yanboliang Sep 10, 2015

Choose a reason for hiding this comment

SparkQA commented Sep 9, 2015

SparkQA commented Sep 10, 2015

mengxr Sep 11, 2015

Choose a reason for hiding this comment

SparkQA commented Sep 11, 2015

yanboliang Sep 11, 2015

Choose a reason for hiding this comment

mengxr commented Sep 11, 2015