Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier #8067

Closed
wants to merge 7 commits into from

Conversation

yanboliang
Copy link
Contributor

Add Python API for MultilayerPerceptronClassifier.

@SparkQA
Copy link

SparkQA commented Aug 10, 2015

Test build #40288 has finished for PR 8067 at commit 70d1da9.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,
    • class MultilayerPerceptronClassifierModel(JavaModel):

@yanboliang
Copy link
Contributor Author

It looks like we hit SPARK-7379, I try to find some clue.

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40433 has finished for PR 8067 at commit 8510817.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,
    • class MultilayerPerceptronClassifierModel(JavaModel):

@mengxr
Copy link
Contributor

mengxr commented Aug 11, 2015

@yanboliang Could you describe the bug in the PR description?

@yanboliang
Copy link
Contributor Author

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Aug 15, 2015

Test build #40953 has finished for PR 8067 at commit 8c94570.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,
    • class MultilayerPerceptronClassificationModel(JavaModel):
    • case class FilterNode(condition: Expression, child: LocalNode) extends UnaryLocalNode
    • abstract class LocalNode extends TreeNode[LocalNode]
    • abstract class LeafLocalNode extends LocalNode
    • abstract class UnaryLocalNode extends LocalNode
    • case class ProjectNode(projectList: Seq[NamedExpression], child: LocalNode) extends UnaryLocalNode
    • case class SeqScanNode(output: Seq[Attribute], data: Seq[InternalRow]) extends LeafLocalNode
    • public final class UTF8String implements Comparable<UTF8String>, Externalizable

@SparkQA
Copy link

SparkQA commented Aug 15, 2015

Test build #40954 has finished for PR 8067 at commit 8c94570.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StringIndexerModel (
    • class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,
    • class MultilayerPerceptronClassificationModel(JavaModel):
    • case class FilterNode(condition: Expression, child: LocalNode) extends UnaryLocalNode
    • abstract class LocalNode extends TreeNode[LocalNode]
    • abstract class LeafLocalNode extends LocalNode
    • abstract class UnaryLocalNode extends LocalNode
    • case class ProjectNode(projectList: Seq[NamedExpression], child: LocalNode) extends UnaryLocalNode
    • case class SeqScanNode(output: Seq[Attribute], data: Seq[InternalRow]) extends LeafLocalNode
    • public final class UTF8String implements Comparable<UTF8String>, Externalizable

@yanboliang
Copy link
Contributor Author

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Aug 15, 2015

Test build #40958 has finished for PR 8067 at commit 8c94570.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,
    • class MultilayerPerceptronClassificationModel(JavaModel):
    • case class FilterNode(condition: Expression, child: LocalNode) extends UnaryLocalNode
    • abstract class LocalNode extends TreeNode[LocalNode]
    • abstract class LeafLocalNode extends LocalNode
    • abstract class UnaryLocalNode extends LocalNode
    • case class ProjectNode(projectList: Seq[NamedExpression], child: LocalNode) extends UnaryLocalNode
    • case class SeqScanNode(output: Seq[Attribute], data: Seq[InternalRow]) extends LeafLocalNode
    • public final class UTF8String implements Comparable<UTF8String>, Externalizable

@yanboliang yanboliang changed the title [SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier and fix bug [SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier and make "layers" public Aug 16, 2015
@yanboliang yanboliang changed the title [SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier and make "layers" public [SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier and make "layers" and "weights" public Aug 16, 2015
@mengxr
Copy link
Contributor

mengxr commented Aug 18, 2015

@yanboliang Shall we split this PR into two? One makes layers and weights public, which can be merged into 1.5, and the other for the Python API, which will be merged into 1.6.

@yanboliang
Copy link
Contributor Author

@mengxr Sure, I have submitted #8263 to make layers and weights public, and let this one focus on Python API.

@yanboliang yanboliang changed the title [SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier and make "layers" and "weights" public [SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier Aug 18, 2015
"""
Model fitted by MultilayerPerceptronClassifier.
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we also need to make layers as a property function, but if we do that like weights in the following lines, we will hit SPARK-7379. It can work well in Python 2 but raise errors in Python 3.
I propose to make the layers of MultilayerPerceptronClassificationModel as Vector rather than Array[Int] at Scala side. Because PySpark can tackle Vector elegantly. And I found all other interfaces of ML use Vector rather than Array. @mengxr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a package private method to Scala's MPCM that returns a Java list of integers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, done.

@SparkQA
Copy link

SparkQA commented Sep 9, 2015

Test build #42198 has finished for PR 8067 at commit db3c676.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,
    • class MultilayerPerceptronClassificationModel(JavaModel):

@SparkQA
Copy link

SparkQA commented Sep 10, 2015

Test build #42261 has finished for PR 8067 at commit b093862.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,
    • class MultilayerPerceptronClassificationModel(JavaModel):


>>> from pyspark.sql import Row
>>> from pyspark.mllib.linalg import Vectors
>>> df = sc.parallelize([
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sqlContext.createDataFrame([

@SparkQA
Copy link

SparkQA commented Sep 11, 2015

Test build #42315 has finished for PR 8067 at commit 5ac6a70.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol,
    • class MultilayerPerceptronClassificationModel(JavaModel):

maxIter=100, tol=1e-4, seed=None, layers=None, blockSize=128):
"""
setParams(self, featuresCol="features", labelCol="label", predictionCol="prediction", \
maxIter=100, tol=1e-4, seed=None, layers=[1, 1], blockSize=128)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L882 we still keep layers=[1, 1] in doc to tell users the default value.

@mengxr
Copy link
Contributor

mengxr commented Sep 11, 2015

LGTM. Merged into master. Thanks!

@asfgit asfgit closed this in b01b262 Sep 11, 2015
@yanboliang yanboliang deleted the SPARK-9773 branch May 5, 2016 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants