-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronClassifier #8067
Conversation
Test build #40288 has finished for PR 8067 at commit
|
It looks like we hit SPARK-7379, I try to find some clue. |
Test build #40433 has finished for PR 8067 at commit
|
@yanboliang Could you describe the bug in the PR description? |
Jenkins, test this please. |
Test build #40953 has finished for PR 8067 at commit
|
Test build #40954 has finished for PR 8067 at commit
|
Jenkins, test this please. |
Test build #40958 has finished for PR 8067 at commit
|
@yanboliang Shall we split this PR into two? One makes |
""" | ||
Model fitted by MultilayerPerceptronClassifier. | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we also need to make layers
as a property function, but if we do that like weights
in the following lines, we will hit SPARK-7379. It can work well in Python 2 but raise errors in Python 3.
I propose to make the layers
of MultilayerPerceptronClassificationModel
as Vector
rather than Array[Int]
at Scala side. Because PySpark can tackle Vector
elegantly. And I found all other interfaces of ML use Vector
rather than Array
. @mengxr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add a package private method to Scala's MPCM that returns a Java list of integers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, done.
Test build #42198 has finished for PR 8067 at commit
|
Test build #42261 has finished for PR 8067 at commit
|
|
||
>>> from pyspark.sql import Row | ||
>>> from pyspark.mllib.linalg import Vectors | ||
>>> df = sc.parallelize([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sqlContext.createDataFrame([
Test build #42315 has finished for PR 8067 at commit
|
maxIter=100, tol=1e-4, seed=None, layers=None, blockSize=128): | ||
""" | ||
setParams(self, featuresCol="features", labelCol="label", predictionCol="prediction", \ | ||
maxIter=100, tol=1e-4, seed=None, layers=[1, 1], blockSize=128) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L882 we still keep layers=[1, 1]
in doc to tell users the default value.
LGTM. Merged into master. Thanks! |
Add Python API for
MultilayerPerceptronClassifier
.