-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-15643] [Doc] [ML] Update spark.ml and spark.mllib migration guide from 1.6 to 2.0 #13378
Conversation
Test build #59561 has finished for PR 13378 at commit
|
`spark.ml.regression.LinearRegressionModel`, the `weights` field has been deprecated in favor of | ||
the new name `coefficients`. This helps disambiguate from instance (row) "weights" given to | ||
algorithms. | ||
* [SPARK-14984](https://issues.apache.org/jira/browse/SPARK-14984): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yanboliang there are breaking changes for removing some deprecated methods in https://issues.apache.org/jira/browse/SPARK-14089 and https://issues.apache.org/jira/browse/SPARK-14952 that we should highlight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I'm happy to just do that in a follow up PR once I've made a final pass through for MiMa changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points. I forgot to record all removed deprecated methods. It's great that you can do that in a follow up PR. Thanks!
Test build #59611 has finished for PR 13378 at commit
|
How do we want to handle the new vectors (i.e. |
@@ -102,32 +102,54 @@ MLlib is under active development. | |||
The APIs marked `Experimental`/`DeveloperApi` may change in future releases, | |||
and the migration guide below will explain all changes between releases. | |||
|
|||
## From 1.5 to 1.6 | |||
## From 1.6 to 2.0 | |||
|
|||
There are no breaking API changes in the `spark.mllib` or `spark.ml` packages, but there are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the case for this release
There are also some changes from these 2 JIRAs/PRs which should be noted here:
For linear algebra, we should definitely discuss the change in the migration guide. @mengxr is also thinking about whether we can add a little functionality to make that transition easier. Documenting/improving this could happen in a follow-up PR. |
Test build #59734 has finished for PR 13378 at commit
|
@MLnick Would you like to update corresponding migration docs for changes in SPARK-14810 in a follow up PR? I saw you left comments to do that. If not, please let me know. |
I'm happy to do the breaking changes in a separate PR (I still need to do a final pass through of those to confirm I've caught them all). |
* [SPARK-13600](https://issues.apache.org/jira/browse/SPARK-13600): | ||
`QuantileDiscretizer` now uses `spark.sql.DataFrameStatFunctions.approxQuantile` to find splits (previously used custom sampling logic). | ||
The output buckets will differ for same input data and params. | ||
* [SPARK-14814](https://issues.apache.org/jira/browse/SPARK-14814): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed that this is a breaking API change, not a change of behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just added it to the list in SPARK-14810. We can either remove it here from this PR and I will include it when I do the one for breaking changes, or add it to a breaking changes section in this PR, which I will update with the others later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it in this PR. @MLnick Please add it in your follow up PR. Thanks!
Separating the work SGTM too. |
Test build #59809 has finished for PR 13378 at commit
|
@yanboliang how is this coming along? I have a PR ready for the breaking changes. I can either do that separately or push a PR to your branch. We need to update this PR with a few items mentioned in the JIRA by @jkbradley & @mengxr. |
@MLnick What about merging this PR firstly and then sending your PR for breaking changes separately? If this is OK, please go ahead to get it in. Thanks! |
@yanboliang I'm happy with that - we need to merge this one first so I can slot my changes in format-wise. Could you update for the new deprecations in the JIRA (https://issues.apache.org/jira/browse/SPARK-15643?focusedCommentId=15343059&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15343059)? Also the vector conversion (https://issues.apache.org/jira/browse/SPARK-15643?focusedCommentId=15334729&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15334729) |
@yanboliang I opened #13924 with my changes. If you prefer, I can incorporate the part about vector conversions into my section on the new linalg classes (since it perhaps fits best there?). |
Test build #61305 has finished for PR 13378 at commit
|
@@ -121,6 +121,9 @@ Deprecations: | |||
We encourage users to use `spark.ml.regression.LinearRegresson` and `spark.ml.classification.LogisticRegresson`. | |||
* [SPARK-14900](https://issues.apache.org/jira/browse/SPARK-14900): | |||
In `spark.mllib.evaluation.MulticlassMetrics`, the parameters `precision`, `recall` and `fMeasure` have been deprecated in favor of `accuracy`. | |||
* [SPARK-15644](https://issues.apache.org/jira/browse/SPARK-15644): | |||
In `spark.ml.util.BaseReadWrite`, the `context` method has been deprecated in favor of `session`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please list this as MLReader and MLWriter instead of BaseReadWrite? Those are the public APIs.
Other than that, this looks good. |
Test build #61364 has finished for PR 13378 at commit
|
LGTM |
…e from 1.6 to 2.0 ## What changes were proposed in this pull request? Update ```spark.ml``` and ```spark.mllib``` migration guide from 1.6 to 2.0. ## How was this patch tested? Docs update, no tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #13378 from yanboliang/spark-13448. (cherry picked from commit 26252f7) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
What changes were proposed in this pull request?
Update
spark.ml
andspark.mllib
migration guide from 1.6 to 2.0.How was this patch tested?
Docs update, no tests.