[SPARK-15643] [Doc] [ML] Update spark.ml and spark.mllib migration guide from 1.6 to 2.0 #13378

yanboliang · 2016-05-28T13:40:41Z

What changes were proposed in this pull request?

Update spark.ml and spark.mllib migration guide from 1.6 to 2.0.

How was this patch tested?

Docs update, no tests.

SparkQA · 2016-05-28T13:48:57Z

Test build #59561 has finished for PR 13378 at commit fb610d2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2016-05-28T13:52:08Z

cc @jkbradley @mengxr

MLnick · 2016-05-29T12:44:48Z

docs/mllib-guide.md

- `spark.ml.regression.LinearRegressionModel`, the `weights` field has been deprecated in favor of
- the new name `coefficients`.  This helps disambiguate from instance (row) "weights" given to
- algorithms.
+* [SPARK-14984](https://issues.apache.org/jira/browse/SPARK-14984):


@yanboliang there are breaking changes for removing some deprecated methods in https://issues.apache.org/jira/browse/SPARK-14089 and https://issues.apache.org/jira/browse/SPARK-14952 that we should highlight.

Though I'm happy to just do that in a follow up PR once I've made a final pass through for MiMa changes.

Good points. I forgot to record all removed deprecated methods. It's great that you can do that in a follow up PR. Thanks!

SparkQA · 2016-05-30T08:54:04Z

Test build #59611 has finished for PR 13378 at commit 260f3a3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MLnick · 2016-05-31T22:05:19Z

How do we want to handle the new vectors (i.e. ml APIs / VectorUDT only works for ml.linalg Vectors and not for old mllib.linalg Vectors)? A note about that should probably go into the migration guide (or elsewhere).

jkbradley · 2016-06-01T01:05:02Z

docs/mllib-guide.md

@@ -102,32 +102,54 @@ MLlib is under active development.
 The APIs marked `Experimental`/`DeveloperApi` may change in future releases,
 and the migration guide below will explain all changes between releases.

-## From 1.5 to 1.6
+## From 1.6 to 2.0

 There are no breaking API changes in the `spark.mllib` or `spark.ml` packages, but there are


Not the case for this release

jkbradley · 2016-06-01T01:12:11Z

There are also some changes from these 2 JIRAs/PRs which should be noted here:

[https://issues.apache.org/jira/browse/SPARK-14810]
[https://issues.apache.org/jira/browse/SPARK-14814]

For linear algebra, we should definitely discuss the change in the migration guide. @mengxr is also thinking about whether we can add a little functionality to make that transition easier. Documenting/improving this could happen in a follow-up PR.

SparkQA · 2016-06-01T10:27:59Z

Test build #59734 has finished for PR 13378 at commit 235930d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2016-06-01T10:35:58Z

@MLnick Would you like to update corresponding migration docs for changes in SPARK-14810 in a follow up PR? I saw you left comments to do that. If not, please let me know.
For linear algebra, we can document them after we have final decision. It's also better we can have a converter that scans a DataFrame and update its schema to use new vectors. Otherwise, the previously stored DataFrame or MLlib models will be loaded incorrectly in Spark 2.0.
@jkbradley @MLnick Let's focus on deprecations and changes of behavior for this PR and get it in firstly. We can left the JIRA open for follow up work.

MLnick · 2016-06-01T17:47:57Z

I'm happy to do the breaking changes in a separate PR (I still need to do a final pass through of those to confirm I've caught them all).

jkbradley · 2016-06-01T17:50:00Z

docs/mllib-guide.md

+* [SPARK-13600](https://issues.apache.org/jira/browse/SPARK-13600):
+ `QuantileDiscretizer` now uses `spark.sql.DataFrameStatFunctions.approxQuantile` to find splits (previously used custom sampling logic).
+ The output buckets will differ for same input data and params.
+* [SPARK-14814](https://issues.apache.org/jira/browse/SPARK-14814):


Just noticed that this is a breaking API change, not a change of behavior.

I've just added it to the list in SPARK-14810. We can either remove it here from this PR and I will include it when I do the one for breaking changes, or add it to a breaking changes section in this PR, which I will update with the others later.

I removed it in this PR. @MLnick Please add it in your follow up PR. Thanks!

jkbradley · 2016-06-01T17:50:20Z

Separating the work SGTM too.

SparkQA · 2016-06-02T03:50:58Z

Test build #59809 has finished for PR 13378 at commit 2339200.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MLnick · 2016-06-27T12:44:37Z

@yanboliang how is this coming along? I have a PR ready for the breaking changes. I can either do that separately or push a PR to your branch.

We need to update this PR with a few items mentioned in the JIRA by @jkbradley & @mengxr.

yanboliang · 2016-06-27T12:51:26Z

@MLnick What about merging this PR firstly and then sending your PR for breaking changes separately? If this is OK, please go ahead to get it in. Thanks!

MLnick · 2016-06-27T12:54:43Z

@yanboliang I'm happy with that - we need to merge this one first so I can slot my changes in format-wise.

Could you update for the new deprecations in the JIRA (https://issues.apache.org/jira/browse/SPARK-15643?focusedCommentId=15343059&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15343059)? Also the vector conversion (https://issues.apache.org/jira/browse/SPARK-15643?focusedCommentId=15334729&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15334729)

MLnick · 2016-06-27T13:25:19Z

@yanboliang I opened #13924 with my changes. If you prefer, I can incorporate the part about vector conversions into my section on the new linalg classes (since it perhaps fits best there?).

yanboliang · 2016-06-27T14:22:21Z

@MLnick I have updated the two new deprecations in the JIRA in this PR. To the vector conversions issue, I think it fits more to add them in your section and please feel free to do that. Thanks!

SparkQA · 2016-06-27T14:42:06Z

Test build #61305 has finished for PR 13378 at commit d2666ac.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-06-27T19:28:05Z

docs/mllib-guide.md

@@ -121,6 +121,9 @@ Deprecations:
 We encourage users to use `spark.ml.regression.LinearRegresson` and `spark.ml.classification.LogisticRegresson`.
 * [SPARK-14900](https://issues.apache.org/jira/browse/SPARK-14900):
 In `spark.mllib.evaluation.MulticlassMetrics`, the parameters `precision`, `recall` and `fMeasure` have been deprecated in favor of `accuracy`.
+* [SPARK-15644](https://issues.apache.org/jira/browse/SPARK-15644):
+ In `spark.ml.util.BaseReadWrite`, the `context` method has been deprecated in favor of `session`.


Could you please list this as MLReader and MLWriter instead of BaseReadWrite? Those are the public APIs.

jkbradley · 2016-06-27T19:28:28Z

Other than that, this looks good.

SparkQA · 2016-06-28T07:16:54Z

Test build #61364 has finished for PR 13378 at commit 5472fb9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-06-28T18:53:20Z

LGTM
Merging with master and branch-2.0
Thank you!

…e from 1.6 to 2.0 ## What changes were proposed in this pull request? Update ```spark.ml``` and ```spark.mllib``` migration guide from 1.6 to 2.0. ## How was this patch tested? Docs update, no tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #13378 from yanboliang/spark-13448. (cherry picked from commit 26252f7) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

yanboliang added 2 commits May 28, 2016 05:43

Document MLlib deprecations and behavior changes in Spark 2.0

182414b

fix typos

fb610d2

MLnick reviewed May 29, 2016
View reviewed changes

update docs

260f3a3

jkbradley reviewed Jun 1, 2016
View reviewed changes

fix typos

235930d

jkbradley reviewed Jun 1, 2016
View reviewed changes

Remove SPARK-14814 which is a breaking change

2339200

MLnick mentioned this pull request Jun 27, 2016

[SPARK-15643][DOC][ML] Add breaking changes to ML migration guide #13924

Closed

Add two deprecations

d2666ac

jkbradley reviewed Jun 27, 2016
View reviewed changes

BaseReadWrite to MLReader/MLWriter

5472fb9

asfgit closed this in 26252f7 Jun 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-15643] [Doc] [ML] Update spark.ml and spark.mllib migration guide from 1.6 to 2.0 #13378

[SPARK-15643] [Doc] [ML] Update spark.ml and spark.mllib migration guide from 1.6 to 2.0 #13378

yanboliang commented May 28, 2016

SparkQA commented May 28, 2016

yanboliang commented May 28, 2016

MLnick May 29, 2016

MLnick May 29, 2016

yanboliang May 30, 2016

SparkQA commented May 30, 2016

MLnick commented May 31, 2016

jkbradley Jun 1, 2016

jkbradley commented Jun 1, 2016

SparkQA commented Jun 1, 2016

yanboliang commented Jun 1, 2016 •

edited

Loading

MLnick commented Jun 1, 2016

jkbradley Jun 1, 2016

MLnick Jun 1, 2016 •

edited

Loading

yanboliang Jun 2, 2016

jkbradley commented Jun 1, 2016

SparkQA commented Jun 2, 2016

MLnick commented Jun 27, 2016

yanboliang commented Jun 27, 2016

MLnick commented Jun 27, 2016

MLnick commented Jun 27, 2016

yanboliang commented Jun 27, 2016 •

edited

Loading

SparkQA commented Jun 27, 2016

jkbradley Jun 27, 2016

jkbradley commented Jun 27, 2016

SparkQA commented Jun 28, 2016

jkbradley commented Jun 28, 2016 •

edited

Loading

[SPARK-15643] [Doc] [ML] Update spark.ml and spark.mllib migration guide from 1.6 to 2.0 #13378

[SPARK-15643] [Doc] [ML] Update spark.ml and spark.mllib migration guide from 1.6 to 2.0 #13378

Conversation

yanboliang commented May 28, 2016

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented May 28, 2016

yanboliang commented May 28, 2016

MLnick May 29, 2016

Choose a reason for hiding this comment

MLnick May 29, 2016

Choose a reason for hiding this comment

yanboliang May 30, 2016

Choose a reason for hiding this comment

SparkQA commented May 30, 2016

MLnick commented May 31, 2016

jkbradley Jun 1, 2016

Choose a reason for hiding this comment

jkbradley commented Jun 1, 2016

SparkQA commented Jun 1, 2016

yanboliang commented Jun 1, 2016 • edited Loading

MLnick commented Jun 1, 2016

jkbradley Jun 1, 2016

Choose a reason for hiding this comment

MLnick Jun 1, 2016 • edited Loading

Choose a reason for hiding this comment

yanboliang Jun 2, 2016

Choose a reason for hiding this comment

jkbradley commented Jun 1, 2016

SparkQA commented Jun 2, 2016

MLnick commented Jun 27, 2016

yanboliang commented Jun 27, 2016

MLnick commented Jun 27, 2016

MLnick commented Jun 27, 2016

yanboliang commented Jun 27, 2016 • edited Loading

SparkQA commented Jun 27, 2016

jkbradley Jun 27, 2016

Choose a reason for hiding this comment

jkbradley commented Jun 27, 2016

SparkQA commented Jun 28, 2016

jkbradley commented Jun 28, 2016 • edited Loading

yanboliang commented Jun 1, 2016 •

edited

Loading

MLnick Jun 1, 2016 •

edited

Loading

yanboliang commented Jun 27, 2016 •

edited

Loading

jkbradley commented Jun 28, 2016 •

edited

Loading