Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-15643] [Doc] [ML] Update spark.ml and spark.mllib migration guide from 1.6 to 2.0 #13378

Closed
wants to merge 7 commits into from

Conversation

yanboliang
Copy link
Contributor

What changes were proposed in this pull request?

Update spark.ml and spark.mllib migration guide from 1.6 to 2.0.

How was this patch tested?

Docs update, no tests.

@SparkQA
Copy link

SparkQA commented May 28, 2016

Test build #59561 has finished for PR 13378 at commit fb610d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

cc @jkbradley @mengxr

`spark.ml.regression.LinearRegressionModel`, the `weights` field has been deprecated in favor of
the new name `coefficients`. This helps disambiguate from instance (row) "weights" given to
algorithms.
* [SPARK-14984](https://issues.apache.org/jira/browse/SPARK-14984):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yanboliang there are breaking changes for removing some deprecated methods in https://issues.apache.org/jira/browse/SPARK-14089 and https://issues.apache.org/jira/browse/SPARK-14952 that we should highlight.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I'm happy to just do that in a follow up PR once I've made a final pass through for MiMa changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points. I forgot to record all removed deprecated methods. It's great that you can do that in a follow up PR. Thanks!

@SparkQA
Copy link

SparkQA commented May 30, 2016

Test build #59611 has finished for PR 13378 at commit 260f3a3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MLnick
Copy link
Contributor

MLnick commented May 31, 2016

How do we want to handle the new vectors (i.e. ml APIs / VectorUDT only works for ml.linalg Vectors and not for old mllib.linalg Vectors)? A note about that should probably go into the migration guide (or elsewhere).

@@ -102,32 +102,54 @@ MLlib is under active development.
The APIs marked `Experimental`/`DeveloperApi` may change in future releases,
and the migration guide below will explain all changes between releases.

## From 1.5 to 1.6
## From 1.6 to 2.0

There are no breaking API changes in the `spark.mllib` or `spark.ml` packages, but there are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the case for this release

@jkbradley
Copy link
Member

There are also some changes from these 2 JIRAs/PRs which should be noted here:

For linear algebra, we should definitely discuss the change in the migration guide. @mengxr is also thinking about whether we can add a little functionality to make that transition easier. Documenting/improving this could happen in a follow-up PR.

@SparkQA
Copy link

SparkQA commented Jun 1, 2016

Test build #59734 has finished for PR 13378 at commit 235930d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

yanboliang commented Jun 1, 2016

@MLnick Would you like to update corresponding migration docs for changes in SPARK-14810 in a follow up PR? I saw you left comments to do that. If not, please let me know.
For linear algebra, we can document them after we have final decision. It's also better we can have a converter that scans a DataFrame and update its schema to use new vectors. Otherwise, the previously stored DataFrame or MLlib models will be loaded incorrectly in Spark 2.0.
@jkbradley @MLnick Let's focus on deprecations and changes of behavior for this PR and get it in firstly. We can left the JIRA open for follow up work.

@MLnick
Copy link
Contributor

MLnick commented Jun 1, 2016

I'm happy to do the breaking changes in a separate PR (I still need to do a final pass through of those to confirm I've caught them all).

* [SPARK-13600](https://issues.apache.org/jira/browse/SPARK-13600):
`QuantileDiscretizer` now uses `spark.sql.DataFrameStatFunctions.approxQuantile` to find splits (previously used custom sampling logic).
The output buckets will differ for same input data and params.
* [SPARK-14814](https://issues.apache.org/jira/browse/SPARK-14814):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed that this is a breaking API change, not a change of behavior.

Copy link
Contributor

@MLnick MLnick Jun 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just added it to the list in SPARK-14810. We can either remove it here from this PR and I will include it when I do the one for breaking changes, or add it to a breaking changes section in this PR, which I will update with the others later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it in this PR. @MLnick Please add it in your follow up PR. Thanks!

@jkbradley
Copy link
Member

Separating the work SGTM too.

@SparkQA
Copy link

SparkQA commented Jun 2, 2016

Test build #59809 has finished for PR 13378 at commit 2339200.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MLnick
Copy link
Contributor

MLnick commented Jun 27, 2016

@yanboliang how is this coming along? I have a PR ready for the breaking changes. I can either do that separately or push a PR to your branch.

We need to update this PR with a few items mentioned in the JIRA by @jkbradley & @mengxr.

@yanboliang
Copy link
Contributor Author

@MLnick What about merging this PR firstly and then sending your PR for breaking changes separately? If this is OK, please go ahead to get it in. Thanks!

@MLnick
Copy link
Contributor

MLnick commented Jun 27, 2016

@MLnick
Copy link
Contributor

MLnick commented Jun 27, 2016

@yanboliang I opened #13924 with my changes. If you prefer, I can incorporate the part about vector conversions into my section on the new linalg classes (since it perhaps fits best there?).

@yanboliang
Copy link
Contributor Author

yanboliang commented Jun 27, 2016

@MLnick I have updated the two new deprecations in the JIRA in this PR. To the vector conversions issue, I think it fits more to add them in your section and please feel free to do that. Thanks!

@SparkQA
Copy link

SparkQA commented Jun 27, 2016

Test build #61305 has finished for PR 13378 at commit d2666ac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -121,6 +121,9 @@ Deprecations:
We encourage users to use `spark.ml.regression.LinearRegresson` and `spark.ml.classification.LogisticRegresson`.
* [SPARK-14900](https://issues.apache.org/jira/browse/SPARK-14900):
In `spark.mllib.evaluation.MulticlassMetrics`, the parameters `precision`, `recall` and `fMeasure` have been deprecated in favor of `accuracy`.
* [SPARK-15644](https://issues.apache.org/jira/browse/SPARK-15644):
In `spark.ml.util.BaseReadWrite`, the `context` method has been deprecated in favor of `session`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please list this as MLReader and MLWriter instead of BaseReadWrite? Those are the public APIs.

@jkbradley
Copy link
Member

Other than that, this looks good.

@SparkQA
Copy link

SparkQA commented Jun 28, 2016

Test build #61364 has finished for PR 13378 at commit 5472fb9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

jkbradley commented Jun 28, 2016

LGTM
Merging with master and branch-2.0
Thank you!

asfgit pushed a commit that referenced this pull request Jun 28, 2016
…e from 1.6 to 2.0

## What changes were proposed in this pull request?
Update ```spark.ml``` and ```spark.mllib``` migration guide from 1.6 to 2.0.

## How was this patch tested?
Docs update, no tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #13378 from yanboliang/spark-13448.

(cherry picked from commit 26252f7)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
@asfgit asfgit closed this in 26252f7 Jun 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants