Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-9671] [MLLIB] re-org user guide and add migration guide #8498

Closed
wants to merge 2 commits into from

Conversation

mengxr
Copy link
Contributor

@mengxr mengxr commented Aug 28, 2015

This PR updates the MLlib user guide and adds migration guide for 1.4->1.5.

  • merge migration guide for spark.mllib and spark.ml packages
  • remove dependency section from spark.ml guide
  • move the paragraph about spark.mllib and spark.ml to the top and recommend spark.ml
  • move Sam's talk to footnote to make the section focus on dependencies

Minor changes to code examples and other wording will be in a separate PR.

@jkbradley @srowen @feynmanliang

@@ -7,6 +7,25 @@ description: MLlib migration guides from before Spark SPARK_VERSION_SHORT

The migration guide for the current Spark version is kept on the [MLlib Programming Guide main page](mllib-guide.html#migration-guide).

## From 1.3 to 1.4
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No content change here. Just moved the paragraphs from mllib-guide and ml-guide.

@SparkQA
Copy link

SparkQA commented Aug 28, 2015

Test build #41734 has finished for PR 8498 at commit 2790270.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • * *(Breaking change)* Theapplyandcopymethods for the case class [BoostingStrategy](api/scala/index.html#org.apache.spark.mllib.tree.configuration.BoostingStrategy) have been changed because of a modification to the case class fields. This could be an issue for users who useBoostingStrategyto set GBT parameters.
    • * *(Breaking change)* The return value of [LDA.run](api/scala/index.html#org.apache.spark.mllib.clustering.LDA) has changed. It now returns an abstract classLDAModelinstead of the concrete classDistributedLDAModel. The object of typeLDAModelcan still be cast to the appropriate concrete type, which depends on the optimization algorithm.
    • * ThescoreColoutput column (with default value "score") was renamed to beprobabilityCol(with default value "probability"). The type was originallyDouble(for the probability of class 1.0), but it is nowVector(for the probability of each class, to support multiclass classification in the future).

See the [Algorithm Guides section](#algorithm-guides) below for guides on sub-packages of `spark.ml`, including feature transformers unique to the Pipelines API, ensembles, and more.

The `spark.ml` package aims to provide a uniform set of high-level APIs that help users create and
tune practical machine learning pipelines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is already mentioned in mllib-guide but should we mention DataFrames here in ml-guide's introduction since it is the key distinction in comparison to spark.ml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

@SparkQA
Copy link

SparkQA commented Aug 28, 2015

Test build #41757 has finished for PR 8498 at commit f8efdcc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • * *(Breaking change)* Theapplyandcopymethods for the case class [BoostingStrategy](api/scala/index.html#org.apache.spark.mllib.tree.configuration.BoostingStrategy) have been changed because of a modification to the case class fields. This could be an issue for users who useBoostingStrategyto set GBT parameters.
    • * *(Breaking change)* The return value of [LDA.run](api/scala/index.html#org.apache.spark.mllib.clustering.LDA) has changed. It now returns an abstract classLDAModelinstead of the concrete classDistributedLDAModel. The object of typeLDAModelcan still be cast to the appropriate concrete type, which depends on the optimization algorithm.
    • * ThescoreColoutput column (with default value "score") was renamed to beprobabilityCol(with default value "probability"). The type was originallyDouble(for the probability of class 1.0), but it is nowVector(for the probability of each class, to support multiclass classification in the future).

@feynmanliang
Copy link
Contributor

LGTM, the ml-ann.md filename is inconsistent with all referencing text (which refers to it as MLP) but that's unrelated to this PR

@mengxr
Copy link
Contributor Author

mengxr commented Aug 28, 2015

Merged into master and branch-1.5.

@asfgit asfgit closed this in 88032ec Aug 28, 2015
asfgit pushed a commit that referenced this pull request Aug 28, 2015
This PR updates the MLlib user guide and adds migration guide for 1.4->1.5.

* merge migration guide for `spark.mllib` and `spark.ml` packages
* remove dependency section from `spark.ml` guide
* move the paragraph about `spark.mllib` and `spark.ml` to the top and recommend `spark.ml`
* move Sam's talk to footnote to make the section focus on dependencies

Minor changes to code examples and other wording will be in a separate PR.

jkbradley srowen feynmanliang

Author: Xiangrui Meng <meng@databricks.com>

Closes #8498 from mengxr/SPARK-9671.

(cherry picked from commit 88032ec)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants