[SPARK-9671] [MLLIB] re-org user guide and add migration guide #8498

mengxr · 2015-08-28T07:30:26Z

This PR updates the MLlib user guide and adds migration guide for 1.4->1.5.

merge migration guide for spark.mllib and spark.ml packages
remove dependency section from spark.ml guide
move the paragraph about spark.mllib and spark.ml to the top and recommend spark.ml
move Sam's talk to footnote to make the section focus on dependencies

Minor changes to code examples and other wording will be in a separate PR.

mengxr · 2015-08-28T07:34:31Z

docs/mllib-migration-guides.md

@@ -7,6 +7,25 @@ description: MLlib migration guides from before Spark SPARK_VERSION_SHORT

 The migration guide for the current Spark version is kept on the [MLlib Programming Guide main page](mllib-guide.html#migration-guide).

+## From 1.3 to 1.4


No content change here. Just moved the paragraphs from mllib-guide and ml-guide.

SparkQA · 2015-08-28T10:44:22Z

Test build #41734 has finished for PR 8498 at commit 2790270.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- * *(Breaking change)* Theapplyandcopymethods for the case class [BoostingStrategy](api/scala/index.html#org.apache.spark.mllib.tree.configuration.BoostingStrategy) have been changed because of a modification to the case class fields. This could be an issue for users who useBoostingStrategyto set GBT parameters.
- * *(Breaking change)* The return value of [LDA.run](api/scala/index.html#org.apache.spark.mllib.clustering.LDA) has changed. It now returns an abstract classLDAModelinstead of the concrete classDistributedLDAModel. The object of typeLDAModelcan still be cast to the appropriate concrete type, which depends on the optimization algorithm.
- * ThescoreColoutput column (with default value "score") was renamed to beprobabilityCol(with default value "probability"). The type was originallyDouble(for the probability of class 1.0), but it is nowVector(for the probability of each class, to support multiclass classification in the future).

feynmanliang · 2015-08-28T18:56:28Z

docs/ml-guide.md

-See the [Algorithm Guides section](#algorithm-guides) below for guides on sub-packages of `spark.ml`, including feature transformers unique to the Pipelines API, ensembles, and more.
-
+The `spark.ml` package aims to provide a uniform set of high-level APIs that help users create and
+tune practical machine learning pipelines.


I know it is already mentioned in mllib-guide but should we mention DataFrames here in ml-guide's introduction since it is the key distinction in comparison to spark.ml?

SparkQA · 2015-08-28T20:18:44Z

Test build #41757 has finished for PR 8498 at commit f8efdcc.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- * *(Breaking change)* Theapplyandcopymethods for the case class [BoostingStrategy](api/scala/index.html#org.apache.spark.mllib.tree.configuration.BoostingStrategy) have been changed because of a modification to the case class fields. This could be an issue for users who useBoostingStrategyto set GBT parameters.
- * *(Breaking change)* The return value of [LDA.run](api/scala/index.html#org.apache.spark.mllib.clustering.LDA) has changed. It now returns an abstract classLDAModelinstead of the concrete classDistributedLDAModel. The object of typeLDAModelcan still be cast to the appropriate concrete type, which depends on the optimization algorithm.
- * ThescoreColoutput column (with default value "score") was renamed to beprobabilityCol(with default value "probability"). The type was originallyDouble(for the probability of class 1.0), but it is nowVector(for the probability of each class, to support multiclass classification in the future).

feynmanliang · 2015-08-28T20:47:55Z

LGTM, the ml-ann.md filename is inconsistent with all referencing text (which refers to it as MLP) but that's unrelated to this PR

mengxr · 2015-08-28T20:54:15Z

Merged into master and branch-1.5.

This PR updates the MLlib user guide and adds migration guide for 1.4->1.5. * merge migration guide for `spark.mllib` and `spark.ml` packages * remove dependency section from `spark.ml` guide * move the paragraph about `spark.mllib` and `spark.ml` to the top and recommend `spark.ml` * move Sam's talk to footnote to make the section focus on dependencies Minor changes to code examples and other wording will be in a separate PR. jkbradley srowen feynmanliang Author: Xiangrui Meng <meng@databricks.com> Closes #8498 from mengxr/SPARK-9671. (cherry picked from commit 88032ec) Signed-off-by: Xiangrui Meng <meng@databricks.com>

re-org user guide and add migration guide

2790270

mengxr reviewed Aug 28, 2015
View reviewed changes

feynmanliang reviewed Aug 28, 2015
View reviewed changes

address comments

f8efdcc

asfgit closed this in 88032ec Aug 28, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-9671] [MLLIB] re-org user guide and add migration guide #8498

[SPARK-9671] [MLLIB] re-org user guide and add migration guide #8498

mengxr commented Aug 28, 2015

mengxr Aug 28, 2015

SparkQA commented Aug 28, 2015

feynmanliang Aug 28, 2015

mengxr Aug 28, 2015

SparkQA commented Aug 28, 2015

feynmanliang commented Aug 28, 2015

mengxr commented Aug 28, 2015

		@@ -7,6 +7,25 @@ description: MLlib migration guides from before Spark SPARK_VERSION_SHORT

		The migration guide for the current Spark version is kept on the [MLlib Programming Guide main page](mllib-guide.html#migration-guide).

		## From 1.3 to 1.4

[SPARK-9671] [MLLIB] re-org user guide and add migration guide #8498

[SPARK-9671] [MLLIB] re-org user guide and add migration guide #8498

Conversation

mengxr commented Aug 28, 2015

mengxr Aug 28, 2015

Choose a reason for hiding this comment

SparkQA commented Aug 28, 2015

feynmanliang Aug 28, 2015

Choose a reason for hiding this comment

mengxr Aug 28, 2015

Choose a reason for hiding this comment

SparkQA commented Aug 28, 2015

feynmanliang commented Aug 28, 2015

mengxr commented Aug 28, 2015