[SPARK-10348] [MLLIB] updates ml-guide #8517

mengxr · 2015-08-29T06:56:42Z

replace ML Dataset by DataFrame to unify the abstraction
ML algorithms -> pipeline components to describe the main concept
remove Scala API doc links from the main guide
Section Title -> Section tile to be consistent with other section titles in MLlib guide
modified lines break at 100 chars or periods

@jkbradley @feynmanliang

remove links to Scala API doc in the main guide change ML algorithms to pipeline components

SparkQA · 2015-08-29T07:18:24Z

Test build #41780 has finished for PR 8517 at commit 18d4122.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- public class JavaTrainValidationSplitExample

feynmanliang · 2015-08-29T19:46:27Z

docs/ml-guide.md


 Machine learning can be applied to a wide variety of data types, such as vectors, text, images, and structured data.
-Spark ML adopts the [`DataFrame`](api/scala/index.html#org.apache.spark.sql.DataFrame) from Spark SQL in order to support a variety of data types under a unified Dataset concept.
+Spark ML adopts the `DataFrame` from Spark SQL in order to support a variety of data types.


nit: spark.ml

I thought about this but didn't figure out a good solution. Using spark.ml everywhere is accurate but it makes the guide a little bit strange to read. Another solution is to define Spark ML precisely somewhere in the doc. Let me think about this and make a new PR if necessary.

feynmanliang · 2015-08-29T19:48:31Z

LGTM, made one minor comment

* replace `ML Dataset` by `DataFrame` to unify the abstraction * ML algorithms -> pipeline components to describe the main concept * remove Scala API doc links from the main guide * `Section Title` -> `Section tile` to be consistent with other section titles in MLlib guide * modified lines break at 100 chars or periods jkbradley feynmanliang Author: Xiangrui Meng <meng@databricks.com> Closes #8517 from mengxr/SPARK-10348. (cherry picked from commit 905fbe4) Signed-off-by: Xiangrui Meng <meng@databricks.com>

mengxr · 2015-08-30T06:26:53Z

Merged into master and branch-1.5.

mengxr added 2 commits August 28, 2015 21:55

update mllib-guide

7dd3552

replace ML Dataset by DataFrame to simplify the abstraction

18d4122

remove links to Scala API doc in the main guide change ML algorithms to pipeline components

feynmanliang reviewed Aug 29, 2015
View reviewed changes

asfgit closed this in 905fbe4 Aug 30, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-10348] [MLLIB] updates ml-guide #8517

[SPARK-10348] [MLLIB] updates ml-guide #8517

mengxr commented Aug 29, 2015

SparkQA commented Aug 29, 2015

feynmanliang Aug 29, 2015

mengxr Aug 30, 2015

feynmanliang commented Aug 29, 2015

mengxr commented Aug 30, 2015

[SPARK-10348] [MLLIB] updates ml-guide #8517

[SPARK-10348] [MLLIB] updates ml-guide #8517

Conversation

mengxr commented Aug 29, 2015

SparkQA commented Aug 29, 2015

feynmanliang Aug 29, 2015

Choose a reason for hiding this comment

mengxr Aug 30, 2015

Choose a reason for hiding this comment

feynmanliang commented Aug 29, 2015

mengxr commented Aug 30, 2015