Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10348] [MLLIB] updates ml-guide #8517

Closed
wants to merge 2 commits into from

Conversation

mengxr
Copy link
Contributor

@mengxr mengxr commented Aug 29, 2015

  • replace ML Dataset by DataFrame to unify the abstraction
  • ML algorithms -> pipeline components to describe the main concept
  • remove Scala API doc links from the main guide
  • Section Title -> Section tile to be consistent with other section titles in MLlib guide
  • modified lines break at 100 chars or periods

@jkbradley @feynmanliang

remove links to Scala API doc in the main guide
change ML algorithms to pipeline components
@SparkQA
Copy link

SparkQA commented Aug 29, 2015

Test build #41780 has finished for PR 8517 at commit 18d4122.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class JavaTrainValidationSplitExample


Machine learning can be applied to a wide variety of data types, such as vectors, text, images, and structured data.
Spark ML adopts the [`DataFrame`](api/scala/index.html#org.apache.spark.sql.DataFrame) from Spark SQL in order to support a variety of data types under a unified Dataset concept.
Spark ML adopts the `DataFrame` from Spark SQL in order to support a variety of data types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spark.ml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this but didn't figure out a good solution. Using spark.ml everywhere is accurate but it makes the guide a little bit strange to read. Another solution is to define Spark ML precisely somewhere in the doc. Let me think about this and make a new PR if necessary.

@feynmanliang
Copy link
Contributor

LGTM, made one minor comment

asfgit pushed a commit that referenced this pull request Aug 30, 2015
* replace `ML Dataset` by `DataFrame` to unify the abstraction
* ML algorithms -> pipeline components to describe the main concept
* remove Scala API doc links from the main guide
* `Section Title` -> `Section tile` to be consistent with other section titles in MLlib guide
* modified lines break at 100 chars or periods

jkbradley feynmanliang

Author: Xiangrui Meng <meng@databricks.com>

Closes #8517 from mengxr/SPARK-10348.

(cherry picked from commit 905fbe4)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
@mengxr
Copy link
Contributor Author

mengxr commented Aug 30, 2015

Merged into master and branch-1.5.

@asfgit asfgit closed this in 905fbe4 Aug 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants