Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-1727. Correct small compile errors, typos, and markdown issues in (primarly) MLlib docs #653

Closed
wants to merge 5 commits into from

Conversation

srowen
Copy link
Member

@srowen srowen commented May 5, 2014

While play-testing the Scala and Java code examples in the MLlib docs, I noticed a number of small compile errors, and some typos. This led to finding and fixing a few similar items in other docs.

Then in the course of building the site docs to check the result, I found a few small suggestions for the build instructions. I also found a few more formatting and markdown issues uncovered when I accidentally used maruku instead of kramdown.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint

val data = sc.textFile("mllib/data/sample_naive_bayes_data.txt")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS here I fleshed out the example since the project contains a data file for Naive bayes.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14683/

.setAppName("My application")
.set("spark.executor.memory", "1g")
val conf = new SparkConf().
setMaster("local").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we ask users to use :paste instead of putting . at the end of the line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the problem is that it didn't work as-is since the first line can be interpreted as a complete statement. I figured that it's best if the snippets work as given, without additional commands or config. I reviewed most of the other Scala snippets in the docs here, and there were only a few cases like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not for running in the REPL, where we don't really need to create SparkConf. With the latest spark-submit, this line should change to new SparkConf().setAppName("My application").

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, this snippet won't be pasted into a REPL, true. (I think the other case I saw is for the REPL, so should have this kind of change.)

But you're saying it can be simplified anyway to that one line? I can change that but I wonder if the idea is to just show use of setters? if so I could revert the change... or just leave for consistency with the other REPL-friendly snippet?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In v1.0, the recommended way of launching an app is through spark-submit, where you set Spark configurations through command-line arguments. It is easier to switch masters than hard code setMaster("local"). Also, it works for YARN.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, although on re-reading I think the purpose of this snippet is to explain how one would invoke Spark programmatically via SparkConf (or else the whole thing should go away). It is something you might want to do in a Scala program, and might even want to pop into a Scala REPL (i.e. not spark-shell). I suggest leaving it; am I really off-base on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove setMaster and set("spark.executor.memory" ...), then it fits in a single line. Those properties should be set with spark-submit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mengxr we want to allow these properties to be set directly as well... some applications will use this.

I think it might be good here to show both. First say that they can be set directly and then explain that they can be set through arguments to spark-submit if you create a SparkContext with an empty conf.

./bin/spark-submit --name "My application" --master local --executor-memory 1g

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14714/

@mengxr
Copy link
Contributor

mengxr commented May 7, 2014

@pwendell Could you help merge this PR? Then I can fix minor issues mentioned in the discussion.

@pwendell
Copy link
Contributor

pwendell commented May 7, 2014

Okay I can merge this and then we can take the other discussion in a new PR/JIRA.

asfgit pushed a commit that referenced this pull request May 7, 2014
…in (primarly) MLlib docs

While play-testing the Scala and Java code examples in the MLlib docs, I noticed a number of small compile errors, and some typos. This led to finding and fixing a few similar items in other docs.

Then in the course of building the site docs to check the result, I found a few small suggestions for the build instructions. I also found a few more formatting and markdown issues uncovered when I accidentally used maruku instead of kramdown.

Author: Sean Owen <sowen@cloudera.com>

Closes #653 from srowen/SPARK-1727 and squashes the following commits:

6e7c38a [Sean Owen] Final doc updates - one more compile error, and use of mean instead of sum and count
8f5e847 [Sean Owen] Fix markdown syntax issues that maruku flags, even though we use kramdown (but only those that do not affect kramdown's output)
99966a9 [Sean Owen] Update issue tracker URL in docs
23c9ac3 [Sean Owen] Add Scala Naive Bayes example, to use existing example data file (whose format needed a tweak)
8c81982 [Sean Owen] Fix small compile errors and typos across MLlib docs
(cherry picked from commit 25ad8f9)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
@asfgit asfgit closed this in 25ad8f9 May 7, 2014
@srowen srowen deleted the SPARK-1727 branch May 7, 2014 11:05
asfgit pushed a commit that referenced this pull request May 15, 2014
…cationModel, and KMeans

`model.predict` returns a RDD of Scala primitive type (Int/Double), which is recognized as Object in Java. Adding predict(JavaRDD) could make life easier for Java users.

Added tests for KMeans, LinearRegression, and NaiveBayes.

Will update examples after #653 gets merged.

cc: @srowen

Author: Xiangrui Meng <meng@databricks.com>

Closes #670 from mengxr/predict-javardd and squashes the following commits:

b77ccd8 [Xiangrui Meng] Merge branch 'master' into predict-javardd
43caac9 [Xiangrui Meng] add predict(JavaRDD) to RegressionModel, ClassificationModel, and KMeans
asfgit pushed a commit that referenced this pull request May 15, 2014
…cationModel, and KMeans

`model.predict` returns a RDD of Scala primitive type (Int/Double), which is recognized as Object in Java. Adding predict(JavaRDD) could make life easier for Java users.

Added tests for KMeans, LinearRegression, and NaiveBayes.

Will update examples after #653 gets merged.

cc: @srowen

Author: Xiangrui Meng <meng@databricks.com>

Closes #670 from mengxr/predict-javardd and squashes the following commits:

b77ccd8 [Xiangrui Meng] Merge branch 'master' into predict-javardd
43caac9 [Xiangrui Meng] add predict(JavaRDD) to RegressionModel, ClassificationModel, and KMeans
(cherry picked from commit d52761d)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
…in (primarly) MLlib docs

While play-testing the Scala and Java code examples in the MLlib docs, I noticed a number of small compile errors, and some typos. This led to finding and fixing a few similar items in other docs.

Then in the course of building the site docs to check the result, I found a few small suggestions for the build instructions. I also found a few more formatting and markdown issues uncovered when I accidentally used maruku instead of kramdown.

Author: Sean Owen <sowen@cloudera.com>

Closes apache#653 from srowen/SPARK-1727 and squashes the following commits:

6e7c38a [Sean Owen] Final doc updates - one more compile error, and use of mean instead of sum and count
8f5e847 [Sean Owen] Fix markdown syntax issues that maruku flags, even though we use kramdown (but only those that do not affect kramdown's output)
99966a9 [Sean Owen] Update issue tracker URL in docs
23c9ac3 [Sean Owen] Add Scala Naive Bayes example, to use existing example data file (whose format needed a tweak)
8c81982 [Sean Owen] Fix small compile errors and typos across MLlib docs
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
…cationModel, and KMeans

`model.predict` returns a RDD of Scala primitive type (Int/Double), which is recognized as Object in Java. Adding predict(JavaRDD) could make life easier for Java users.

Added tests for KMeans, LinearRegression, and NaiveBayes.

Will update examples after apache#653 gets merged.

cc: @srowen

Author: Xiangrui Meng <meng@databricks.com>

Closes apache#670 from mengxr/predict-javardd and squashes the following commits:

b77ccd8 [Xiangrui Meng] Merge branch 'master' into predict-javardd
43caac9 [Xiangrui Meng] add predict(JavaRDD) to RegressionModel, ClassificationModel, and KMeans
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants