Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Small fixes to Spark on Yarn doc #8762

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions docs/running-on-yarn.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,16 @@ Spark application's configuration (driver, executors, and the AM when running in

There are two deploy modes that can be used to launch Spark applications on YARN. In `yarn-cluster` mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In `yarn-client` mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.

Unlike in Spark standalone and Mesos mode, in which the master's address is specified in the `--master` parameter, in YARN mode the ResourceManager's address is picked up from the Hadoop configuration. Thus, the `--master` parameter is `yarn-client` or `yarn-cluster`.
Unlike [Spark standalone](spark-standalone.html) and [Mesos](running-on-mesos.html) modes, in which the master's address is specified in the `--master` parameter, in YARN mode the ResourceManager's address is picked up from the Hadoop configuration. Thus, the `--master` parameter is `yarn-client` or `yarn-cluster`.

To launch a Spark application in `yarn-cluster` mode:

`$ ./bin/spark-submit --class path.to.your.Class --master yarn-cluster [options] <app jar> [app options]`
$ ./bin/spark-submit --class path.to.your.Class --master yarn-cluster [options] <app jar> [app options]

For example:

$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--num-executors 3 \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
Expand All @@ -37,7 +37,7 @@ For example:

The above starts a YARN client program which starts the default Application Master. Then SparkPi will be run as a child thread of Application Master. The client will periodically poll the Application Master for status updates and display them in the console. The client will exit once your application has finished running. Refer to the "Debugging your Application" section below for how to see driver and executor logs.

To launch a Spark application in `yarn-client` mode, do the same, but replace `yarn-cluster` with `yarn-client`. To run spark-shell:
To launch a Spark application in `yarn-client` mode, do the same, but replace `yarn-cluster` with `yarn-client`. The following shows how you can run `spark-shell` in `yarn-client` mode:

$ ./bin/spark-shell --master yarn-client

Expand All @@ -54,8 +54,8 @@ In `yarn-cluster` mode, the driver runs on a different machine than the client,

# Preparations

Running Spark-on-YARN requires a binary distribution of Spark which is built with YARN support.
Binary distributions can be downloaded from the Spark project website.
Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the value in the other changes; these two lines seem OK but this is pretty trivial. Would you consider taking one big review of the docs, or at least a logical set of docs, and identify meaningful improvements, and make a JIRA covering small improvements to a lot of docs? I don't think it's worth many tiny PRs for mostly non-functional changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about lines 30 and 24? They are the most important with the others just amendments as I was reviewing the entire doc. The reason for the change was this change 16b6d18 where I learnt the command line's option is no longer supported and hence the change in the doc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes removing the arg is a good idea. Line 24 is just a formatting change... OK but it renders the same way. I'm not against merging some of these changes per se, just requesting maybe a different approach going forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/running-on-yarn.html and search for ./bin/spark-submit --class path.to.your.Class. It renders incorrectly since there's the indent + backticks that make the docs go awry. That's why I fixed that, too.

What do you propose as a different approach going forward? When I see a change without changes to the docs, what's the proper approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it renders in code font already just not in the code box format. OK that seems fine. I think it's OK to merge this.

I'm referring to piecemeal edits since you've made a number of tiny PRs without a JIRA. It would be more efficient to have a review of whole logical sections of docs, and focus on small but not trivial changes in one PR. You'll probably end up spotting some broader opportunities to make docs consistent. Yes several of these changes here end up being good small fixes.

Binary distributions can be downloaded from the [downloads page](http://spark.apache.org/downloads.html) of the project website.
To build Spark yourself, refer to [Building Spark](building-spark.html).

# Configuration
Expand Down