-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs small fixes #8629
Closed
Closed
Docs small fixes #8629
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,18 +5,19 @@ title: Cluster Mode Overview | |
|
||
This document gives a short overview of how Spark runs on clusters, to make it easier to understand | ||
the components involved. Read through the [application submission guide](submitting-applications.html) | ||
to submit applications to a cluster. | ||
to learn about launching applications on a cluster. | ||
|
||
# Components | ||
|
||
Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext | ||
Spark applications run as independent sets of processes on a cluster, coordinated by the `SparkContext` | ||
object in your main program (called the _driver program_). | ||
|
||
Specifically, to run on a cluster, the SparkContext can connect to several types of _cluster managers_ | ||
(either Spark's own standalone cluster manager or Mesos/YARN), which allocate resources across | ||
(either Spark's own standalone cluster manager, Mesos or YARN), which allocate resources across | ||
applications. Once connected, Spark acquires *executors* on nodes in the cluster, which are | ||
processes that run computations and store data for your application. | ||
Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to | ||
the executors. Finally, SparkContext sends *tasks* for the executors to run. | ||
the executors. Finally, SparkContext sends *tasks* to the executors to run. | ||
|
||
<p style="text-align: center;"> | ||
<img src="img/cluster-overview.png" title="Spark cluster components" alt="Spark cluster components" /> | ||
|
@@ -33,9 +34,9 @@ There are several useful things to note about this architecture: | |
2. Spark is agnostic to the underlying cluster manager. As long as it can acquire executor | ||
processes, and these communicate with each other, it is relatively easy to run it even on a | ||
cluster manager that also supports other applications (e.g. Mesos/YARN). | ||
3. The driver program must listen for and accept incoming connections from its executors throughout | ||
its lifetime (e.g., see [spark.driver.port and spark.fileserver.port in the network config | ||
section](configuration.html#networking)). As such, the driver program must be network | ||
3. The driver program must listen for and accept incoming connections from its executors throughout | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also what's the change here? The rest LGTM. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The additional spaces at the end. Thanks a lot for reviewing the change and accepting it! |
||
its lifetime (e.g., see [spark.driver.port and spark.fileserver.port in the network config | ||
section](configuration.html#networking)). As such, the driver program must be network | ||
addressable from the worker nodes. | ||
4. Because the driver schedules tasks on the cluster, it should be run close to the worker | ||
nodes, preferably on the same local area network. If you'd like to send requests to the | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the change here -- I assume it's still code-formatted, but it's indented now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's merged already, but for the sake of completeness I'm answering now - yes, it's properly code-formatted.