Skip to content

Commit

Permalink
[MINOR][DOCS] Add note about Spark network security
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

In response to a recent question, this reiterates that network access to a Spark cluster should be disabled by default, and that access to its hosts and services from outside a private network should be added back explicitly.

Also, some minor touch-ups while I was at it.

## How was this patch tested?

N/A

Author: Sean Owen <srowen@gmail.com>

Closes #21947 from srowen/SecurityNote.
  • Loading branch information
srowen authored and HyukjinKwon committed Aug 2, 2018
1 parent c5fe412 commit c9914cf
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 9 deletions.
23 changes: 18 additions & 5 deletions docs/security.md
Expand Up @@ -278,7 +278,7 @@ To enable authorization in the SHS, a few extra options are used:
<table class="table">
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
<tr>
<td>spark.history.ui.acls.enable</td>
<td><code>spark.history.ui.acls.enable</code></td>
<td>false</td>
<td>
Specifies whether ACLs should be checked to authorize users viewing the applications in
Expand All @@ -292,15 +292,15 @@ To enable authorization in the SHS, a few extra options are used:
</td>
</tr>
<tr>
<td>spark.history.ui.admin.acls</td>
<td><code>spark.history.ui.admin.acls</code></td>
<td>None</td>
<td>
Comma separated list of users that have view access to all the Spark applications in history
server.
</td>
</tr>
<tr>
<td>spark.history.ui.admin.acls.groups</td>
<td><code>spark.history.ui.admin.acls.groups</code></td>
<td>None</td>
<td>
Comma separated list of groups that have view access to all the Spark applications in history
Expand Down Expand Up @@ -501,6 +501,7 @@ can be accomplished by setting `spark.ssl.useNodeLocalConf` to `true`. In that c
provided by the user on the client side are not used.

### Mesos mode

Mesos 1.3.0 and newer supports `Secrets` primitives as both file-based and environment based
secrets. Spark allows the specification of file-based and environment variable based secrets with
`spark.mesos.driver.secret.filenames` and `spark.mesos.driver.secret.envkeys`, respectively.
Expand Down Expand Up @@ -562,8 +563,12 @@ Security.

# Configuring Ports for Network Security

Spark makes heavy use of the network, and some environments have strict requirements for using tight
firewall settings. Below are the primary ports that Spark uses for its communication and how to
Generally speaking, a Spark cluster and its services are not deployed on the public internet.
They are generally private services, and should only be accessible within the network of the
organization that deploys Spark. Access to the hosts and ports used by Spark services should
be limited to origin hosts that need to access the services.

Below are the primary ports that Spark uses for its communication and how to
configure those ports.

## Standalone mode only
Expand Down Expand Up @@ -597,6 +602,14 @@ configure those ports.
<td><code>SPARK_MASTER_PORT</code></td>
<td>Set to "0" to choose a port randomly. Standalone mode only.</td>
</tr>
<tr>
<td>External Service</td>
<td>Standalone Master</td>
<td>6066</td>
<td>Submit job to cluster via REST API</td>
<td><code>spark.master.rest.port</code></td>
<td>Use <code>spark.master.rest.enabled</code> to enable/disable this service. Standalone mode only.</td>
</tr>
<tr>
<td>Standalone Master</td>
<td>Standalone Worker</td>
Expand Down
15 changes: 11 additions & 4 deletions docs/spark-standalone.md
Expand Up @@ -362,8 +362,15 @@ You can run Spark alongside your existing Hadoop cluster by just launching it as

# Configuring Ports for Network Security

Spark makes heavy use of the network, and some environments have strict requirements for using
tight firewall settings. For a complete list of ports to configure, see the
Generally speaking, a Spark cluster and its services are not deployed on the public internet.
They are generally private services, and should only be accessible within the network of the
organization that deploys Spark. Access to the hosts and ports used by Spark services should
be limited to origin hosts that need to access the services.

This is particularly important for clusters using the standalone resource manager, as they do
not support fine-grained access control in a way that other resource managers do.

For a complete list of ports to configure, see the
[security page](security.html#configuring-ports-for-network-security).

# High Availability
Expand All @@ -376,7 +383,7 @@ By default, standalone scheduling clusters are resilient to Worker failures (ins

Utilizing ZooKeeper to provide leader election and some state storage, you can launch multiple Masters in your cluster connected to the same ZooKeeper instance. One will be elected "leader" and the others will remain in standby mode. If the current leader dies, another Master will be elected, recover the old Master's state, and then resume scheduling. The entire recovery process (from the time the first leader goes down) should take between 1 and 2 minutes. Note that this delay only affects scheduling _new_ applications -- applications that were already running during Master failover are unaffected.

Learn more about getting started with ZooKeeper [here](http://zookeeper.apache.org/doc/current/zookeeperStarted.html).
Learn more about getting started with ZooKeeper [here](https://zookeeper.apache.org/doc/current/zookeeperStarted.html).

**Configuration**

Expand Down Expand Up @@ -419,6 +426,6 @@ In order to enable this recovery mode, you can set SPARK_DAEMON_JAVA_OPTS in spa

**Details**

* This solution can be used in tandem with a process monitor/manager like [monit](http://mmonit.com/monit/), or just to enable manual recovery via restart.
* This solution can be used in tandem with a process monitor/manager like [monit](https://mmonit.com/monit/), or just to enable manual recovery via restart.
* While filesystem recovery seems straightforwardly better than not doing any recovery at all, this mode may be suboptimal for certain development or experimental purposes. In particular, killing a master via stop-master.sh does not clean up its recovery state, so whenever you start a new Master, it will enter recovery mode. This could increase the startup time by up to 1 minute if it needs to wait for all previously-registered Workers/clients to timeout.
* While it's not officially supported, you could mount an NFS directory as the recovery directory. If the original Master node dies completely, you could then start a Master on a different node, which would correctly recover all previously registered Workers/applications (equivalent to ZooKeeper recovery). Future applications will have to be able to find the new Master, however, in order to register.

0 comments on commit c9914cf

Please sign in to comment.