[SPARK-4563][core] Allow driver to advertise a different network address. #15120

vanzin · 2016-09-16T22:02:52Z

The goal of this feature is to allow the Spark driver to run in an
isolated environment, such as a docker container, and be able to use
the host's port forwarding mechanism to be able to accept connections
from the outside world.

The change is restricted to the driver: there is no support for achieving
the same thing on executors (or the YARN AM for that matter). Those still
need full access to the outside world so that, for example, connections
can be made to an executor's block manager.

The core of the change is simple: add a new configuration that tells what's
the address the driver should bind to, which can be different than the address
it advertises to executors (spark.driver.host). Everything else is plumbing
the new configuration where it's needed.

To use the feature, the host starting the container needs to set up the
driver's port range to fall into a range that is being forwarded; this
required the block manager port to need a special configuration just for
the driver, which falls back to the existing spark.blockManager.port when
not set. This way, users can modify the driver settings without affecting
the executors; it would theoretically be nice to also have different
retry counts for driver and executors, but given that docker (at least)
allows forwarding port ranges, we can probably live without that for now.

Because of the nature of the feature it's kinda hard to add unit tests;
I just added a simple one to make sure the configuration works.

This was tested with a docker image running spark-shell with the following
command:

docker blah blah blah
-p 38000-38100:38000-38100
[image]
spark-shell
--num-executors 3
--conf spark.shuffle.service.enabled=false
--conf spark.dynamicAllocation.enabled=false
--conf spark.driver.host=[host's address]
--conf spark.driver.port=38000
--conf spark.driver.blockManager.port=38020
--conf spark.ui.port=38040

Running on YARN; verified the driver works, executors start up and listen
on ephemeral ports (instead of using the driver's config), and that caching
and shuffling (without the shuffle service) works. Clicked through the UI
to make sure all pages (including executor thread dumps) worked. Also tested
apps without docker, and ran unit tests.

…ess. The goal of this feature is to allow the Spark driver to run in an isolated environment, such as a docker container, and be able to use the host's port forwarding mechanism to be able to accept connections from the outside world. The change is restricted to the driver: there is no support for achieving the same thing on executors (or the YARN AM for that matter). Those still need full access to the outside world so that, for example, connections can be made to an executor's block manager. The core of the change is simple: add a new configuration that tells what's the address the driver should bind to, which can be different than the address it advertises to executors (spark.driver.host). Everything else is plumbing the new configuration where it's needed. To use the feature, the host starting the container needs to set up the driver's port range to fall into a range that is being forwarded; this required the block manager port to need a special configuration just for the driver, which falls back to the existing spark.blockManager.port when not set. This way, users can modify the driver settings without affecting the executors; it would theoretically be nice to also have different retry counts for driver and executors, but given that docker (at least) allows forwarding port ranges, we can probably live without that for now. Because of the nature of the feature it's kinda hard to add unit tests; I just added a simple one to make sure the configuration works. This was tested with a docker image running spark-shell with the following command: docker blah blah blah \ -p 38000-38100:38000-38100 \ [image] \ spark-shell \ --num-executors 3 \ --conf spark.shuffle.service.enabled=false \ --conf spark.dynamicAllocation.enabled=false \ --conf spark.driver.host=[host's address] \ --conf spark.driver.port=38000 \ --conf spark.driver.blockManager.port=38020 \ --conf spark.ui.port=38040 Running on YARN; verified the driver works, executors start up and listen on ephemeral ports (instead of using the driver's config), and that caching and shuffling (without the shuffle service) works. Clicked through the UI to make sure all pages (including executor thread dumps) worked. Also tested apps without docker, and ran unit tests.

SparkQA · 2016-09-16T23:55:56Z

Test build #65509 has finished for PR 15120 at commit 40f23a4.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-09-17T02:25:36Z

Test build #65516 has finished for PR 15120 at commit 0c1fe29.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2016-09-19T18:34:38Z

/cc @zsxwing

zsxwing · 2016-09-20T22:48:05Z

core/src/main/scala/org/apache/spark/internal/config/package.scala

+  private[spark] val DRIVER_BIND_ADDRESS = ConfigBuilder("spark.driver.bindAddress")
+    .doc("Address where to bind network listen sockets on the driver.")
+    .stringConf
+    .createWithDefault(Utils.localHostName())


This is a broken change. If a user uses spark.driver.host to specify the bind address, it won't work now. Right?

It should, as long as he doesn't set "spark.driver.bindAddress" - which is a new setting that is being added to override that behavior.

Maybe I miss something. val bindAddress = conf.get(DRIVER_BIND_ADDRESS) won't use spark.driver.host. Right?

Ah I see what you mean. I might have inverted the config resolution order... let me take a look.

And not the other way around, for backwards compatibility. Includes a small change to the resolution of fallback conf entry values.

SparkQA · 2016-09-21T02:07:23Z

Test build #65692 has finished for PR 15120 at commit 5137131.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing

LGTM. Just some nits

zsxwing · 2016-09-21T18:02:03Z

core/src/main/scala/org/apache/spark/util/Utils.scala

-              s"service$serviceString (for example spark.ui.port for SparkUI) to an available " +
-              "port or increasing spark.port.maxRetries."
+              s"$maxRetries retries (starting from $startPort)! Consider explicitly setting " +
+              s"the appropriate port for the service$serviceString (for example spark.ui.port " +


nit: since you are touching this, could you add a space between service and $serviceString?

The space is actually in $serviceString because the service name is allowed to be empty. (Historical reasons? Don't ask me.)

zsxwing · 2016-09-21T18:02:11Z

core/src/main/scala/org/apache/spark/SparkContext.scala

-    _conf.setIfMissing("spark.driver.host", Utils.localHostName())
+    // Set Spark driver host and port system properties. This explicitly sets the configuration
+    // instead of relying on the default value of the config constant.
+    _conf.set(DRIVER_HOST_ADDRESS, conf.get(DRIVER_HOST_ADDRESS))


nit: could you use _conf instead? I thought _conf and conf were different at first glance. But actually they are same.

zsxwing · 2016-09-21T18:04:34Z

core/src/main/scala/org/apache/spark/internal/config/ConfigProvider.scala

@@ -66,7 +66,7 @@ private[spark] class SparkConfigProvider(conf: JMap[String, String]) extends Con
    findEntry(key) match {
      case e: ConfigEntryWithDefault[_] => Option(e.defaultValueString)
      case e: ConfigEntryWithDefaultString[_] => Option(e.defaultValueString)
-      case e: FallbackConfigEntry[_] => defaultValueString(e.fallback.key)
+      case e: FallbackConfigEntry[_] => get(e.fallback.key)


do we need to backport this fix to 2.0.1?

No, this code is only in master.

We are facing this issue with Spark 1.6 . Are we going to backport this?

What issue? The code you're commenting on does not exist in 1.6. If you're having issues, please ask questions on the mailing lists or use the bug tracker.

zsxwing · 2016-09-21T18:14:34Z

LGTM

SparkQA · 2016-09-21T20:47:36Z

Test build #65726 has finished for PR 15120 at commit 6dd0716.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2016-09-21T21:42:21Z

Merging to master. Thanks!

eduardohl · 2016-11-29T15:11:37Z

Hi, I'm currently in need of using this change because we're running inside containers that need port mapping. We're not using master (currently using 2.0.2). Do you know if there are any alternatives while the commit isn't released?

vanzin · 2016-11-29T17:28:08Z

Do you know if there are any alternatives while the commit isn't released?

Not really.

sangramga · 2018-06-27T13:43:49Z

@vanzin @SparkQA Is this feature available for only YARN cluster manager? Can Spark standalone manager can also use this?

Fix CheckpointSuite.

0c1fe29

zsxwing reviewed Sep 20, 2016

View reviewed changes

Bind address should default to spark.driver.host.

5137131

And not the other way around, for backwards compatibility. Includes a small change to the resolution of fallback conf entry values.

zsxwing requested changes Sep 21, 2016

View reviewed changes

Use '_conf' consistently.

6dd0716

asfgit closed this in 2cd1bfa Sep 21, 2016

vanzin deleted the SPARK-4563 branch November 30, 2016 22:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-4563][core] Allow driver to advertise a different network address. #15120

[SPARK-4563][core] Allow driver to advertise a different network address. #15120

vanzin commented Sep 16, 2016

SparkQA commented Sep 16, 2016

SparkQA commented Sep 17, 2016

vanzin commented Sep 19, 2016

zsxwing Sep 20, 2016 •

edited

vanzin Sep 20, 2016

zsxwing Sep 20, 2016

vanzin Sep 20, 2016

SparkQA commented Sep 21, 2016

zsxwing left a comment

zsxwing Sep 21, 2016

vanzin Sep 21, 2016

zsxwing Sep 21, 2016

zsxwing Sep 21, 2016

zsxwing Sep 21, 2016

vanzin Sep 21, 2016

sumitvashistha Jan 31, 2017

vanzin Jan 31, 2017

zsxwing commented Sep 21, 2016

SparkQA commented Sep 21, 2016

zsxwing commented Sep 21, 2016

eduardohl commented Nov 29, 2016

vanzin commented Nov 29, 2016

sangramga commented Jun 27, 2018 •

edited

[SPARK-4563][core] Allow driver to advertise a different network address. #15120

[SPARK-4563][core] Allow driver to advertise a different network address. #15120

Conversation

vanzin commented Sep 16, 2016

SparkQA commented Sep 16, 2016

SparkQA commented Sep 17, 2016

vanzin commented Sep 19, 2016

zsxwing Sep 20, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Sep 21, 2016

zsxwing left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zsxwing commented Sep 21, 2016

SparkQA commented Sep 21, 2016

zsxwing commented Sep 21, 2016

eduardohl commented Nov 29, 2016

vanzin commented Nov 29, 2016

sangramga commented Jun 27, 2018 • edited

zsxwing Sep 20, 2016 •

edited

sangramga commented Jun 27, 2018 •

edited