Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21642][CORE] Use FQDN for DRIVER_HOST_ADDRESS instead of ip address #18846

Closed
wants to merge 1 commit into from

Conversation

Projects
None yet
6 participants
@akitanaka
Copy link
Contributor

commented Aug 4, 2017

What changes were proposed in this pull request?

The patch lets spark web ui use FQDN as its hostname instead of ip address.

In current implementation, ip address of a driver host is set to DRIVER_HOST_ADDRESS. This becomes a problem when we enable SSL using "spark.ssl.enabled", "spark.ssl.trustStore" and "spark.ssl.keyStore" properties. When we configure these properties, spark web ui is launched with SSL enabled and the HTTPS server is configured with the custom SSL certificate you configured in these properties.
In this case, client gets javax.net.ssl.SSLPeerUnverifiedException exception when the client accesses the spark web ui because the client fails to verify the SSL certificate (Common Name of the SSL cert does not match with DRIVER_HOST_ADDRESS).

To avoid the exception, we should use FQDN of the driver host for DRIVER_HOST_ADDRESS.

Error message that client gets when the client accesses spark web ui:
javax.net.ssl.SSLPeerUnverifiedException: Certificate for <10.102.138.239> doesn't match any of the subject alternative names: []

How was this patch tested?

manual tests

@jiangxb1987

This comment has been minimized.

Copy link
Contributor

commented Aug 6, 2017

OK to test

@jiangxb1987

This comment has been minimized.

Copy link
Contributor

commented Aug 6, 2017

This looks reasonable, cc @cloud-fan

@cloud-fan

This comment has been minimized.

Copy link
Contributor

commented Aug 7, 2017

ok to test

@jiangxb1987

This comment has been minimized.

Copy link
Contributor

commented Aug 7, 2017

Should we also apply this change to RpcEnv ? @zsxwing

@SparkQA

This comment has been minimized.

Copy link

commented Aug 7, 2017

Test build #80323 has finished for PR 18846 at commit afc07ee.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.
@tgravescs

This comment has been minimized.

Copy link
Contributor

commented Aug 7, 2017

I'm in favor of the change but I don't know the history on why its an ip to start with so we should make sure its not going to break usecases.
Did you test if the host doesn't have fqdn it falls back to ip?

@cloud-fan

This comment has been minimized.

Copy link
Contributor

commented Aug 7, 2017

retest this please

@akitanaka

This comment has been minimized.

Copy link
Contributor Author

commented Aug 7, 2017

It seems that Spark started using IP address in SPARK-6440 (#5424). As far as I looked at the change, this change was introduced only for visibility.

@akitanaka

This comment has been minimized.

Copy link
Contributor Author

commented Aug 7, 2017

Sorry, I forgot to answer one of your questions.

Did you test if the host doesn't have fqdn it falls back to ip?

Yes, I tested it manually. InetAddress.getCanonicalHostName() returned ip address if the client failed reverse DNS lookup.

@SparkQA

This comment has been minimized.

Copy link

commented Aug 7, 2017

Test build #80347 has finished for PR 18846 at commit afc07ee.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@zsxwing

This comment has been minimized.

Copy link
Member

commented Aug 7, 2017

Does this affect how we detect driver/executor connection lost?

@akitanaka

This comment has been minimized.

Copy link
Contributor Author

commented Aug 8, 2017

In my understanding, this does not affect driver/executor's heartbeat behavior since this PR only changes DRIVER_HOST_ADDRESS.
However, I am not sure why the unit test was failed.

The reason I want to apply this change is the client of Spark context web UI is not Spark components. (The client will be web browser and Hadoop ResourceManager's webproxy) Since it is difficult to change these clients' behavior, I thought we should change Spark side behavior.

@cloud-fan

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2017

retest this please

@zsxwing

This comment has been minimized.

Copy link
Member

commented Aug 9, 2017

@thideeeee I think DRIVER_HOST_ADDRESS will be used to generate the driver url. Could you check if this line still works after your change?

} else if (driver.exists(_.address == remoteAddress)) {

@SparkQA

This comment has been minimized.

Copy link

commented Aug 9, 2017

Test build #80443 has finished for PR 18846 at commit afc07ee.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.
@jiangxb1987

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2017

retest this please

@SparkQA

This comment has been minimized.

Copy link

commented Aug 9, 2017

Test build #80446 has finished for PR 18846 at commit afc07ee.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@akitanaka

This comment has been minimized.

Copy link
Contributor Author

commented Aug 11, 2017

@zsxwing Thank you for the pointer. I tested manually, as far as I tested, Spark works as expected even if we apply this patch. I was able to confirm that driver/executor shut down when its connection lost.

On env which can reverse DNS lookup:
17/08/10 19:11:06 ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Driver ip-10-46-12-135.us-west-2.compute.internal:34105 disassociated! Shutting down.

On env which cannot reverse DNS lookup:
17/08/11 02:14:32 ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Driver 172.16.1.26:36997 disassociated! Shutting down.

On the other hand, all the tests were failed. All the reasons of test failure were different, it looks the test failed randomly. Still looking why the test failed.

@tgravescs

This comment has been minimized.

Copy link
Contributor

commented Aug 11, 2017

retest this please

@SparkQA

This comment has been minimized.

Copy link

commented Aug 11, 2017

Test build #80535 has finished for PR 18846 at commit afc07ee.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@tgravescs

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2017

retest this please

1 similar comment
@tgravescs

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2017

retest this please

@tgravescs

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2017

Jenkins, test this please

@SparkQA

This comment has been minimized.

Copy link

commented Aug 16, 2017

Test build #80752 has finished for PR 18846 at commit afc07ee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@cloud-fan

This comment has been minimized.

Copy link
Contributor

commented Aug 17, 2017

LGTM, merging to master!

@asfgit asfgit closed this in d695a52 Aug 17, 2017

ash211 added a commit to apache-spark-on-k8s/spark that referenced this pull request Sep 7, 2017

[SPARK-21642][CORE] Use FQDN for DRIVER_HOST_ADDRESS instead of ip ad…
…dress

## What changes were proposed in this pull request?

The patch lets spark web ui use FQDN as its hostname instead of ip address.

In current implementation, ip address of a driver host is set to DRIVER_HOST_ADDRESS. This becomes a problem when we enable SSL using "spark.ssl.enabled", "spark.ssl.trustStore" and "spark.ssl.keyStore" properties. When we configure these properties, spark web ui is launched with SSL enabled and the HTTPS server is configured with the custom SSL certificate you configured in these properties.
In this case, client gets javax.net.ssl.SSLPeerUnverifiedException exception when the client accesses the spark web ui because the client fails to verify the SSL certificate (Common Name of the SSL cert does not match with DRIVER_HOST_ADDRESS).

To avoid the exception, we should use FQDN of the driver host for DRIVER_HOST_ADDRESS.

Error message that client gets when the client accesses spark web ui:
javax.net.ssl.SSLPeerUnverifiedException: Certificate for <10.102.138.239> doesn't match any of the subject alternative names: []

## How was this patch tested?
manual tests

Author: Hideaki Tanaka <tanakah@amazon.com>

Closes apache#18846 from thideeeee/SPARK-21642.

(cherry picked from commit d695a52)

mccheah added a commit to palantir/spark that referenced this pull request Sep 26, 2017

[SPARK-21642][CORE] Use FQDN for DRIVER_HOST_ADDRESS instead of ip ad…
…dress

## What changes were proposed in this pull request?

The patch lets spark web ui use FQDN as its hostname instead of ip address.

In current implementation, ip address of a driver host is set to DRIVER_HOST_ADDRESS. This becomes a problem when we enable SSL using "spark.ssl.enabled", "spark.ssl.trustStore" and "spark.ssl.keyStore" properties. When we configure these properties, spark web ui is launched with SSL enabled and the HTTPS server is configured with the custom SSL certificate you configured in these properties.
In this case, client gets javax.net.ssl.SSLPeerUnverifiedException exception when the client accesses the spark web ui because the client fails to verify the SSL certificate (Common Name of the SSL cert does not match with DRIVER_HOST_ADDRESS).

To avoid the exception, we should use FQDN of the driver host for DRIVER_HOST_ADDRESS.

Error message that client gets when the client accesses spark web ui:
javax.net.ssl.SSLPeerUnverifiedException: Certificate for <10.102.138.239> doesn't match any of the subject alternative names: []

## How was this patch tested?
manual tests

Author: Hideaki Tanaka <tanakah@amazon.com>

Closes apache#18846 from thideeeee/SPARK-21642.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.