Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22992][K8S] Remove assumption of the DNS domain #20187

Closed
wants to merge 1 commit into from

Conversation

foxish
Copy link
Contributor

@foxish foxish commented Jan 8, 2018

What changes were proposed in this pull request?

Remove the use of FQDN to access the driver because it assumes that it's set up in a DNS zone - cluster.local which is common but not ubiquitous
Note that we already access the in-cluster API server through kubernetes.default.svc, so, by extension, this should work as well.
The alternative is to introduce DNS zones for both of those addresses.

How was this patch tested?

Unit tests

cc @vanzin @liyinan926 @mridulm @mccheah

@SparkQA
Copy link

SparkQA commented Jan 8, 2018

Test build #85805 has finished for PR 20187 at commit bfaf466.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@foxish
Copy link
Contributor Author

foxish commented Jan 8, 2018

cc/ @MrHohn @bowei from K8s networking - can you guys confirm that using <service_name>.<namespace>.svc is strictly better than using the FQDN which made an assumption of the dns zone (cluster.local)?

@bowei
Copy link

bowei commented Jan 8, 2018

Confirmed -- the FQDN is not going to work on clusters that have a custom cluster suffix.

@liyinan926
Copy link
Contributor

@foxish have you run the integration tests against this on both minikube and GKE?

@foxish
Copy link
Contributor Author

foxish commented Jan 8, 2018

@liyinan926 -- not yet, will be running them shortly.

@vanzin
Copy link
Contributor

vanzin commented Jan 8, 2018

Let me know how those tests go and then I'll merge this.

@foxish
Copy link
Contributor Author

foxish commented Jan 8, 2018

Discovery starting.
Discovery completed in 191 milliseconds.
Run starting. Expected test count is: 8
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi using the remote example jar.
- Run SparkPi with custom driver pod name, labels, annotations, and environment variables.
- Run SparkPi with a test secret mounted into the driver and executor pods
- Run SparkPi using the remote example jar with a test secret mounted into the driver and executor pods
Run completed in 3 minutes, 29 seconds.
Total number of tests run: 8
Suites: completed 2, aborted 0
Tests: succeeded 8, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:55 min
[INFO] Finished at: 2018-01-08T12:49:03-08:00
[INFO] Final Memory: 18M/241M
[INFO] ------------------------------------------------------------------------

All tests passed. @vanzin, this is ready to be merged.
Also, FYI, we now have a way to visualize these results in CI - running against HEAD apache/spark.

https://k8s-testgrid.appspot.com/sig-big-data#spark-periodic-default-gke

@vanzin
Copy link
Contributor

vanzin commented Jan 8, 2018

Merging to master / 2.3.

asfgit pushed a commit that referenced this pull request Jan 8, 2018
## What changes were proposed in this pull request?

Remove the use of FQDN to access the driver because it assumes that it's set up in a DNS zone - `cluster.local` which is common but not ubiquitous
Note that we already access the in-cluster API server through `kubernetes.default.svc`, so, by extension, this should work as well.
The alternative is to introduce DNS zones for both of those addresses.

## How was this patch tested?
Unit tests

cc vanzin liyinan926 mridulm mccheah

Author: foxish <ramanathana@google.com>

Closes #20187 from foxish/cluster.local.

(cherry picked from commit eed82a0)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
@asfgit asfgit closed this in eed82a0 Jan 8, 2018
@foxish foxish deleted the cluster.local branch January 8, 2018 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants