[SPARK-27997][K8S] Add support for kubernetes OAuth Token refresh#33675
[SPARK-27997][K8S] Add support for kubernetes OAuth Token refresh#33675haodemon wants to merge 11 commits intoapache:masterfrom
Conversation
| ConfigBuilder("spark.kubernetes.client.oauth.token.provider.class") | ||
| .doc("A class that implements fabric's OAuthTokenProvider interface to " + | ||
| "provide a token refresh for long running jobs.") | ||
| .version("3.1.3") |
There was a problem hiding this comment.
Could you use 3.3.0 because the master branch is 3.3.0-SNAPSHOT?
There was a problem hiding this comment.
Thank you for the review. Changed to 3.3.0
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Thank you for making a PR, @haodemon .
- Apache Spark mainly runs the unit tests on your GitHub Action. Could you enable the
GitHub Actionbuild_and_testjob in your Spark fork?- We also have a Jenkins sever. I'll trigger it though.
- Apache Spark 3.3.0 will use
5.6.0, could you update the link to5.6.0link in the PR description?
|
ok to test |
|
Test build #142193 has finished for PR 33675 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #142197 has finished for PR 33675 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
@dongjoon-hyun, answering inline.
I'm not sure why they didn't start before, but seems like now they're passing.
Updated, thank you. |
| .withOption(oauthTokenValue) { | ||
| .withOption(oauthTokenProviderInstance) { | ||
| (provider, configBuilder) => configBuilder.withOauthTokenProvider(provider) | ||
| }.withOption(oauthTokenValue) { |
There was a problem hiding this comment.
To be clear, if we have two configurations at the same time mistakingly, we invoke OAuthTokenProvider and override it with oauthTokenValue?
There was a problem hiding this comment.
Could you check what happens when we call configBuilder.withOauthTokenProvider(provider) and configBuilder.withOauthToken(token) together?
There was a problem hiding this comment.
The token that was specified in configBuilder.withOauthToken(token) will stay unused, and the class that implements OAuthTokenProvider will be used instead. I confirmed this via tests on my cluster by simultaneously:
1 . Adding a debugging log into my class and providing it via
--conf spark.kubernetes.client.oauth.token.provider.class <myClass>
- Specifying tokens via
--conf spark.kubernetes.authenticate.submission.oauthToken=<token>
--conf spark.kubernetes.authenticate.driver.oauthToken=<token>
--conf spark.kubernetes.authenticate.oauthToken=<token>
My log:
[2021-08-11 12:51:58,012] DEBUG (OAuthGoogleTokenProvider.java:62) - Refreshing kubernetes oauth token via [/usr/lib/google-cloud-sdk/bin/gcloud, config, config-helper, --format=json]
[2021-08-11 12:51:58,498] DEBUG (OAuthGoogleTokenProvider.java:73) - New token expiry time is 3511s
From the log it could be concluded that OAuthTokenProvider has a higher precedence over the token that was specified in configBuilder.withOauthToken(token). It could also be confirmed by looking at the fabric's code in
https://github.com/fabric8io/kubernetes-client/blob/74cc63df9b6333d083ee24a6ff3455eaad0a6da8/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/RequestConfig.java#L136
There was a problem hiding this comment.
Thank you for checking. In that case, the current situation may cause a confusion to the Spark users. From Spark perspective, we have two options.
- Option A: We can revise the description of this new configuration about that precedence.
- Option B: We can ignore
spark.kubernetes.client.oauth.token.provider.classwhenoauthTokenexists from Spark side.
Which one do you prefer?
There was a problem hiding this comment.
the current situation may cause a confusion to the Spark users.
Indeed. Thanks for making me realize this. The patch is missing a way for a Spark user to specify how they would like the Spark to authenticate in Kubernetes when running on a client or cluster mode. There is several options present for oauthToken:
spark.kubernetes.authenticate.submission.oauthToken
spark.kubernetes.authenticate.driver.oauthToken
spark.kubernetes.authenticate.oauthToken
And I think we need to have the same for token provider, like spark.kubernetes.authenticate.*.oauthTokenProvider.
This change would:
- Make the new options consistent with the rest of kubernetes options.
- Make existing options mutually exclusive. For every mode only one of the
oauthToken,oauthTokenFile,oauthTokenProviderwould be allowed.
If we try this, we won't have to add anything about precedence into the docs and there would be no need to ignore anything in the code.
@dongjoon-hyun, sorry for a lot of text.
There was a problem hiding this comment.
I think either precedence or doing an assertion that only one of these is set is fine, but let's just pick one and do it so we can get this in.
There was a problem hiding this comment.
I prepared the changes, so now we have:
• OAuthTokenProvider options are now consistent with the other two options – OAuthToken and OAuthTokenFile.
• Added an assertion that only one of the options is set (either OAuthToken, OAuthTokenFile or OAuthTokenProvider
• Added docs
I have tested this in client mode on Kubernetes.
| val CLIENT_CERT_FILE_CONF_SUFFIX = "clientCertFile" | ||
| val CA_CERT_FILE_CONF_SUFFIX = "caCertFile" | ||
|
|
||
|
|
There was a problem hiding this comment.
nit. Shall we remove the redundant emtpy line?
|
BTW, I embedded your example into the PR description to provide it as a commit message, @haodemon . Please let me know if that is not what you want. |
|
Test build #142330 has finished for PR 33675 at commit
|
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Jenkins retest this please. |
|
Test build #143220 has finished for PR 33675 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #143289 has finished for PR 33675 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #143997 has started for PR 33675 at commit |
|
Test build #144684 has finished for PR 33675 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
|
@haodemon Can we reopen this and I'll help with anything needed to get the tests to pass? |
|
This is reopened according to @tiagovrtr 's request. |
Thank you, @tiagovrtr. I'll take a look tomorrow and will try to update the PR. In any case, I'll post an update here. |
|
Can one of the admins verify this patch? |
|
this patch seems only to bring the latest changes from master, anything else to do here? |
|
Looping in @renato-farias as I'm leaving the @bbc and he'll very much welcome this fix in a future release 🙂 |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
|
Hi, All. |
|
Here is a rebased PR. |
What changes were proposed in this pull request?
This change allows a spark user to provide a class which implements fabric's OAuthTokenProvider to refresh tokens throughout the life of the spark app.
https://javadoc.io/doc/io.fabric8/kubernetes-client/5.6.0/io/fabric8/kubernetes/client/OAuthTokenProvider.html
Why are the changes needed?
Currently, while running spark on kubernetes, one should specify oauth tokens via config before starting an application.
The token has an expiration time (usually an hour, for GKE) and there is no way to update the token in the runtime. The spark app starts to throw exceptions.
Does this PR introduce any user-facing change?
No, this is a new feature.
How was this patch tested?
A class which implements OAuthTokenProvider interface[0] was added into the classpath on driver node with no other spark options for tokens specified
It was also tested with expired tokens specified, and the token was updated via the user-provided class.
There is no need to use any other token-related configuration options if this class is provided.
An example of the user-provided class for GKE
[0] https://gist.github.com/haodemon/5490fefdb258275c1f805d584319090b