-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11662] [YARN]. In Client mode, make sure we re-login before at… #9875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11662] [YARN]. In Client mode, make sure we re-login before at… #9875
Conversation
…tempting to create new delegation tokens if a new SparkContext is created within the same application. Since Hadoop gives precedence to the delegation tokens, we must make sure we login as a different user, get new tokens and replace the old ones in the current user's credentials cache to avoid not being able to get new ones.
|
Jenkins, test this please |
|
Test build #46458 has finished for PR 9875 at commit
|
|
Test build #46454 has finished for PR 9875 at commit
|
|
retest this please |
|
Test build #46466 has finished for PR 9875 at commit
|
|
@harishreedharan I thought this issue existed even without reusing the jvm (going by other jira and prs that were filed)? For instance if you just have a long running process and had specified a keytab to use. I thought they had said it wasn't relogging in because of the token that it was acquiring. |
|
Yes, that is correct but this happens even without the tokens expiring. What do you think about doing the relogin in the YarnCLientSchedulerBackend? This is messy with client mode but I don't think we have another option though. |
|
Also, to be clear..it would work fine in Cluster mode. In Client mode, #7394 should have taken care of the long-running app issue (though there was one where @SaintBacchus I think mentioned that the So in either case, I am wondering whether in client mode, we should simply re-login using keytab and not bother with tokens on the driver app at all (so the AM would login, update tokens etc, while the client app always just logs in). So you think that make sense, @tgravescs? |
|
Is the situation where the client can only have tokens need to be covered? As users may not have keytabs, only kinit-granted tickets, and they still have the right to submit work with a lifespan <= ticket life. |
|
If we are going to include the fix for the issue @SaintBacchus mentioned then I think the right thing is to only login from the keytab or get the tokens (not both). If keytab is supplied always use that and don't bother with tokens on the driver, otherwise get the tokens. |
|
After applying the provided patch things still do not work. I've been doing some debugging, I've found some additional information. When it works, it seems that two tokens are created with the renewal interval being set for the first one using the "getTokenRenewalInterval(stagingDirPath)" function in Client.scala. The second time around (after stopping and restarting the context), however, it prints a message saying 1 token was created, but no renewal interval is set. Finally, it dies saying the token can't be found in the cache. The relevant output is below (ip/hostnames removed): ---------------Successful run-------------- |
|
Test build #58467 has finished for PR 9875 at commit
|
| // If this JVM hosted an yarn-client mode driver before, the credentials of the current user | ||
| // now has delegation tokens, which means Hadoop security code will look at that and not the | ||
| // keytab login. So we must re-login and get new tokens. | ||
| if (reusedJVM && loginFromKeytab && !isClusterMode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @harishreedharan , do you plan on updating this patch?
If yes, I'm wondering why not do this in all cases, not just when a new context is created. The same code should work in both scenarios, right?
If not, should probably close the PR.
|
ping @harishreedharan please close the PR if you don't intend to work on it. |
…tempting to create new delegation tokens if a new SparkContext is created within the same application.
Since Hadoop gives precedence to the delegation tokens, we must make sure we login as a different user, get new tokens and replace the old ones in the current user's credentials cache to avoid not being able to get new ones.
/cc @tedyu @tgravescs