-
Notifications
You must be signed in to change notification settings - Fork 9.2k
YARN-11112 Avoid renewing delegation token when app is first submitte… #4198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💔 -1 overall
This message was automatically generated. |
steveloughran
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someone who understands YARN properly will need to review this ... not me
It does seem excessive to renew tokens immediately. But it will be good to understand why that decision was made.
Does the Git history of the file provide any information?
Looking at the patch, the process must renew tokens just before they expire rather than waiting until they have actually expired. And we can't assume that every hosts clocks in perfect sync.
| import java.util.concurrent.locks.ReadWriteLock; | ||
| import java.util.concurrent.locks.ReentrantReadWriteLock; | ||
|
|
||
| import org.apache.hadoop.security.token.TokenIdentifier; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should go in to the other org.apache. imports
| try { | ||
| renewToken(dttr); | ||
| // if expire date is not greater than now, renew token. | ||
| if (expirationDate <= now) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's going to break if the clocks are out of sync; tokens also need to be renewed if they are going to expire soon
|
💔 -1 overall
This message was automatically generated. |
|
@steveloughran Thanks for your comment. Let's say a user submit an app,
The Step 6 is not very necesarry as the expire time does not vary a lot (only several minutes) . Renewing token will be a heavy opertion in a very busy cluster(in our case, there are nearly 40, 000 apps an hour), as it uses namesystem write lock. --> And we can't assume that every hosts clocks in perfect sync |
Hexiaoqiao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good improvement to me. leave comment inline, FYI.
| dttr = new DelegationTokenToRenew(Arrays.asList(applicationId), token, | ||
| tokenConf, now, shouldCancelAtEnd, evt.getUser()); | ||
| // decode identify to get maxDate. | ||
| TokenIdentifier tokenIdentifier = token.decodeIdentifier(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good improvement, especially for big and busy production cluster. IMO this changes is safe for HDFS, but I am not sure if it is proper for other system.
Another way, just suggest to update org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer#skipTokenRenewal rather than here to avoid double decode token. Also it is reasonable and readable.
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
zuston
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement. @yuanboliu
Thanks for your effort on this ticket which is also useful to us.
| hasHdfsToken = true; | ||
| } | ||
| if (skipTokenRenewal(token)) { | ||
| if (skipTokenRenewal(token, tokenExpiredTime)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The var of tokenExpiredTime will be changed in the method of skipTokenRenewal, however we can't found it according to its method name directly. Maybe skipTokenRenewal should be renamed to skipTokenRenewalAndUpdateExpiredTime.
...main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
Outdated
Show resolved
Hide resolved
| if (identifier == null) { | ||
| return false; | ||
| } | ||
| expiredTime.set(identifier.getMaxDate()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we just to reset the expireTime only when the result of skipTokenRenewal is false?
|
@dineshchitlangia @9uapaw @brumi1024 If u have time, could u help review it? I think it's a nice improvement. |
|
@zuston thanks for your comment. We're also working on HDFS-16558 to change the write lock of delegation token to read lock |
|
🎊 +1 overall
This message was automatically generated. |
| //to cause this one to be set for renew in 2 secs | ||
| Renewer.tokenToRenewIn2Sec = token1; | ||
| AbstractDelegationTokenIdentifier identifier1 = token1.decodeIdentifier(); | ||
| identifier1.setMaxDate(System.currentTimeMillis() + 2 * 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, use 2_000 for representing 2 seconds
| import java.util.concurrent.atomic.AtomicBoolean; | ||
| import java.util.concurrent.atomic.AtomicInteger; | ||
|
|
||
| import org.apache.hadoop.security.token.delegation.AbstractDelegationTokenIdentifier; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, should go in the right place in the org.apache imports.
| + applicationId + ", user=" + evt.getUser(), ioe); | ||
| requestNewHdfsDelegationTokenAsProxyUser( | ||
| Arrays.asList(applicationId), evt.getUser(), | ||
| Arrays.asList(applicationId), evt.getUser(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
roll this change back just to keep the diff smaller
| import org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppEvent; | ||
| import org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppEventType; | ||
| import org.apache.hadoop.yarn.server.utils.YarnServerBuilderUtils; | ||
| import org.slf4j.Logger; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert moving these...they were in the right place.
imports are where much needless backport/cherrypick issues surface, so minimising changes here is really important.
| try { | ||
| renewToken(dttr); | ||
| // if expire date is not greater than now, renew token. | ||
| if (tokenExpiredTime.get() <= now) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is too brittle. it should allow for some clock skew. maybe renew if within 10 minutes of expiry.
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
For those who is interesting in this patch, we've applied it in our product env, and it works as expected. |
Hexiaoqiao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuanboliu Thanks for involving me here. Thanks for your good catch here. Leave some nit comment inline. Thanks again.
| renewToken(dttr); | ||
| // if expire date is not greater than now, renew token. | ||
| // add extra time in case of clock skew | ||
| if (tokenExpiredTime.get() <= now + clockSkewExtraTime) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't get why add clock skew condition check here. I think we should file another JIRA to fix if it is issue indeed.
| } | ||
| expiredTime.set(identifier.getMaxDate()); | ||
| Text renewer = identifier.getRenewer(); | ||
| return (renewer != null && renewer.toString().equals("")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just suggest to check if delegation token need to renew while app is just submitted.
|
Thanks for the patch @yuanboliu. I was not involved in the implementation of this class, however, according to: https://blog.cloudera.com/hadoop-delegation-tokens-explained/ the immediate renewal is necessary due to "YARN RM verifies the Delegation Tokens are valid by immediately renewing them." What is you opinion about it @steveloughran @Hexiaoqiao? |
|
@9uapaw @Hexiaoqiao @yuanboliu I would like to ask, RM_DELEGATIONTOKEN, this TOKEN is generated by RM, but I don't see the logic to verify this TOKEN,can you give some advice? |
|
Sorry for the late reply |
|
@slfan1989 As for the verification, you can refer it to RMDelegationTokenSecretManager which extends from AbstractDelegationTokenSecretManager checkToken. |
|
For those who's interesting in this patch: |
| if (identifier == null) { | ||
| return false; | ||
| } | ||
| expiredTime.set(identifier.getMaxDate()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be identifier.getIssueDate()
|
We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
…d to RM
Description of PR
When auth is enabled by NameNode, then delegation token is required if application needs to acess files/directoies. We find that when app is first submitted to RM, RM renewer will renew app token no matter whether token is expired or not. Renewing token is a bit heavy since it uses global write lock. Here is the result when delegation token is required in a very busy cluster.
How was this patch tested?
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?