-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-25907][runtime][security] Add pluggable delegation token manager #18664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 20cf691 (Tue Feb 08 13:02:17 UTC 2022) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
ade4165
to
323538b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @gaborgsomogyi, I think this goes in a right direction. My main concern is about how / where the DTM is instantiated. Also it would be good to hide the implementation behind an interface for easier testing.
|
||
private static final Logger LOG = LoggerFactory.getLogger(HadoopDependency.class); | ||
|
||
public static boolean isHadoopCommonOnClasspath(ClassLoader classLoader) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: javadoc
I really like this, that class not found exception was confusing users for ages :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add javadoc to crystal clear functions but I think this is just waste of java compiler time.
If we have an agreement to add such then I would add it of course...
flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java
Outdated
Show resolved
Hide resolved
new ResourceManagerException("Error while shutting down resource manager", e); | ||
} | ||
|
||
delegationTokenManager.ifPresent(DelegationTokenManager::stop); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to handle exceptions here as with the other services?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding it but it would be good to understand that do heartbeat services need to do the same or they're somehow different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My gut feeling would be that they should follow the same pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've added the suggestion and we can discuss whether we want to add the mentioned change in a separate PR for the old code.
flink-runtime/src/main/java/org/apache/flink/runtime/security/token/DelegationTokenManager.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/main/java/org/apache/flink/runtime/security/token/DelegationTokenManager.java
Outdated
Show resolved
Hide resolved
...t/java/org/apache/flink/runtime/security/token/ExceptionThrowingDelegationTokenProvider.java
Outdated
Show resolved
Hide resolved
...k-runtime/src/main/java/org/apache/flink/runtime/security/token/DelegationTokenProvider.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/main/java/org/apache/flink/runtime/security/token/DelegationTokenManager.java
Outdated
Show resolved
Hide resolved
DelegationTokenManager delegationTokenManager = | ||
new DelegationTokenManager(flinkConfiguration); | ||
delegationTokenManager.obtainDelegationTokens(credentials); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This makes me think whether DTM should be Closeable
instead of having stop()
method. I know that start / stop
methods are intended for the renewal, but it feels like the obtainDelegationTokens
will also need to create a resource for pulling the token out of KDC that needs be closed 🤔
DelegationTokenManager delegationTokenManager = | |
new DelegationTokenManager(flinkConfiguration); | |
delegationTokenManager.obtainDelegationTokens(credentials); | |
try (final DelegationTokenManager delegationTokenManager = | |
new DelegationTokenManager(flinkConfiguration) { | |
delegationTokenManager.obtainDelegationTokens(credentials); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obtainDelegationTokens
obtains tokens but these are mustn't be invalidated if you mean that.
start/stop
just starts and stops automatic re-obtain and propagation. Not yet see why stopping this needs to be called close. In this case close
would be no-op because manual obtain is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant, that it probably needs to open some kind of kerberos client underneath that needs to be closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, UGI is used which doesn't need to be closed.
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
Outdated
Show resolved
Hide resolved
Oh gosh, resolving the conflict... |
@flinkbot run azure |
@dmvk Thanks for the quick review. Hope this round of azure is going to pass. |
@gaborgsomogyi Thanks for updating the PR. Sure will do, but right now the team is really busy with the feature freeze, so I can get back to this after that. In general I think it will be better to start merging DTM related PRs into 1.16, so we don't release something that's not yet complete that might confuse users. |
Sure, I know OSP is not lightning fast at some points, just wanted to say I'm not touching the code until further discussion.
I've planned this feature to be done there, so same side. |
ad4d3eb
to
9c138ee
Compare
@flinkbot run azure |
@dmvk since there was a branch cut may I ask to have another round please? |
Hi @gaborgsomogyi, I'm on vacation until 22nd; maybe @zentol or @XComp could take a look instead? |
Have fun then :) |
@dmvk hope you've had fun, recharged and I would like to ask then to have another round please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, thanks for the contribution Gabor 👍 I've added few cosmetic comments, PTAL. Could you please fix the commit history, so it's ready for merge (I guess this could be done in a single commit)?
|
||
@Override | ||
public void obtainDelegationTokens(Credentials credentials) { | ||
LOG.debug("obtainDelegationTokens"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to keep this debug messages? Personally I'd be in favor of removing them as this could cause a confusion when user enables the debug logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously I was trying to debug such a framework w/o such messages and it was complete nightmare. Nobody knew what started/happened when. Maybe these can be more descriptive but I'm pretty sure that if the system is not doing what we want then something is needed.
...time/src/test/java/org/apache/flink/runtime/security/token/DelegationTokenConverterTest.java
Show resolved
Hide resolved
*/ | ||
public class ExceptionThrowingDelegationTokenProvider implements DelegationTokenProvider { | ||
|
||
public static volatile boolean enabled = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big fan of this approach as it's fairly fragile and the tests that are using the class can not be parallelized. 🤔
It's probably ok-ish as long as this is used only by a single test class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree that it's ugly but I haven't had better idea which is not an overkill. If this would be used from multiple places then something different is needed for sure.
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
Outdated
Show resolved
Hide resolved
Do you mean squash all commits in the branch into a single one? Seems like there are things to learn here for me. When I've squashed commits on github then the comments are blown up all the time. I'm basically fine with that just wanted to ask before I do something. |
eb0db2a
to
3f85e9c
Compare
I've squashed everything into a single commit. |
Unrelated issue: |
3f85e9c
to
6dea854
Compare
Finally passed. |
@dmvk thanks for the review and taking care! New PRs are on the way... |
@gaborgsomogyi Hello, I am currently using Flink version 1.15.2 and have encountered an issue with the HDFS delegation token expiring after 7 days in a Kerberos scenario. |
The token framework stores everything in-memory and it's not compatible with My suggestion is to either use Flink delegation token framework or use external token management which is for example I encourage you to either use the mailing list or slack. |
@gaborgsomogyi Hello,Thank you for your response. |
What is the purpose of the change
It adds a pluggable delegation token framework to Flink. From high level perspective delegation token framework is loaded in all deployment modes when:
security.kerberos.fetch.delegation-token
istrue
hadoop-common
dependency is on classpathPlease note that this PR is not adding the whole feature, there are several subtasks in FLINK-21232 which needs to be solved as well.
Brief change log
DelegationTokenManager
which is loaded inClusterEntrypoint
DelegationTokenProvider
APIDelegationTokenProvider
implementations are loaded byDelegationTokenManager
which is covered in unit testsYarnClusterDescriptor
Verifying this change
security.kerberos.fetch.delegation-token=true
+hadoop-common
dependency is on classpathsecurity.kerberos.fetch.delegation-token=false
+hadoop-common
dependency is on classpathsecurity.kerberos.fetch.delegation-token=true
+hadoop-common
dependency is NOT on classpathsecurity.kerberos.fetch.delegation-token=false
+hadoop-common
dependency is NOT on classpathDoes this pull request potentially affect one of the following parts:
@Public(Evolving)
: yesDocumentation