-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-31656][runtime][security] Obtain delegation tokens early to support external file system usage in HA services #22298
Conversation
@@ -213,6 +213,20 @@ public void obtainDelegationTokens(DelegationTokenContainer container) throws Ex | |||
LOG.info("Delegation tokens obtained successfully"); | |||
} | |||
|
|||
@Override | |||
public void obtainDelegationTokens() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to the immediately propagate the existing tokens (fetched here) to Task managers once start function is called. Otherwise, I see No credential error in task manager. I guess it is due to the latency of additional token fetching in startTokenUpdate function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the one-time token obtain stage there are no task managers but I see that there is an issue so I'll take a look...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just added further changes to solve the TM issue. We should test it on cluster in-depth because that's modifying Flink's critical path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HuangZhenQiu please confirm back that you see working cluster tests on your side just to double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gaborgsomogyi Sorry for late reply. The feature is fully tested end to end.
f3d83f1
to
9daabcd
Compare
I've just fixed the unit tests + changed the title and description. The cluster tests are green. |
// Obtaining delegation tokens and propagating them to the local JVM receivers in a | ||
// one-time fashion is required because BlobServer may connect to external file | ||
// systems | ||
delegationTokenManager.obtainDelegationTokens(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't seem to find any test for this new behaviour, would be good to add something to guard against accidental regressions in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added that we obtain tokens in ClusterEntrypointTest
but it would be overkill to check that token obtain happens before HA services.
…pport external file system usage in blob server
9daabcd
to
6466d49
Compare
What is the purpose of the change
At the moment there are no delegation tokens available when HA services is starting. If the HA services uses an external file system where the authentication type is delegation token based (typically S3) then it throws and exception since there are no credentials.
In this PR I've moved the delegation token manager initialization before HA services and trigger a manual token obtain + local JVM receiver propagation. Additionally deferred base directory creation in
FileSystemBlobStore
andFileSystemJobResultStore
.Brief change log
FileSystemBlobStore
andFileSystemJobResultStore
. This is the solution for the task manager side.DelegationTokenManager
API documentation for better clarityVerifying this change
Manually on cluster.
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: yesDocumentation