-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18610. ABFS OAuth2 Token Provider support for Azure Workload Identity #5953
Conversation
🎊 +1 overall
This message was automatically generated. |
...-azure/src/main/java/org/apache/hadoop/fs/azurebfs/oauth2/WorkloadIdentityTokenProvider.java
Outdated
Show resolved
Hide resolved
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/oauth2/AzureADAuthenticator.java
Outdated
Show resolved
Hide resolved
Add tests around the code piece added. |
@anmolanmol1234 - Thank you for the review. I added string constants. Regarding tests, would you mind providing some guidance on what you are expecting? I added this documentation showing how the code changes can be tested using integration tests. I ran the integration tests and copied the output in the merge request description above. If you're expecting unit tests instead of integration tests, then I would appreciate specific guidance on how that can be accomplished. This pull request includes changes to these existing classes:
This pull request adds a new WorkloadTokenIdentityProvider class that extends
Given the lack of unit tests in existing code, is it reasonable to conclude the new integration test documentation and test runs are sufficient in this case? |
🎊 +1 overall
This message was automatically generated. |
May I ask why this PR still not merged? just curious |
@creste Is there any update on this PR? It is something I ideally need too and having similar pain points |
@steven13cooper - I'm currently waiting on @anmolanmol1234 or another reviewer to approve the PR or at least provide guidance on how to add more tests. |
Hi @steveloughran and @saxenapranav, could you please help review this PR? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add some tests around the change.
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/oauth2/AzureADAuthenticator.java
Outdated
Show resolved
Hide resolved
return expiring; | ||
} | ||
|
||
private static String getClientAssertion(String tokenFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason for having it static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method doesn't use any member variables so I made it static. I can remove static
if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not have it static, reason being, it is being called only from single non-static method. We can remove the tokenFile
argument and let it depend on the object field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, reading will be done for each refresh call, let call it from the constructor itself. This will be better as: if the file read fails for any reason, it will be raised on the FileSystem creation itself, and also would be saving IO calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not have it static, reason being, it is being called only from single non-static method. We can remove the tokenFile argument and let it depend on the object field.
Will do.
Also, reading will be done for each refresh call, let call it from the constructor itself. This will be better as: if the file read fails for any reason, it will be raised on the FileSystem creation itself, and also would be saving IO calls.
Doing that will introduce a bug where the ABFS driver will successfully authenticate the first time when getClientAssertion
is called from the constructor, but will fail during subsequent invocations of getClientAssertion
because the token in the file is refreshed out-of-band by Azure. Subsequent invocations of refreshToken
will return the token that was cached when the constructor was invoked instead of reading the new token that was written to tokenFile
by Azure.
For your reference, this WorkloadIdentityTokenProvider
class has logic that is similar to the WorkloadIdentityCredential class in Microsoft's Azure Identity SDK for Java. That class stores the path to the token file (federatedTokenFilePathInput) in the constructor and reads from the token file every time a new token is requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am understanding correctly, there is some process on the azure vm which refreshes this file? Would be great if you can please cite the reference and also if we could add as a javadoc for the method getClientAssertion
.
As, it is important that the tokenFile is not cached, lets add a test around this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is some process on the azure vm which refreshes this file?
Yes.
please cite the reference
I added the citation to the javadoc for getClientAssertion
.
As, it is important that the tokenFile is not cached, lets add a test around this as well.
The only way I see to verify the token file is not cached is to somehow mock the call to AzureADAuthenticator.getTokenUsingJWTAssertion()
in WorkloadIdentityTokenProvider.refreshToken()
, which is problematic. Please see my comments here regarding my thoughts on mocking it. If we can identify a way to mock that code then I can add a unit test to verify the token file is not cached. Otherwise, I don't see how to implement that unit test.
...-azure/src/main/java/org/apache/hadoop/fs/azurebfs/oauth2/WorkloadIdentityTokenProvider.java
Outdated
Show resolved
Hide resolved
@saxenapranav - Please see my thoughts on tests here. It is not clear to me how to add tests to the code. Any guidance you could provide would be greatly appreciated. |
🎊 +1 overall
This message was automatically generated. |
/** | ||
* Provides tokens based on Azure AD Workload Identity. | ||
*/ | ||
public class WorkloadIdentityTokenProvider extends AccessTokenProvider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add tests around this class. Following is something around which we can write tests which can help prevent regressions in future:
- how refreshing the token work.
- We can have a protected method
getTokenTtl()
which on production code would giveONE_HOUR
, but in test, we can mock it as per the test requirement. - We can mock the external call, the super call.
- We can have a protected method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the guidance. I implemented several unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, we have added tests around hasEnoughTimeElapsedSinceLastRefresh
. Was expecting if token refreshing could also be tested. And, also we could see refreshing on a upper layer (still an unit test):
- We have AccessTokenProvider object (instance of WorkloadIdentityTokenProvider ), and we could call
AccessTokenProvider
's methodgetToken()
, and then test different scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the request, but I don't see a reasonable way to mock the code for a unit test of AccessTokenProvider
. Note that AccessTokenProvider.getToken()
calls WorkloadIdentityProvider.refreshToken()
, which calls AzureADAuthenticator.getTokenUsingJWTAssertion()
, which is a static method that eventually makes HTTP requests. That is a problem because:
- All unit tests currently use Mockito version 2.28.2, which does not support mocking static methods.
- The TestAzureADAuthenticator unit tests do not show how to mock the HTTP requests made by
AzureADAuthenticator
.
Without a way to mock the calls made by WorkloadIdentityProvider.refreshToken()
or AzureADAuthenticator.getTokenUsingJWTAssertion()
, all unit tests will try to make real HTTP requests which will always fail.
Do you have any ideas on how to work around the limitations of the code to implement the desired unit test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we dont need to mock the static method. What can be done is:
we have a method: getTokenUsingJWTAssertion()
which calls AzureADAuthenticator .getTokenUsingJWTAssertion(authEndpoint, clientId, clientAssertion);
. Now, this new method is mockable, and in the test, we can give the required behavior.
Also, the methods which are mockable from the test, we add an annotation VisibleForTesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the help. I added more unit tests per your instructions.
Thanks for taking the comments. Have added my thought-process for the test. |
@saxenapranav - Thank you for the guidance. I added unit tests and addressed all other feedback. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@creste, thanks for taking the feedback. Have added some more comments. Thanks.
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@creste , shared my thoughts around how we can mock it.
/** | ||
* Provides tokens based on Azure AD Workload Identity. | ||
*/ | ||
public class WorkloadIdentityTokenProvider extends AccessTokenProvider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we dont need to mock the static method. What can be done is:
we have a method: getTokenUsingJWTAssertion()
which calls AzureADAuthenticator .getTokenUsingJWTAssertion(authEndpoint, clientId, clientAssertion);
. Now, this new method is mockable, and in the test, we can give the required behavior.
Also, the methods which are mockable from the test, we add an annotation VisibleForTesting
This behaviour of the AAD workload identity is not well documented, but the AAD workload identity webhook injects the following env variables into pods. https://azure.github.io/azure-workload-identity/docs/quick-start.html#7-deploy-workload
I think it is better to make the following config properties optional and use values provided by the env variables above by default.
These values provided by Hadoop config are redundant in general and potentially cause inconsistency with the correct properties of federated identities. |
you can use env var resolution within a hadoop core-site file; which lets you at the values with defaults when unset. on locked down config loading (oozie etc) then only the default is valid.
so: no need to add explicit resolution, just document or set as default. example, s3a uses temp dirs in yarn containers automatically. <property>
<name>fs.s3a.buffer.dir</name>
<value>${env.LOCAL_DIRS:-${hadoop.tmp.dir}}/s3a</value>
<description>Comma separated list of directories that will be used to buffer file
uploads to.
Yarn container path will be used as default value on yarn applications,
otherwise fall back to hadoop.tmp.dir
</description>
</property> |
@sugibuchi @steveloughran - I added the Azure environment variables to the hadoop config documented in the readme. Please take a look when you get a chance. |
@steveloughran @creste
About the description of the auth method:
This is not precise. The token files injected by the AAD workload identity webhook are files of "projected service account tokens" issued by Kubernetes clusters. They are not OAuth2 access tokens for accessing Azure resources. https://azure.github.io/azure-workload-identity/docs/introduction.html#how-it-works I propose to update the description of this auth method like:
|
@sugibuchi - Thank you for the additional comments.
The current descriptions of the properties were copied from other parts of the README. For example, see the property descriptions for MSITokenProvider. @steveloughran or @anmolanmol1234 - what descriptions should the README use for those properties?
I have no preference, but since this text was also based on other descriptions in the README I would appreciate input from a maintainer before making the change. @steveloughran or @anmolanmol1234 - any thoughts on this? |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented; use of env vars is good.
not knowledgeable about the docs to review them
import org.apache.hadoop.fs.azurebfs.AbstractAbfsTestWithTimeout; | ||
import org.junit.Test; | ||
import org.mockito.Mockito; | ||
|
||
import java.io.File; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this block of imports should go up first
new WorkloadIdentityTokenProvider(AUTHORITY, TENANT_ID, CLIENT_ID, tokenFile.getPath())); | ||
Mockito.doReturn(azureAdToken) | ||
.when(tokenProvider).getTokenUsingJWTAssertion(TOKEN); | ||
assertEquals(azureAdToken, tokenProvider.getToken()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use assertJ assertions.
Mockito.doReturn(azureAdToken) | ||
.when(tokenProvider).getTokenUsingJWTAssertion(TOKEN); | ||
assertEquals(azureAdToken, tokenProvider.getToken()); | ||
assertTrue("token fetch time was not set correctly", tokenProvider.getTokenFetchTime() > startTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use AssertJ especially here.
make test >= so that if the start time and load happens in same millisecond by clock granularity, no test failure
new WorkloadIdentityTokenProvider(AUTHORITY, TENANT_ID, CLIENT_ID, tokenFile.getPath())); | ||
Mockito.doReturn(azureAdToken) | ||
.when(tokenProvider).getTokenUsingJWTAssertion(TOKEN); | ||
boolean exception = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use LambdaTestUtils.intercept() here, ideally looking for the error string as well as exception class.
*/ | ||
@InterfaceAudience.Private | ||
@InterfaceStability.Evolving | ||
package org.apache.hadoop.fs.azurebfs.oauth2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to worry about package files in test modules...is yetus complaining about it?
FS_AZURE_ACCOUNT_OAUTH_MSI_AUTHORITY, | ||
AuthConfigurations.DEFAULT_FS_AZURE_ACCOUNT_OAUTH_MSI_AUTHORITY); | ||
authority = appendSlashIfNeeded(authority); | ||
String tenantId = getPasswordString(FS_AZURE_ACCOUNT_OAUTH_MSI_TENANT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always good to trim this so if someone splits a value with newlines it is trimmed properly
Any progress on this? |
Hi @creste , there is a need for this change to merge to trunk. Can you please share the plan on completing this PR. Please let me know if I can help you anyway to address the review comments to get this change in. Thanks. |
@snvijaya - My team is no longer using Hadoop and has moved on to another project so I am unable to commit to completing this PR. Feel free to address the feedback and make any changes needed to get this merged. |
Thanks @creste . @anujmodi2021 Will pick this up. |
Hi @steveloughran, @cthtrifork Addressed all the pending comments here. Thanks a lot. |
Description of PR
Add support for Azure Active Directory (Azure AD) workload identities which integrate with the Kubernetes's native capabilities to federate with any external identity provider.
This PR is based on Haifeng Chen's patch attached to HADOOP-18610. I fixed a few typos and linter errors but did not modify the core functionality.
How was this patch tested?
New ABFS OAuth test configuration added for WorkloadIdentityTokenProvider. Complete test suite was run against Azure Blob Storage in Central US region.
:::: AGGREGATED TEST RESULT ::::
HNS-OAuth
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 4
[INFO] Results:
[INFO]
[WARNING] Tests run: 587, Failures: 0, Errors: 0, Skipped: 99
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAbfsTerasort.test_110_teragen:244->executeStage:206 » TestTimedOut test t...
[INFO]
[ERROR] Tests run: 339, Failures: 0, Errors: 1, Skipped: 56
HNS-SharedKey
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[ERROR] Errors:
[ERROR] TestExponentialRetryPolicy.testThrottlingIntercept:106 » KeyProvider Failure t...
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 1, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 587, Failures: 0, Errors: 0, Skipped: 68
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAbfsTerasort.test_110_teragen:244->executeStage:206 » TestTimedOut test t...
[INFO]
[ERROR] Tests run: 339, Failures: 0, Errors: 1, Skipped: 43
NonHNS-SharedKey
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[ERROR] Errors:
[ERROR] TestExponentialRetryPolicy.testThrottlingIntercept:106 » KeyProvider Failure t...
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 1, Skipped: 11
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] ITestAzureBlobFileSystemCheckAccess.testCheckAccessForAccountWithoutNS:181 Expecting org.apache.hadoop.security.AccessControlException with text "This request is not authorized to perform this operation using this permission.", 403 but got : "void"
[INFO]
[ERROR] Tests run: 587, Failures: 1, Errors: 0, Skipped: 277
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAbfsTerasort.test_110_teragen:244->executeStage:206 » TestTimedOut test t...
[INFO]
[ERROR] Tests run: 339, Failures: 0, Errors: 1, Skipped: 46
AppendBlob-HNS-OAuth
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
[INFO]
[ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 4
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] ITestAzureBlobFileSystemLease.testTwoWritersCreateAppendNoInfiniteLease:177->twoWriters:165 » AbfsRestOperation
[INFO]
[ERROR] Tests run: 587, Failures: 0, Errors: 1, Skipped: 99
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] ITestAbfsStreamStatistics.testAbfsStreamOps:140->Assert.assertTrue:42->Assert.fail:89 The actual value of 99 was not equal to the expected value
[ERROR] Errors:
[ERROR] ITestAbfsTerasort.test_110_teragen:244->executeStage:206 » TestTimedOut test t...
[INFO]
[ERROR] Tests run: 339, Failures: 1, Errors: 1, Skipped: 80
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?