-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17377: ABFS: MsiTokenProvider doesn't retry HTTP 429/410 from the Instance Metadata Service #5273
base: trunk
Are you sure you want to change the base?
Conversation
|| statusCode == HttpURLConnection.HTTP_GONE | ||
|| statusCode == HTTP_TOO_MANY_REQUESTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep this in AzureADAuthentication condition.
Reason being, now if any API in AbfsHttpOperation get 429 or 410, it will be retried 30 times.
Right now what would happen in 429 / 410:
executeHttpOperation
would give true and in completeExecute after very first call:
if (result.getStatusCode() >= HttpURLConnection.HTTP_BAD_REQUEST) {
throw new AbfsRestOperationException(result.getStatusCode(), result.getStorageErrorCode(),
result.getStorageErrorMessage(), null, result);
}
would throw exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing from ExponentialRetryPolicy, we can have https://github.com/apache/hadoop/pull/5273/files#diff-dff9c93d1668203c206aa1c092aef9d2921dc6e20af8888d06fae34778991531R320-R321 as
!succeeded && isRecoverableFailure
&& (tokenFetchRetryPolicy.shouldRetry(retryCount, httperror)||httpError==429 ||httpError==410);
reason being, this check is only required in ADAuthenticator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We plan to retry for these status codes, if they come as a response from backend as well. Hence adding these into a centralized exponential retry policy class.
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented
hadoop-tools/hadoop-azure/pom.xml
Outdated
@@ -321,8 +321,23 @@ | |||
<dependency> | |||
<groupId>org.mockito</groupId> | |||
<artifactId>mockito-core</artifactId> | |||
<version>4.11.0</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, hadoop-project defines the version, and through properties. revert this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, cut this now; the version in hadoop project is the one you now expect
hadoop-tools/hadoop-azure/pom.xml
Outdated
<dependency> | ||
<groupId>org.mockito</groupId> | ||
<artifactId>mockito-inline</artifactId> | ||
<version>4.11.0</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is new to hadoop, declare it in hadoop-project/pom.xml, with versions and exclusions, then declare here without those
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/oauth2/AzureADAuthenticator.java
Show resolved
Hide resolved
@@ -58,6 +58,13 @@ public class ExponentialRetryPolicy { | |||
*/ | |||
private static final double MAX_RANDOM_RATIO = 1.2; | |||
|
|||
/** | |||
* Qualifies for retry based on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a . in it, maybe split into
"qualifies for retry."
and "see..."
* https://learn.microsoft.com/en-us/azure/active-directory/ | ||
* managed-identities-azure-resources/how-to-use-vm-token#error-handling | ||
*/ | ||
private static final int HTTP_TOO_MANY_REQUESTS = 429; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make public and refer from tests, maybe put in a different file for this
|
||
/** | ||
* Test MsiTokenProvider. | ||
*/ | ||
public final class ITestAbfsMsiTokenProvider | ||
extends AbstractAbfsIntegrationTest { | ||
|
||
private static final int HTTP_TOO_MANY_REQUESTS = 429; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refer to the value in the src/main code
...ools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAbfsMsiTokenProvider.java
Outdated
Show resolved
Hide resolved
...ools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAbfsMsiTokenProvider.java
Show resolved
Hide resolved
HADOOP-17377: Add retry for HTTP 429 and HTTP 410 apache#5273
This comment was marked as outdated.
This comment was marked as outdated.
@anmolanmol1234 still need those (minor) changes -otherwise it is ready to merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh dear, this is a full mockito update now, which is always a PITA.
How about you split out the mockito update into its own JIRA "update mockito to 4.11.0" (and apply the comments i've done on the changes), then make the abfs change depend on it. That way a broader change is visible on its own.
@@ -24,7 +24,6 @@ | |||
import static org.junit.Assert.assertNotNull; | |||
import static org.junit.Assert.assertTrue; | |||
import static org.junit.Assert.fail; | |||
import static org.mockito.Matchers.any; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reinstate so this file doesn't change
@@ -423,7 +423,7 @@ public void testSubclusterDown() throws Exception { | |||
FSNamesystem ns0 = nn0.getNamesystem(); | |||
HAContext nn0haCtx = (HAContext)getInternalState(ns0, "haContext"); | |||
HAContext mockCtx = mock(HAContext.class); | |||
doThrow(new StandbyException("Mock")).when(mockCtx).checkOperation(any()); | |||
doThrow(new StandbyException("Mock")).when(mockCtx).checkOperation(Mockito.any()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return to the existing any() static import
@@ -192,7 +192,7 @@ public void testMountController() throws IOException { | |||
assertTrue("cgroup dir should be cerated", cgroup.mkdirs()); | |||
//Since we enabled (deferred) cgroup controller mounting, no interactions | |||
//should have occurred, with this mock | |||
verifyZeroInteractions(privilegedOperationExecutorMock); | |||
Mockito.verifyNoInteractions(privilegedOperationExecutorMock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use static import for consistency with the others.
@@ -210,7 +210,7 @@ private void assertAllocatedGpus(int gpus, int deniedGpus, | |||
private void assertNoAllocation(GpuAllocation allocation) { | |||
assertEquals(1, allocation.getDeniedGPUs().size()); | |||
assertEquals(0, allocation.getAllowedGPUs().size()); | |||
verifyZeroInteractions(nmStateStore); | |||
Mockito.verifyNoInteractions(nmStateStore); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use static import
hadoop-project/pom.xml
Outdated
@@ -1308,7 +1308,20 @@ | |||
<dependency> | |||
<groupId>org.mockito</groupId> | |||
<artifactId>mockito-core</artifactId> | |||
<version>2.28.2</version> | |||
<version>4.11.0</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a new property mockito.version and reference in both places
This comment was marked as outdated.
This comment was marked as outdated.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed tests; propose use/adding methods in the classes to cut back on (1) mockito (2) modifying field access
hadoop-tools/hadoop-azure/pom.xml
Outdated
@@ -321,8 +321,23 @@ | |||
<dependency> | |||
<groupId>org.mockito</groupId> | |||
<artifactId>mockito-core</artifactId> | |||
<version>4.11.0</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, cut this now; the version in hadoop project is the one you now expect
// Mock the tokenFetchRetryPolicy to verify retries. | ||
ExponentialRetryPolicy exponentialRetryPolicy = Mockito.spy( | ||
conf.getOauthTokenFetchRetryPolicy()); | ||
Field tokenFetchRetryPolicy = AzureADAuthenticator.class.getDeclaredField( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this kind of stuff is trouble as it makes maintenance a nightmare; you can't see where the field is access, all you have is a mocking test failing.
proposed: add a static setter to AzureADAuthenticator; mark as @VisibleForTesting.
|
||
// If the status code doesn't qualify for retry shouldRetry returns false and the loop ends. | ||
// It being called multiple times verifies that the retry was done for the throttling status code 429. | ||
Mockito.verify(exponentialRetryPolicy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so ExponentialRetryPolicy.getRetryCount() is there to let you pass a non-mocked policy in and then assert on it. how about using that here? it probably needs making the accessors public, rather than package scoped, but that's all. The less use we make of mockito, the less things will break with every mockito upgrade
💔 -1 overall
This message was automatically generated. |
I think this this PR is great, however there's still one related open problem: the default values (2) for |
I'll go with whatever @saxenapranav thinks here...we have seen this ourselves and need a fix. However, that PR to update mockito bounced, so either
|
The mockito upgrade was needed as part of this PR to mock static methods. So would it be fine if we remove that test method or if not I will attempt to upgrade mockito, including the shaded client. |
Will update this change as an iteration of this PR, but will need some time for the mockito upgrade PR. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, tested for our use case, where throttling caused constant 429 errors, this PR fixed the problem by doing proper retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This depends on mockito-inline to compile? Is this part of the mockito-upgrade? or is there a way to cope without it?
@@ -48,7 +48,7 @@ public final class FileSystemConfigurations { | |||
public static final int DEFAULT_AZURE_OAUTH_TOKEN_FETCH_RETRY_MAX_ATTEMPTS = 5; | |||
public static final int DEFAULT_AZURE_OAUTH_TOKEN_FETCH_RETRY_MIN_BACKOFF_INTERVAL = 0; | |||
public static final int DEFAULT_AZURE_OAUTH_TOKEN_FETCH_RETRY_MAX_BACKOFF_INTERVAL = SIXTY_SECONDS; | |||
public static final int DEFAULT_AZURE_OAUTH_TOKEN_FETCH_RETRY_DELTA_BACKOFF = 2; | |||
public static final int DEFAULT_AZURE_OAUTH_TOKEN_FETCH_RETRY_DELTA_BACKOFF = 2 * 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use 2_000
@@ -323,6 +323,13 @@ | |||
<artifactId>mockito-core</artifactId> | |||
<scope>test</scope> | |||
</dependency> | |||
|
|||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed? because its not in the base project pom.
I would rather this PR doesn't need that mockito upgrade as mockito upgrades are always a painful piece of work which never gets backported.
ABFS: MsiTokenProvider doesn't retry HTTP 429 from the Instance Metadata Service
Resolution for the above mentioned issue where we should enable retries for HTTP error code 429 and HTTP error code 410 based on https://learn.microsoft.com/en-in/azure/virtual-machines/linux/instance-metadata-service?tabs=windows#errors-and-debugging and https://learn.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/how-to-use-vm-token#error-handling
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?