Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ExpiredAuthenticationToken happened when deleting resource groups #33112

Open
3 tasks
waynewang1989 opened this issue Jan 19, 2023 · 19 comments
Open
3 tasks
Assignees
Labels
AAD customer-reported Issues that are reported by GitHub users external to the Azure organization. Mgmt This issue is related to a management-plane library. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team.

Comments

@waynewang1989
Copy link

Describe the bug
"code":"ExpiredAuthenticationToken","message":"The access token expiry UTC time '1/18/2023 5:10:42 PM' is earlier than current UTC time '1/18/2023 5:24:09 PM'."}}": The access token expiry UTC time '1/18/2023 5:10:42 PM' is earlier than current UTC time '1/18/2023 5:24:09 PM'

Exception or Stack Trace
The access token expiry UTC time '1/18/2023 5:10:42 PM' is earlier than current UTC time '1/18/2023 5:24:09 PM'."}}": The access token expiry UTC time '1/18/2023 5:10:42 PM' is earlier than current UTC time '1/18/2023 5:24:09 PM

To Reproduce
Deleting resource group encountering this issue once.

Code Snippet
azureResourceManager.resourceGroups().deleteByName(name)

Expected behavior
No access token issue populated.

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • Bug Description Added
  • Repro Steps Added
  • Setup information Added
@ghost ghost added needs-triage This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jan 19, 2023
@waynewang1989 waynewang1989 changed the title [BUG] [BUG] ExpiredAuthenticationToken happened when deleting resource groups Jan 19, 2023
@joshfree joshfree added ARM Mgmt This issue is related to a management-plane library. labels Jan 19, 2023
@ghost ghost removed the needs-triage This is a new issue that needs to be triaged to the appropriate team. label Jan 19, 2023
@joshfree
Copy link
Member

@weidongxu-microsoft could you please follow up with @waynewang1989

@weidongxu-microsoft
Copy link
Member

weidongxu-microsoft commented Jan 20, 2023

@XiaofeiCao for investigation. It might relate to or caused by azure-identity. azure-identity should have refreshed the token 5min to 10min before expiration time.

@XiaofeiCao
Copy link
Contributor

XiaofeiCao commented Jan 30, 2023

@waynewang1989 Which version of azure-resourcemanager and azure-identity are you using?

@waynewang1989
Copy link
Author

@XiaofeiCao, Please check:
com.azure.resourcemanager.version: 2.22.0
com.azure.identity.version: 1.7.3

@XiaofeiCao
Copy link
Contributor

Hi @waynewang1989 , I tried expiring the token, but current SDK successfully refreshed it.

To help nagivate where the bug could be, I went through the auth flow in our SDK and summarized a very simple version of it:

flowchart LR
    A[HttpPipeline] --> B{Local token cache exists?}
    B -- Yes --> C{Should refresh cache?}
    C -- Yes --> D[Get token and cache it]
    C -- No ----> E
    D --> E[Execute request using token]
    E --> F[Get response]
    B -- No ----> D

And in ShouldRefreshCache, the logic can be simplified to:

// compares local time to the time of expiry minus REFRESH_OFFSET(5min)
OffsetDateTime.now()
            .isAfter(accessToken.getExpiresAt().minus(REFRESH_OFFSET)

There maybe a chance that your local time deviates from the server, e.g. your local time is 5:00 pm while server is 5:24, and the token expire time is 5:10.

We can first starting by eliminating this possibility. Could you also log the time the exception is thrown? You can use the following code:

AzureResourceManager azureResourceManager = AzureResourceManager
        .configure()
        .withLogOptions(new HttpLogOptions()
            .setLogLevel(HttpLogDetailLevel.BODY_AND_HEADERS)
            .setResponseLogger((logger, loggingOptions) -> {
                final HttpResponse response = loggingOptions.getHttpResponse();

                String contentLengthString = response.getHeaderValue("Content-Length");
                String bodySize = (CoreUtils.isNullOrEmpty(contentLengthString))
                    ? "unknown-length body"
                    : contentLengthString + "-byte body";

                StringBuilder responseLogMessage = new StringBuilder();
                responseLogMessage
                     // log the time of the response
                    .append("[")
                    .append(OffsetDateTime.now().format(DateTimeFormatter.RFC_1123_DATE_TIME))
                    .append("]")
                    .append("<-- ")
                    .append(response.getStatusCode())
                    .append(" ")
                    .append(response.getRequest().getUrl())
                    .append(" (")
                    .append(loggingOptions.getResponseDuration().toMillis())
                    .append(" ms, ")
                    .append(bodySize)
                    .append(")")
                    .append(System.lineSeparator());

                HttpResponse bufferedResponse = response.buffer();
                ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
                WritableByteChannel bodyContentChannel = Channels.newChannel(outputStream);
                return bufferedResponse.getBody()
                    .flatMap(byteBuffer -> {
                        try {
                            bodyContentChannel.write(byteBuffer.duplicate());
                            return Mono.just(byteBuffer);
                        } catch (IOException ex) {
                            return Mono.error(ex);
                        }
                    })
                    .doFinally(ignored -> {
                        responseLogMessage.append(", Response body:")
                            .append(System.lineSeparator())
                            .append(outputStream.toString(StandardCharsets.UTF_8))
                            .append(System.lineSeparator())
                            .append(" <-- END HTTP");

                        logger.info(responseLogMessage.toString());

                    }).then(Mono.just(bufferedResponse));

            })
        )
        .authenticate(credential, profile)

@waynewang1989
Copy link
Author

Thanks, Xiaofei,

This method could generate too many logs which are too heavy for the logging system.

All our services are deployed in Azure AKS clusters and I checked that the time is synced. I don't think there exists any difference between the Azure compute service and the k8s pods.

CMIIAW, I think from the Azure SDK client side,

  1. The access token should be generated and verified to Azure service and the time should be the Azure service time, which is identical.
  2. Once we encountered the issue of the access token expires, Azure SDK should be able to re-authenticate using the credentials to get the new access token at least once before throwing out this exception.

One question:
Do we have a public interface or method that we could use to manually refresh the access token in our service code if we failed the methods above?

@XiaofeiCao
Copy link
Contributor

Thanks @waynewang1989 for the confirmation and proposal, they makes total sense to me.

We may not rush to the retry logic before we get to know the root cause of the issue. There is even a chance that service backend return an expired token.

As to your question, I haven't found a way to refresh the token myself... Will get back to you once I do.

@wangwenbj
Copy link

Thanks, @XiaofeiCao Looking forward to hearing from you!

@XiaofeiCao
Copy link
Contributor

Hi @waynewang1989 ,@wangwenbj ,
I'm afraid there's currently no way to refresh the token manually, since AccessTokenCache resides in azure-identity which acts like a blackbox to us.

You could configure SDK to retry on ExpiredAuthenticationToken situation and log necessary information during each retry(this should only be used for debug purpose):

// configure AzureResourceManager
AzureResourceManager
    .configure()
    .withLogOptions(new HttpLogOptions().setLogLevel(HttpLogDetailLevel.BODY_AND_HEADERS))
    .withHttpClient(httpClient)
    // retry policy
    .withRetryPolicy(new RetryPolicy(new ExponentialBackoff() {
        @Override
        public boolean shouldRetry(HttpResponse httpResponse) {
            boolean isExpiredToken = isExpiredAuthenticationToken(httpResponse);
            if (isExpiredToken) {
                // Do some log here
                LOGGER.error("Token expired. \nMessage: {}.\nCurrent UTC time: {}",
                        httpResponse.getBodyAsString().block(),
                        OffsetDateTime.now().atZoneSameInstant(ZoneOffset.UTC)
                                .format(DateTimeFormatter.ofPattern("d/M/yyyy h:mm:ss a")));
            } else if (httpResponse.getHeaderValue(WWW_AUTHENTICATE) != null) {
                // in case a Conditional Access policy change, log it:
                // https://learn.microsoft.com/en-us/azure/active-directory/conditional-access/concept-continuous-access-evaluation
                LOGGER.warning("Conditional Access state changed, header: {}", httpResponse.getHeaderValue(WWW_AUTHENTICATE));
            }
            return super.shouldRetry(httpResponse)
                    || isExpiredToken;
        }
    }))

// check
private boolean isExpiredAuthenticationToken(HttpResponse httpResponse) {
    return
            // 401
            httpResponse.getStatusCode() == HttpURLConnection.HTTP_UNAUTHORIZED
            // no ARM Challenge present
            && httpResponse.getHeaderValue("WWW-Authenticate") == null
            // contains error code "ExpiredAuthenticationToken"
            && httpResponse.getBodyAsBinaryData() != null
            && httpResponse.getBodyAsBinaryData().toString().contains("ExpiredAuthenticationToken");
}

Mock test with retry: https://github.com/XiaofeiCao/ioexception_repro/blob/main/src/test/java/com/azure/resourcemanager/repro/ioexception/test/expiredtoken/LongRunningOperationTokenExpiredTests.java#L96

Mock test to reproduce with 100 parallelism but failed:
https://github.com/XiaofeiCao/ioexception_repro/blob/main/src/test/java/com/azure/resourcemanager/repro/ioexception/test/expiredtoken/LongRunningOperationTokenExpiredTests.java#L49

@wangwenbj
Copy link

wangwenbj commented Mar 1, 2023 via email

@XiaofeiCao
Copy link
Contributor

Hi wen,

Let's wait for more logs from above retry. Thing is, if service decides to return an expired token to us, manual or auto refresh from SDK is meaningless.

@wangwenbj
Copy link

wangwenbj commented Mar 2, 2023 via email

@ghost ghost added the needs-team-attention This issue needs attention from Azure service team or SDK team label Mar 13, 2023
@ghost
Copy link

ghost commented Mar 13, 2023

Thank you for your feedback. This has been routed to the support team for assistance.

@navba-MSFT navba-MSFT added Service Attention This issue is responsible by Azure service team. and removed CXP Attention labels Mar 17, 2023
@ghost
Copy link

ghost commented Mar 17, 2023

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @armleads-azure.

Issue Details

Describe the bug
"code":"ExpiredAuthenticationToken","message":"The access token expiry UTC time '1/18/2023 5:10:42 PM' is earlier than current UTC time '1/18/2023 5:24:09 PM'."}}": The access token expiry UTC time '1/18/2023 5:10:42 PM' is earlier than current UTC time '1/18/2023 5:24:09 PM'

Exception or Stack Trace
The access token expiry UTC time '1/18/2023 5:10:42 PM' is earlier than current UTC time '1/18/2023 5:24:09 PM'."}}": The access token expiry UTC time '1/18/2023 5:10:42 PM' is earlier than current UTC time '1/18/2023 5:24:09 PM

To Reproduce
Deleting resource group encountering this issue once.

Code Snippet
azureResourceManager.resourceGroups().deleteByName(name)

Expected behavior
No access token issue populated.

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • Bug Description Added
  • Repro Steps Added
  • Setup information Added
Author: waynewang1989
Assignees: XiaofeiCao
Labels:

question, ARM, Service Attention, Mgmt, customer-reported, needs-team-attention

Milestone: -

@navba-MSFT
Copy link

@armleads-azure Could you please look into this once you get a chance ? Thanks in advance.

cc @jennyhunter-msft @josephkwchan

@pilor
Copy link

pilor commented Mar 17, 2023

@navba-MSFT ARM doesn't generate auth tokens, AAD does. If the assumption here is that an expired JWT token is being provided to a user then AAD/identity is the right service contact. I don't know how they are represented in github.

@navba-MSFT navba-MSFT added AAD and removed ARM Service Attention This issue is responsible by Azure service team. needs-team-attention This issue needs attention from Azure service team or SDK team labels Mar 17, 2023
@ghost
Copy link

ghost commented Mar 17, 2023

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @adamedx.

Issue Details

Describe the bug
"code":"ExpiredAuthenticationToken","message":"The access token expiry UTC time '1/18/2023 5:10:42 PM' is earlier than current UTC time '1/18/2023 5:24:09 PM'."}}": The access token expiry UTC time '1/18/2023 5:10:42 PM' is earlier than current UTC time '1/18/2023 5:24:09 PM'

Exception or Stack Trace
The access token expiry UTC time '1/18/2023 5:10:42 PM' is earlier than current UTC time '1/18/2023 5:24:09 PM'."}}": The access token expiry UTC time '1/18/2023 5:10:42 PM' is earlier than current UTC time '1/18/2023 5:24:09 PM

To Reproduce
Deleting resource group encountering this issue once.

Code Snippet
azureResourceManager.resourceGroups().deleteByName(name)

Expected behavior
No access token issue populated.

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • Bug Description Added
  • Repro Steps Added
  • Setup information Added
Author: waynewang1989
Assignees: XiaofeiCao
Labels:

question, AAD, Service Attention, Mgmt, customer-reported, needs-team-attention

Milestone: -

@ghost ghost added the needs-team-triage This issue needs the team to triage. label Mar 17, 2023
@navba-MSFT navba-MSFT added Service Attention This issue is responsible by Azure service team. needs-team-attention This issue needs attention from Azure service team or SDK team and removed needs-team-triage This issue needs the team to triage. labels Mar 17, 2023
@ghost
Copy link

ghost commented Mar 17, 2023

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @adamedx.

Issue Details

Describe the bug
"code":"ExpiredAuthenticationToken","message":"The access token expiry UTC time '1/18/2023 5:10:42 PM' is earlier than current UTC time '1/18/2023 5:24:09 PM'."}}": The access token expiry UTC time '1/18/2023 5:10:42 PM' is earlier than current UTC time '1/18/2023 5:24:09 PM'

Exception or Stack Trace
The access token expiry UTC time '1/18/2023 5:10:42 PM' is earlier than current UTC time '1/18/2023 5:24:09 PM'."}}": The access token expiry UTC time '1/18/2023 5:10:42 PM' is earlier than current UTC time '1/18/2023 5:24:09 PM

To Reproduce
Deleting resource group encountering this issue once.

Code Snippet
azureResourceManager.resourceGroups().deleteByName(name)

Expected behavior
No access token issue populated.

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • Bug Description Added
  • Repro Steps Added
  • Setup information Added
Author: waynewang1989
Assignees: XiaofeiCao
Labels:

question, AAD, Service Attention, Mgmt, customer-reported, needs-team-attention

Milestone: -

@navba-MSFT
Copy link

Adding @adamedx team to look in to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AAD customer-reported Issues that are reported by GitHub users external to the Azure organization. Mgmt This issue is related to a management-plane library. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

7 participants