New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrieving key vault secret takes 1m 40s with default credentials #23017
Comments
Interesting! Is it any faster if you use either |
|
I've added logs from one of the slow runs here: https://github.com/rudfoss/keyvault-secret-test/blob/main/logs.txt |
Hey @rudfoss, thanks for the logs and detailed repro! What environment are you running this on? Is it a local machine or somewhere in Azure (a VM, Azure Cloud Shell...)? It appears that DefaultAzureCredential is trying to authenticate with Managed Identity first and is failing. I can't deduce the reason from the logs alone, but maybe the Managed Identity service principal doesn't have the correct permissions to retrieve the secret -- and even if you aren't intending to use Managed Identity, DefaultAzureCredential is trying anyway. Your logs aren't timestamped, but it looks like there are a number of retries of the Managed Identity request. I wonder if the delay between retries is what's causing the slowness? |
Hi @timovv. Thanks for the reply. I'm running the code locally from my machine (win 10, node 16) connecting to a key vault in norway and east us, both set up with RBAC auth and my user assigned as Key Vault admin. I've installed the Azure CLI and Azure PowerShell and I'm signed in to both. I don't have the versions in front of me atm, but I'll add them tomorrow. The authentication eventually works and the code retrieves the secret value. What is strange is that using the exact same or a new default credential with an App Configuration instance works instantly. There seems to be something specific to the key vault or |
What you're describing, with the credential working fine for App Configuration, does strike me as odd. Just to rule it out: when you make the fast call to |
Edit: I did a retest this morning, altering the order and instances.
So it appears there is some caching going on... |
I'm stepping through the client core code atm and it seems there are several requests to:
Which eventually fails with A bit of googling turned up this: Is the client believing that I'm running this in an Azure VM? If so how is it determining this for the key vault client and not the app config client? Edit: Still more debugging reveals the delay happens within the Edit2: The |
Ok, so I think I may have found at least an issue, and I also think it may be indirectly responsible for my problem. The azure-sdk-for-js/sdk/identity/identity/src/credentials/managedIdentityCredential/imdsMsi.ts Line 112 in d63c111
On my machine this results in a request to azure-sdk-for-js/sdk/identity/identity/src/credentials/managedIdentityCredential/imdsMsi.ts Line 159 in d63c111
The I don't understand why In any case, the consequence is that I've not yet uncovered why App Configuration behaves differently from Key Vault secrets, but I'll keep looking. Edit: azure-sdk-for-js/sdk/identity/identity/src/credentials/managedIdentityCredential/imdsMsi.ts Line 159 in d63c111
However, this time the request within Edit2: azure-sdk-for-js/sdk/identity/identity/src/credentials/managedIdentityCredential/imdsMsi.ts Line 150 in d63c111
This timeout is in effect for the App Configuration client because it tries to I haven't really understood the internal implementation here so I don't know exactly why a timeout value would be reused between requests, but that seems to be what's going on here. My short-term, extremely dirty fix for the issue is to duck-punch the With two small helper functions I can minimize the impact and make the credential work the same way for the key vault and for app configurations. const resetTimeoutOnGetToken = (credential: TokenCredential) => {
const original = credential.getToken
credential.getToken = ((arg1, arg2, ...restArgs: any[]) => {
if (arg2?.requestOptions?.timeout === 0) {
delete arg2?.requestOptions.timeout
}
return original.call(credential, arg1, arg2, ...restArgs)
}) as any
return () => {
credential.getToken = original
}
}
const findManagedIdentityCredential = (chainedTokenCredential: ChainedTokenCredential): ManagedIdentityCredential | undefined => {
return chainedTokenCredential["_sources"].find((credential) => credential instanceof ManagedIdentityCredential)
} The above functions allow me to do this before I pass the credential to a client: const cred: any = new DefaultAzureCredential()
const managedCred = findManagedIdentityCredential(cred)
if (managedCred) {
resetTimeoutOnGetToken(managedCred)
} It is in no way clean or correct, but it works for now. I'd love to know why the credential instances behave this way and why the timeout carries over though... It seem a bit weird to me, but I'm sure there is a reason. |
I'll dig more into this on Monday, but since you were so wonderful to have debugged this so extensively, I wanted to point out one special thing about keyvault: namely that it uses challenge based authentication: azure-sdk-for-js/sdk/keyvault/keyvault-common/src/challengeBasedAuthenticationPolicy.ts Line 45 in 480e82c
This causes it to force receiving the challenge by sending a body-less request, which could explain the extra request you are seeing. I'm also deeply suspicious of the fact we're not handling |
Thanks for the reply @xirzec , I didn't mean for people to work weekends on this. I just found it interesting enough to tinker with on the side and wanted to document my findings. I'm perfectly happy to wait for a resolution here, especially now that I've got my hacky workaround. Let me know if I can assist with debugging this. I'm curious why there would need to be a blacklist of errors here though instead of simply failing outright. I guess there is an edge case where an erroneous response indicates that the service is still there though that seems odd. Also regarding the timeout I'm curious why there would be a default 300ms delay for this that is overridable. I'd presume the IMDS service would know what service it is calling and what would be the expected return time so that it should "own" the timeout value itself. It is after all calling an endpoint that the code using the library had no influence over at all. I'm probably missing a lot of complexity here though... In any case. Thanks a lot for helping me with this! |
No worries! I was very curious about this one so I couldn't help but look at it again briefly on Saturday.
Yes, this feels extra odd since the way we are making this request means it shouldn't be throwing if it gets back any kind of valid HTTP response (even if that response is a 4xx, 5xx, etc.) I'm very tempted to remove these checks and treat any throw as permanent failure.
My best guess here is that it's highly unusual for a custom timeout to be passed to However, I think things are getting a bit off here since the challenge based auth policy is blindly copying over the timeout, which on the request will always default to 0 when not set, which does not mean that the consumer explicitly passed 0. |
@xirzec Honestly, I expected this to be a problem in my code so having it addressed this quickly is great! I'll leave my workaround in place and remove it once a new release is available. The PR looks good to me at least :) Thanks for looking at this so quickly! I really appreciate it. |
I have run into this problem locally too. It looks like the cause has been correctly diagnosed in the linked PR. The timeout value when "pinging" the IMDS endpoint is set to What is odd is that I'm finding this behaviour to be intermittent. Sometimes I'll get a response quite quickly (the "ping" will fail fast) and sometimes it takes a long time. I also note that I see this in the logs:
Should one expect that the IMDS checks should be skipped if no |
@dhensby that log statement seems very suspicious given that the imds flow doesn't seem to use azure-sdk-for-js/sdk/identity/identity/src/credentials/managedIdentityCredential/imdsMsi.ts Line 81 in 1a66e5d
Perhaps this has gotten a bit stale or was copied from another credential type? |
@xirzec ah, that makes sense and the update looks good. 👍 |
We had the same problem as rudfoss. Version 4.6.0 made it faster than 4.5.0, so we are happy and can update our dependency again from 4.4.0. Thanks for the fix! |
I should mention that |
The fix for me was combination of updating @azure/identity and @azure/keyvault-secrets to latest versions |
I'm closing this now since a fix has been released. Thanks for the help! |
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
Repository where I've reproduced the problem: https://github.com/rudfoss/keyvault-secret-test
Expected behavior
I'm seeing some strange behavior when retrieving key vault secrets using
@azure/keyvault-secrets
. Currently it takes about 1m40s to request a single secret usingDefaultAzureCredential
. I'm signed in to the Azure CLI and Azure PowerShell. My resources are deployed toNorway East
. Full repo that reproduces the problem here: https://github.com/rudfoss/keyvault-secret-testHere is essentially what I'm trying to do:
The strange thing is that it's not slow when using the same credentials to retrieve an App Configuration setting like this:
Would love some pointers on why this is slow and if there is something I can do to fix it.
The text was updated successfully, but these errors were encountered: