Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI is logging on each command when service-principal is used #13276

Closed
bsuchorowski opened this issue Apr 30, 2020 · 11 comments
Closed

CLI is logging on each command when service-principal is used #13276

bsuchorowski opened this issue Apr 30, 2020 · 11 comments
Assignees
Labels
Account az login/account customer-reported Issues that are reported by GitHub users external to the Azure organization. feature-request
Milestone

Comments

@bsuchorowski
Copy link

az feedback auto-generates most of the information requested below, as of CLI version

2.5.0 and older

Describe the bug
We are experiencing same issues with login taking too long as described here #7465. We use self-hosted Azure Devops agent and AzureCLI@2 task. From time to time it takes a few minutes to login. As we are doing a lot of quering from our container registry via az acr command, we notices that our script (that contains ~10 az commands) takes 10+ minutes to finish and it is only querying simple things such as repository tags and so on. I tried to reproduce it with --debug option and I noticed that every command tries to login and hangs there for a while. I am wondering why it does not find any cached accessTokens in accessToken.json (entry for service-principal does not have _clientId property over there).

adal-python : bcd11473-ee6d-48f7-9123-ae680f2dce7c - TokenRequest:No user_id passed for cache query
adal-python : bcd11473-ee6d-48f7-9123-ae680f2dce7c - CacheDriver:finding with query keys: {'_clientId': '...'}
adal-python : bcd11473-ee6d-48f7-9123-ae680f2dce7c - CacheDriver:Looking for potential cache entries: {'_clientId': '...'}
adal-python : bcd11473-ee6d-48f7-9123-ae680f2dce7c - CacheDriver:Found 0 potential entries.
urllib3.connectionpool : Starting new HTTPS connection (1): login.microsoftonline.com:443

It happens on every az command and it takes a lot of time.

To Reproduce

  1. Login with az login --service-principal
  2. Do any az command, for example az acr repository list with --debug to see whether it uses cached token or not.

Expected behavior

  1. Running the command should not require to login every time.
@ghost ghost added needs-triage customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Apr 30, 2020
@jiasli jiasli self-assigned this Apr 30, 2020
@yonzhan yonzhan added this to the Backlog milestone Apr 30, 2020
@ghost ghost removed the needs-triage label Apr 30, 2020
@yonzhan
Copy link
Collaborator

yonzhan commented Apr 30, 2020

@jiasli please take a look

@jiasli
Copy link
Member

jiasli commented Apr 30, 2020

Thanks for the feedback. It is by-design that service principal's access token is not cached and is retrieved every time a call is made.

def acquire_token(self, authentication_context, resource, client_id):
if hasattr(self, 'secret'):
return authentication_context.acquire_token_with_client_credentials(resource, client_id, self.secret)
return authentication_context.acquire_token_with_client_certificate(resource, client_id, self.cert_file_string,
self.thumbprint, self.public_certificate)

We will consider implementing this during the Azure Identity adoption process. The functionality is already in Azure Identity's code:

token = self._client.get_cached_token(scopes)

@bsuchorowski
Copy link
Author

So there is no way to reduce the number of login calls? It would not be a problem if #7465 would not be happening. We have a powershell script with a dozen of az commands such as az acr repository show-tags and az acr helm push and it is taking too long to execute them.

@jiasli
Copy link
Member

jiasli commented May 11, 2020

Thanks for the details. I see where the problem is - it is more about the execution time of each command when retrieving the access token, instead the number of retrieving access token calls. 😉

Still, I am not sure if #7465 is relevant. As in #7465, only the first call should be slow, then the following calls should be normal.

Yes, reducing the number of retrieving access token calls may help, but it will take months before we migrate to MSAL and use the token cache for service principals.

I can't reproduce on a normal Linux or Windows machine and I will do some test on DevOps.

At the same time, could you share

  1. Some sample of your DevOps task YAML and script?
  2. Raw --debug log with timestamps?
    image

@bsuchorowski
Copy link
Author

bsuchorowski commented May 14, 2020

Raw log file:
TagImagesRawLog.txt

Here is the script that causes troubles. We are building an umbrella chart that contains a few helm charts but we are building only these that have changed since last build. The rest is pulled from container registry and retagged. Basically we are searching for previously pushed docker images in the container repository, pulling them, retagging with a new tag and pushing back to the repository. Same operation is repeated for a few repositories.

We could reduce number of az commands by building umbrella chart without aligning child charts versions, however it simplifies the script. We were not expecting so big performance hit.

param (
    [string]$semVer
 )

function Get-ScriptDirectory { Split-Path $MyInvocation.ScriptName }

$tagName = $semVer
$version = (git describe --abbrev=0);
$branchName = $tagName.Substring($tagName.IndexOf("-"));
$branchName = $branchName.Substring(0,$branchName.IndexOf("."))
echo "Found base version: ${version}"
echo "Found found feature branch: ${branchName}"
az acr login --debug -n xxx

# Get all services configured in this monorepo
$repositories = az acr repository list --debug -n xxx --query "[?contains(@, 'xxx0') || contains(@,'xxx1') || contains(@,'xxx2') || contains(@,'xxx3') || contains(@,'xxx4')]"
[System.Collections.ArrayList]$repositoriesArray = $repositories -split '\r?\n'
$repositoriesArray.RemoveAt(0)
$repositoriesArray.RemoveAt($repositoriesArray.count-1)
 foreach ($repositoryElement in $repositoriesArray) {
    #Find last tag in feature branch
    $repositoryElement = $repositoryElement.Trim(',').Trim(' ').Trim('"')
    echo "Trying to find tag for ${repositoryElement}"
    $foundTag = (az acr repository show-tags  --debug -n xxx --repository $repositoryElement --orderby time_desc --query "[?contains(@, '${branchName}')] | [0]")
    if($foundTag.length -eq 0) {
        echo "Not found tag for feature branch. Looking for master branch"
        $foundTag = (az  acr repository show-tags --debug -n xxx --repository  $repositoryElement --orderby time_desc --query "[?contains(@, '$version') && !contains(@, '-')] | [0]")
        }
  $foundTag = $foundTag.Trim('"') 
    echo "Found tag ${foundTag}"
   if ($tagName -eq $foundTag) {continue;}
   echo "Tagging repository ${repositoryElement}:${foundTag} with version ${tagName}"
   docker pull xxx.azurecr.io/${repositoryElement}:${foundTag}
   docker tag xxx.azurecr.io/${repositoryElement}:${foundTag} xxx.azurecr.io/${repositoryElement}:${tagName};
   docker push xxx.azurecr.io/${repositoryElement}:${tagName};
   $path = Join-Path (Get-ScriptDirectory) ${repositoryElement}/charts/${repositoryElement}/
   helm init --client-only
   helm package --version ${tagName} ${path}
   az acr helm push --debug -n xxx ${repositoryElement}-${tagName}.tgz --force
 }

Example of long login execution (over 2 minutes):

2020-05-14T07:23:56.9910741Z urllib3.connectionpool : Starting new HTTPS connection (1): login.microsoftonline.com:443
2020-05-14T07:27:12.9035638Z urllib3.connectionpool : https://login.microsoftonline.com:443 "POST /***/oauth2/token HTTP/1.1" 200 1354

@jiasli
Copy link
Member

jiasli commented May 14, 2020

@bsuchorowski, thanks for the detailed log. Judging from the log, Azure CLI is running as expected. Only this https://login.microsoftonline.com call is slow:

2020-05-14T07:23:56.9867220Z adal-python : 1dd4dfdf-fb13-4f88-8ac8-5248c8340b33 - Authority:Performing instance discovery: ...
2020-05-14T07:23:56.9868447Z adal-python : 1dd4dfdf-fb13-4f88-8ac8-5248c8340b33 - Authority:Performing static instance discovery
2020-05-14T07:23:56.9869642Z adal-python : 1dd4dfdf-fb13-4f88-8ac8-5248c8340b33 - Authority:Authority validated via static instance discovery
2020-05-14T07:23:56.9883628Z adal-python : 1dd4dfdf-fb13-4f88-8ac8-5248c8340b33 - TokenRequest:Getting token with client credentials.
2020-05-14T07:23:56.9887168Z adal-python : 1dd4dfdf-fb13-4f88-8ac8-5248c8340b33 - TokenRequest:No user_id passed for cache query
2020-05-14T07:23:56.9890624Z adal-python : 1dd4dfdf-fb13-4f88-8ac8-5248c8340b33 - CacheDriver:finding with query keys: {'_clientId': '...'}
2020-05-14T07:23:56.9894299Z adal-python : 1dd4dfdf-fb13-4f88-8ac8-5248c8340b33 - CacheDriver:Looking for potential cache entries: {'_clientId': '...'}
2020-05-14T07:23:56.9897688Z adal-python : 1dd4dfdf-fb13-4f88-8ac8-5248c8340b33 - CacheDriver:Found 0 potential entries.
2020-05-14T07:23:56.9910741Z urllib3.connectionpool : Starting new HTTPS connection (1): login.microsoftonline.com:443
2020-05-14T07:27:12.9035638Z urllib3.connectionpool : https://login.microsoftonline.com:443 "POST /***/oauth2/token HTTP/1.1" 200 1354
2020-05-14T07:27:12.9074149Z adal-python : 1dd4dfdf-fb13-4f88-8ac8-5248c8340b33 - OAuth2Client:Get Token Server returned this correlation_id: 1dd4dfdf-fb13-4f88-8ac8-5248c8340b33
2020-05-14T07:27:12.9106017Z adal-python : 1dd4dfdf-fb13-4f88-8ac8-5248c8340b33 - CacheDriver:Adding entry AccessTokenId: b'JRO87A06oKX0EBRffE55++9oJi8J7RtBbg83G4j7YJo='
2020-05-14T07:27:12.9108192Z adal-python : 1dd4dfdf-fb13-4f88-8ac8-5248c8340b33 - CacheDriver:Added entry is MRRT

Other calls seem to be normal.

2020-05-14T07:23:51.8824434Z urllib3.connectionpool : Starting new HTTPS connection (1): login.microsoftonline.com:443
2020-05-14T07:23:52.1035995Z urllib3.connectionpool : https://login.microsoftonline.com:443 "POST /***/oauth2/token HTTP/1.1" 200 1354

While I will try to internally contact AAD team, could you file a support request for AAD as well and provide this issue #13276 and the log?

image

You may cc me in the email thread. My email address is jiasli at microsoft.com. Thanks for understanding.

@bsuchorowski
Copy link
Author

That is exactly our issue - https://login.microsoftonline.com is slow. I am not able to create a support request right now. I need to contact somebody from my organization because right now I am receiving:

Your cloud service provider manages your subscription.
To submit a technical support request for this subscription, please contact your service provider:

@bsuchorowski
Copy link
Author

Just FYI - we have just migrated from Self-Hosted agents to VMSS agents with the same scripts in use. I will let you know it it behaves the same way.

@bsuchorowski
Copy link
Author

Still this simple script takes 30-35% of our overall pipepline execution time.

@jiasli
Copy link
Member

jiasli commented Sep 9, 2020

Hi @bsuchorowski, I understand this issue is troubling you.

As each Azure CLI command runs in a separate process, there is no in-memory cache to save the access token of a service principal. So the https://login.microsoftonline.com:443 call is necessary.

You will need to create a support ticket to AAD or Azure Network team to check why connection to https://login.microsoftonline.com:443 is slow.

Thanks for your understanding.

@yonzhan yonzhan added feature-request Account az login/account and removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Feb 21, 2022
@RakeshMohanMSFT RakeshMohanMSFT self-assigned this Oct 13, 2022
@jiasli
Copy link
Member

jiasli commented Nov 3, 2022

After migrating to MSAL Python (#19853), Azure CLI now utilizes MSAL's token cache to save service principal's access tokens:

result = self.acquire_token_silent(scopes, None, **kwargs)
if not result:
result = self.acquire_token_for_client(scopes, **kwargs)

Azure CLI first calls acquire_token_silent to attempt to get an access token from token cache. If no access token is returned, it calls acquire_token_for_client to get a new access token with client secret/certificate/assertion.

This is the same pattern as MSAL's sample code:

https://github.com/AzureAD/microsoft-authentication-library-for-python/blob/0861aba8eea63fa11142ef9fc17c9c7a41715e08/sample/confidential_client_secret_sample.py#L56-L63

# Firstly, looks up a token from cache
# Since we are looking for token for the current app, NOT for an end user,
# notice we give account parameter as None.
result = app.acquire_token_silent(config["scope"], account=None)

if not result:
    logging.info("No suitable token exists in cache. Let's get a new one from AAD.")
    result = app.acquire_token_for_client(scopes=config["scope"])

@jiasli jiasli changed the title CLI is logging on every command once service-principal is logged CLI is logging on each command when service-principal is used Nov 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Account az login/account customer-reported Issues that are reported by GitHub users external to the Azure organization. feature-request
Projects
None yet
Development

No branches or pull requests

6 participants