Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure CLI - Running commands in parallel fails #8561

Closed
oskarm93 opened this issue Oct 11, 2018 · 9 comments
Closed

Azure CLI - Running commands in parallel fails #8561

oskarm93 opened this issue Oct 11, 2018 · 9 comments
Assignees

Comments

@oskarm93
Copy link

Environment

  • Server - Azure Pipelines
    • rcss-willistowerswatson - PPM - AML - CI
  • Agent - Hosted or Private:
    • Private - WS16 - 2.140.2

Issue Description

We have 4 agents installed on a single VM. I use them with a multi-configuration phase to build multiple Docker images at the same time. We use ACR build tasks inside Azure CLI task to build the images.

Cutdown YAML phase:

  #### Run on multiple build agents ####
  - job: Phase_2
    displayName: Run on multiple build agents
    dependsOn: Phase_1

    strategy:
      maxParallel: 99
      matrix:
        Config1:
          ServiceName: Service1
          ImageOS: windows
        Config2:
          ServiceName: Service2
          ImageOS: windows

    steps:
      - checkout: self
        clean: true

      ### Build Docker image ###
      - task: AzureCLI@1
        displayName: Build Docker image
        inputs:
          azureSubscription: WTW-IXS-TESTLAB-AML-Contributor
          scriptLocation: inlineScript
          failOnStandardError: true
          inlineScript: |
            powershell -File "$(Build.SourcesDirectory)/build/Invoke-ACRBuild.ps1" "$(BaseDir)" "Dockerfile" "$(ImageName)" "$(ImageOS)"

Invoke-ACRBuild.ps1

[CmdletBinding()]
Param
(
    [Parameter(Mandatory)]
    [String]$BaseDir,

    [Parameter(Mandatory)]
    [String]$Dockerfile,

    [Parameter(Mandatory)]
    [String]$ImageName,

    [Parameter(Mandatory)]
    [String]$ImageOS,

    [Parameter(Mandatory=$false)]
    [String]$AzureContainerRegistryName
)

$errorFile = [System.IO.Path]::GetTempFileName()
& az acr login --name $AzureContainerRegistryName 2>$errorFile

$errorOutput = Get-Content $errorFile -Raw
$predefinedErrorMessage = 'This error may also indicate that the docker daemon is not running.'
If(-not $errorOutput.Contains($predefinedErrorMessage)) { Write-Error $errorOutput }

& az acr build --registry $AzureContainerRegistryName --file $Dockerfile --image $ImageName --os $ImageOS $BaseDir 2>$errorFile

$errorOutput = Get-Content $errorFile -Raw
If($errorOutput.Contains('ERROR:')) { Write-Error $errorOutput }

Aside from a really crap error handling (Azure CLI commands like to report progress into stderr...), the az commands to ACR build fail every now and again when running in parallel.

One multi-config job failed with this:

2018-10-11T11:42:19.8727351Z [command]C:\Windows\system32\cmd.exe /D /S /C ""C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin\az.cmd" account set --subscription f8036650-7484-4d51-a6b8-ecf04f234a58"
2018-10-11T11:42:22.2645668Z ERROR: The subscription of 'f8036650-7484-4d51-a6b8-ecf04f234a58' doesn't exist in cloud 'AzureCloud'.
2018-10-11T11:42:22.2701414Z ##[error]Error Code: [1]
2018-10-11T11:42:22.2709820Z ##[error]Error: Error in setting up subscription
2018-10-11T11:42:22.2928307Z [command]C:\Windows\system32\cmd.exe /D /S /C ""C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin\az.cmd" account clear"
2018-10-11T11:42:24.2932659Z ##[error]Script failed with error: ERROR: The subscription of 'f8036650-7484-4d51-a6b8-ecf04f234a58' doesn't exist in cloud 'AzureCloud'.

The other multi-config job failed with this at the same time:

2018-10-11T11:42:17.6243077Z D:\_Builds\<vm_name>.03\1\s>powershell -File "D:\_Builds\<vm_name>.03\1\s/build/Invoke-ACRBuild.ps1" "D:\_Builds\<vm_name>.03\1\a/docker_binaries/WTW.AML.SchemaAnalysis.Host" "Dockerfile" "<image>" "windows" 
2018-10-11T11:42:24.8363983Z D:\_Builds\<vm_name>.03\1\s\build\Invoke-ACRBuild.ps1 : az.cmd : ERROR: Please run 'az login' to setup account.
2018-10-11T11:42:24.8364488Z At D:\_Builds\<vm_name>.03\1\s\build\Invoke-ACRBuild.ps1:27 char:1
2018-10-11T11:42:24.8365122Z + & az acr build --registry $AzureContainerRegistryName --file $Dockerf ...
2018-10-11T11:42:24.8365277Z + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2018-10-11T11:42:24.8366070Z     + CategoryInfo          : NotSpecified: (ERROR: Please r... setup account.:String) [], RemoteException
2018-10-11T11:42:24.8366212Z     + FullyQualifiedErrorId : NativeCommandError
2018-10-11T11:42:24.8366734Z  
2018-10-11T11:42:24.8366874Z     + CategoryInfo          : NotSpecified: (:) [Write-Error], WriteErrorException
2018-10-11T11:42:24.8367387Z     + FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,Invoke-ACRBuild.ps1
2018-10-11T11:42:24.8367635Z  
2018-10-11T11:42:24.9500063Z [command]C:\Windows\system32\cmd.exe /D /S /C ""C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\wbin\az.cmd" account clear"
2018-10-11T11:42:26.9330159Z ##[error]Script failed with error: Error: C:\Users\SVEAZR~1\AppData\Local\Temp\azureclitaskscript1539258131140.bat failed with return code: 0

It looks almost as if they log each other out sometimes and then the other task doesn't know what subscription to talk to.

Task logs

48.txt
52.txt

@oskarm93
Copy link
Author

I have some more logs, this time with diagnostics and debug messages turned on:
62.txt
52.txt

@bishal-pdMSFT
Copy link
Contributor

PR for fixing this is out

@bishal-pdMSFT
Copy link
Contributor

Closing as fixed with #8656

@oskarm93
Copy link
Author

@bishal-pdMSFT When is the change likely to hit west europe accounts?

@bishal-pdMSFT
Copy link
Contributor

@xenalite this has not yet started rolling out. It will take ~2 weeks for it to reach all locations. In the meanwhile you can follow the workaround mentioned in #8314 (comment)

@oskarm93
Copy link
Author

oskarm93 commented Nov 5, 2018

@bishal-pdMSFT I can still experience failures with parallel Azure CLI task.
The error message is something else, but the problem seems to be very similar.
When we use inline Azure CLI, a batch file is created in the temp directory with a timestamp in the name.
If we run tasks in parallel, they can create a file with a same name, then delete each others files.

I have some other diagnostic logs with 2 jobs running in parallel on the same node.
47.txt
46.txt

Both of them create a file in:

'C:\Users\SVEAZR~1\AppData\Local\Temp\azureclitaskscript1541409720018.bat'

Then one of the tasks fails with

2018-11-05T09:24:15.8782306Z The batch file cannot be found.
2018-11-05T09:24:17.6184819Z ##[error]Script failed with error: Error: C:\Users\SVEAZR~1\AppData\Local\Temp\azureclitaskscript1541409720018.bat failed with return code: 1

I thought it should generate the batch file in the agent's work directory.

@bishal-pdMSFT
Copy link
Contributor

You are right, it should use agent's temp folder. I think when the task was authored, agent's temp folder was not available. Having said that, the probability of this error happening is quite low compared to earlier error. This error will happen only when two parallel tasks starts within 1 second of each other. But none the less, it needs to be fixed.
Re-activating the issue.

@bishal-pdMSFT bishal-pdMSFT reopened this Nov 5, 2018
@oskarm93
Copy link
Author

oskarm93 commented Nov 5, 2018

@bishal-pdMSFT

Having said that, the probability of this error happening is quite low compared to earlier error.

We get this failure around 3 times a day.

@vineetmimrot
Copy link
Member

This PR fixes the issue #9054

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants