Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different behaviors for cosmos db linux emulator on unbuntu 18 vs 20 image #5036

Closed
1 of 7 tasks
DOMZE opened this issue Feb 7, 2022 · 17 comments
Closed
1 of 7 tasks
Assignees
Labels
Area: Containers bug report external investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: Ubuntu

Comments

@DOMZE
Copy link

DOMZE commented Feb 7, 2022

Description

Copied from Azure/azure-cosmos-dotnet-v3#3010

Describe the bug
Using Azure DevOps and a ubuntu-latest hosted agent, when starting the container, the container returns right away the following:

This is an evaluation version.  There are [165] days left in the evaluation period.
Shutting Down
Shut Down

Additional context
Can't use older images (tags) to see if the code has changes since, as it fails the evaluation time check
Is there something I can add as environment variable to get more logs out to diagnose the problem?

Virtual environments affected

  • Ubuntu 18.04
  • Ubuntu 20.04
  • macOS 10.15
  • macOS 11
  • Windows Server 2016
  • Windows Server 2019
  • Windows Server 2022

Image version and build link

Agent name: 'Hosted Agent'
Agent machine name: 'fv-az31-196'
Current agent version: '2.198.2'
Operating System
Virtual Environment
Virtual Environment Provisioner
Current image version: '20220131.1'
Agent running as: 'vsts'

Is it regression?

No response

Expected behavior

The container should create the partitions and return "Started" at some point

Actual behavior

The container exits and the logs shows

This is an evaluation version.  There are [165] days left in the evaluation period.
Shutting Down
Shut Down

Repro steps

Create an Azure DevOps pipeline with the following:

pool:
  name: Azure Pipelines
  vmImage: 'ubuntu-latest'
jobs:
- job: TestJob
  displayName: 'My Job'
  steps:
  - task: PowerShell@2
    name: showNetAdapters
    displayName: 'Show NetAdapters'
    inputs:
      pwsh: true
      targetType: inline
      script: |
        ifconfig
        $ipAddress = (hostname -I | awk '{print $1}')
        Write-Output "IpAddress = $ipAddress"
  - task: PowerShell@2
    name: startCosmosDb
    displayName: 'Start Azure Cosmos DB emulator'
    inputs:
      pwsh: true
      targetType: inline
      script: |
        $ipAddress = (hostname -I | awk '{print $1}')
        $containerId = (docker create -p 8081:8081 -p 10251:10251 -p 10252:10252 -p 10253:10253 -p 10254:10254 -m 3g --cpus=2.0 -e AZURE_COSMOS_EMULATOR_PARTITION_COUNT=10 -e AZURE_COSMOS_EMULATOR_ENABLE_DATA_PERSISTENCE=false -e AZURE_COSMOS_EMULATOR_IP_ADDRESS_OVERRIDE=$ipAddress mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator)
        Write-Host "##vso[task.setvariable variable=cosmosDbContainerId]$containerId"
        docker start $containerId
        Start-Sleep -Seconds 5
        $isStarted = $false
        while ($isStarted -eq $false) {
            $logs = (docker logs $containerId)
            if ($logs.Contains('Started')) {
                Write-Output "Container $containerId started."
                $isStarted = $true
                break;
            }
            Write-Output "Waiting for container $containerId to start"
            Write-Output ($logs | Out-String)
            Start-Sleep -Seconds 5
        }
  - script: |
      echo 'ContainerId = $(cosmosDbContainerId)'
      docker logs $(cosmosDbContainerId)
    displayName: Diagnostics
  - script: |
      ipAddress=$(hostname -I | awk '{print $1}')
      curl -k https://$ipAddress:8081/_explorer/emulator.pem > $(Agent.TempDirectory)/emulatorcert.crt
      sudo cp $(Agent.TempDirectory)/emulatorcert.crt /usr/local/share/ca-certificates/
      sudo update-ca-certificates
      echo "##vso[task.setvariable variable=cosmosDbEndpoint]https://$ipAddress:8081"
    displayName: 'Prepare emulator'
  - script: |
      if [ ! -z "$(cosmosDbContainerId)" ];
      then
        docker rm -f $(cosmosDbContainerId)
        sudo rm -f /usr/local/share/ca-certificates/emulatorcert.crt
      fi
    displayName: 'Clean Azure Cosmos DB emulator'
    condition: always()
@shilovmaksim
Copy link
Contributor

Hi @DOMZE .
Thanks you for tour report. We will take a look at this.

@shilovmaksim shilovmaksim added Area: Containers OS: Ubuntu investigate Collect additional information, like space on disk, other tool incompatibilities etc. and removed needs triage labels Feb 7, 2022
@nikolai-frolov
Copy link
Contributor

Hello @DOMZE your repro yml fails on all environments (not only on Ubuntu 20) since it incorrectly creates docker container for CosmosDB. Please use this one guideline to correctly create container. These steps work for me.
I'm going to close this issue since it's not related to image generation.

@DOMZE
Copy link
Author

DOMZE commented Feb 8, 2022

How does it incorrectly creates the docker container?

docker create and docker start is just docker run split in 2 commands (docker run = docker create + docker start)...
also on the official docs, they run the container in interactive mode (using -it), which clearly can't be done in an automated environment.
It also says after the running the container in interactive mode: After the emulator is running, using a different terminal, different terminal because the original terminal is blocked by TTY...

So to go back to my last point on Azure Devops (not github actions...), doing the command as you showed (docker run, results in the following error from docker):
the input device is not a TTY

which makes sense considering there's no TTY...

So just want to know, how can one work (in my azure devops agent) using ubuntu 18 and not in ubuntu 20? the SAME exact pipeline YAML?

@soenneker
Copy link

@nikolai-frolov
The script below works (and is using the DevOps recommended Docker approach) yet if you specify ubuntu-latest as the image, it doesn't. You'll see that by the 'Stopping container' messages in the powershell task.

trigger: none

pool:
  name: Azure Pipelines
  vmImage: 'ubuntu-18.04'

resources:
  containers:
  - container: azure-cosmosdb-emulator
    image: mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator
    ports:
    - "8081:8081"
    - "10251:10251"
    - "10252:10252"
    - "10253:10253"
    - "10254:10254"
    options: --name azure-cosmosdb-emulator -m 3g --cpus=2.0 -e AZURE_COSMOS_EMULATOR_PARTITION_COUNT=20 -e AZURE_COSMOS_EMULATOR_ENABLE_DATA_PERSISTENCE=false -e AZURE_COSMOS_EMULATOR_IP_ADDRESS_OVERRIDE=127.0.0.1

jobs:
- job: Job
  displayName: 'Job'

  services:
    azure-cosmosdb-emulator: azure-cosmosdb-emulator

  steps:

  - task: PowerShell@2
    name: startCosmosDb
    displayName: 'Wait for Cosmos'
    inputs:
      pwsh: true
      targetType: inline
      script: |
        $isStarted = $false
        while ($isStarted -eq $false) {
            $logs = (docker logs azure-cosmosdb-emulator)
            if ($logs.Contains('Started')) {
                Write-Output "Container $containerId started."
                $isStarted = $true
                break;
            }
            Write-Output "Waiting for container azure-cosmosdb-emulator to start"
            Write-Output ($logs | Out-String)
            Start-Sleep -Seconds 5
        }

@nikolai-frolov
Copy link
Contributor

nikolai-frolov commented Feb 9, 2022

At the first of all, they use -d flag which allows to run container in background. It works fine without -it flags as well.

- name: Check execution
        shell: pwsh
        run: |
          $ipaddr = ifconfig | grep "inet " | grep -Fv 127.0.0.1 | awk '{print $2}' | head -n 1
          docker pull mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator
          docker run -d -p 8081:8081 -p 10251:10251 -p 10252:10252 -p 10253:10253 -p 10254:10254  -m 3g --cpus=2.0 --name=test-linux-emulator -e AZURE_COSMOS_EMULATOR_PARTITION_COUNT=10 -e AZURE_COSMOS_EMULATOR_ENABLE_DATA_PERSISTENCE=true -e AZURE_COSMOS_EMULATOR_IP_ADDRESS_OVERRIDE=$ipaddr mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator
          docker ps --all

Please double check parameters which are passed during container creation and provide the build link if it still doesn't work for you.

@soenneker
Copy link

soenneker commented Feb 10, 2022

Hi @nikolai-frolov could you give the yaml code I posted a try? It's pretty clear from there that the container is stopping (and the pipeline won't complete)

An additional note, running docker ps isn't enough of a check immediately after docker run because the container starts successfully. It shuts down probably 5 seconds after though.

@nikolai-frolov
Copy link
Contributor

nikolai-frolov commented Feb 10, 2022

@soenneker it works fine on 20.04 ADO after adding of endpoint to your infinite cycle:
image
image


trigger: none

pool:
  name: Azure Pipelines
  vmImage: 'ubuntu-20.04'

resources:
  containers:
  - container: azure-cosmosdb-emulator
    image: mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator
    ports:
    - "8081:8081"
    - "10251:10251"
    - "10252:10252"
    - "10253:10253"
    - "10254:10254"
    options: --name azure-cosmosdb-emulator -m 3g --cpus=2.0 -e AZURE_COSMOS_EMULATOR_PARTITION_COUNT=20 -e AZURE_COSMOS_EMULATOR_ENABLE_DATA_PERSISTENCE=false -e AZURE_COSMOS_EMULATOR_IP_ADDRESS_OVERRIDE=127.0.0.1

jobs:
- job: Job
  displayName: 'Job'

  services:
    azure-cosmosdb-emulator: azure-cosmosdb-emulator

  steps:

  - task: PowerShell@2
    name: startCosmosDb
    displayName: 'Wait for Cosmos'
    inputs:
      pwsh: true
      targetType: inline
      script: |
        $isStarted = $false
        $retriesCount = 10
        while ($isStarted -eq $false -and $retriesCount -ne 0) {
            $logs = (docker logs azure-cosmosdb-emulator)
            if ($logs.Contains('Started')) {
                Write-Output "Container $containerId started."
                $isStarted = $true
                break;
            }
            Write-Output "Waiting for container azure-cosmosdb-emulator to start"
            Write-Output ($logs | Out-String)
            Start-Sleep -Seconds 5
            $retriesCount--
        }
        docker ps --all

@nikolai-frolov
Copy link
Contributor

Well, seems issue is reproducible not at all 100% cases. I will investigate this deeply.

@nikolai-frolov
Copy link
Contributor

@milismsft does mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator image create logs somewhere? I can't find a reason why this container stops just after the start on ubuntu 20.04 (docker client is identical with 18.04 and it seems like compatibility issue for now)...
image

@milismsft
Copy link

@nikolai-frolov did you try to execute that workload locally on a Ubuntu 20.04 Linux VM or local machine? The only issue that I'm aware of is when using the Azure DevOps pipeline configured with Ubuntu 20.04 (seems that 18.04 works just fine), and will recommend opening a support ticket with the respective Azure team.

There are no extra logs that can be retrieved at this time. The error seems to be coming from the initial setup which initializes the Windows sandbox we run the emulator under. There must be some policies or similar that are in the way of it, unfortunately without a local repro it will be a very hard thing to investigate.

@milismsft
Copy link

One more thing, the script above that checks for the emulator if it's started, that is incorrect actually. Just because the container is started/running, it does not translate into the emulator being ready to service requests. Please see our public documentation of how to verify that emulator is truly ready (i.e. check through the emulator's "explorer" link if the PEM public certificate key is available).

@nikolai-frolov
Copy link
Contributor

nikolai-frolov commented Feb 11, 2022

I tried to reproduce the issue locally on VM with ubuntu-20.04 image, but without luck: every time it successfully starts a container. I used the following:

$containerId = (docker create -p 8081:8081 -p 10251:10251 -p 10252:10252 -p 10253:10253 -p 10254:10254 -m 3g --cpus=2.0 -e AZURE_COSMOS_EMULATOR_PARTITION_COUNT=10 -e AZURE_COSMOS_EMULATOR_ENABLE_DATA_PERSISTENCE=false -e AZURE_COSMOS_EMULATOR_IP_ADDRESS_OVERRIDE=127.0.0.1 mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator)
docker start $containerId
docker ps --all

When I tried to execute the same in the ADO pipeline it also passes (at least 5 pipelines pass in a row):

trigger: none

pool:
  name: Azure Pipelines
  vmImage: 'ubuntu-20.04'

jobs:
- job: Job
  displayName: 'Job'
  steps:
  - task: PowerShell@2
    name: startCosmosDb
    displayName: 'Wait for Cosmos'
    inputs:
      pwsh: true
      targetType: inline
      script: |
        $containerId = (docker create -p 8081:8081 -p 10251:10251 -p 10252:10252 -p 10253:10253 -p 10254:10254 -m 3g --cpus=2.0 -e AZURE_COSMOS_EMULATOR_PARTITION_COUNT=10 -e AZURE_COSMOS_EMULATOR_ENABLE_DATA_PERSISTENCE=false -e AZURE_COSMOS_EMULATOR_IP_ADDRESS_OVERRIDE=127.0.0.1 mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator)
        docker start $containerId
        docker logs --timestamps --details $containerId
        docker ps --all

image

On the other hand, issue still occurs during usage of azure container:

trigger: none

pool:
  name: Azure Pipelines
  vmImage: 'ubuntu-20.04'

resources:
  containers:
  - container: azure-cosmosdb-emulator
    image: mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator
    ports:
    - "8081:8081"
    - "10251:10251"
    - "10252:10252"
    - "10253:10253"
    - "10254:10254"
    options: --name azure-cosmosdb-emulator -m 3g --cpus=2.0 -e AZURE_COSMOS_EMULATOR_PARTITION_COUNT=20 -e AZURE_COSMOS_EMULATOR_ENABLE_DATA_PERSISTENCE=false -e AZURE_COSMOS_EMULATOR_IP_ADDRESS_OVERRIDE=127.0.0.1

jobs:
- job: Job
  displayName: 'Job'

  services:
    azure-cosmosdb-emulator: azure-cosmosdb-emulator

  steps:

  - task: PowerShell@2
    name: startCosmosDb
    displayName: 'Wait for Cosmos'
    inputs:
      pwsh: true
      targetType: inline
      script: |
        $isStarted = $false
        $retriesCount = 10
        while ($isStarted -eq $false -and $retriesCount -ne 0) {
            $logs = (docker logs azure-cosmosdb-emulator)
            if ($logs.Contains('Started')) {
                Write-Output "Container $containerId started."
                $isStarted = $true
                break;
            }
            Write-Output "Waiting for container azure-cosmosdb-emulator to start"
            Write-Output ($logs | Out-String)
            Start-Sleep -Seconds 5
            $retriesCount--
        }
        docker ps --all

image

Continuing an investigation...

@nikolai-frolov
Copy link
Contributor

nikolai-frolov commented Feb 11, 2022

Issue definitely don't related to Azure DevOps since the same issue occurs in the GitHub Actions which uses its own GitHub Runner for execution.
@milismsft I've added additional output for ports in use and "docker inspect" (please see this one run to find output of successful 18.04 and failed 20.04 images). Comparing of them it looks like issue occurs during Sandbox creation and it terminates a container. Could you please assist with further investigation?

name: "Cosmos DB check"

on:
  workflow_dispatch:

defaults:
  run:
    shell: pwsh

jobs:
  testing:
    name: Testing
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ ubuntu-18.04, ubuntu-20.04 ]
    steps:
      - name: Start Azure Cosmos DB emulator
        run: |
          Write-Output "Ports in use before container creation:"
          netstat -tulpn | grep LISTEN

          Write-Output "Container creation..."
          $containerId = (docker create -p 8081:8081 -p 10251:10251 -p 10252:10252 -p 10253:10253 -p 10254:10254 -m 3g --memory-swap -1 --cpus=2.0 mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator)

          Write-Output "Container starting..."
          docker start $containerId
          Start-Sleep -Seconds 10
          $retriesCount = 5
          while ($retriesCount -ne 0) {
            $logs = (docker logs $containerId)
            Write-Output ($logs | Out-String)
            if ($logs[-1] -eq 'Started') {
              Write-Output "Container $containerId started."
              break
            }
            Write-Output "Waiting for container $containerId to start..."
            Start-Sleep -Seconds 5
            $retriesCount--
          }

          docker ps --all

          docker inspect $containerId

          Write-Output "Ports in use after container creation:"
          netstat -tulpn | grep LISTEN

@nikolai-frolov
Copy link
Contributor

nikolai-frolov commented Feb 14, 2022

I've additionally compared the output of "lscpu" command for Ubuntu 20.04 machines where pipeline passed and failed (see failed and passed) and machines look similar.
@milismsft is going to reproduce the issue locally.

@nikolai-frolov
Copy link
Contributor

@milismsft @DOMZE could you please reopen related issue (Azure/azure-cosmos-dotnet-v3#3010) for tracking? We are going to close this issue since it looks unrelated to image generation and should be investigated on the owners side. Feel free to contact us for assistance.

@milismsft
Copy link

@DOMZE I opened Azure/azure-cosmos-db-emulator-docker#45 for tracking purposes.

@nikolai-frolov
Copy link
Contributor

@milismsft we discussed with you that it's not related to machines. Also issue occurs on both Azure Agents and GitHub Runners which use different services. Your ticket to Azure team without logs and links will not take affect...
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Containers bug report external investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: Ubuntu
Projects
None yet
Development

No branches or pull requests

5 participants