Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS resolution error when downloading bicep in bitbucket #9774

Closed
ausy-tk opened this issue Feb 6, 2023 · 41 comments
Closed

DNS resolution error when downloading bicep in bitbucket #9774

ausy-tk opened this issue Feb 6, 2023 · 41 comments
Assignees

Comments

@ausy-tk
Copy link

ausy-tk commented Feb 6, 2023

Bicep version
v0.14.6 via az bicep version

Describe the bug
When deploying Docker containers to our Azure Container Apps environment with az cli using Bicep template, our deployments randomly fail while attempting to pull the latest Bicep version from downloads.bicep.azure.com with the following error:

ERROR: az_command_data_logger: Error while attempting to retrieve the latest Bicep version: HTTPSConnectionPool(host='downloads.bicep.azure.com', port=443): Max retries exceeded with url: /releases/latest (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fae4f82d660>: Failed to establish a new connection: [Errno -2] Name does not resolve')).

To Reproduce
Deploy a Docker container to Azure Container Apps environment with az-cli using a bicep template (but happens randomly)

Additional context
We know there is a workaround which was described in issue #3689 to use the ARM json as a template, but anyway this may be should analyzed in more depth, if teams as us want to standardize to Bicep templates.

We don't experience any other internet connection issues within our CI/CD environment

@ghost ghost added the Needs: Triage 🔍 label Feb 6, 2023
@alex-frankel
Copy link
Collaborator

@davidcho23 -- can you take a look at this one? Is this error coming from the CDN?

@davidcho23
Copy link
Contributor

davidcho23 commented Feb 14, 2023

The CDN endpoint is working fine. I am able to see available Bicep versions and install the latest Bicep version using az bicep list-versions and az bicep install

@majastrz do you have an idea of what the issue might be?

@majastrz
Copy link
Member

majastrz commented Feb 14, 2023

The "Name does not resolve" part suggests that the Az CLI is unable to resolve the downloads.bicep.azure.com DNS name. We haven't made any DNS changes here in several weeks.

Locally, the name resolves for me successfully as well:

❯ nslookup downloads.bicep.azure.com
Server:  router
Address:  192.168.1.1

Non-authoritative answer:
Name:    part-0041.t-0009.fdv2-t-msedge.net
Addresses:  2620:1ec:4f:1::69
          2620:1ec:4e:1::69
          13.107.237.69
          13.107.238.69
Aliases:  downloads.bicep.azure.com
          bicep-downloads-prod.azureedge.net
          bicep-downloads-prod.afd.azureedge.net
          star-azureedge-prod.trafficmanager.net
          shed.dual-low.part-0041.t-0009.fdv2-t-msedge.net

The random nature of this suggests some DNS resolution issue (rather than a configuration issue on our end) in the CI/CD environment. @ausy-tk can you share any details about the CI/CD environment that is executing the Az CLI commands?

@ausy-tk
Copy link
Author

ausy-tk commented Feb 20, 2023

Hi @majastrz, unfortunately the CI/CD environment is not in our hand directly, so I can't provide any details. But I have also opened a support ticket on their side as it seems that the issue is not on your side.

@danielmackay
Copy link

danielmackay commented Mar 8, 2023

I am getting this same error also when trying to use az bicep from a bit bucket pipeline.

This error seems to be intermittent.

@OlivierTD
Copy link

I'm getting the same problem on my end on Bitbucket pipelines.

If I run the command az deployment group create with the --debug tag I get this stack trace

DEBUG: cli.azure.cli.command_modules.resource._bicep: Bicep CLI installation path: /root/.azure/bin/bicep
DEBUG: cli.azure.cli.command_modules.resource._bicep: Bicep CLI installed: False.
DEBUG: urllib3.connectionpool: Starting new HTTPS connection (1): [aka.ms:443](http://aka.ms:443/)
DEBUG: urllib3.connectionpool: [https://aka.ms:443](https://aka.ms/) "GET /BicepLatestRelease HTTP/1.1" 301 0
DEBUG: urllib3.connectionpool: Starting new HTTPS connection (1): [downloads.bicep.azure.com:443](http://downloads.bicep.azure.com:443/)
DEBUG: cli.azure.cli.core.azclierror: Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "/usr/local/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/usr/local/lib/python3.10/socket.py", line 955, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name does not resolve

@danielmackay Since it is intermittent, caching the azure dependencies in my pipeline fixed my issue. This way I avoid downloading bicep on every run.

In my bitbucket-pipelines.yml file:

definitions:
  caches:
    azure: /root/.azure/

After that I just needed one run to succeed. Now it always retrieves my cache and I no longer have this issue!

@majastrz
Copy link
Member

We are unable to reproduce the issue ourselves without more information. Would anyone seeing this problem be able to provide any additional environment or networking details that could help us investigate this?

@danielmackay
Copy link

@majastrz - This is not specific to ACA, but seems to be any pipeline that tries to use bicep. You can reproduce this by running a bit bucket pipeline that has a step like this:

    - step:
        name: Deploy Infrastructure
        image: mcr.microsoft.com/azure-cli
        script:
          - az login --service-principal -u $AZURE_APP_ID -p $AZURE_PASSWORD --tenant $AZURE_TENANT_ID     
          - az deployment group create --resource-group $RG_NAME --template-file ./deploy/main.bicep 

@alex-frankel
Copy link
Collaborator

And is the issue intermittent, or does this happen every time you try to deploy with bitbucket?

@majastrz
Copy link
Member

And do you have any pipelines that exhibit that issue that are not using bitbucket?

@danielmackay
Copy link

@alex-frankel @majastrz - from my testing this happens every time. We are only using bitbucket.

We also get this behavior from running az bicep upgrade for example within our bitbucket pipeline:

az bicep upgrade
ERROR: Error while attempting to retrieve the latest Bicep version: HTTPSConnectionPool(host='[downloads.bicep.azure.com](http://downloads.bicep.azure.com/)', port=443): Max retries exceeded with url: /releases/latest (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f343fe94c70>: Failed to establish a new connection: [Errno -2] Name does not resolve')).

@ausy-tk
Copy link
Author

ausy-tk commented Mar 17, 2023

And do you have any pipelines that exhibit that issue that are not using bitbucket?
@alex-frankel @majastrz
We're using GitLab and the issue occurred intermittently. But as we use now the workaround in all our pipelines I cannot tell if it now would occur every time. But it should be reproducible by running a pipeline using az-cli with bicep templates.

@alex-frankel
Copy link
Collaborator

We are going to try to repro on our end, but any chance you can file a support case with BitBucket? It's very odd that it is only not working using that tool and it seems most likely the issue is on their end.

@alex-frankel alex-frankel changed the title ACA deployment fails with Bicep template - Error while attempting to retrieve the latest Bicep version - Failed to establish a new connection: Name does not resolve DNS resolution error when downloading bicep in bitbucket Mar 22, 2023
@brwilkinson
Copy link
Collaborator

brwilkinson commented Mar 23, 2023

Can you please test either of the following 2 workarounds ?

  • use the az cli version 2.34.1 instead of latest.
    • There may be a newer version that works, I didn't test extensively as yet.
image: mcr.microsoft.com/azure-cli:2.34.1   # workaround 1
# image: mcr.microsoft.com/azure-cli:latest

pipelines:
  default:
    - step:
        name: 'Deployment to Staging'
        deployment: staging
        script:
          - az --version
          - az bicep --help
          - az bicep install

image

With newer version of az cli, I see either of the the following

az bicep --help
az: 'bicep' is not in the 'az' command group. See 'az --help'. If the command is from an extension, please make sure the corresponding extension is installed. To learn more about extensions, please visit https://docs.microsoft.com/en-us/cli/azure/azure-cli-extensions-overview

It's possible the regression was related to these new config options?

  • see workaround below around setting one of these

image

I see the following on the install..

image

The install does appear to complete, with the exception.

image

Other possible workarounds appear to be as follows

image: mcr.microsoft.com/azure-cli:latest 

pipelines:
  default:
    - step:
        name: 'Deployment to Staging'
        deployment: staging
        script:
          - az --version
          - az config set bicep.use_binary_from_path=False   #workaround 2
          - az bicep --help
          - az bicep install

image

Adding text log for for follow up on az cli once we confirm the above workaround.

+ az bicep install
ERROR: The command failed with an unexpected error. Here is the traceback:
ERROR: No section: 'bicep'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/knack/cli.py", line 233, in invoke
    cmd_result = self.invocation.execute(args)
  File "/usr/local/lib/python3.10/site-packages/azure/cli/core/commands/__init__.py", line 663, in execute
    raise ex
  File "/usr/local/lib/python3.10/site-packages/azure/cli/core/commands/__init__.py", line 726, in _run_jobs_serially
    results.append(self._run_job(expanded_arg, cmd_copy))
  File "/usr/local/lib/python3.10/site-packages/azure/cli/core/commands/__init__.py", line 697, in _run_job
    result = cmd_copy(params)
  File "/usr/local/lib/python3.10/site-packages/azure/cli/core/commands/__init__.py", line 333, in __call__
    return self.handler(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/azure/cli/core/commands/command_operation.py", line 121, in handler
    return op(**command_args)
  File "/usr/local/lib/python3.10/site-packages/azure/cli/command_modules/resource/custom.py", line 3648, in install_bicep_cli
    ensure_bicep_installation(cmd.cli_ctx, release_tag=version, target_platform=target_platform)
  File "/usr/local/lib/python3.10/site-packages/azure/cli/command_modules/resource/_bicep.py", line 141, in ensure_bicep_installation
    use_binary_from_path = cli_ctx.config.get("bicep", "use_binary_from_path").lower()
  File "/usr/local/lib/python3.10/site-packages/knack/config.py", line 99, in get
    raise last_ex  # pylint:disable=raising-bad-type
  File "/usr/local/lib/python3.10/site-packages/knack/config.py", line 94, in get
    return config.get(section, option)
  File "/usr/local/lib/python3.10/site-packages/knack/config.py", line 208, in get
    return self.config_parser.get(section, option)
  File "/usr/local/lib/python3.10/configparser.py", line 783, in get
    d = self._unify_values(section, vars)
  File "/usr/local/lib/python3.10/configparser.py", line 1154, in _unify_values
    raise NoSectionError(section) from None
configparser.NoSectionError: No section: 'bicep'
To check existing issues, please visit: https://github.com/Azure/azure-cli/issues
To open a new issue, please run `az feedback`
Installing Bicep CLI v0.15.31...

@brwilkinson
Copy link
Collaborator

Tagging @ausy-tk @danielmackay @OlivierTD

Can you please test above workarounds?

  • Exception is different however I assume this was swallowed by the deployment exception.

@brwilkinson
Copy link
Collaborator

I believe this has been reported and a fix is rolling out:

@brwilkinson
Copy link
Collaborator

I see @danielmackay commented on that other thread that the workaround was successful.

@alex-frankel
Copy link
Collaborator

I'm going to close this one since it looks like Ben is right on the root cause and the issue is being tracked on the az CLI side. We can re-open if needed.

@az-core
Copy link

az-core commented Mar 29, 2023

We ran into above DNS issue intermittently from time to time. Below are my observations so far -

We also encountered the issue mentioned Error with Azure CLI 2.46.0 and Bicep if no bicep configuration exists. Applying the fix (and temporary work around) for #25710 resolves the bicep configuration problem but not the intermittent DNS problem.

We are using the latest image mcr.microsoft.com/azure-cli (2.46.0) where bicep was installed using command az bicep install. Also tried using other commands: az bicep list-versions and install a specific version of the bicep tool. All these resulted in the DNS errors intermittently.

For troubleshooting, switched over to using mcr.microsoft.com/azure-functions/dotnet:4-dotnet6-core-tools instead and the DNS issue seems resolved. I haven't tried using any other image with Azure CLI yet. Switching back to azure-cli image resurfaces the problem intermittently.

@krizskp
Copy link

krizskp commented Mar 30, 2023

I also use BitBucket pipelines and this DNS issue was happening since before before the Azure/azure-cli#25710 issue started. Using workaround 2, it fixed the bicep config issue but the DNS issue still remained. Since recently, the DNS issue seems to become more frequent. There was a day when bicep finally got installed after 11 runs of the pipeline.

So in agreement with az-core's comment above, this is still not fixed.

@brwilkinson
Copy link
Collaborator

@krizskp can you please test using the older azure-cli image ?

i.e. workaround 1
image: mcr.microsoft.com/azure-cli:2.34.1 # workaround 1

@krizskp
Copy link

krizskp commented Mar 30, 2023

@krizskp can you please test using the older azure-cli image ?

i.e. workaround 1 image: mcr.microsoft.com/azure-cli:2.34.1 # workaround 1

No, this didn't work for me.

@brwilkinson
Copy link
Collaborator

brwilkinson commented Mar 30, 2023

@krizskp what did you mean by "didn't work" ?

are you using image: mcr.microsoft.com/azure-cli:latest at the moment ?

Can you please provide some more info or logs from your error messages that relate to the DNS error.

are you using hosted runners or what Workspace runners are you using ?

@krizskp
Copy link

krizskp commented Mar 30, 2023

@krizskp what did you mean by "didn't work" ?

are you using image: mcr.microsoft.com/azure-cli:latest at the moment ?

With workaround 1, having azure-cli version set to 2.34.1, I still get
ERROR: Error while attempting to retrieve the latest Bicep version: HTTPSConnectionPool(host='[downloads.bicep.azure.com](http://downloads.bicep.azure.com/)', port=443): Max retries exceeded with url: /releases/latest (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f8f2f3706d0>: Failed to establish a new connection: [Errno -2] Name does not resolve')).

@brwilkinson
Copy link
Collaborator

@krizskp Thank you ✅ will re-open.

@brwilkinson brwilkinson reopened this Mar 30, 2023
@brwilkinson
Copy link
Collaborator

@krizskp are you using hosted runners or what Workspace runners are you using ?

@brwilkinson
Copy link
Collaborator

brwilkinson commented Mar 30, 2023

Appears to be other reports of this more widespread i.e. not specific to bicep

@krizskp
Copy link

krizskp commented Mar 30, 2023

@krizskp are you using hosted runners or what Workspace runners are you using ?

@brwilkinson I'm running it on BitBucket pipelines.

Fails at this command: az deployment group create ...

@brwilkinson
Copy link
Collaborator

brwilkinson commented Mar 30, 2023

@krizskp was this intermittent for you?

Also what region? Can you check which DNS servers you are using?

I still am not able to repro so far... in bitbucket... I can setup a deployment schedule if it is intermittent ?

image: mcr.microsoft.com/azure-cli:latest

pipelines:
  default:
    - step:
        name: 'Deployment to Staging'
        deployment: staging
        script:
          - apk update 
          - apk add bind-tools
          - uname -a
          - cat /etc/resolv.conf
          - nslookup downloads.bicep.azure.com
          - dig downloads.bicep.azure.com
          - ping downloads.bicep.azure.com -4 -c 2

all resolve correctly to 13.107.237.69

image

@krizskp
Copy link

krizskp commented Mar 30, 2023

@brwilkinson Yes, it was intermittent.

Here's the output:

image

image

image

image

@brwilkinson
Copy link
Collaborator

Thank you @krizskp for the output...

It looks like you are using an internal IP Address for your DNS server? ec2.local are you familiar with this server?

either way it appears DNS is actually working correctly... however the deployment fails...

So are you also also applying the workaround ?

az config set bicep.use_binary_from_path=False

# full steps for repro and testing.

image: mcr.microsoft.com/azure-cli:latest

pipelines:
  default:
    - step:
        name: 'Deployment to Staging'
        deployment: staging
        script:
          - apk update 
          - apk add bind-tools
          - uname -a
          - cat /etc/resolv.conf
          - nslookup downloads.bicep.azure.com
          - dig downloads.bicep.azure.com
          - ping downloads.bicep.azure.com -4 -c 2

          - az --version
          # - az config set bicep.version_check=True
          - az config set bicep.use_binary_from_path=False
          - az bicep --help
          - az bicep install

output from the additional steps

image

@krizskp
Copy link

krizskp commented Mar 30, 2023

@brwilkinson I'm using BitBucket pipelines and don't configure the DNS myself. It is managed by BitBucket I suppose. And ec2.local also comes with their pipeline docker containers.

And yes, I'm using the az config set bicep.use_binary_from_path=False workaround, but that is for another (Azure/azure-cli#25710) issue.

@brwilkinson
Copy link
Collaborator

Please test the complete config below and provide the logs.

image: mcr.microsoft.com/azure-cli:latest

pipelines:
  default:
    - step:
        name: 'Deployment to Staging'
        deployment: staging
        script:
          - apk update 
          - apk add bind-tools
          - uname -a
          - cat /etc/resolv.conf
          - nslookup downloads.bicep.azure.com
          - dig downloads.bicep.azure.com
          - ping downloads.bicep.azure.com -4 -c 2

          - az --version
          # - az config set bicep.version_check=True
          - az config set bicep.use_binary_from_path=False
          - az bicep --help
          - az bicep install

@brwilkinson
Copy link
Collaborator

As mentioned further up in this thread...

  • Exception is different however I assume this was swallowed by the deployment exception.

We believe it's directly related and have confirmed with many that the workaround has been successful.

You just need to add the following prior to running your deployment... until the fix is rolled out in the az cli image.

az config set bicep.use_binary_from_path=False
az deployment group create --template-file xyx etc .....

@krizskp
Copy link

krizskp commented Mar 30, 2023

Please test the complete config below and provide the logs.

image: mcr.microsoft.com/azure-cli:latest

pipelines:
  default:
    - step:
        name: 'Deployment to Staging'
        deployment: staging
        script:
          - apk update 
          - apk add bind-tools
          - uname -a
          - cat /etc/resolv.conf
          - nslookup downloads.bicep.azure.com
          - dig downloads.bicep.azure.com
          - ping downloads.bicep.azure.com -4 -c 2

          - az --version
          # - az config set bicep.version_check=True
          - az config set bicep.use_binary_from_path=False
          - az bicep --help
          - az bicep install

This is another message which sometimes pops up.

image

image

image

image

image

@brwilkinson
Copy link
Collaborator

Thank you @krizskp ... interesting.

I don't see the warning, to show the 'use_binary_from_path=False'

image

Since the DNS looks good, can we try the following.

image: mcr.microsoft.com/azure-cli:latest

pipelines:
  default:
    - step:
        name: 'Deployment to Staging'
        deployment: staging
        script:
          # - apk update 
          # - apk add bind-tools
          # - uname -a
          # - cat /etc/resolv.conf
          # - nslookup downloads.bicep.azure.com
          # - dig downloads.bicep.azure.com
          # - ping downloads.bicep.azure.com -4 -c 2

          # - az --version
          # - az config set bicep.version_check=True
          - az config set bicep.use_binary_from_path=False
          #- az bicep --help
          - az config get
          - az bicep install

e.g.

image

@brwilkinson
Copy link
Collaborator

Looks like the fix for the config has been merged

I guess we will see a new version in the next week.

@brwilkinson
Copy link
Collaborator

Azure-CLI 2.47.0 is now latest, will close.

@anthony-c-martin
Copy link
Member

anthony-c-martin commented Apr 4, 2023

@krizskp if you get a chance, some other commands that would be useful to help debug:

  • nslookup aka.ms - nslookup for the first URL that azCLI attempts to resolve
  • nslookup downloads.bicep.azure.com - nslookup for the second URL that azCLI attempts to resolve
  • curl -v -LO https://aka.ms/BicepLatestRelease - this attempts the same initial GET as azCLI, with detailed logging enabled
  • curl -v -LO https://downloads.bicep.azure.com/v0.15.31/bicep-linux-x64 - the second GET that azCLI attempts, with detailed logging

@brwilkinson brwilkinson reopened this Apr 4, 2023
@brwilkinson
Copy link
Collaborator

Reopen - confirm DNS issue has resolution.

@brwilkinson
Copy link
Collaborator

Hi @krizskp any additional status updates on the DNS failures?

@ghost ghost locked as resolved and limited conversation to collaborators Jun 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests