Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatal bug: execution locked to amd64 cpu arch #125

Closed
sean-freeman opened this issue Mar 16, 2024 · 5 comments
Closed

fatal bug: execution locked to amd64 cpu arch #125

sean-freeman opened this issue Mar 16, 2024 · 5 comments

Comments

@sean-freeman
Copy link

sean-freeman commented Mar 16, 2024

fatal bug: execution locked to amd64 cpu arch

Problem

Python Function _install_terraform using constant TERRAFORM_BASE_URL correctly uses OS (e.g. windows, linux, darwin) but Python function is incorrectly hardcoded to _amd64.zip.

Python Function _install_ibmcloud_tf_provider using constant IBM_PROVIDER_BASE_URL correctly uses OS (e.g. windows, linux, darwin) but Python function is incorrectly hardcoded to _amd64.zip.

In addition, the Python Function __init__ contains a hardcoded value for the constant terraform_version='0.12.20' so only this legacy version of Terraform will ever be downloaded.

Problem outcome

For any execution from a host that is not x86_64, the Ansible code will fail.

For most OS this will cause immediate crash, as the Python code repeatedly attempts to download the compiled binary file for the wrong CPU Architecture.

For macOS (darwin), the error is invisible as Rosetta 2 will attempt binary translation from x86_64 to arm64 transparently, and each Terraform plan/apply (usually 10-30 seconds) will take 5+ minutes; the end-user will be confused why the performance is so poor, believing this to be an issue with their network connection.

@sean-freeman
Copy link
Author

sean-freeman commented Mar 17, 2024

Workarounds for macOS with Apple Silicon

module_utils/ibmcloud.py - Append Terraform block to generated provider.tf files (version lock to Terraform Provider for IBM Cloud 1.51.0)

    TF_PROVIDER_TEMPLATE = """\
    terraform {{
        required_version = ">= 1.0"
        required_providers {{ ibm = {{ source  = "IBM-Cloud/ibm" , version = ">= {ibm_provider_version}" }} }}
    }}
    provider "ibm" {{
        ##version          = ">= {ibm_provider_version}"
    {{% if generation is not none %}}
...
...

module_utils/ibmcloud.py - Force default Terraform version to last of MIT release (1.5.5)

    def __init__(
            self,
            parameters,
            terraform_dir,
            ibm_provider_version,
            terraform_version='1.5.5',
            env=None):

module_utils/ibmcloud.py - Change hardcoded value from amd64 to arm64

import platform
...

    def __init__(
        ...
        self.platform = sys.platform
        if self.platform.startswith('linux'):
            self.platform = 'linux'
        self.cpu_arch = platform.machine()
        if "x86_64" in self.cpu_arch:
            self.cpu_arch = 'amd64'
        elif "386" in self.cpu_arch:
            self.cpu_arch = '386'

    def _install_terraform(self):
        ...
        self._download_extract_zip(
            "{0}{1}/terraform_{1}_{2}_{3}.zip".format(
                self.TERRAFORM_BASE_URL, self.terraform_version, self.platform, self.cpu_arch))
                ...

    def _install_ibmcloud_tf_provider(self):
        ...
        self._download_extract_zip(
            "{0}v{1}/terraform-provider-ibm_{1}_{2}_{3}.zip".format(
                self.IBM_PROVIDER_BASE_URL, self.ibm_provider_version, self.platform, self.cpu_arch))
                ...

@sean-freeman
Copy link
Author

sean-freeman commented May 3, 2024

Workarounds for increased execution speed on macOS with Apple Silicon

Due to increase execution speed, during looped Ansible Tasks the same temporary directory may attempt to be created/used and cause failures.

module_utils/ibmcloud.py - Append random integer to timestamp

import random
...

        def tf_subdir_path():
            ...
            return os.path.join(self.terraform_dir, (timestamp + str(random.randint(1, 1000000))))
            ...

@jaywcarman
Copy link
Collaborator

@sean-freeman, thanks for taking the time to write up this issue. While I'm not a maintainer here, I can offer some insight from when I originally put this collection together.

Python Function _install_terraform using constant TERRAFORM_BASE_URL correctly uses OS (e.g. windows, linux, darwin) but Python function is incorrectly hardcoded to _amd64.zip.

Python Function _install_ibmcloud_tf_provider using constant IBM_PROVIDER_BASE_URL correctly uses OS (e.g. windows, linux, darwin) but Python function is incorrectly hardcoded to _amd64.zip.

The URL parsing was written to work on an amd64 machine running Linux, MacOS or Windows. Within that context, I think the bug fix is to throw an exception when running on a non-amd64 machine. Something like this:

if platform.machine().lower() not in ['x86_64', 'amd64']:
    raise AnsibleError("Only 'x86_64' architecture is supported. Detected '%s'", platform.machine())

Supporting additional architectures would be a great enhancement. Too bad Terraform never released ppc64le binaries or else this would have been a priority back when I was working on it.

In addition, the Python Function __init__ contains a hardcoded value for the constant terraform_version='0.12.20' so only this legacy version of Terraform will ever be downloaded.

This version is set in the (closed-source) generator. The idea was that this collection version would follow the IBM Cloud provider version and whatever Terraform version they supported/tested. Unfortunately it was never updated.

Problem outcome

For any execution from a host that is not x86_64, the Ansible code will fail.

For most OS this will cause immediate crash, as the Python code repeatedly attempts to download the compiled binary file for the wrong CPU Architecture.

For macOS (darwin), the error is invisible as Rosetta 2 will attempt binary translation from x86_64 to arm64 transparently, and each Terraform plan/apply (usually 10-30 seconds) will take 5+ minutes; the end-user will be confused why the performance is so poor, believing this to be an issue with their network connection.

Yuck, this is a nasty failure mode. I'm happy to help to try and get some kind of update here that prevents wasting users' time.

@jaywcarman
Copy link
Collaborator

Workarounds for increased execution speed on macOS with Apple Silicon

Due to increase execution speed, during looped Ansible Tasks the same temporary directory may attempt to be created/used and cause failures.

module_utils/ibmcloud.py - Append random integer to timestamp

Ah, yes - assuming the timestamp to microseconds would always be unique was bad.

I'd prefer to use the uuid module instead of random.

@sean-freeman
Copy link
Author

Resolved by PR #126

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants