Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade failed due to use of < for str and int #1738

Closed
3 tasks done
kurokobo opened this issue Mar 1, 2024 · 6 comments · Fixed by #1745
Closed
3 tasks done

Upgrade failed due to use of < for str and int #1738

kurokobo opened this issue Mar 1, 2024 · 6 comments · Fixed by #1745

Comments

@kurokobo
Copy link
Contributor

kurokobo commented Mar 1, 2024

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that the AWX Operator is open source software provided for free and that I might not receive a timely response.

Bug Summary

The conditionals that are introduced in #1486 is comparing string and int that causes task failure.

AWX Operator version

upstream devel

AWX version

23.9.0

Kubernetes platform

kubernetes

Kubernetes/Platform version

v1.28.6+k3s2

Modifications

no

Steps to reproduce

  1. Deploy minimal AWX CR with Operator 2.12.2
  2. Upgrade Operator to the latest devel image

Expected results

AWX is upgraded with PSQL15

Actual results

The first reconciliation loop is completed successfully, but the second loop is failed with following error (this is formatted log):

TASK [installer : Set path to PG_VERSION file for given container image] *******
task path: /opt/ansible/roles/installer/tasks/database_configuration.yml:159
fatal: [localhost]: FAILED! => {"msg": "
The conditional check '(_previous_upgraded_pg_version | default(false)) | ternary(_previous_upgraded_pg_version < supported_pg_version, true)' failed. The error was: Unexpected templating type error occurred on ({% if (_previous_upgraded_pg_version | default(false)) | ternary(_previous_upgraded_pg_version < supported_pg_version, true) %} True {% else %} False {% endif %}): '<' not supported between instances of 'AnsibleUnsafeText' and 'int'. '<' not supported between instances of 'AnsibleUnsafeText' and 'int'

The error appears to be in '/opt/ansible/roles/installer/tasks/database_configuration.yml': line 159, column 7, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

  block:
    - name: Set path to PG_VERSION file for given container image
      ^ here
"}

PLAY RECAP *********************************************************************
localhost                  : ok=47   changed=0    unreachable=0    failed=1    skipped=23   rescued=0    ignored=0   

Additional information

@john-westcott-iv @rooftopcellist @TheRealHaoLiu @aknochow
F.Y.I.

Operator Logs

No response

@kurokobo
Copy link
Contributor Author

kurokobo commented Mar 1, 2024

Not fully tested but this seems to happen for upgrading existing AWX CR only. My OP updated.

@kurokobo kurokobo changed the title Installation failed due to unsafe conditionals Upgrade failed due to unsafe conditionals Mar 1, 2024
@kurokobo kurokobo changed the title Upgrade failed due to unsafe conditionals Upgrade failed due to use of < for str and int Mar 1, 2024
@kurokobo
Copy link
Contributor Author

kurokobo commented Mar 1, 2024

Updated again my OP.

The first reconciliation is succeeded and append following status to AWX CR.

  status:
    ...
    upgradedPostgresVersion: "15"

This is string, and will be compared against int in the second loop. This causes task failure.

Any reason that upgradedPostgresVersion is changed to string in https://github.com/ansible/awx-operator/pull/1486 ?

@aknochow aknochow self-assigned this Mar 1, 2024
@aknochow
Copy link
Member

aknochow commented Mar 1, 2024

@kurokobo PR is in with the fix: #1741

@kurokobo
Copy link
Contributor Author

kurokobo commented Mar 3, 2024

@aknochow
Tested #1741 and confirmed that causes another error 😞

TASK [Update upgradedPostgresVersion status] ******************************** 
fatal: [localhost]: FAILED! => {"changed": false, "error": {"body": "{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"AWX.awx.ansible.com \\\"awx-demo\\\" is invalid: upgradedPostgresVersion: Invalid value: \\\"integer\\\": upgradedPostgresVersion in body must be of type string: \\\"integer\\\"\",\"reason\":\"Invalid\",\"details\":{\"name\":\"awx-demo\",\"group\":\"awx.ansible.com\",\"kind\":\"AWX\",\"causes\":[{\"reason\":\"FieldValueTypeInvalid\",\"message\":\"Invalid value: \\\"integer\\\": upgradedPostgresVersion in body must be of type string: \\\"integer\\\"\",\"field\":\"upgradedPostgresVersion\"}]},\"code\":422}\n", "reason": "Unprocessable Entity", "status": 422}, "msg": "Failed to replace status: 422\nReason: Unprocessable Entity\nHTTP response headers: HTTPHeaderDict({'Audit-Id': 'c8fcf9a0-89e9-4878-bf68-b8d768150468', 'Cache-Control': 'no-cache, private', 'Content-Length': '534', 'Content-Type': 'application/json', 'Date': 'Sun, 03 Mar 2024 05:12:28 GMT', 'X-Kubernetes-Pf-Flowschema-Uid': '4ac800a9-f034-409c-bad0-41ab29e6b253', 'X-Kubernetes-Pf-Prioritylevel-Uid': '927d6283-0d47-46fc-b3cd-5cfd74d30e3c'})\nHTTP response body: b'{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"AWX.awx.ansible.com \\\\\"awx-demo\\\\\" is invalid: upgradedPostgresVersion: Invalid value: \\\\\"integer\\\\\": upgradedPostgresVersion in body must be of type string: \\\\\"integer\\\\\"\",\"reason\":\"Invalid\",\"details\":{\"name\":\"awx-demo\",\"group\":\"awx.ansible.com\",\"kind\":\"AWX\",\"causes\":[{\"reason\":\"FieldValueTypeInvalid\",\"message\":\"Invalid value: \\\\\"integer\\\\\": upgradedPostgresVersion in body must be of type string: \\\\\"integer\\\\\"\",\"field\":\"upgradedPostgresVersion\"}]},\"code\":422}\\n'\nOriginal traceback: \n  File \"/usr/local/lib/python3.9/site-packages/kubernetes/dynamic/client.py\", line 55, in inner\n    resp = func(self, *args, **kwargs)\n\n  File \"/usr/local/lib/python3.9/site-packages/kubernetes/dynamic/client.py\", line 270, in request\n    api_response = self.client.call_api(\n\n  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 348, in call_api\n    return self.__call_api(resource_path, method,\n\n  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 180, in __call_api\n    response_data = self.request(\n\n  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 407, in request\n    return self.rest_client.PATCH(url,\n\n  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/rest.py\", line 299, in PATCH\n    return self.request(\"PATCH\", url,\n\n  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/rest.py\", line 238, in request\n    raise ApiException(http_resp=r)\n"}
{"reason":"FieldValueTypeInvalid","message":"Invalid value: "integer": upgradedPostgresVersion in body must be of type string: "integer"","field":"upgradedPostgresVersion"}

Since the status.upgradedPostgresVersion is defined as string in CRD:

upgradedPostgresVersion:
description: Status to indicate that the database has been upgraded to the version in the status
type: string

We should store upgradedPostgresVersion as string with {{ ... | string }} and use it with {{ ... | int }} in conditionals. Changing status.upgradedPostgresVersion to int may be solve the issue too, but changing the type of the status in CRD may have side effects, since I think the instances of users who have been upgrading since PSQL 12 already have values in string type. Not tested on my side at all.

@aknochow
Copy link
Member

aknochow commented Mar 3, 2024

@kurokobo I'll check this out on Monday. How are you reproducing the error? Wondering why I didn't run into this in my upgrade test locally.

@kurokobo
Copy link
Contributor Author

kurokobo commented Mar 3, 2024

@aknochow
Eventually, the reconsiliation completes without the status.upgradedPostgresVersion being added and the state stabilizes, so it is difficult to notice the error unless you follow the logs carefully.

The reproduction procedure is as in the OP, but more precisely as follows:

  1. Checkout 2.12.2: git checkout 2.12.2

  2. Clean up existing deployment: make undeploy

  3. Deploy 2.12.2: IMG=quay.io/ansible/awx-operator:2.12.2 make deploy

  4. Deploy minimal AWX:

    ---
    apiVersion: awx.ansible.com/v1beta1
    kind: AWX
    metadata:
      namespace: awx
      name: awx-demo
    spec:
      service_type: nodeport
  5. Wait the state to be stabilized. Logs should contain two PLAY RECAP at this step.

    $ kubectl -n awx logs deployments/awx-operator-controller-manager | grep -e "^PLAY RECAP" -A 1
    PLAY RECAP *********************************************************************
    localhost                  : ok=113  changed=33   unreachable=0    failed=0    skipped=52   rescued=0    ignored=2   
    --
    PLAY RECAP *********************************************************************
    localhost                  : ok=83   changed=1    unreachable=0    failed=0    skipped=80   rescued=0    ignored=1
  6. Ensure PSQL13 is deployed:

    $ kubectl -n awx get pod
    NAME                                               READY   STATUS    RESTARTS   AGE
    awx-operator-controller-manager-589cdd869b-bhq7b   2/2     Running   0          13m
    awx-demo-postgres-13-0                             1/1     Running   0          11m
    awx-demo-task-5ff4c6dbdb-t26b5                     4/4     Running   0          11m
    awx-demo-web-c99d685d8-8sll6                       3/3     Running   0          11m
  7. Upgrade Operator to the latest image (which includes Fixing postgres upgrade conditional #1741): IMG=quay.io/ansible/awx-operator:devel make deploy

  8. Wait the state to be stabilized. Logs should contain multiple PLAY RECAP at this step, and the first one includes failed.

    $ kubectl -n awx logs deployments/awx-operator-controller-manager | grep -e "^PLAY RECAP" -A 1
    PLAY RECAP *********************************************************************
    localhost                  : ok=108  changed=14   unreachable=0    failed=1    skipped=74   rescued=0    ignored=1   ✅
    --
    PLAY RECAP *********************************************************************
    localhost                  : ok=84   changed=2    unreachable=0    failed=0    skipped=84   rescued=0    ignored=1   
    --
    PLAY RECAP *********************************************************************
    localhost                  : ok=84   changed=1    unreachable=0    failed=0    skipped=84   rescued=0    ignored=1 
  9. Dig into the logs, and find the task that marked as failed.

    $ kubectl -n awx logs deployments/awx-operator-controller-manager | grep -e "^fatal:" -B 1 -A 1
     TASK [Check if legacy queue is present] ******************************** 
    fatal: [localhost]: FAILED! => {"changed": false, "rc": 1, "return_code": 1, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
    ...ignoring
    --
     TASK [Update upgradedPostgresVersion status] ********************************    ✅
    fatal: [localhost]: FAILED! => {"changed": false, "error": {"body": "{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"AWX.awx.ansible.com \\\"awx-demo\\\" is invalid: upgradedPostgresVersion: Invalid value: \\\"integer\\\": upgradedPostgresVersion in body must be of type string: \\\"integer\\\"\",\"reason\":\"Invalid\",\"details\":{\"name\":\"awx-demo\",\"group\":\"awx.ansible.com\",\"kind\":\"AWX\",\"causes\":[{\"reason\":\"FieldValueTypeInvalid\",\"message\":\"Invalid value: \\\"integer\\\": upgradedPostgresVersion in body must be of type string: \\\"integer\\\"\",\"field\":\"upgradedPostgresVersion\"}]},\"code\":422}\n", "reason": "Unprocessable Entity", "status": 422}, "msg": "Failed to replace status: 422\nReason: Unprocessable Entity\nHTTP response headers: HTTPHeaderDict({'Audit-Id': '445049f1-8a2f-4d96-974d-ab6330b24e03', 'Cache-Control': 'no-cache, private', 'Content-Length': '534', 'Content-Type': 'application/json', 'Date': 'Sun, 03 Mar 2024 15:43:55 GMT', 'X-Kubernetes-Pf-Flowschema-Uid': '4ac800a9-f034-409c-bad0-41ab29e6b253', 'X-Kubernetes-Pf-Prioritylevel-Uid': '927d6283-0d47-46fc-b3cd-5cfd74d30e3c'})\nHTTP response body: b'{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"AWX.awx.ansible.com \\\\\"awx-demo\\\\\" is invalid: upgradedPostgresVersion: Invalid value: \\\\\"integer\\\\\": upgradedPostgresVersion in body must be of type string: \\\\\"integer\\\\\"\",\"reason\":\"Invalid\",\"details\":{\"name\":\"awx-demo\",\"group\":\"awx.ansible.com\",\"kind\":\"AWX\",\"causes\":[{\"reason\":\"FieldValueTypeInvalid\",\"message\":\"Invalid value: \\\\\"integer\\\\\": upgradedPostgresVersion in body must be of type string: \\\\\"integer\\\\\"\",\"field\":\"upgradedPostgresVersion\"}]},\"code\":422}\\n'\nOriginal traceback: \n  File \"/usr/local/lib/python3.9/site-packages/kubernetes/dynamic/client.py\", line 55, in inner\n    resp = func(self, *args, **kwargs)\n\n  File \"/usr/local/lib/python3.9/site-packages/kubernetes/dynamic/client.py\", line 270, in request\n    api_response = self.client.call_api(\n\n  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 348, in call_api\n    return self.__call_api(resource_path, method,\n\n  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 180, in __call_api\n    response_data = self.request(\n\n  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py\", line 407, in request\n    return self.rest_client.PATCH(url,\n\n  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/rest.py\", line 299, in PATCH\n    return self.request(\"PATCH\", url,\n\n  File \"/usr/local/lib/python3.9/site-packages/kubernetes/client/rest.py\", line 238, in request\n    raise ApiException(http_resp=r)\n"}
    
    --
     TASK [Check if legacy queue is present] ******************************** 
    fatal: [localhost]: FAILED! => {"changed": false, "rc": 1, "return_code": 1, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
    ...ignoring
    --
     TASK [Check if legacy queue is present] ******************************** 
    fatal: [localhost]: FAILED! => {"changed": false, "rc": 1, "return_code": 1, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
    ...ignoring
  10. Ensure PSQL is upgraded to 15

    $ kubectl -n awx get pod
    NAME                                               READY   STATUS    RESTARTS   AGE
    awx-operator-controller-manager-77467457c9-m4rwc   2/2     Running   0          29m
    awx-demo-postgres-15-0                             1/1     Running   0          28m
    awx-demo-task-5b765b4c65-kg7jj                     4/4     Running   0          27m
    awx-demo-web-7476466fb7-65sqc                      3/3     Running   0          27m
  11. Ensure status.upgradedPostgresVersion is not set.

    $ kubectl -n awx get awx awx-demo -o yaml
    ...
    status:
      adminPasswordSecret: awx-demo-admin-password
      adminUser: admin
      broadcastWebsocketSecret: awx-demo-broadcast-websocket
      conditions:
      - lastTransitionTime: "2024-03-03T15:44:43Z"
        reason: ""
        status: "False"
        type: Failure
      - lastTransitionTime: "2024-03-03T15:43:55Z"
        reason: Successful
        status: "True"
        type: Running
      - lastTransitionTime: "2024-03-03T15:45:31Z"
        reason: Successful
        status: "True"
        type: Successful
      image: quay.io/ansible/awx:latest
      postgresConfigurationSecret: awx-demo-postgres-configuration
      secretKeySecret: awx-demo-secret-key
      version: 23.9.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants