New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWX DB migration gets in loop and upgrade never ends #14314
Labels
Comments
#14311 proposed solution |
fosterseth
changed the title
AWX DB migration gets in loop and upgrade newer ends
AWX DB migration gets in loop and upgrade never ends
Aug 9, 2023
TheRealHaoLiu
added a commit
to TheRealHaoLiu/awx
that referenced
this issue
Oct 11, 2023
TheRealHaoLiu
added a commit
to TheRealHaoLiu/awx-operator
that referenced
this issue
Oct 11, 2023
related to ansible/awx#14314 and ansible/awx#14566 each step of the migration can run for undetermined amount of time there's no output between migration steps so the exec connection might be killed due to idle adding background keepalive to prevent exec to be killed due to idle connection
TheRealHaoLiu
added a commit
to TheRealHaoLiu/awx
that referenced
this issue
Oct 12, 2023
TheRealHaoLiu
added a commit
to TheRealHaoLiu/awx
that referenced
this issue
Oct 13, 2023
TheRealHaoLiu
added a commit
to TheRealHaoLiu/awx
that referenced
this issue
Oct 13, 2023
fixes ansible#14314 Removing retry attempt limit when waiting for migration to complete.
3 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Please confirm the following
security@ansible.com
instead.)Bug Summary
When upgrading from 22.3 to 22.5, we noticed that the upgrade took much longer than expected, was stuck in the DB migration process, and didn't allow AWX to start. This upgrade included one of the database migration tasks.
This issue happened because of a bug in AWX DB migration logic. When we upgrade AWX, the AWX operator starts the DB migration. The following script is responsible for the retry mechanism.
The script makes a total of 30 retries to check the status of AWX migration. It will wait for a maximum of 60 seconds (TIMEOUT=60) for each attempt. However, an exponential backoff strategy dynamically calculates the waiting time between attempts.
The next_sleep function calculates the waiting time before the next attempt. It starts with MIN_SLEEP (0.5 seconds) and doubles the value with each attempt until it reaches MAX_SLEEP (30 seconds). So, the waiting times between attempts will be as follows:
Attempt 1: 0.5 seconds
Attempt 2: 1 second
Attempt 3: 2 seconds
Attempt 4: 4 seconds
Attempt 5: 8 seconds
Attempt 6: 16 seconds
Attempt 7: 30 seconds (maximum reached)
Attempt 8 and beyond: 30 seconds (maximum reached)
After the 7th attempt, the waiting time will remain constant at 30 seconds since it has reached the maximum configured value. Suppose none of the attempts succeeds within the specified TIMEOUT period (60 seconds). In that case, the script will fail with an error message "ERROR: Database migrations not applied" and return with exit code 1 and stop the migration task.
It means that if we have a large DB that requires more time to finish the particular migration task, the migration will never be finished since it will be stuck in a loop.
The workaround:
To resolve this issue, we found a workaround with the AWX community and implemented it. To go with a workaround, you should do the following actions:
Scale in awx-operator-controller-manager replicas to 0
Scale in awx-task replicas to 0
Scale in awx-web replicas to 0
And replace the args section with
AWX version
22.5
Select the relevant components
Installation method
kubernetes
Modifications
no
Ansible version
No response
Operating system
No response
Web browser
No response
Steps to reproduce
Upgrade the AWX with the large database from 22.3 to 22.5
Expected results
DB migration task is running without interruption
Actual results
DB migration task is interrupted with the pod restart
Additional information
No response
The text was updated successfully, but these errors were encountered: