New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase timeout while waiting for RDS database rename #2961
Comments
@AetherUnbound , I did a little poking around (ha ha, sorry, pun intended) and am making some notes on things to try. If any of this looks totally misguided, please do let me know!
Oh, oops! Just realized that I don't have the access I would need to test the dag. Fun thought experiment though, and I'll be interested to see what happens! |
Sorry it's taken me so long to get back to you @rwidom! Your assessment looks correct, it looks like it should just emit What's looks buggy to me is this bit of the hook's call: The |
Made an upstream PR to try and address this, since it seems counter to what the docs behavior describes! 😄 apache/airflow#34773 |
The upstream PR was merged! 🚀 Now we just need to wait for it to be released 😄 |
This should be fixed, we just need to upgrade Airflow so I'm going to go ahead and close it! |
Description
The last several runs of the staging database restore DAG have all had the
await_<rename>
steps time out before the new database has become available. Upon retries, the database rename had completed and all steps resumed as normal.From the logs, it looks like a failure actually occurs when the database does not exist yet, rather than an empty response. The current DAG is set up to poke for up to an hour, but all requests are failing which then defaults to Airflow's retry mechanisms which are much shorter than one hour:
openverse/catalog/dags/database/staging_database_restore/staging_database_restore.py
Lines 171 to 181 in 6a8b184
Logs for the failure:
We should investigate to see if there is a way to retry the sensor even on
DBInstanceNotFound
errors. Otherwise, we should leverage Airflow's retry mechanisms but extend either the retry count or the delays so it allows up to one hour for the database rename.The text was updated successfully, but these errors were encountered: