Investigate Data Refreshes blocking during popularity steps #1473
Labels
💻 aspect: code
Concerns the software code in the repository
🛠 goal: fix
Bug fix
🟨 priority: medium
Not blocking but should be addressed soon
🧱 stack: catalog
Related to the catalog and Airflow DAGs
Projects
Description
The data refresh DAGs use a
SingleRunExternalDAGsSensor
to enforce a concurrency restraint. The actual data refresh steps on the ingestion server should only run for one media type at a time. The Sensor accomplishes this by checking to see if if there is another running data refresh DAG, and if that DAG's own Sensor has passed. If true, it blocks until the other DAG is finished.Recently we ran an audio data refresh in production while the image refresh was running simultaneously. The audio's
wait_for_data_refresh
sensor went up for reschedule, even though the image refresh was still at theupdate_materialized_popularity_view
step, meaning that its own sensor had not completed.My initial read is that the current implementation of the Sensor should work here; we should investigate what happened and try to reproduce it.
Additional context
The workaround for this bug is setting the
wait_for_data_refresh
tosuccess
after manually verifying the state of the refreshes. This is not sustainable but allows us to pass refreshes.Resolution
The text was updated successfully, but these errors were encountered: