Prevent iNaturalist from running alongside any other DAGs #1276
Labels
💻 aspect: code
Concerns the software code in the repository
✨ goal: improvement
Improvement to an existing user-facing feature
help wanted
Open to participation from the community
🟨 priority: medium
Not blocking but should be addressed soon
🧱 stack: catalog
Related to the catalog and Airflow DAGs
Projects
Description
Similar to #1277 in reasoning, iNaturalist can be intensive and disruptive for the other DAGs running on the instance. If possible, we would like to prevent iNaturalist from running while other DAGs are in progress. One way to do this would be to leverage Airflow pools. Currently most of our tasks sit in the
default_pool
, which has 128 slots. (We could increase this to a higher number if desired).Tasks which use pools can request multiple slots. One way to prevent iNaturalist concurrency would be to to set its
pool_slots
to the maximum size fordefault_pool
, meaning it would require all pool slots to be available. This, paired with a reduced priority weight would mean that all other tasks would run prior to iNaturalist, and the latter could only run if all slots are available. This would apply per-task, so even if iNaturalist ran a task, as soon as it completed a task it would free up the pool slots for other DAGs.Alternatives
Additional context
See #1277 for an additional/alternate option.
The text was updated successfully, but these errors were encountered: