-
Notifications
You must be signed in to change notification settings - Fork 16.5k
Description
Description
Schedule tasks based on labels/capabilities with and+or matching, instead of just queue names.
Use case/motivation
Imagine you have tasks that can only run on specific nodes, for example GPUs, different operating systems or other external HW-peripherals connected.
Some tasks have partial constraints,
Task A: requires gpu
Task B: requires linux
Task C: requires gpu && linux
Task D: requires gpu && windows
Task E: requires gpu || mac-m1-cpu
To serve the above you might have 5 different nodes
Node 1: Linux (can serve task B only)
Node 2: Linux+Gpu (can serve A+B+C+D+E)
Node 3: Windows+Gpu (can serve A+D)
Optimizing this type of planning is, from what i understand, not possible with CeleryExecutor today. Airflow celery workers can only listen to a list of queues. In the scenario above, task D should not be assigned to a single queue, because there are two workers that could potentially execute it. Node2+3 can't listen to a common queue though because they support different features. Instead you would have to compromise and choose one of the nodes for task D.
Is this even possible to solve with Celery? I've scrutinized the underlying Celery documents deeply and it has lots of advanced features for inserting items to the queues using exchanges and routers, but it seems like consumers only read from fixed queue-names.
Am i missing something or is this correct? It feels like this should be a very common scenario.
Compared with other systems, Kubernetes can do it with nodeSelectors, affinity and anti-affinity. Using Airflow KubernetesExecutor these can be injected in pod_override.
Jenkins has label conditions
Related issues
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct