-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Applying the TI mutation hook to tasks run from UI #20133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'm wary of running this in the webserver code -- does it need to be there, or could it be run on the worker just before the task actually starts? /cc @potiuk Another consideration for multi-tenancy. |
|
Yep. Agree with @ashb . It should not be possible to run the mutation hook in here. First of all, what would be the purpose of it ? The reason is not cost. If you look closely the task_instance_mutation hook should definitely not be called here: https://airflow.apache.org/docs/apache-airflow/stable/concepts/cluster-policies.html | Takes a TaskInstance parameter called task_instance. Called right before task execution. |
|
hi @potiuk , thanks for the quick review. Sorry, I misspoke. "Cost" should have been "speed" and was in reference to Anyway, what I'm trying to address --
I'm trying to make this statement true
because it appears to not be so. Ex: the demo function from the Cluster policy docs shows a queue mutation def task_instance_mutation_hook(task_instance: TaskInstance):
if task_instance.try_number >= 1:
task_instance.queue = 'retry_queue'An easy way for me to validate that queue mutation works is to use my What happens: If I trigger a new DAGrun, I get print statements from my pod mutation hook (so it did successfully mutate the task and send it to Kube) and the task does not appear in the Celery worker logs. However, if I then clear the task from the UI and let the scheduler automatically add it again, it goes to the Celery worker. Task log 1: Task log 2: And then the task appears in my Celery worker's logs |
|
But that's a totally different issue and changing "www" to execute task mutation has precisely 0 chances of fixing the problem, simply because webserver has nothing to do with task execution. I suggest you open an issue (not PR) where you will describe your observations. |
I discovered that
task_instance_mutation_hookis only invoked indagrun.verify_integrity(), which itself is not invoked often due to its cost. The goal of this change is to ensure tasks are mutated on every execution, regardless of the source of the run.I am unsure how to formally test this change, though I did verify it behaved as expected locally by clicking the UI and checking webserver logs.