Skip to content

Persist Celery external executor ids on send#64997

Open
DaveT1991 wants to merge 5 commits intoapache:mainfrom
DaveT1991:fix/persist-celery-external-executor-id
Open

Persist Celery external executor ids on send#64997
DaveT1991 wants to merge 5 commits intoapache:mainfrom
DaveT1991:fix/persist-celery-external-executor-id

Conversation

@DaveT1991
Copy link
Copy Markdown
Contributor

Summary

Persist the Celery task id to task_instance.external_executor_id as soon as a workload is successfully submitted.

Today the Celery executor stores the returned task_id in event_buffer, and the scheduler later writes it to the database while handling the queued/running event. If the scheduler dies in that window, the replacement scheduler cannot adopt the in-flight task because external_executor_id is still NULL, so the task may be reset and re-queued instead of adopted.

This change persists the Celery task id immediately after a successful send for task-instance workloads, while keeping the existing event-buffer path as the normal reconciliation flow.

Closes #64971

Testing

  • Added regression coverage in providers/celery/tests/unit/celery/executors/test_celery_executor.py
  • git diff --check

from sqlalchemy import update

from airflow.models.taskinstance import TaskInstance as TI
from airflow.models.taskinstancekey import TaskInstanceKey
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaveT1991 These imports statemtns are defined above. Could we use them here ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, can you check now and let me know?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Prab-27 I did some minor import refactoring to solve the failed testing issue.

DaveT1991 and others added 3 commits April 10, 2026 14:31
Per Prab-27 feedback, update and TaskInstance/TaskInstanceKey were
re-imported inside _persist_task_external_executor_id even though they
belong at the module level. Promote them out of the local scope (and out
of the TYPE_CHECKING block for the model imports) so there is a single
definition at the top of the file. Drop the TI alias in favour of the
full TaskInstance name now that it is always in scope.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a CeleryExecutor resiliency gap by persisting the Celery task_id into task_instance.external_executor_id immediately after a successful send, so that a replacement scheduler can adopt in-flight tasks after a crash (instead of resetting/re-queuing them).

Changes:

  • Persist external_executor_id for task-instance workloads during _send_workloads() right after apply_async() succeeds.
  • Add a helper to update the task_instance row for the relevant TaskInstanceKey.
  • Add a unit regression test verifying the DB persistence and the existing event-buffer behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
providers/celery/src/airflow/providers/celery/executors/celery_executor.py Writes Celery task_id to TaskInstance.external_executor_id on send via a new persistence helper.
providers/celery/tests/unit/celery/executors/test_celery_executor.py Adds regression coverage ensuring external_executor_id is persisted immediately after send.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Persist external_executor_id for CeleryExecutor in _send_workloads() to avoid "lost" events during scheduler termination

3 participants