FIX: do not reattach to finished pod on restart of k8s pod operator#39414
FIX: do not reattach to finished pod on restart of k8s pod operator#39414NotYuki wants to merge 1 commit intoapache:mainfrom
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
Taragolis
left a comment
There was a problem hiding this comment.
Could you add a tests. See: https://github.com/apache/airflow/blob/main/contributing-docs/09_testing.rst
|
Thanks for the comment. Actually, after delving deeper into the Kubernetes tests and KPO code, I've noticed similar logic already implemented with patch_already_checked() method. I want to spend some time investigating why it doesn't work as expected on my k8s environment. |
| if not namespace: | ||
| raise ValueError("Pod namespace is empty") | ||
| pod = self.find_pod(namespace, context=context) | ||
| if pod and pod.status and pod.status.phase == "Running": |
There was a problem hiding this comment.
When the next retry attempt comes back, by that time the task might move from running to completed state. Why do you want to rerun the same?
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
For now, if KubernetesPodOperator.reattach_on_restart is enabled, the Operator tries to find a pod regardless of its status. This might lead to an unhealthy deadlock situation if on_finish_action is set to KEEP_POD or DELETE_SUCCEEDED_POD.
For instance, if a task has on_finish_action = OnFinishAction.DELETE_SUCCEEDED_POD and it fails on the first run, any subsequent attempts of the same task will fail immediately with a Kubernetes API error, as Airflow will try to attach to a finished pod.
This PR introduces changes to KPO, allowing it to reattach only to running pods.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.