Graceful handle for airflow application pods during K8s API brief interuption #19698
Closed
1 of 2 tasks
Labels
kind:feature
Feature Requests
Description
Description
I have come across this issue where k8s customer using airflow application. The pod get terminated when there is brief API server interruption. API server interruption can happen due to lot of scenarios so ideally the application should handle it gracefully. Another key thing here is this happens only when you set the "is_delete_pod_operator" set to True.
Now if i read the code i understand that the launcher.start_pod or launcher.monitor_pod have some logic which check heartbeat to the AI server and once its find some interruption it goes to the AirflowException part. So if we have some kind of nested exception or some retry logic to handle the brief interruption of API sever unavailability that should help to avoid the pod termination, because it just takes very less time for API server to re-establish the connection.
The kubernetesPodOperator is provided as open-source Airflow and more details about the behavior can be found below.
Use case/motivation
No response
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: