Tasks stuck indefinitely with follow=true #23496

schattian · 2022-05-05T08:55:54Z

Closed in favor of #23497

I observed that some workers stopped randomly after being running.
After some investigation, the issue is in the new kubernetes pod operator and is dependant of a current issue in the kubernetes api.

When a log rotate event occurs in kubernetes, the stream we consume on fetch_container_logs(follow=True,...) is no longer being feeded.

Therefore, the k8s pod operator hangs indefinetly at the middle of the log. Only a sigterm could terminate it as logs consumption is blocking execute() to finish.

Ref to the issue in kubernetes: kubernetes/kubernetes#59902

However, I think there are many possibilities to walk-around this from airflow-side (like making them not-blocking and block until status.phase.completed as it's currently done when get_logs is not true).

Linking #12103 for ref, as the result is more or less the same (although the root cause is different)

The text was updated successfully, but these errors were encountered:

boring-cyborg · 2022-05-05T08:55:55Z

Thanks for opening your first issue here! Be sure to follow the issue template!

schattian closed this as completed May 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tasks stuck indefinitely with follow=true #23496

Tasks stuck indefinitely with follow=true #23496

schattian commented May 5, 2022 •

edited

Loading

boring-cyborg bot commented May 5, 2022

Tasks stuck indefinitely with follow=true #23496

Tasks stuck indefinitely with follow=true #23496

Comments

schattian commented May 5, 2022 • edited Loading

boring-cyborg bot commented May 5, 2022

schattian commented May 5, 2022 •

edited

Loading