Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks stuck indefinitely with follow=true #23496

Closed
schattian opened this issue May 5, 2022 · 1 comment
Closed

Tasks stuck indefinitely with follow=true #23496

schattian opened this issue May 5, 2022 · 1 comment

Comments

@schattian
Copy link
Contributor

schattian commented May 5, 2022

Closed in favor of #23497

I observed that some workers stopped randomly after being running.
After some investigation, the issue is in the new kubernetes pod operator and is dependant of a current issue in the kubernetes api.

When a log rotate event occurs in kubernetes, the stream we consume on fetch_container_logs(follow=True,...) is no longer being feeded.

Therefore, the k8s pod operator hangs indefinetly at the middle of the log. Only a sigterm could terminate it as logs consumption is blocking execute() to finish.

Ref to the issue in kubernetes: kubernetes/kubernetes#59902

However, I think there are many possibilities to walk-around this from airflow-side (like making them not-blocking and block until status.phase.completed as it's currently done when get_logs is not true).

Linking #12103 for ref, as the result is more or less the same (although the root cause is different)

@boring-cyborg
Copy link

boring-cyborg bot commented May 5, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant