Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KubernetesPodOperator is still running running pod but task is marked as failed #10325

Closed
art-i-svsg opened this issue Aug 14, 2020 · 2 comments
Labels
kind:bug This is a clearly a bug provider:cncf-kubernetes Kubernetes provider related issues

Comments

@art-i-svsg
Copy link

Apache Airflow version: 1.10.9

Kubernetes version (if you are using kubernetes) (use kubectl version): 1.14

Environment:

What happened:

We have Airflow set up with Celery executor but our tasks implemented using KubernetesPodExecutor. We create dag runs with set of tasks and run them as pods. We have tasks that can run for 40 minutes and more. Pretty often, we see that task is still running, actively doing required operations, but airflow marks task as failed, and retries it or if there are no retries left it just marks it as failed. Sometimes pods are stuck in running, though task is showing succeed status. We currently have one worker pod, which basically starts tasks execution, and we started to notice that worker goes OOMKilled pretty often because of low memory. Sometimes though tasks run just fine.

This might be related to this bug: https://issues.apache.org/jira/browse/AIRFLOW-6580

What you expected to happen:

We expect pod to run as long as needed, and task reflect real status of the underlying pod.

Anything else we need to know:

We have tasks that run every night, and it happens either every day to 2-3 tasks, or every other day. Sometimes it runs just fine.

This really impacts our production services and any help is highly appreciated!

@art-i-svsg art-i-svsg added the kind:bug This is a clearly a bug label Aug 14, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Aug 14, 2020

Thanks for opening your first issue here! Be sure to follow the issue template!

@eladkal eladkal added the provider:cncf-kubernetes Kubernetes provider related issues label Nov 19, 2020
@eladkal
Copy link
Contributor

eladkal commented Sep 30, 2021

This may have been solved by #10230
If the issue still happens on latest airflow version and kubernetes provider let us know
closing for now

@eladkal eladkal closed this as completed Sep 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug provider:cncf-kubernetes Kubernetes provider related issues
Projects
None yet
Development

No branches or pull requests

2 participants