Replies: 2 comments
-
I believe there were fixes to some of the Kubernetes communication in 2.6 (you can take a look at all the reflog there) and I think it would be worth trying to migrate to see if the problems you have are not fixed there. Other than that, if you gather enough evidences, logs, correlating things that are happening, enought to make useful analyssis and maybe even come up with a simple, reproducible scenario - I'd encourage you to open issue about it with all the details. It's hard to say anything with "Receive SIGTERM". There shoudl be information in logs (either Airflow or K8S deployment of yours) stating who is sending SIGTERM and why and correlating those might be a key to diagnosing the issue. |
Beta Was this translation helpful? Give feedback.
-
@tatitati , were you able to solve the SIGTERM issue in recent versions ? |
Beta Was this translation helpful? Give feedback.
-
Hi guys, we have our Airflow, which is using the next components:
We are running around 10 Dags
Each dag has 10 parallel tasks.
Each of these tasks simply run a batch in another aws account, which contains a simple sleep 200 (Is our way to emulate some heavy process in another account.)
This said. what we observe is that we start all these dags, and at the start they complete successfully. However after some of them are completed, the rest start to fail.
We have debugged and we could see that for any reason, if we run them with only one scheduler, everything works, but when we add more then our tasks start to fail after a few minutes. Is then when our tasks receive SIGTERM, failing.
I confirm that our liveness-probe are correct (no errors on this).
Beta Was this translation helpful? Give feedback.
All reactions