Schedular going down for 1-2 minute on every 10 minute as increase completed pods in EKS #22612
Open
2 tasks done
Labels
affected_version:2.2
Issues Reported for 2.2
affected_version:2.3
Issues Reported for 2.3
area:core
area:performance
area:Scheduler
Scheduler or dag parsing Issues
kind:bug
This is a clearly a bug
provider:cncf-kubernetes
Kubernetes provider related issues
Apache Airflow version
2.2.4 (latest released)
What happened
Hi Team, I am using airflow 2.2.4 and deployed it on aws eks cluster. I noticed that every 5-10 minute schedular down message seeing on airflow UI. When I checked airflow schedular log, seeing the lot of below statements.
[2022-03-21 08:21:21,640] {kubernetes_executor.py:729} INFO - Attempting to adopt pod sampletask.05b6f567b4a64bd5beb16e526ba94d7a
This above statement will print for all completed pod which exist in eks, But it is repeating multiple time and as also invoking the PATCH api.
As per my understanding what happing is, below code pulling all the completed pod details for every time from EKS cluster and invoking the patch API on completed pod. So this activity for 1000 completed POD finishing in 1 minute, for 7000 completed POD its taking 3-5 minute, thats the reason scheduler is going down
What you think should happen instead
This schedular will be healthy when we set "delete_worker_pods = True". but when set delete_worker_pods =False and completed pod count goes to 7000 to 10,000 The scheduler should goes down.
The scheduler should be healthy irrespective of how many completed pod exist in EKS cluster.
How to reproduce
Deploy airflow in k8s cluster and set "delete_worker_pods = False". once completed pod reaches 7,000 to 10,000, you will able to see this issue.
Operating System
OS:Debian GNU/Linux, VERSION: 10
Versions of Apache Airflow Providers
No response
Deployment
Other Docker-based deployment
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: