-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow version
Other Airflow 2 version (please specify below)
What happened
Airflow version 2.3.4
I was getting dag_bag timeout error. Since one of the dag has around 350 tasks, loading and processing were causing issues there.
While debugging for the issue i made following config changes.
parallelism = 1800
max_active_tasks_per_dag = 200
dagbag_import_timeout = 1200.0
dagbag_import_error_traceback_depth = 30
dag_file_processor_timeout = 1000
default_task_retry_delay = 3600
min_serialized_dag_fetch_interval = 20
sql_alchemy_pool_recycle = 600
max_db_retries = 10
From the above dag_bag-related changes are the only one that is helpful. But that i realized after figuring out the issue.
But other config changes created side-effect and I am started getting it once in a while. Need to understand what config changes might have been causing this.
File "/home/alpha/.local/lib/python3.8/site-packages/airflow/providers/ssh/operators/ssh.py", line 158, in execute
result = self.run_ssh_client_command(ssh_client, self.command)
File "/home/alpha/.local/lib/python3.8/site-packages/airflow/providers/ssh/operators/ssh.py", line 143, in run_ssh_client_command
exit_status, agg_stdout, agg_stderr = self.ssh_hook.exec_ssh_client_command(
File "/home/alpha/.local/lib/python3.8/site-packages/airflow/providers/ssh/hooks/ssh.py", line 494, in exec_ssh_client_command
readq, _, _ = select([channel], [], [], timeout)
ValueError: filedescriptor out of range in select()What you think should happen instead
Airflow jobs should not fail with above error. It's look like we are hitting ssh file descriptor limit here. But I don't clearly understand why that is the case and where all of the active file descriptor are holding up.
How to reproduce
Please set the above mention config.
And run jobs with large amount of stages.
Operating System
NAME="Ubuntu" VERSION="20.04.3 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.3 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal
Versions of Apache Airflow Providers
[03:37 AM]alpha@devops18:airflow$ pip freeze | grep "apache-airflow-provider"
/usr/lib/python3/dist-packages/secretstorage/dhcrypto.py:15: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead
from cryptography.utils import int_from_bytes
/usr/lib/python3/dist-packages/secretstorage/util.py:19: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead
from cryptography.utils import int_from_bytes
apache-airflow-providers-common-sql==1.2.0
apache-airflow-providers-ftp==3.1.0
apache-airflow-providers-http==4.0.0
apache-airflow-providers-imap==3.0.0
apache-airflow-providers-sqlite==3.2.1
apache-airflow-providers-ssh==3.3.0
Deployment
Virtualenv installation
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct