Skip to content

Airflow : filedescriptor out of range in select #30062

@shubhampatel94

Description

@shubhampatel94

Apache Airflow version

Other Airflow 2 version (please specify below)

What happened

Airflow version 2.3.4

I was getting dag_bag timeout error. Since one of the dag has around 350 tasks, loading and processing were causing issues there.
While debugging for the issue i made following config changes.
parallelism = 1800
max_active_tasks_per_dag = 200
dagbag_import_timeout = 1200.0
dagbag_import_error_traceback_depth = 30
dag_file_processor_timeout = 1000
default_task_retry_delay = 3600
min_serialized_dag_fetch_interval = 20
sql_alchemy_pool_recycle = 600
max_db_retries = 10

From the above dag_bag-related changes are the only one that is helpful. But that i realized after figuring out the issue.

But other config changes created side-effect and I am started getting it once in a while. Need to understand what config changes might have been causing this.

  File "/home/alpha/.local/lib/python3.8/site-packages/airflow/providers/ssh/operators/ssh.py", line 158, in execute
    result = self.run_ssh_client_command(ssh_client, self.command)
  File "/home/alpha/.local/lib/python3.8/site-packages/airflow/providers/ssh/operators/ssh.py", line 143, in run_ssh_client_command
    exit_status, agg_stdout, agg_stderr = self.ssh_hook.exec_ssh_client_command(
  File "/home/alpha/.local/lib/python3.8/site-packages/airflow/providers/ssh/hooks/ssh.py", line 494, in exec_ssh_client_command
    readq, _, _ = select([channel], [], [], timeout)
ValueError: filedescriptor out of range in select()

What you think should happen instead

Airflow jobs should not fail with above error. It's look like we are hitting ssh file descriptor limit here. But I don't clearly understand why that is the case and where all of the active file descriptor are holding up.

How to reproduce

Please set the above mention config.
And run jobs with large amount of stages.

Operating System

NAME="Ubuntu" VERSION="20.04.3 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.3 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal

Versions of Apache Airflow Providers

[03:37 AM]alpha@devops18:airflow$ pip freeze | grep "apache-airflow-provider"
/usr/lib/python3/dist-packages/secretstorage/dhcrypto.py:15: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead
from cryptography.utils import int_from_bytes
/usr/lib/python3/dist-packages/secretstorage/util.py:19: CryptographyDeprecationWarning: int_from_bytes is deprecated, use int.from_bytes instead
from cryptography.utils import int_from_bytes
apache-airflow-providers-common-sql==1.2.0
apache-airflow-providers-ftp==3.1.0
apache-airflow-providers-http==4.0.0
apache-airflow-providers-imap==3.0.0
apache-airflow-providers-sqlite==3.2.1
apache-airflow-providers-ssh==3.3.0

Deployment

Virtualenv installation

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions