Skip to content

Airflow Scheduler making large amounts of DNS Requests #14068

@caljnj

Description

@caljnj

Apache Airflow version:1.10.9

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release): CentOS Linux 8 (Core)
  • Kernel (e.g. uname -a): 4.18.0-147.5.1.el8_1.x86_64
  • Install tools:
  • Others:

What happened:

Airflow is producing a crazy amount of DNS queries (a few hundred per min) for requestion name resolution for database server we have specified in the sql_alchemy_conn variable

via auditd the logs look like this:

type=PROCTITLE msg=audit(02/04/21 13:17:39.711:630677) : proctitle=airflow scheduler -- DagFileProcessorManager
type=SOCKADDR msg=audit(02/04/21 13:17:39.711:630677) : saddr={ fam=inet laddr=192.168.8.8 lport=53 }
type=SYSCALL msg=audit(02/04/21 13:17:39.711:630677) : arch=x86_64 syscall=connect success=yes exit=0 a0=0x1e a1=0x7ff852737934 a2=0x10 a3=0x7ffea729eef4 items=0 ppid=5388 pid=14411 auid=someuser uid=someuser gid=someuser euid=someuser suid=someuser fsuid=someuser egid=someuser sgid=someuser fsgid=someuser tty=(none) ses=1 comm=airflow schedul exe=/usr/libexec/platform-python3.6 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=connect

where the process 5388 looked like this:

someuser  5388  5328  0 Jan20 ?        02:45:50 airflow scheduler -- DagFileProcessorManager
someuser 15018  5388 10 13:18 ?        00:00:00 airflow scheduler - DagFileProcessor /home/someuser/airflow/dags/sykdomspulsen_test.py
someuser 15020  5388  4 13:18 ?        00:00:00 [airflow schedul] <defunct>

What you expected to happen:

I would appreciate some help setting up a debug environment with the same config etc that i can use to tracing the code to see where these DNS calls are being made...

I thought i coudl fix it by setting a static DNS record in the hosts file but that completely broke airflow.

see below:

fhi.no airflow[5453]: [2021-02-04 13:46:56 +0100] [4154] [ERROR] Exception in worker process
Feb 04 13:46:56 someserver airflow[5453]: Traceback (most recent call last):
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 2285, in _wrap_poo
l_connect
Feb 04 13:46:56 someserver airflow[5453]:     return fn()
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 363, in connect
Feb 04 13:46:56 someserver airflow[5453]:     return _ConnectionFairy._checkout(self)
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 773, in _checkout
Feb 04 13:46:56 someserver airflow[5453]:     fairy = _ConnectionRecord.checkout(pool)
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 492, in checkout
Feb 04 13:46:56 someserver airflow[5453]:     rec = pool._do_get()
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/impl.py", line 139, in _do_get
Feb 04 13:46:56 someserver airflow[5453]:     self._dec_overflow()
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 69, in __exit
__
Feb 04 13:46:56 someserver airflow[5453]:     exc_value, with_traceback=exc_tb,
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
Feb 04 13:46:56 someserver airflow[5453]:     raise exception
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/impl.py", line 136, in _do_get
Feb 04 13:46:56 someserver airflow[5453]:     return self._create_connection()
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 308, in _create_conn
ection
Feb 04 13:46:56 someserver airflow[5453]:     return _ConnectionRecord(self)
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 437, in __init__
Feb 04 13:46:56 someserver airflow[5453]:     self.__connect(first_connect_check=True)
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 657, in __connect
Feb 04 13:46:56 someserver airflow[5453]:     pool.logger.debug("Error on connect(): %s", e)
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 69, in __exit
__
Feb 04 13:46:56 someserver airflow[5453]:     exc_value, with_traceback=exc_tb,
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
Feb 04 13:46:56 someserver airflow[5453]:     raise exception
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 652, in __connect
Feb 04 13:46:56 someserver airflow[5453]:     connection = pool._invoke_creator(self)
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 114, in conn
ect
Feb 04 13:46:56 someserver airflow[5453]:     return dialect.connect(*cargs, **cparams)
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 488, in connect
Feb 04 13:46:56 someserver airflow[5453]:     return self.dbapi.connect(*cargs, **cparams)
Feb 04 13:46:56 someserver airflow[5453]: pyodbc.Error: ('HY000', '[HY000] [Microsoft][ODBC Driver 17 for SQL Server]SSPI Provider: Server not
 found in Kerberos database (851968) (SQLDriverConnect)')
Feb 04 13:46:56 someserver airflow[5453]: The above exception was the direct cause of the following exception:
Feb 04 13:46:56 someserver airflow[5453]: Traceback (most recent call last):
Feb 04 13:46:56 someserver airflow[5453]:   File "/home/someuser/.local/lib/python3.6/site-packages/gunicorn/arbiter.py", line 586, in spawn_w
orker
Feb 04 13:46:56 someserver airflow[5453]:     worker.init_process()
Feb 04 13:46:56 someserver airflow[5453]:   File "/home/someuser/.local/lib/python3.6/site-packages/gunicorn/workers/base.py", line 135, in in
it_process
Feb 04 13:46:56 someserver airflow[5453]:     self.load_wsgi()
Feb 04 13:46:56 someserver airflow[5453]:   File "/home/someuser/.local/lib/python3.6/site-packages/gunicorn/workers/base.py", line 144, in lo
ad_wsgi
Feb 04 13:46:56 someserver airflow[5453]:     self.wsgi = self.app.wsgi()
Feb 04 13:46:56 someserver airflow[5453]:   File "/home/someuser/.local/lib/python3.6/site-packages/gunicorn/app/base.py", line 67, in wsgi
Feb 04 13:46:56 someserver airflow[5453]:     self.callable = self.load()
Feb 04 13:46:56 someserver airflow[5453]:   File "/home/someuser/.local/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 52, in load
Feb 04 13:46:56 someserver airflow[5453]:     return self.load_wsgiapp()
Feb 04 13:46:56 someserver airflow[5453]:   File "/home/someuser/.local/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 41, in load
_wsgiapp
Feb 04 13:46:56 someserver airflow[5453]:     return util.import_app(self.app_uri)
Feb 04 13:46:56 someserver airflow[5453]:   File "/home/someuser/.local/lib/python3.6/site-packages/gunicorn/util.py", line 361, in import_app
Feb 04 13:46:56 someserver airflow[5453]:     app = eval(obj, vars(mod))
Feb 04 13:46:56 someserver airflow[5453]:   File "<string>", line 1, in <module>
Feb 04 13:46:56 someserver airflow[5453]:   File "/home/someuser/.local/lib/python3.6/site-packages/airflow/www/app.py", line 233, in cached_a
pp
Feb 04 13:46:56 someserver airflow[5453]:     app = create_app(config, testing)
Feb 04 13:46:56 someserver airflow[5453]:   File "/home/someuser/.local/lib/python3.6/site-packages/airflow/www/app.py", line 88, in create_ap
p
Feb 04 13:46:56 someserver airflow[5453]:     from airflow.www import views
Feb 04 13:46:56 someserver airflow[5453]:   File "/home/someuser/.local/lib/python3.6/site-packages/airflow/www/views.py", line 2443, in <modu
le>
Feb 04 13:46:56 someserver airflow[5453]:     class ChartModelView(wwwutils.DataProfilingMixin, AirflowModelView):
Feb 04 13:46:56 someserver airflow[5453]:   File "/home/someuser/.local/lib/python3.6/site-packages/airflow/www/views.py", line 2534, in Chart
ModelView
Feb 04 13:46:56 someserver airflow[5453]:     'conn_id': _connection_ids()
Feb 04 13:46:56 someserver airflow[5453]:   File "/home/someuser/.local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper
Feb 04 13:46:56 someserver airflow[5453]:     return func(*args, **kwargs)
Feb 04 13:46:56 someserver airflow[5453]:   File "/home/someuser/.local/lib/python3.6/site-packages/airflow/www/views.py", line 2440, in _conn
ection_ids
Feb 04 13:46:56 someserver airflow[5453]:     .group_by(Connection.conn_id))]
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3403, in __iter__
Feb 04 13:46:56 someserver airflow[5453]:     return self._execute_and_instances(context)
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3425, in _execute_an
d_instances
Feb 04 13:46:56 someserver airflow[5453]:     querycontext, self._connection_from_session, close_with_result=True
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3440, in _get_bind_a
rgs
Feb 04 13:46:56 someserver airflow[5453]:     mapper=self._bind_mapper(), clause=querycontext.statement, **kw
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3418, in _connection
_from_session
Feb 04 13:46:56 someserver airflow[5453]:     conn = self.session.connection(**kw)
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 1133, in connectio
n
Feb 04 13:46:56 someserver airflow[5453]:     execution_options=execution_options,
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 1139, in _connecti
on_for_bind
Feb 04 13:46:56 someserver airflow[5453]:     engine, execution_options
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 432, in _connectio
n_for_bind
Feb 04 13:46:56 someserver airflow[5453]:     conn = bind._contextual_connect()
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 2251, in _contextu
al_connect
Feb 04 13:46:56 someserver airflow[5453]:     self._wrap_pool_connect(self.pool.connect, None),
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 2289, in _wrap_poo
l_connect
Feb 04 13:46:56 someserver airflow[5453]:     e, dialect, self
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1555, in _handle_d
bapi_exception_noconnection
Feb 04 13:46:56 someserver airflow[5453]:     sqlalchemy_exception, with_traceback=exc_info[2], from_=e
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
Feb 04 13:46:56 someserver airflow[5453]:     raise exception
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 2285, in _wrap_poo
l_connect
Feb 04 13:46:56 someserver airflow[5453]:     return fn()
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 363, in connect
Feb 04 13:46:56 someserver airflow[5453]:     return _ConnectionFairy._checkout(self)
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 773, in _checkout
Feb 04 13:46:56 someserver airflow[5453]:     fairy = _ConnectionRecord.checkout(pool)
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 492, in checkout
Feb 04 13:46:56 someserver airflow[5453]:     rec = pool._do_get()
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/impl.py", line 139, in _do_get
Feb 04 13:46:56 someserver airflow[5453]:     self._dec_overflow()
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 69, in __exit
__
Feb 04 13:46:56 someserver airflow[5453]:     exc_value, with_traceback=exc_tb,
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
Feb 04 13:46:56 someserver airflow[5453]:     raise exception
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/impl.py", line 136, in _do_get
Feb 04 13:46:56 someserver airflow[5453]:     return self._create_connection()
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 308, in _create_conn
ection
Feb 04 13:46:56 someserver airflow[5453]:     return _ConnectionRecord(self)
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 437, in __init__
Feb 04 13:46:56 someserver airflow[5453]:     self.__connect(first_connect_check=True)
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 657, in __connect
Feb 04 13:46:56 someserver airflow[5453]:     pool.logger.debug("Error on connect(): %s", e)
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 69, in __exit
__
Feb 04 13:46:56 someserver airflow[5453]:     exc_value, with_traceback=exc_tb,
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
Feb 04 13:46:56 someserver airflow[5453]:     raise exception
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 652, in __connect
Feb 04 13:46:56 someserver airflow[5453]:     connection = pool._invoke_creator(self)
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 114, in conn
ect
Feb 04 13:46:56 someserver airflow[5453]:     return dialect.connect(*cargs, **cparams)
Feb 04 13:46:56 someserver airflow[5453]:   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 488, in connect
Feb 04 13:46:56 someserver airflow[5453]:     return self.dbapi.connect(*cargs, **cparams)
Feb 04 13:46:56 someserver airflow[5453]: sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('HY000', '[HY000] [Microsoft][ODBC Driver 17 for SQL Serv
er]SSPI Provider: Server not found in Kerberos database (851968) (SQLDriverConnect)')
Feb 04 13:46:56 someserver airflow[5453]: (Background on this error at: http://sqlalche.me/e/dbapi)
Feb 04 13:46:56 someserver airflow[5453]: [2021-02-04 13:46:56 +0100] [4154] [INFO] Worker exiting (pid: 4154)
Feb 04 13:46:56 someserver airflow[5453]: [2021-02-04 13:46:56 +0100] [3569] [INFO] Worker exiting (pid: 3569)[2021-02-04 13:46:56 +0100] [395
0] [INFO] Worker exiting (pid: 3950)
Feb 04 13:46:56 someserver airflow[5453]: [2021-02-04 13:46:56 +0100] [3192] [INFO] Worker exiting (pid: 3192)
Feb 04 13:46:56 someserver airflow[5453]: [2021-02-04 13:46:56 +0100] [2804] [INFO] Worker exiting (pid: 2804)
Feb 04 13:46:56 someserver airflow[5453]: [2021-02-04 13:46:56 +0100] [5481] [INFO] Shutting down: Master
Feb 04 13:46:56 someserver airflow[5453]: [2021-02-04 13:46:56 +0100] [5481] [INFO] Reason: Worker failed to boot.
Feb 04 13:49:26 someserver airflow[5453]: [2021-02-04 13:49:26,737] {cli.py:859} ERROR - No response from gunicorn master within 120 seconds
Feb 04 13:49:26 someserver airflow[5453]: [2021-02-04 13:49:26,811] {cli.py:860} ERROR - Shutting down webserver
Feb 04 13:49:29 someserver systemd[2859]: airflow-webserver.service: Main process exited, code=exited, status=1/FAILURE
Feb 04 13:49:29 someserver systemd[2859]: airflow-webserver.service: Failed with result 'exit-code'.
Feb 04 13:49:34 someserver systemd[2859]: airflow-webserver.service: Service RestartSec=5s expired, scheduling restart.
Feb 04 13:49:34 someserver systemd[2859]: airflow-webserver.service: Scheduled restart job, restart counter is at 1.
Feb 04 13:49:34 someserver systemd[2859]: Stopped Airflow webserver daemon.
Feb 04 13:49:34 someserver systemd[2859]: Started Airflow webserver daemon.
Feb 04 13:49:34 someserver airflow[4812]: [2021-02-04 13:49:34,842] {settings.py:253} INFO - settings.configure_orm(): Using pool settings. po
ol_size=5, max_overflow=10, pool_recycle=1800, pid=4812
Feb 04 13:49:35 someserver airflow[4812]: [2021-02-04 13:49:35,683] {cli_action_loggers.py:107} WARNING - Failed to log action with (pyodbc.Er
ror) ('HY000', '[HY000] [Microsoft][ODBC Driver 17 for SQL Server]SSPI Provider: Server not found in Kerberos database (851968) (SQLDriverConnect)')
Feb 04 13:49:35 someserver airflow[4812]: (Background on this error at: http://sqlalche.me/e/dbapi)

How to reproduce it:
I'm not sure how to repro this to be honest. i'm new to airflow and new to the system im working on.
But more than happy to get some pointers on how to breakpoint the code myself to see where the DNS calls are happening and find out what kind of loop i'm in.

A wild guess is there is something wrong with the timers in the setup_event_handlers function..
https://airflow.apache.org/docs/apache-airflow/1.10.3/_modules/airflow/utils/sqlalchemy.html

Anything else we need to know:

This problem occurs all day.
It seems to happen within the processes that are forked by the scheduler.

Any relevant logs to include? Put them here in side a detail tag:

x.log lots of stuff

-->

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:MetaDBMeta Database related issues.kind:bugThis is a clearly a bug

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions