You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apache Airflow version: 2.0.0 Kubernetes version: 1.17.16 OS (e.g. from /etc/os-release): Ubuntu 18.4
What happened:
Webserver gets exception when reading logs from Elasticsearch when "host" field in the log is not a string. Recent Filebeat template mapping creates host as an object with "host.name", "host.os" etc.
[2021-01-18 23:53:27,923] {app.py:1891} ERROR - Exception on /get_logs_with_metadata [GET]
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/usr/local/lib/python3.7/site-packages/airflow/www/auth.py", line 34, in decorated
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/www/decorators.py", line 60, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/utils/session.py", line 65, in wrapper
return func(*args, session=session, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/www/views.py", line 1054, in get_logs_with_metadata
logs, metadata = task_log_reader.read_log_chunks(ti, try_number, metadata)
File "/usr/local/lib/python3.7/site-packages/airflow/utils/log/log_reader.py", line 58, in read_log_chunks
logs, metadatas = self.log_handler.read(ti, try_number, metadata=metadata)
File "/usr/local/lib/python3.7/site-packages/airflow/utils/log/file_task_handler.py", line 217, in read
log, metadata = self._read(task_instance, try_number_element, metadata)
File "/usr/local/lib/python3.7/site-packages/airflow/providers/elasticsearch/log/es_task_handler.py", line 161, in _read
logs_by_host = self._group_logs_by_host(logs)
File "/usr/local/lib/python3.7/site-packages/airflow/providers/elasticsearch/log/es_task_handler.py", line 130, in _group_logs_by_host
grouped_logs[key].append(log)
TypeError: unhashable type: 'AttrDict'
What you expected to happen:
Airflow Webserver successfully pulls the logs, replacing host value with default if needed.
The issue comes from this line. When "host" is a dictionary, it tries to insert it as a key to the grouped_logs dictionary, which throws unhashable type: 'AttrDict'.
def _group_logs_by_host(logs):
grouped_logs = defaultdict(list)
for log in logs:
key = getattr(log, 'host', 'default_host')
grouped_logs[key].append(log) # ---> fails when key is a dict
How to reproduce it:
I don't know how to concisely write this and make it easy to read at the same time.
1- Configure Airflow to read logs from Elasticsearch
2 - Load index template where host is an object [May need to add other fields to this template as well].
Filebeat adds this by default (and many more fields).
Workaround: Remove host field completely with filebeat.
Solution: Do a type check if the extracted host field is a string, if not use the default value. Solution2: Make host field name configurable so that we can set it to be host.name instead of hardcoded 'host'.
If I have time I will submit the fix. I never submitted a commit before so I don't know how long it will take me to prepare a proper commit for this.
The text was updated successfully, but these errors were encountered:
Apache Airflow version: 2.0.0
Kubernetes version: 1.17.16
OS (e.g. from /etc/os-release): Ubuntu 18.4
What happened:
Webserver gets exception when reading logs from Elasticsearch when "host" field in the log is not a string. Recent Filebeat template mapping creates host as an object with "host.name", "host.os" etc.
What you expected to happen:
Airflow Webserver successfully pulls the logs, replacing host value with default if needed.
The issue comes from this line. When "host" is a dictionary, it tries to insert it as a key to the
grouped_logs
dictionary, which throwsunhashable type: 'AttrDict'
.How to reproduce it:
I don't know how to concisely write this and make it easy to read at the same time.
1- Configure Airflow to read logs from Elasticsearch
2 - Load index template where host is an object [May need to add other fields to this template as well].
Filebeat adds this by default (and many more fields).
3 - Post sample log and fill in
log_id
field for a valid dag run.4 - Go to WebUI and try to view logs for dag_run.
Workaround: Remove host field completely with filebeat.
Solution: Do a type check if the extracted
host
field is a string, if not use the default value.Solution2: Make host field name configurable so that we can set it to be
host.name
instead of hardcoded'host'
.If I have time I will submit the fix. I never submitted a commit before so I don't know how long it will take me to prepare a proper commit for this.
The text was updated successfully, but these errors were encountered: