Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch log retrieval fails when "host" field is not a string #13755

Closed
semihsezer opened this issue Jan 19, 2021 · 4 comments · Fixed by #14625
Closed

Elasticsearch log retrieval fails when "host" field is not a string #13755

semihsezer opened this issue Jan 19, 2021 · 4 comments · Fixed by #14625
Labels
affected_version:2.0 Issues Reported for 2.0 area:logging kind:bug This is a clearly a bug

Comments

@semihsezer
Copy link

semihsezer commented Jan 19, 2021

Apache Airflow version: 2.0.0
Kubernetes version: 1.17.16
OS (e.g. from /etc/os-release): Ubuntu 18.4

What happened:

Webserver gets exception when reading logs from Elasticsearch when "host" field in the log is not a string. Recent Filebeat template mapping creates host as an object with "host.name", "host.os" etc.

[2021-01-18 23:53:27,923] {app.py:1891} ERROR - Exception on /get_logs_with_metadata [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/lib/python3.7/site-packages/airflow/www/auth.py", line 34, in decorated
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airflow/www/decorators.py", line 60, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airflow/utils/session.py", line 65, in wrapper
    return func(*args, session=session, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airflow/www/views.py", line 1054, in get_logs_with_metadata
    logs, metadata = task_log_reader.read_log_chunks(ti, try_number, metadata)
  File "/usr/local/lib/python3.7/site-packages/airflow/utils/log/log_reader.py", line 58, in read_log_chunks
    logs, metadatas = self.log_handler.read(ti, try_number, metadata=metadata)
  File "/usr/local/lib/python3.7/site-packages/airflow/utils/log/file_task_handler.py", line 217, in read
    log, metadata = self._read(task_instance, try_number_element, metadata)
  File "/usr/local/lib/python3.7/site-packages/airflow/providers/elasticsearch/log/es_task_handler.py", line 161, in _read
    logs_by_host = self._group_logs_by_host(logs)
  File "/usr/local/lib/python3.7/site-packages/airflow/providers/elasticsearch/log/es_task_handler.py", line 130, in _group_logs_by_host
    grouped_logs[key].append(log)
TypeError: unhashable type: 'AttrDict'

What you expected to happen:
Airflow Webserver successfully pulls the logs, replacing host value with default if needed.

The issue comes from this line. When "host" is a dictionary, it tries to insert it as a key to the grouped_logs dictionary, which throws unhashable type: 'AttrDict'.

def _group_logs_by_host(logs):
   grouped_logs = defaultdict(list)
    for log in logs:
        key = getattr(log, 'host', 'default_host')
        grouped_logs[key].append(log) # ---> fails when key is a dict

How to reproduce it:

I don't know how to concisely write this and make it easy to read at the same time.

1- Configure Airflow to read logs from Elasticsearch

[elasticsearch]
host = http://localhost:9200
write_stdout = True
json_format = True

2 - Load index template where host is an object [May need to add other fields to this template as well].
Filebeat adds this by default (and many more fields).

PUT _template/filebeat-airflow
{
    "order": 1,
    "index_patterns": [
      "filebeat-airflow-*"
    ],
    "mappings": {
      "doc": {
        "properties": {
          "host": {
            "properties": {
              "name": {
                "type": "keyword",
                "ignore_above": 1024
              },
              "id": {
                "type": "keyword",
                "ignore_above": 1024
              },
              "architecture": {
                "type": "keyword",
                "ignore_above": 1024
              },
              "ip": {
                "type": "ip"
              },
              "mac": {
                "type": "keyword",
                "ignore_above": 1024
              }
            }
          }
        }
      }
    }
}

3 - Post sample log and fill in log_id field for a valid dag run.

curl -X POST -H 'Content-Type: application/json' -i 'http://localhost:9200/filebeat-airflow/_doc' --data '{"message": "test log message", "log_id": "<fill-in-with-valid-example>", "offset": "1"}'

4 - Go to WebUI and try to view logs for dag_run.

Workaround: Remove host field completely with filebeat.

Solution: Do a type check if the extracted host field is a string, if not use the default value.
Solution2: Make host field name configurable so that we can set it to be host.name instead of hardcoded 'host'.

If I have time I will submit the fix. I never submitted a commit before so I don't know how long it will take me to prepare a proper commit for this.

@semihsezer semihsezer added the kind:bug This is a clearly a bug label Jan 19, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Jan 19, 2021

Thanks for opening your first issue here! Be sure to follow the issue template!

@armandleopold
Copy link

Had the same issue

@bmfisher
Copy link

bmfisher commented Feb 8, 2021

Having the same issue

@rob2244
Copy link

rob2244 commented Mar 17, 2021

Having the same issue

@paolaperaza paolaperaza added the affected_version:2.0 Issues Reported for 2.0 label Mar 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.0 Issues Reported for 2.0 area:logging kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants