Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support remote logging in elasticsearch with filebeat 7 #14625

Merged
merged 4 commits into from Jun 11, 2021
Merged

Support remote logging in elasticsearch with filebeat 7 #14625

merged 4 commits into from Jun 11, 2021

Conversation

@jedcunningham
Copy link
Member

@jedcunningham jedcunningham commented Mar 5, 2021

closes: #13755

This adds support to choose different fields for host and offset. Filebeat 7 changed the defaults, and while one could probably move them back, making it configurable makes more sense.

https://www.elastic.co/guide/en/beats/libbeat/current/breaking-changes-7.0.html

offset -> log.offset
host.name -> host.hostname (And based on #13755, something also uses a str host, not sure what though)

This still needs test coverage.

@mik-laj
Copy link
Member

@mik-laj mik-laj commented Mar 5, 2021

I think this is a very internal details and the user should not configure it explicitly. If possible, we should check which version of the service we are connecting to and act accordingly. When this is not possible, we recommend introducing a more descriptive configuration option, e.g. use_filebeat_7_headers or something similar.

Also, we should update the documentation and describe these configuration options.
http://apache-airflow-docs.s3-website.eu-central-1.amazonaws.com/docs/apache-airflow-providers-elasticsearch/latest/logging.html

@jedcunningham
Copy link
Member Author

@jedcunningham jedcunningham commented Mar 5, 2021

If anything I think we may want to go the other way and make it more configurable as filebeats is just one way to get logs into elasticsearch. I feel like we should try and be agnostic especially since Airflow isn't directly involved with getting the logs there in the first place.

Thanks for the feedback though @mik-laj, I'll chew on this a bit more.

@kaxil kaxil added this to the Airflow 2.1 milestone Mar 18, 2021
Copy link
Member

@ashb ashb left a comment

Code LGTM -- I think given there are so many ways of getting logs in to ES (versions of filebeat, logstash, graylog, vector.dev etc) that we should improve the docs and say what we required, and leave it flexible.

@TRManderson
Copy link

@TRManderson TRManderson commented Apr 14, 2021

Code LGTM too, seems like a nice minimal way to get exactly what's needed for filebeat 7

@ashb ashb removed this from the Airflow 2.1 milestone May 18, 2021
@ashb ashb added this to the Airflow 2.1.1 milestone May 18, 2021
@jedcunningham jedcunningham marked this pull request as ready for review Jun 9, 2021
@jedcunningham jedcunningham requested a review from kaxil Jun 9, 2021
kaxil
kaxil approved these changes Jun 9, 2021
@github-actions
Copy link

@github-actions github-actions bot commented Jun 9, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

Filebeat 7 renamed some fields (offset->log.offset and host->host.name),
so allow the field names Airflow uses to be configured.

Airflow isn't directly involved with getting the logs _to_
elasticsearch, so we should allow easy configuration to accomodate
whatever tools are used in that process.
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
ashb
ashb approved these changes Jun 11, 2021
@kaxil kaxil merged commit 5cd0bf7 into apache:main Jun 11, 2021
44 of 52 checks passed
jhtimmins added a commit to astronomer/airflow that referenced this issue Jun 15, 2021
Filebeat 7 renamed some fields (offset->log.offset and host->host.name),
so allow the field names Airflow uses to be configured.

Airflow isn't directly involved with getting the logs _to_
elasticsearch, so we should allow easy configuration to accomodate
whatever tools are used in that process.

(cherry picked from commit 5cd0bf7)
ashb added a commit that referenced this issue Jun 22, 2021
Filebeat 7 renamed some fields (offset->log.offset and host->host.name),
so allow the field names Airflow uses to be configured.

Airflow isn't directly involved with getting the logs _to_
elasticsearch, so we should allow easy configuration to accomodate
whatever tools are used in that process.

(cherry picked from commit 5cd0bf7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants