-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add special exception for "host field is not hashable" #23136
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
if not isinstance(key, Hashable): | ||
raise ValueError("The host field in all log records needs to be hashable. " | ||
"If you are using filebeat, read here: " | ||
"https://github.com/apache/airflow/issues/15613#issuecomment-1104487752") from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of linking to GitHub, we should have documentation for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually better solution will be to copy the explanation to our ElasticSearch documentation (at airflow.apache.org) and link from it to there. The error message should explain the reason and link to the detailed discussion/explanation why - but linking to an issue is only fine only in a source comment, rather than in a user message. Theree we should only link to a documentation we control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get your point, just got set up with breeze to write some proper documentation.
I have a question: airflow.providers.elasticsearch.log.es_task_handler.ElasticsearchTaskHandler
has offset_field
and host_field
paramaters in its constructor. I have a hard time figuring out where these are being set / come from. Are they configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea about details of it to be hones. - I would have to - similarly to you dive deep in the code to understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually - I know that - (Re-read this) - I believe each logging handler can be configured with parameters - you can read it in "logging" configugration in our docs./
bdbcd22
to
27aa8e3
Compare
@NiklasBeierl can you fix the static checks? |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
I am adding a special exception to the es_task_handler for cases in which the "hosts" field in an elastic record is not hashable. (i.e. a dict) It seems to be a common problem, but its not really a problem with airflow itself, rather "incomaptible defaults" of airflow and filebeat. See my comment here: #15613 (comment)
I considered making
_group_logs_by_host
"smart" and let it handle "common"host
objects. (Like the ones add_host_metadata or filebeat itself are producing)But it struck me as risky in case someone does something else with the
host
field. Making an assumption like taking thehost.hostname
if it is available might have unintended side-effects if a worker node changes hostname / ip / Mac. Turning the entirehost
into a string might also cause issues if thehost
object happens to contain volatile values like "uptime" or "load".closes: #15613