Strip userinfo from ES host URL before using it as task-log label#65349
Merged
potiuk merged 2 commits intoapache:mainfrom Apr 16, 2026
Merged
Strip userinfo from ES host URL before using it as task-log label#65349potiuk merged 2 commits intoapache:mainfrom
potiuk merged 2 commits intoapache:mainfrom
Conversation
The Elasticsearch task-log handler grouped hits by host, falling back to the raw ``[elasticsearch] host`` config value when a hit lacked a ``host`` field. That config commonly embeds credentials (``https://user:password@elk.example.com:9200``), so the full URL — including the ``user:password@`` userinfo — would appear as a dictionary key in the task-log output, where any user with task-log read permission could see it. Add a ``_strip_userinfo`` helper and use it for the host fallback in ``_group_logs_by_host``. The Elasticsearch client is still connected using the full unredacted URL, so authentication is unaffected. Generated-by: Claude Opus 4.6 (1M context) following the guidelines at https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions
potiuk
commented
Apr 16, 2026
eladkal
approved these changes
Apr 16, 2026
Contributor
Member
Author
|
Yeah. would love to see your comments @Owen-CH-Leung :) |
Member
Author
|
We can always post-review :) |
karenbraganz
pushed a commit
to karenbraganz/airflow
that referenced
this pull request
Apr 16, 2026
…ache#65349) * Strip userinfo from ES host URL before using it as task-log label The Elasticsearch task-log handler grouped hits by host, falling back to the raw ``[elasticsearch] host`` config value when a hit lacked a ``host`` field. That config commonly embeds credentials (``https://user:password@elk.example.com:9200``), so the full URL — including the ``user:password@`` userinfo — would appear as a dictionary key in the task-log output, where any user with task-log read permission could see it. Add a ``_strip_userinfo`` helper and use it for the host fallback in ``_group_logs_by_host``. The Elasticsearch client is still connected using the full unredacted URL, so authentication is unaffected. Generated-by: Claude Opus 4.6 (1M context) following the guidelines at https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions * Apply suggestion from @potiuk
Contributor
|
Great catch! The OpenSearch provider has the same issue also. Worth a follow-up PR to apply the same fix there. |
potiuk
added a commit
to potiuk/airflow
that referenced
this pull request
Apr 19, 2026
…abel Follow-up to apache#65349 — OpenSearch's `_group_logs_by_host` had the same credential-leak as Elasticsearch: the raw `[opensearch] host` config value (which commonly embeds `user:password@...`) was used as a log-source dictionary key, exposing credentials in task logs. Apply the same `_strip_userinfo` helper; the OpenSearch client still connects with the full URL so auth is unaffected. Both `OpensearchTaskHandler` and `OpensearchRemoteLogIO` sites are patched. Also add `AGENTS.md` to both `providers/opensearch` and `providers/elasticsearch` noting that the two providers are forks and most task-log-handler fixes should be cross-applied.
3 tasks
potiuk
added a commit
that referenced
this pull request
Apr 19, 2026
…abel (#65509) Follow-up to #65349 — OpenSearch's `_group_logs_by_host` had the same credential-leak as Elasticsearch: the raw `[opensearch] host` config value (which commonly embeds `user:password@...`) was used as a log-source dictionary key, exposing credentials in task logs. Apply the same `_strip_userinfo` helper; the OpenSearch client still connects with the full URL so auth is unaffected. Both `OpensearchTaskHandler` and `OpensearchRemoteLogIO` sites are patched. Also add `AGENTS.md` to both `providers/opensearch` and `providers/elasticsearch` noting that the two providers are forks and most task-log-handler fixes should be cross-applied.
59 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The Elasticsearch task-log handler groups log hits by
hostand falls back to the raw[elasticsearch] hostconfig value when a hit does not carry ahostfield. That config is commonly set to a URL that embeds credentials:As a result, the full URL — including the
user:password@userinfo — appeared as a dictionary key in the task-log output, visible to any user with task-log read permission.This PR adds a small
_strip_userinfohelper that removes the userinfo portion of a URL, and uses it inElasticsearchRemoteLogIO._group_logs_by_hostfor the host fallback value. The Elasticsearch client itself is still connected using the full unredacted URL, so authentication is unaffected.Test plan
test_strip_userinfoparametrized across 7 input URL shapes (with userinfo, without userinfo, username-only, non-URL, empty) — all passtest_es_task_handler.pysuite continues to pass (71/71)Changelog
Added a "Bug fixes" section to
providers/elasticsearch/docs/changelog.rstdescribing the redaction.Was generative AI tooling used to co-author this PR?
Generated-by: Claude Opus 4.6 (1M context) following the guidelines at
https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions