Fix OpenSearch log handler to support single-URL opensearch secret from helm chart#66051
Fix OpenSearch log handler to support single-URL opensearch secret from helm chart#66051AlexMTX wants to merge 1 commit into
Conversation
…edded credentials and port When AIRFLOW__OPENSEARCH__HOST contains userinfo (user:password@) and/or a non-default port, the handler now uses them correctly: - Credentials embedded in the host URL are extracted and used for HTTP auth when AIRFLOW__OPENSEARCH__USERNAME / PASSWORD are not set. - OPENSEARCH_PORT now defaults to None instead of 9200, so a port in the host URL is no longer silently overridden by the hardcoded default.
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
|
SameerMesiah97
left a comment
There was a problem hiding this comment.
PR is justified but I think the validation needs to be strengthened. Also, test coverage could be better.
I have left a few comments.
| template, so a port embedded in the host URL (e.g. ``:443``) was silently overridden. The | ||
| default is now ``None``; when no explicit port is configured, the port is taken from the host | ||
| URL, falling back to ``9200`` only when the URL carries no port either. | ||
|
|
There was a problem hiding this comment.
The changelog entry seems accurate but this is too much detail. These are meant to be user-facing summaries rather than explanations of internal behavior. I would suggest shortening this to focus on the observable fixes (credentials + port handling) rather than implementation details. Perhaps this might be better:
Fix OpenSearch log handler to properly support single-URL configuration
with embedded credentials and port. Previously, credentials could be
ignored and ports overridden by defaults.
| if not parsed_url.netloc: | ||
| raise ValueError(f"'{host}' is not a valid URL.") | ||
|
|
||
| return host |
There was a problem hiding this comment.
_format_url handles scheme + netloc which is fine for most cases, but we rely on parsed_url.hostname downstream. Might be worth validating that explicitly to avoid cases where netloc exists but hostname is None (e.g. malformed userinfo).
| return OpenSearch( | ||
| hosts=[{"host": parsed_url.hostname, "port": resolved_port, "scheme": parsed_url.scheme}], | ||
| http_auth=(username, password), | ||
| http_auth=(effective_username, effective_password), |
There was a problem hiding this comment.
what about cases where effective_username is "" and effective_password is present? Or vice versa? Perhaps you could do this:
http_auth = (
(effective_username, effective_password)
if effective_username
else None
)
My understanding is empty password + non-empty username is valid but the opposite or both being empty are not.
| # No credentials anywhere — empty strings passed through | ||
| ("http://opensearch.example.com:9200", "", "", ("", "")), | ||
| ], | ||
| ) |
There was a problem hiding this comment.
Could you add these 2 cases here;
# Only username is present.
("https://user@opensearch.example.com", "", "", ("user", ""),),
# Only password is present.
( "https://:pass@opensearch.example.com", "", "", ("", "pass"),),
We need to handle partial credentials too.
| _create_opensearch_client("https://user:pass@opensearch.example.com:9200", None, "", "", {}) | ||
| hosts = mock_os.call_args.kwargs["hosts"] | ||
| assert hosts == [{"host": "opensearch.example.com", "port": 9200, "scheme": "https"}] | ||
|
|
There was a problem hiding this comment.
If you decide to introduce hostname validation, you can add this test as well:
def test_invalid_hostname_raises(self):
with pytest.raises(ValueError):
_create_opensearch_client("http://:9200", None, "", "", {})
I would add a match as well to pytest.raises to assert the error message.
|
@AlexMTX A few things need addressing before review — see our Pull Request quality criteria. Issues found:
What to do next:
No rush — take your time. We appreciate your contribution and are happy to wait for updates. If you have questions, feel free to ask on the Airflow Slack. Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
|
@AlexMTX A few things need addressing before review — see our Pull Request quality criteria.
No rush. Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. Drafted-by: Claude Code (Opus 4.7); reviewed by @potiuk before posting |
with embedded credentials and port
When AIRFLOW__OPENSEARCH__HOST contains userinfo (user:password@) and/or a non-default port, the handler now uses them correctly:
Was generative AI tooling used to co-author this PR?
Claude Code v2.1.121 Sonnet 4.6
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.