Skip to content

Fix OpenSearch log handler to support single-URL opensearch secret from helm chart#66051

Open
AlexMTX wants to merge 1 commit into
apache:mainfrom
AlexMTX:main
Open

Fix OpenSearch log handler to support single-URL opensearch secret from helm chart#66051
AlexMTX wants to merge 1 commit into
apache:mainfrom
AlexMTX:main

Conversation

@AlexMTX
Copy link
Copy Markdown

@AlexMTX AlexMTX commented Apr 28, 2026

with embedded credentials and port

When AIRFLOW__OPENSEARCH__HOST contains userinfo (user:password@) and/or a non-default port, the handler now uses them correctly:

  • Credentials embedded in the host URL are extracted and used for HTTP auth when AIRFLOW__OPENSEARCH__USERNAME / PASSWORD are not set.
  • OPENSEARCH_PORT now defaults to None instead of 9200, so a port in the host URL is no longer silently overridden by the hardcoded default.

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)
    Claude Code v2.1.121 Sonnet 4.6

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

…edded credentials and port

When AIRFLOW__OPENSEARCH__HOST contains userinfo (user:password@) and/or a
non-default port, the handler now uses them correctly:

- Credentials embedded in the host URL are extracted and used for HTTP auth
  when AIRFLOW__OPENSEARCH__USERNAME / PASSWORD are not set.
- OPENSEARCH_PORT now defaults to None instead of 9200, so a port in the host
  URL is no longer silently overridden by the hardcoded default.
@boring-cyborg
Copy link
Copy Markdown

boring-cyborg Bot commented Apr 28, 2026

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@AlexMTX AlexMTX changed the title Fix OpenSearch log handler to support single-URL k8s deployments Fix OpenSearch log handler to support single-URL opensearch secret from helm chart Apr 28, 2026
Copy link
Copy Markdown
Contributor

@SameerMesiah97 SameerMesiah97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR is justified but I think the validation needs to be strengthened. Also, test coverage could be better.

I have left a few comments.

template, so a port embedded in the host URL (e.g. ``:443``) was silently overridden. The
default is now ``None``; when no explicit port is configured, the port is taken from the host
URL, falling back to ``9200`` only when the URL carries no port either.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changelog entry seems accurate but this is too much detail. These are meant to be user-facing summaries rather than explanations of internal behavior. I would suggest shortening this to focus on the observable fixes (credentials + port handling) rather than implementation details. Perhaps this might be better:

  Fix OpenSearch log handler to properly support single-URL configuration
  with embedded credentials and port. Previously, credentials could be
  ignored and ports overridden by defaults.

if not parsed_url.netloc:
raise ValueError(f"'{host}' is not a valid URL.")

return host
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_format_url handles scheme + netloc which is fine for most cases, but we rely on parsed_url.hostname downstream. Might be worth validating that explicitly to avoid cases where netloc exists but hostname is None (e.g. malformed userinfo).

return OpenSearch(
hosts=[{"host": parsed_url.hostname, "port": resolved_port, "scheme": parsed_url.scheme}],
http_auth=(username, password),
http_auth=(effective_username, effective_password),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about cases where effective_username is "" and effective_password is present? Or vice versa? Perhaps you could do this:

http_auth = (
    (effective_username, effective_password)
    if effective_username
    else None
)

My understanding is empty password + non-empty username is valid but the opposite or both being empty are not.

# No credentials anywhere — empty strings passed through
("http://opensearch.example.com:9200", "", "", ("", "")),
],
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add these 2 cases here;

# Only username is present.
("https://user@opensearch.example.com", "", "", ("user", ""),),
# Only password is present.
( "https://:pass@opensearch.example.com", "", "", ("", "pass"),),

We need to handle partial credentials too.

_create_opensearch_client("https://user:pass@opensearch.example.com:9200", None, "", "", {})
hosts = mock_os.call_args.kwargs["hosts"]
assert hosts == [{"host": "opensearch.example.com", "port": 9200, "scheme": "https"}]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you decide to introduce hostname validation, you can add this test as well:

def test_invalid_hostname_raises(self):
    with pytest.raises(ValueError):
        _create_opensearch_client("http://:9200", None, "", "", {})

I would add a match as well to pytest.raises to assert the error message.

@eladkal
Copy link
Copy Markdown
Contributor

eladkal commented May 4, 2026

cc @Owen-CH-Leung

@potiuk
Copy link
Copy Markdown
Member

potiuk commented May 18, 2026

@AlexMTX A few things need addressing before review — see our Pull Request quality criteria.

Issues found:

  • Build documentation (spellcheck): CI image checks / Build documentation (--spellcheck-only) is failing. Run breeze build-docs --spellcheck-only locally to see which words it didn't accept, then either fix the spelling or add the term to docs/spelling_wordlist.txt if it's a legitimate technical word. See the docs-building docs.

What to do next:

  • Push a fix for the spellcheck failure.

No rush — take your time. We appreciate your contribution and are happy to wait for updates. If you have questions, feel free to ask on the Airflow Slack.


Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

@potiuk
Copy link
Copy Markdown
Member

potiuk commented May 18, 2026

@AlexMTX A few things need addressing before review — see our Pull Request quality criteria.

  • Build docs — Failing: CI image checks / Build documentation (--spellcheck-only). See docs.

Note: Your branch is 528 commits behind main. Please rebase and push again to get up-to-date CI results.

No rush.


Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.


Drafted-by: Claude Code (Opus 4.7); reviewed by @potiuk before posting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants