Skip to content

fix(elasticsearch): pin compatible-with at the transport layer to keep ES 8 servers working#66065

Merged
eladkal merged 12 commits into
apache:mainfrom
petercheon:fix/elasticsearch-compat-with-v2
May 19, 2026
Merged

fix(elasticsearch): pin compatible-with at the transport layer to keep ES 8 servers working#66065
eladkal merged 12 commits into
apache:mainfrom
petercheon:fix/elasticsearch-compat-with-v2

Conversation

@petercheon
Copy link
Copy Markdown
Contributor

@petercheon petercheon commented Apr 29, 2026

What does this PR do

Adds a new [elasticsearch] es_compat_with config option that pins the
HTTP Accept / Content-Type compatible-with parameter on every
request the provider makes, by wrapping
client.transport.perform_request.

When set to a major version string ("7", "8", "9"), every outbound
request carries
Accept: application/vnd.elasticsearch+json; compatible-with=<major>
(and the matching +x-ndjson form for bulk so multi-line bodies still
parse on the server). When unset, behavior is unchanged.

The wrap is applied at every Elasticsearch client construction site in
the provider:

  • ElasticsearchTaskHandler (log/es_task_handler.py)
  • ElasticsearchRemoteLogIO (log/es_task_handler.py)
  • ESConnection (hooks/elasticsearch.py)
  • ElasticsearchPythonHook (hooks/elasticsearch.py)

A new shared module airflow.providers.elasticsearch._compat exposes the
helper.

Why

Since #64070 the provider declares elasticsearch>=8.10,<10. A default
install resolves to an elasticsearch>=9 Python client, which always
negotiates Accept: application/vnd.elasticsearch+json; compatible-with=9
on every request. Elasticsearch 8.x servers reject that with HTTP 400
media_type_header_exception (Accept version must be either version 8 or 7, but found 9), breaking the entire remote task log path:

PUT http://<es-host>/_bulk [status:400]
Unable to insert logs into Elasticsearch.
Reason: BadRequestError(400, 'media_type_header_exception',
        'Invalid media-type value on headers [Content-Type, Accept]',
        Accept version must be either version 8 or 7, but found 9.
        Accept=application/vnd.elasticsearch+json; compatible-with=9)

The same regression hits ElasticsearchSQLHook and
ElasticsearchPythonHook against ES 8 clusters.

There is no operator-side workaround today: the existing
elasticsearch_configs section can pass headers= to
Elasticsearch.__init__, but elasticsearch-py re-applies its own
per-API-method content-negotiation headers right before the request goes
out — those override the constructor headers. The compat header has to
be enforced at the transport layer.

Why not auto-detect the server version?

The first thing one would reach for is client.info() to read
version.number. That does not work, because info() itself goes
through the same transport with the same per-method
compatible-with=<client_major> header — so against an ES 8 server it
fails with the same 400. Auto-detect would need a separate raw HTTP
probe (or a fix in elasticsearch-py to make compat negotiation
server-aware), which is out of scope for this regression fix. Operators
already know which major version their cluster is.

Why supersede #66064

#66064 attempted the same fix using
client.options(headers={...}). That approach pins
client._headers (the default headers), but elasticsearch-py merges
per-API-method Accept/Content-Type over the default in
_perform_request, then runs mimetype_header_to_compat, which
hardcodes compatible-with=<client.__versionstr__.split(".")[0]>. So the
wire still carried compatible-with=9 against an ES 8 server. Verified
locally with elasticsearch-py 9.3.0 by capturing
Transport.perform_request calls. The unit tests in #66064 only
asserted against client._headers, so they passed without catching the
issue. This PR therefore (a) patches at the transport layer, where the
final headers actually live, and (b) writes wire-level tests that
intercept Transport.perform_request and assert on the captured headers
— see providers/elasticsearch/tests/unit/elasticsearch/test_compat.py.

How was this tested

Wire-level unit tests that swap elastic_transport.Transport.perform_request
for a recording spy, drive search / info / bulk, and assert on the
captured Accept / Content-Type headers:

  • test_apply_compat_with_unset_does_not_wrap_transport: with the
    option unset (both "" and None) the helper returns the client
    unchanged and transport.perform_request is the original.
  • test_apply_compat_with_pins_compatible_with_8: with
    es_compat_with = "8", every captured call carries
    Accept: application/vnd.elasticsearch+json; compatible-with=8,
    and bulk keeps +x-ndjson for the Content-Type.
  • test_apply_compat_with_pins_compatible_with_7: same flow with
    "7" to assert the helper does not hardcode "8".

Manually verified with elasticsearch-py 9.3.0 + a stand-alone
reproduction of apply_compat_with against Transport.perform_request
spies — all three calls go on the wire with the pinned major.

Reproduction (without the fix)

  1. Run Airflow 3.2.1 (provider 6.5.1+) against any Elasticsearch 8.x
    server, with write_to_es=True.
  2. Trigger any task. Worker logs the 400 above on every _bulk.

After the fix, set [elasticsearch] es_compat_with = "8" and the same
deployment writes logs successfully and the SQL/Python hooks work.

Notes on the wrap target

The wrap is installed on client.transport.perform_request. That is the
lowest stable seam between the high-level Elasticsearch client and the
HTTP layer in the current elastic_transport 8.x line; the per-API-method
Accept / Content-Type headers are constructed by elasticsearch-py and
then handed to Transport.perform_request, which is where they need to
be rewritten before the request is serialized. If elastic_transport
ever changes that signature, a forward-compatible alternative is to
subclass BaseClient and override the per-request header negotiation —
happy to take that direction in a follow-up if the maintainers prefer.

The wrap is also idempotent: a second call to apply_compat_with on the
same client is a no-op (guarded by inspecting
transport.__dict__["perform_request"]).

Out of scope / follow-ups

  • Auto-detection of server major (separate raw HTTP probe).
  • A potential upstream fix in elasticsearch-py making
    compatible-with negotiation server-aware.
  • Splitting the provider into ES 8 / ES 9 release lines.
  • Per-cluster (rather than global) es_compat_with configuration —
    multi-cluster deployments are out of scope here.

Note for the release manager

provider.yaml records version_added: 6.5.4 for the new option. If the
next release window is 6.6.0 instead, please feel free to bump that
during release prep.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

…p ES 8 servers working

Since apache#64070 the provider depends on elasticsearch>=8.10,<10. A default
install resolves to an elasticsearch>=9 Python client, which always
negotiates 'compatible-with=9' on every request. Elasticsearch 8.x
servers reject that with HTTP 400 media_type_header_exception, breaking
remote task log ingestion and both ElasticsearchSQLHook and
ElasticsearchPythonHook against ES 8 clusters.

Add a [elasticsearch] es_compat_with config option that, when set to a
major version string ('7'/'8'/'9'), wraps the client's transport
perform_request so every outbound request carries
'Accept: application/vnd.elasticsearch+json; compatible-with=<major>'
(and the matching '+x-ndjson' form for bulk so streaming bodies still
parse). The wrap is applied at every Elasticsearch client construction
site in the provider:

  - ElasticsearchTaskHandler  (log/es_task_handler.py)
  - ElasticsearchRemoteLogIO  (log/es_task_handler.py)
  - ESConnection              (hooks/elasticsearch.py)
  - ElasticsearchPythonHook   (hooks/elasticsearch.py)

When the option is unset, behavior is unchanged.

Tests assert against what the transport actually sends, not the in-memory
state of the client object. Setting client._headers (which is what
client.options(headers=...) does) is not enough because elasticsearch-py
re-applies its own per-API-method content-negotiation headers right
before the request is sent — only the transport layer sees the final
headers.

Closes: apache#66063
Supersedes: apache#66064
peter-cheon and others added 2 commits April 29, 2026 13:10
…impler ct check, correct test assertion

- idempotent guard: skip second wrap by checking transport.__dict__ for an
  existing instance attribute. Repeated apply_compat_with calls (e.g.
  hook reuse paths) are now true no-ops.
- content-type check simplified: '"ndjson" in ct' already matches both
  'ndjson' and '+x-ndjson' so the redundant 'x-ndjson' branch is dropped.
- unset-case test was using 'transport.perform_request is original' which
  would fail even when nothing was wrapped, because attribute access on a
  bound method produces a fresh wrapper object every time. Switched to
  inspecting transport.__dict__ for the 'perform_request' key, which
  precisely tracks whether the helper installed an instance override.
- new test_apply_compat_with_is_idempotent asserts the guard above.
The previous wrapper unconditionally overwrote the entire `Accept` header
to `application/vnd.elasticsearch+json; compatible-with=<major>` whenever
one was present. That is too aggressive: elasticsearch-py emits
non-JSON `Accept` values for several APIs that still need to flow through
the same transport. Notably:

- `client.cat.help()` sends `Accept: text/plain`.
- All other `client.cat.*` endpoints send `Accept: text/plain,application/json`.
- Search-MVT endpoints send `Accept: application/vnd.mapbox-vector-tile`.

After the previous wrap every one of those calls went on the wire as
plain `application/vnd.elasticsearch+json; compatible-with=<N>`, silently
turning cat responses into JSON for any operator using
`ElasticsearchPythonHook.get_conn()` to call cat APIs.

Mirror upstream's own `mimetype_header_to_compat` instead: only
`application/(json|x-ndjson|vnd.mapbox-vector-tile)` parts of the header
get the `compatible-with=<configured>` suffix, anything else is left
verbatim. The regex also matches the already-rewritten
`application/vnd.elasticsearch+<x>; compatible-with=<N>` form that
elasticsearch-py 9.x ships before the transport sees the request, so the
configured major actually replaces the client default major on the wire
(verified with a Transport spy against elasticsearch-py 9.3.0).

Two adjacent hardenings while we are in here:

- Strip whitespace from the config value and reject anything that is not a
  positive integer string with `AirflowConfigException` at construction
  time, so a typo like `es_compat_with = 'v8'` fails fast in the worker
  startup log instead of returning a 400 storm per request.
- Walk header keys case-insensitively, so a future `elastic_transport`
  that forwards PascalCase `Accept` / `Content-Type` keys cannot silently
  bypass the rewrite.

Tests: add wire-level cases for cat APIs (`text/plain` preserved,
`text/plain,application/...` partial rewrite), PascalCase headers,
whitespace stripping, non-numeric major rejection, and a direct
`conf.get -> None` branch (the existing parametrize folds into the
provider yaml default `""` via `conf_vars`).
@petercheon petercheon marked this pull request as ready for review April 29, 2026 06:01
peter-cheon and others added 7 commits April 29, 2026 15:11
….sdk

The rest of this module already routes airflow imports through the
`common.compat.sdk` shim (`conf` lives there), and the shim explicitly
exports `AirflowConfigException` so the same provider build can target
both Airflow 2 (`airflow.exceptions`) and Airflow 3 (`airflow.sdk.exceptions`).
Switch the new exception import to the same shim so we don't pin to
`airflow.exceptions` and silently break the Airflow 3 import path.
CI on the latest `main` merge surfaced four failures, all mechanical and
fixed in this commit:

1. **Static checks** (ruff / ruff-format / autogen):
   - `_compat.py` — D205 docstring rule wants the summary line on its own
     line, both for the module docstring and for `apply_compat_with`'s
     docstring. Reformatted both.
   - `hooks/elasticsearch.py` — collapsed the multi-line
     `apply_compat_with(Elasticsearch(...))` call into a single line
     (now under the line-length cap thanks to the basic_auth tuple sitting
     inside the existing parens).
   - `tests/.../test__compat.py` — collapsed two over-wrapped expressions
     (`captured.append({...})` in the spy, and the
     `assert wire_capture[-1][...] == "..."` in
     `test_apply_compat_with_strips_whitespace_in_config`).
   - `get_provider_info.py` — the autogenerated mirror of `provider.yaml`
     was missing the new `es_compat_with` config option entry. Added it
     with the same description / version_added / type / example / default
     as the yaml.

2. **MyPy providers** (`Cannot assign to a method [method-assign]`):
   - `transport.perform_request = perform_request` (instance-level
     assignment) is rejected by mypy because elastic_transport's
     `Transport.perform_request` is bound at the class. Switched to
     `setattr(transport, "perform_request", perform_request)`, which
     mypy accepts and which preserves the exact same runtime behaviour
     (the idempotency guard at the top of the function still inspects
     `transport.__dict__["perform_request"]`, so repeat calls remain
     no-ops).

3. **Non-DB tests core** and **Low dep tests core**
   (`test_project_structure.py::test_providers_modules_should_have_tests`):
   - The structure check expects the test file for source `_compat.py`
     (note the leading underscore) to be named `test__compat.py` (two
     underscores: `test_` + `_compat`). Renamed the file from
     `test_compat.py` → `test__compat.py` via `git mv` so the rest of
     git history follows.

Re-validated locally:
- `ruff check` and `ruff format --check` pass on all four files.
- mypy on `_compat.py` no longer reports the `method-assign` error
  (only an unrelated `airflow.__version__` attr-defined error from
  running mypy outside a real Airflow install — Airflow CI runs against
  an installed Airflow so this does not surface there).
- Wire-level regression matrix re-run with elasticsearch-py 9.3.0 and
  the `setattr` variant: cat.help `text/plain` preserved, cat.indices
  partial rewrite preserved, search/bulk Accept and Content-Type
  rewritten to compat=8, idempotency guard still triggers, bad values
  rejected. 7/7 PASS.
@eladkal
Copy link
Copy Markdown
Contributor

eladkal commented May 11, 2026

cc @Subham-KRLX as you added elasticsearch 9 support. Can you review?

@Subham-KRLX
Copy link
Copy Markdown
Contributor

This fixes the ES9 to ES8 regression. Before I can approve please validate that the elasticsearch es_compat_with option only accepts a major version string for example 7 8 or 9 and raise AirflowConfigException on invalid values. Do not construct the Elasticsearch client twice. Create a single instance and pass it to apply_compat_with. Make the transport wrapper accept args and kwargs and decorate it with functools.wraps so it is tolerant of different elastic_transport perform_request signatures. Add a short docs entry for es_compat_with with an airflow.cfg example and note that the transport level rewrite overrides per API header negotiation. Confirm whether a newsfragment is required. Address these and I will re check and approve.

- Refactor apply_compat_with to use functools.wraps + *args, **kwargs so
  the wrapper survives future elastic_transport perform_request signature
  changes (new keyword-only params, reordered positionals) and preserves
  __name__/__doc__/__wrapped__ for introspection.

- Extend the es_compat_with docs entry with explicit valid-value rules
  and a note that the fix is installed at the transport layer and
  therefore overrides elasticsearch-py's per-API-method header
  negotiation (constructor headers= does not work for this purpose).
@petercheon
Copy link
Copy Markdown
Contributor Author

petercheon commented May 12, 2026

This fixes the ES9 to ES8 regression. Before I can approve please validate that the elasticsearch es_compat_with option only accepts a major version string for example 7 8 or 9 and raise AirflowConfigException on invalid values. Do not construct the Elasticsearch client twice. Create a single instance and pass it to apply_compat_with. Make the transport wrapper accept args and kwargs and decorate it with functools.wraps so it is tolerant of different elastic_transport perform_request signatures. Add a short docs entry for es_compat_with with an airflow.cfg example and note that the transport level rewrite overrides per API header negotiation. Confirm whether a newsfragment is required. Address these and I will re check and approve.

Thanks for the review @Subham-KRLX — addressed below.

  1. Major-version-only validation with AirflowConfigException.
    Already enforced. _compat.py validates the option against ^\d+$ after .strip()-ing whitespace and raises AirflowConfigException on any non-numeric value at client construction time, so a misconfiguration like "v8" or "8.0" fails fast in the worker startup log instead of producing a per-request 400 storm. Covered by
    test_apply_compat_with_rejects_non_numeric_major (parametrized over "v8", "8.0", "abc", "8;9") and test_apply_compat_with_strips_whitespace_in_config.

  2. Single Elasticsearch(...) construction.
    I might be misreading your concern here — each call site already constructs the client exactly once and hands that instance to apply_compat_with, which mutates client.transport.perform_request in place and returns the same client. The line you quoted (self.es = apply_compat_with(Elasticsearch(self.url, basic_auth=(user,
    password), **kwargs))) is the current code, not a suggested change. If you spotted a site where two clients are actually being built, could you point me at the file:line? I want to make sure I'm not missing one.

  3. *args, **kwargs + functools.wraps on the transport wrapper. ✅
    Done in the latest push. The inner perform_request is now @functools.wraps(original_perform_request) decorated and accepts *args, **kwargs, with headers pulled via kwargs.get("headers") and put back in kwargs after the rewrite. This way the wrapper is tolerant of future elastic_transport.Transport.perform_request signature
    changes (new keyword-only params, reordered positionals) and inspect / debuggers / wrapped-aware tooling see the original signature. The existing wire-level tests continue to pass because elasticsearch-py always passes headers as a kwarg.

  4. Docs entry with airflow.cfg example and transport-level override note. ✅
    Extended providers/elasticsearch/docs/logging/index.rst (the existing Pinning the compatible-with content-negotiation level subsection) with:

    • explicit valid-value rules (positive integer major; non-numeric → AirflowConfigException)
    • a .. note:: calling out that the fix is installed at the transport layer and therefore overrides elasticsearch-py's per-API-method Accept / Content-Type negotiation — and that constructor headers= on Elasticsearch.init and the elasticsearch_configs section do not work for this purpose, since elasticsearch-py re-applies
      its own compatible-with=<client_major> right before the request goes out.

The airflow.cfg example was already there ([elasticsearch] es_compat_with = 8).

Ready for another look when you have a moment.

Used in providers/elasticsearch/docs/logging/index.rst to describe the
fail-fast behavior when [elasticsearch] es_compat_with is set to an
invalid value. The wordlist already contained 'misconfigured' but the
noun form was missing, causing the --spellcheck-only docs build to fail.
@Subham-KRLX
Copy link
Copy Markdown
Contributor

@eladkal I reviewed the changes. es_compat_with accepts only numeric majors and raises AirflowConfigException. The transport wrapper uses functools.wraps accepts args and kwargs rewrites headers idempotently and the client is constructed once and wrapped in place. Tests check wire headers and docs were updated. Please confirm there are no other sites constructing two clients and I will approve.

@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label May 18, 2026
@eladkal
Copy link
Copy Markdown
Contributor

eladkal commented May 19, 2026

@Subham-KRLX since you created the original PR I defer to your judgment and confirmation that this PR fix the issue

@Subham-KRLX
Copy link
Copy Markdown
Contributor

@Subham-KRLX since you created the original PR I defer to your judgment and confirmation that this PR fix the issue

I checked the codebase and audited all client construction sites in the provider there are only four sites that instantiate the client and each does it exactly once before wrapping it in place so there is no double construction the fix is clean and works at the transport layer to bypass the client limitation and all tests pass. This definitely fixes the regression so we are good to go.

@eladkal eladkal merged commit fdbb9b0 into apache:main May 19, 2026
104 checks passed
@boring-cyborg
Copy link
Copy Markdown

boring-cyborg Bot commented May 19, 2026

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants