Skip to content

Lazy load Elasticsearch hook client import#67519

Closed
Codingaditya17 wants to merge 2 commits into
apache:mainfrom
Codingaditya17:lazy-load-elasticsearch-provider
Closed

Lazy load Elasticsearch hook client import#67519
Codingaditya17 wants to merge 2 commits into
apache:mainfrom
Codingaditya17:lazy-load-elasticsearch-provider

Conversation

@Codingaditya17
Copy link
Copy Markdown
Contributor

Why

This is a focused provider-specific PR for the broader provider lazy-loading effort discussed in the issue. The Elasticsearch provider currently imports the Elasticsearch Python client at module import time in hooks/elasticsearch.py. That means importing the provider hook also imports the external elasticsearch client package immediately, even when a DAG only needs to parse the hook module and does not actually create an Elasticsearch client. This PR makes that import lazy so the Elasticsearch client package is imported only when a connection/client is actually created.

What changed

Removed the module-level from elasticsearch import Elasticsearch import from the Elasticsearch hook module, added a type-only import under TYPE_CHECKING, and moved the runtime import into the places where the client is actually created: ESConnection.__init__ and ElasticsearchPythonHook._get_elastic_connection. Also updated the affected unit test mock to patch elasticsearch.Elasticsearch directly now that the provider module no longer exposes Elasticsearch as a module-level import.

Scope

This PR only targets the Elasticsearch provider to keep the change small and reviewable. If this direction looks good, similar provider-specific lazy-loading PRs can be opened separately for other providers.

Tests

Ran uv run pytest providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py -q and got 26 passed, 1 warning.

Ran uv run pytest providers/elasticsearch/tests/unit/elasticsearch -q and got 131 passed, 1 warning.

Ran uv run prek run --files providers/elasticsearch/src/airflow/providers/elasticsearch/hooks/elasticsearch.py providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py and all checks passed.

Copy link
Copy Markdown
Contributor

@dwreeves dwreeves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me. Thanks!

@Codingaditya17
Copy link
Copy Markdown
Contributor Author

Thanks for the review, @dwreeves.

All checks are passing now. This is ready for maintainer review when someone with write access has time.

@eladkal eladkal requested a review from jroachgolf84 May 26, 2026 12:53
@Codingaditya17
Copy link
Copy Markdown
Contributor Author

I addressed the review comments.

The update is pushed in 839b78a.

@jroachgolf84
Copy link
Copy Markdown
Collaborator

Ooh, one other thing. Did you run the performance tests that @dwreeves ran?

@Codingaditya17
Copy link
Copy Markdown
Contributor Author

Not yet. I only ran the provider unit tests and local static checks so far.

I focused first on keeping this PR small and matching the lazy-load pattern described in the issue: move the runtime SDK import out of module import time, keep it under TYPE_CHECKING for typing, and update the affected mock.

I can run the same profiling script / before-and-after import benchmark that @dwreeves used and add the numbers to the PR description if that is expected for these provider-specific lazy-loading PRs.

@jroachgolf84
Copy link
Copy Markdown
Collaborator

@Codingaditya17, that would be great, please include the results here.

@Codingaditya17
Copy link
Copy Markdown
Contributor Author

Sure, I ran the provider footprint benchmark locally using the script from the gist linked in #67515.

Command used on both before and after:

uv run --project airflow-core python dev/analyze_provider_footprints.py -p elasticsearch --runs 5 --markdown

Before this PR, with the module-level Elasticsearch client import:

Provider Time Delta Memory Delta Modules Loaded Top 3rd-Party Packages
elasticsearch 254ms (+23%) +58.8MB (+64%) 685 (1 tested) requests, elasticsearch, numpy, chardet (+18)

After this PR, with the Elasticsearch client import lazy-loaded:

Provider Time Delta Memory Delta Modules Loaded Top 3rd-Party Packages
elasticsearch 19ms (+2%) +2.3MB (+2%) 53 (1 tested) sqlparse, wrapt, deprecated, more_itertools

So on my machine, this reduced the provider import delta by about 235ms, reduced memory delta by about 56.5MB, and reduced provider-loaded modules by 632.

I restored the branch after benchmarking, so there are no benchmark/script changes included in the PR.

@kaxil
Copy link
Copy Markdown
Member

kaxil commented May 26, 2026

Hi @Codingaditya17, thank you for the focused work here -- the engineering on this PR is clean.

I'm closing this along with the other PRs targeting #67515 after thinking through the actual savings more carefully. Full reasoning here: #67515 (comment)

Short version: an Operator imports a Hook which imports the SDK, so moving the SDK import inside a method mostly shifts when the import happens rather than saving it -- workers running the task pay the same cost. It also loses the parse-time import_error surface that today reports install/version issues to users. PEP 810 (lazy imports, Python 3.15) is the cleaner path and the polyfill works alongside it without source-level refactors.

Genuinely sorry for the wasted cycles -- on us for not flagging the strategy concern before contributors started shipping.

Two areas that are worth the work if you want something else to pick up:

  1. TYPE_CHECKING guards on type-only imports across providers. where the module/package import hasn't already been done.
  2. Lazy-loading conditionally-used heavy deps (e.g. pandas_gbq inside BigQueryHook.get_pandas_df rather than at module level) -- the fail-fast loss is scoped to methods users may never call.

Thanks again.

@kaxil kaxil closed this May 26, 2026
@Codingaditya17 Codingaditya17 deleted the lazy-load-elasticsearch-provider branch May 26, 2026 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants