Lazy load Elasticsearch hook client import#67519
Conversation
dwreeves
left a comment
There was a problem hiding this comment.
Seems good to me. Thanks!
|
Thanks for the review, @dwreeves. All checks are passing now. This is ready for maintainer review when someone with write access has time. |
|
I addressed the review comments. The update is pushed in |
|
Ooh, one other thing. Did you run the performance tests that @dwreeves ran? |
|
Not yet. I only ran the provider unit tests and local static checks so far. I focused first on keeping this PR small and matching the lazy-load pattern described in the issue: move the runtime SDK import out of module import time, keep it under I can run the same profiling script / before-and-after import benchmark that @dwreeves used and add the numbers to the PR description if that is expected for these provider-specific lazy-loading PRs. |
|
@Codingaditya17, that would be great, please include the results here. |
|
Sure, I ran the provider footprint benchmark locally using the script from the gist linked in #67515. Command used on both before and after:
Before this PR, with the module-level Elasticsearch client import:
After this PR, with the Elasticsearch client import lazy-loaded:
So on my machine, this reduced the provider import delta by about 235ms, reduced memory delta by about 56.5MB, and reduced provider-loaded modules by 632. I restored the branch after benchmarking, so there are no benchmark/script changes included in the PR. |
|
Hi @Codingaditya17, thank you for the focused work here -- the engineering on this PR is clean. I'm closing this along with the other PRs targeting #67515 after thinking through the actual savings more carefully. Full reasoning here: #67515 (comment) Short version: an Operator imports a Hook which imports the SDK, so moving the SDK import inside a method mostly shifts when the import happens rather than saving it -- workers running the task pay the same cost. It also loses the parse-time Genuinely sorry for the wasted cycles -- on us for not flagging the strategy concern before contributors started shipping. Two areas that are worth the work if you want something else to pick up:
Thanks again. |
Why
This is a focused provider-specific PR for the broader provider lazy-loading effort discussed in the issue. The Elasticsearch provider currently imports the Elasticsearch Python client at module import time in
hooks/elasticsearch.py. That means importing the provider hook also imports the externalelasticsearchclient package immediately, even when a DAG only needs to parse the hook module and does not actually create an Elasticsearch client. This PR makes that import lazy so the Elasticsearch client package is imported only when a connection/client is actually created.What changed
Removed the module-level
from elasticsearch import Elasticsearchimport from the Elasticsearch hook module, added a type-only import underTYPE_CHECKING, and moved the runtime import into the places where the client is actually created:ESConnection.__init__andElasticsearchPythonHook._get_elastic_connection. Also updated the affected unit test mock to patchelasticsearch.Elasticsearchdirectly now that the provider module no longer exposesElasticsearchas a module-level import.Scope
This PR only targets the Elasticsearch provider to keep the change small and reviewable. If this direction looks good, similar provider-specific lazy-loading PRs can be opened separately for other providers.
Tests
Ran
uv run pytest providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.py -qand got26 passed, 1 warning.Ran
uv run pytest providers/elasticsearch/tests/unit/elasticsearch -qand got131 passed, 1 warning.Ran
uv run prek run --files providers/elasticsearch/src/airflow/providers/elasticsearch/hooks/elasticsearch.py providers/elasticsearch/tests/unit/elasticsearch/hooks/test_elasticsearch.pyand all checks passed.