ElasticsearchSQLHook: add Polars DataFrame support via custom SQL reader#66220
ElasticsearchSQLHook: add Polars DataFrame support via custom SQL reader#66220SameerMesiah97 wants to merge 1 commit into
Conversation
justinpakzad
left a comment
There was a problem hiding this comment.
Should polars be added as an optional dependency in the pyproject.toml? Currently it's only in the dev dependencies, so I think this would fail with an import error if a user passes df_type="polars". Might be worth adding a guarded import with an error message pointing users to the right install extra.
…Hook. Use cursor-based pagination and add unit tests for pagination, max_rows handling, and cursor cleanup.
452e963 to
07a45b5
Compare
polars has been added as an optional dependency in |
|
cc @guan404ming for review as the one who added polars support for most of the providers |
To be specific |
Would be interested in your thoughts on this issue as well: |
Just revisiting this now and I noticed that the base |
Were you able to get |
Good shout. I didn't actually test the |
Actually, that is a false alarm. The root cause was a version mismatch between the elasticsearch client used by airflow and the cluster. |
|
Requesting review for this. |
Description
This PR is a follow-up to #50454 which adds support for returning query results as a Polars DataFrame in
ElasticsearchSQLHookby implementing_get_polars_df.Instead of relying on
polars.read_database, which requires DB-API compatibility, this implementation introduces a custom reader that executes Elasticsearch SQL queries using cursor-based pagination and converts the results into a Polars DataFrame.Rationale
ElasticsearchSQLCursoris not compatible with the execution model expected bypolars.read_database, which prevents native Polars support via the existing DB-API abstraction.This change resolves the outstanding TODO identified in #50454 by implementing a dedicated reader that interacts directly with the Elasticsearch SQL API. This avoids the need to adapt the custom cursor to Polars’ internal executor interface, which would introduce unnecessary complexity and tighter coupling.
Tests
Added unit tests verifying that:
max_rowstruncation.df_type="polars"is requested.Documentation
Updated the
_get_polars_dfdocstring to describe the use of a custom reader and explain whypolars.read_databaseis not used.Backwards Compatibility
This change adds support for
df_type="polars"inElasticsearchSQLHookand does not modify existing behavior for other data frame types. No breaking changes are introduced.