fix: Honor ROW_LIMIT setting in elasticsearch queries#15082
Conversation
In order to adjust the fetch size of the elasticsearch-dbapi (https://github.com/preset-io/elasticsearch-dbapi#fetch-size) to the ROW_LIMIT setting, the fetch_size attribute of the cursor needs to be set.
Honor ROW_LIMIT setting in elasticsearch queries
Codecov Report
@@ Coverage Diff @@
## 1.2 #15082 +/- ##
==========================================
+ Coverage 77.20% 77.24% +0.04%
==========================================
Files 956 973 +17
Lines 48164 50523 +2359
Branches 6006 6184 +178
==========================================
+ Hits 37183 39025 +1842
- Misses 10776 11292 +516
- Partials 205 206 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
|
|
||
| with closing(engine.raw_connection()) as conn: | ||
| cursor = conn.cursor() | ||
| if hasattr(cursor, 'fetch_size'): |
There was a problem hiding this comment.
Adding an engine specific configuration on engine agnostic code is not the best path IMO. Also overriding a database connection behind the scenes does not seem like the right way to go.
Can you add more context and a real world example on why this is needed. Thank you!
There was a problem hiding this comment.
I agree that this should be better handled in engine specific code but I could not identify any method where this could be done. I am gladly accepting advise on how to improve this.
As for the context, I need to use the DeckGL chart to display a large set (up to several hundreds of thousands)
of earthquakes along with other key metrics on a dashboard. When removing the limit in the chart config (or when setting the limit to 50000), I am still fetching only 10000 events due to the default fetch_size set in elasticsearch-dbapi.
Thank you for looking into this!
There was a problem hiding this comment.
I reverted the original change in models/core.py and am now setting fetch_size of the Elastic cursor in the elasticsearch db_engine_spec. Is that the proper implementation?
There was a problem hiding this comment.
wouldn't it make more sense to just set your database connection to whatever top value you need?
I mean why not just configure the elasticsearch connection with the fetch_size for ROW_LIMIT?
There was a problem hiding this comment.
According to the elasticsearch-dbapi docs (https://github.com/preset-io/elasticsearch-dbapi#fetch-size), the fetch_size is set at the cursor level and not when initiating a connection. As far as I understand the code (see
superset/superset/models/core.py
Line 404 in a330b66
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue |
SUMMARY
In order to adjust the fetch size of the elasticsearch-dbapi (https://github.com/preset-io/elasticsearch-dbapi#fetch-size) to the ROW_LIMIT setting, the fetch_size attribute of the cursor needs to be set.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION