Skip to content

fix: Honor ROW_LIMIT setting in elasticsearch queries#15082

Closed
sowo wants to merge 7 commits into
apache:1.2from
sowo:1.2
Closed

fix: Honor ROW_LIMIT setting in elasticsearch queries#15082
sowo wants to merge 7 commits into
apache:1.2from
sowo:1.2

Conversation

@sowo
Copy link
Copy Markdown
Contributor

@sowo sowo commented Jun 10, 2021

SUMMARY

In order to adjust the fetch size of the elasticsearch-dbapi (https://github.com/preset-io/elasticsearch-dbapi#fetch-size) to the ROW_LIMIT setting, the fetch_size attribute of the cursor needs to be set.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

  • set ROW_LIMIT=20000 in config.py
  • set ROW_LIMIT=50000 in a chart that uses an Elasticsearch data set
  • run query on a sufficiently large data set
  • observe that 20k rows are displayed in the chart

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

sowo added 2 commits June 10, 2021 10:44
In order to adjust the fetch size of the elasticsearch-dbapi (https://github.com/preset-io/elasticsearch-dbapi#fetch-size) to the ROW_LIMIT setting, the fetch_size attribute of the cursor needs to be set.
Honor ROW_LIMIT setting in elasticsearch queries
@dpgaspar dpgaspar self-requested a review June 10, 2021 12:08
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 10, 2021

Codecov Report

Merging #15082 (a330b66) into 1.2 (387d933) will increase coverage by 0.04%.
The diff coverage is 68.86%.

❗ Current head a330b66 differs from pull request most recent head 374f942. Consider uploading reports for the commit 374f942 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##              1.2   #15082      +/-   ##
==========================================
+ Coverage   77.20%   77.24%   +0.04%     
==========================================
  Files         956      973      +17     
  Lines       48164    50523    +2359     
  Branches     6006     6184     +178     
==========================================
+ Hits        37183    39025    +1842     
- Misses      10776    11292     +516     
- Partials      205      206       +1     
Flag Coverage Δ
hive 81.43% <ø> (+0.66%) ⬆️
javascript 71.76% <68.86%> (-0.56%) ⬇️
mysql 81.70% <ø> (+0.67%) ⬆️
postgres 81.72% <ø> (+0.66%) ⬆️
presto 81.42% <ø> (?)
python 82.25% <ø> (+0.81%) ⬆️
sqlite 81.36% <ø> (+0.68%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset-frontend/src/CRUD/Fieldset.jsx 85.71% <ø> (ø)
superset-frontend/src/SqlLab/App.jsx 0.00% <0.00%> (ø)
superset-frontend/src/SqlLab/actions/sqlLab.js 58.97% <ø> (ø)
...-frontend/src/SqlLab/components/SouthPane/state.ts 100.00% <ø> (ø)
superset-frontend/src/SqlLab/reducers/sqlLab.js 34.95% <ø> (ø)
superset-frontend/src/chart/Chart.jsx 50.00% <ø> (ø)
superset-frontend/src/chart/ChartContainer.jsx 100.00% <ø> (ø)
...ontend/src/common/hooks/apiResources/dashboards.ts 40.00% <0.00%> (-10.00%) ⬇️
superset-frontend/src/components/Button/index.tsx 100.00% <ø> (ø)
...c/components/ErrorMessage/DatabaseErrorMessage.tsx 94.73% <ø> (ø)
... and 402 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 387d933...374f942. Read the comment docs.

@sowo sowo changed the title Honor ROW_LIMIT setting in elasticsearch queries fix: Honor ROW_LIMIT setting in elasticsearch queries Jun 10, 2021
Comment thread superset/models/core.py Outdated

with closing(engine.raw_connection()) as conn:
cursor = conn.cursor()
if hasattr(cursor, 'fetch_size'):
Copy link
Copy Markdown
Member

@dpgaspar dpgaspar Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an engine specific configuration on engine agnostic code is not the best path IMO. Also overriding a database connection behind the scenes does not seem like the right way to go.

Can you add more context and a real world example on why this is needed. Thank you!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this should be better handled in engine specific code but I could not identify any method where this could be done. I am gladly accepting advise on how to improve this.

As for the context, I need to use the DeckGL chart to display a large set (up to several hundreds of thousands)
of earthquakes along with other key metrics on a dashboard. When removing the limit in the chart config (or when setting the limit to 50000), I am still fetching only 10000 events due to the default fetch_size set in elasticsearch-dbapi.

Thank you for looking into this!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted the original change in models/core.py and am now setting fetch_size of the Elastic cursor in the elasticsearch db_engine_spec. Is that the proper implementation?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it make more sense to just set your database connection to whatever top value you need?
I mean why not just configure the elasticsearch connection with the fetch_size for ROW_LIMIT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the elasticsearch-dbapi docs (https://github.com/preset-io/elasticsearch-dbapi#fetch-size), the fetch_size is set at the cursor level and not when initiating a connection. As far as I understand the code (see

cursor = conn.cursor()
), there's no method that I could override to create a cursor with engine specific settings.

@sowo sowo requested a review from dpgaspar June 22, 2021 04:26
@stale
Copy link
Copy Markdown

stale Bot commented Apr 16, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale Bot added the inactive Inactive for >= 30 days label Apr 16, 2022
@stale stale Bot closed this Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

inactive Inactive for >= 30 days size/XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants