Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: stop query in SQL Lab with impala engine (#20950) #22441

Closed
wants to merge 0 commits into from
Closed

fix: stop query in SQL Lab with impala engine (#20950) #22441

wants to merge 0 commits into from

Conversation

wanghong1314
Copy link
Contributor

fix(sqllab): Stop button for queries doesn't work in SQL Lab when using SQL Lab with impala engine and adding Progress Information
Fix the bug:#20950

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

BEFORE
image

AFTER
image

progress info
image

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@wanghong1314 wanghong1314 marked this pull request as draft December 16, 2022 15:01
@wanghong1314 wanghong1314 marked this pull request as ready for review December 16, 2022 15:04
@wanghong1314
Copy link
Contributor Author

@rusackas @eschutho @yousoph Have some time to help check whether it fits the solution

@codecov
Copy link

codecov bot commented Dec 16, 2022

Codecov Report

Merging #22441 (55fbb6d) into master (b6d39d1) will decrease coverage by 0.05%.
The diff coverage is 25.71%.

@@            Coverage Diff             @@
##           master   #22441      +/-   ##
==========================================
- Coverage   66.91%   66.85%   -0.06%     
==========================================
  Files        1851     1850       -1     
  Lines       70715    70768      +53     
  Branches     7766     7750      -16     
==========================================
- Hits        47320    47315       -5     
- Misses      21373    21437      +64     
+ Partials     2022     2016       -6     
Flag Coverage Δ
hive 52.40% <25.71%> (-0.06%) ⬇️
mysql ?
postgres 77.90% <25.71%> (-0.13%) ⬇️
presto 52.30% <25.71%> (-0.06%) ⬇️
python 81.11% <25.71%> (-0.16%) ⬇️
sqlite 76.39% <25.71%> (-0.11%) ⬇️
unit 51.13% <25.71%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset-frontend/src/SqlLab/actions/sqlLab.js 63.47% <ø> (-0.10%) ⬇️
superset/views/core.py 74.69% <0.00%> (-0.29%) ⬇️
superset/db_engine_specs/impala.py 40.21% <25.00%> (-43.12%) ⬇️
superset/config.py 91.51% <100.00%> (+0.02%) ⬆️
superset/common/utils/dataframe_utils.py 90.47% <0.00%> (-4.77%) ⬇️
superset/db_engine_specs/mysql.py 94.04% <0.00%> (-4.77%) ⬇️
superset/reports/commands/log_prune.py 85.71% <0.00%> (-3.58%) ⬇️
...a/dataset/AddDataset/DatasetPanel/DatasetPanel.tsx 87.23% <0.00%> (-2.97%) ⬇️
...frontend/src/views/CRUD/welcome/DashboardTable.tsx 54.09% <0.00%> (-1.68%) ⬇️
superset/commands/importers/v1/utils.py 92.20% <0.00%> (-1.30%) ⬇️
... and 30 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -229,6 +229,9 @@ export function startQuery(query) {

export function querySuccess(query, results) {
return function (dispatch) {
if (!results.query) {
Copy link
Member

@eschutho eschutho Dec 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this line here to prevent an error on line 239? If so, I think it's still useful to store the query in line 245 and just use optional chaining in case results.query doesn't exist. results?.query?.sqlEditorId

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah ,to prevent query.sqlEditorId is undefined, I have tested your method, it is OK, thank you for your suggestion, I need to submit a PR again?
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code has been changed, please kindly review it again

) -> None: # pylint: disable=arguments-differ
# kwargs = {"async": async_}
try:
cursor.execute_async(query)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the intention here to run execute_async if _async=True is passed in? If so, it looks like we should still keep the option to run this query synchronously.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. since the impala engine does not support async with execute, it is possible that my comments are ambiguous
  2. because the impala engine is asynchronous, the query status can be obtained

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run execute if synchronous and execute_async only if _async is True? That way we can still run synchronously as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you can't get the status of the query synchronously, you can't really stop the query, and the code is executed asynchronously by default image

# Refresh session so that the `query.status` and `query.extra.get(is_stopped)`
# modified in stop_query in views / core.py is reflected here.
# stop query
if cls.is_cancel_query(cls, query, session, query_id):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too familiar with impala, to be honest, but for the other databases, we usually handle the cancel query functionality when the cancel_query method is called. It looks like it would be more efficient to move this logic into cancel_query in this db engine spec so that it is only run when we know that a cancel query has been requested, instead of checking on each cursor operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the impala query cancellation needs to be obtained in the cursor, similar to the hive engine query cancellation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is my custom method, not the cancel_query method, because this method is used repeatedly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-bodley or @betodealmeida this is the way that we cancel queries on hive, but I'm wondering if it would be more efficient to use the cancel_query method instead, providing that you can get the cursor. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also cc @villebro

Copy link
Member

@eschutho eschutho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @wanghong1314! I left a few comments/questions. Do you mind also adding some tests? There are examples for my_sql, postgres, snowflake, etc..

@eschutho
Copy link
Member

@wanghong1314 there are also some failing CI checks. Let us know if you need any help resolving them. cc @betodealmeida for another set of eyes on this review.

@wanghong1314
Copy link
Contributor Author

@wanghong1314 there are also some failing CI checks. Let us know if you need any help resolving them. cc @betodealmeida for another set of eyes on this review.

@baldoalessandro please check that I have not changed the code of mysql and postgress related engine. Why does ci report an error?

@wanghong1314
Copy link
Contributor Author

@eschutho @bolkedebruin @eschutho I saw that the ci test passed, please help review the code, thank you

@eschutho
Copy link
Member

eschutho commented Jan 3, 2023

@eschutho @bolkedebruin @eschutho I saw that the ci test passed, please help review the code, thank you

Thanks @wanghong1314. I'm going to defer to @betodealmeida or @villebro on whether the hive pattern is still the best option here.

@eschutho eschutho dismissed their stale review January 3, 2023 23:01

waiting for other feedback

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. I would suggest looking at a very recently merged PR of mine #22498 that may already solve this problem for you. More specifically, you may be able to leverage that QUERY_EARLY_CANCEL_KEY and avoid having to introduce the is_cancel_query method. Please ping me on Slack if you want to discuss this sync (we can hop on a zoom or similar if needed).

superset-frontend/src/SqlLab/actions/sqlLab.js Outdated Show resolved Hide resolved
Comment on lines 1110 to 1111
# Interval between consecutive polls when using Impala Engine
IMPALA_POLL_INTERVAL = int(timedelta(seconds=5).total_seconds())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure we don't clutter config.py with too many disparate config keys, maybe we should remove these *_POLL_INTERVAL keys and refactor this to something like

DB_POLL_INTERVAL_SECONDS: Dict[str, int] = {}

This could be used to specify these per engine name in your superset_config.py (here I'd be overriding polling to 1 seconds for Hive):

DB_POLL_INTERVAL_SECONDS = {
    "hive": int(timedelta(seconds=1).total_seconds()),
}

I know it's a breaking change, so we can probably fall back to HIVE_POLL_INTERVAL in the hive spec for now. So maybe change the following line

time.sleep(current_app.config["HIVE_POLL_INTERVAL"])
to something like this (see how I've moved the default to the engine spec, rather than having it in config.py):

if sleep_interval := current_app.config.get("HIVE_POLL_INTERVAL"):
    logger.warning("HIVE_POLL_INTERVAL is deprecated and will be removed in 3.0. Please use DB_POLL_INTERVAL instead")
else:
    sleep_interval = current_app.config["DB_POLL_INTERVAL_SECONDS"].get(cls.engine, 5)

time.sleep(sleep_interval)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. I will try it locally and then submit the code

@wanghong1314 wanghong1314 deleted the fix/impala_stop_query branch January 7, 2023 14:39
@wanghong1314 wanghong1314 restored the fix/impala_stop_query branch January 7, 2023 14:40
@wanghong1314 wanghong1314 deleted the fix/impala_stop_query branch January 7, 2023 14:40
@wanghong1314 wanghong1314 restored the fix/impala_stop_query branch January 7, 2023 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants