-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: stop query in SQL Lab with impala engine (#20950) #22441
Conversation
Codecov Report
@@ Coverage Diff @@
## master #22441 +/- ##
==========================================
- Coverage 66.91% 66.85% -0.06%
==========================================
Files 1851 1850 -1
Lines 70715 70768 +53
Branches 7766 7750 -16
==========================================
- Hits 47320 47315 -5
- Misses 21373 21437 +64
+ Partials 2022 2016 -6
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@@ -229,6 +229,9 @@ export function startQuery(query) { | |||
|
|||
export function querySuccess(query, results) { | |||
return function (dispatch) { | |||
if (!results.query) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this line here to prevent an error on line 239? If so, I think it's still useful to store the query in line 245 and just use optional chaining in case results.query doesn't exist. results?.query?.sqlEditorId
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code has been changed, please kindly review it again
superset/db_engine_specs/impala.py
Outdated
) -> None: # pylint: disable=arguments-differ | ||
# kwargs = {"async": async_} | ||
try: | ||
cursor.execute_async(query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the intention here to run execute_async
if _async=True
is passed in? If so, it looks like we should still keep the option to run this query synchronously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run execute if synchronous and execute_async only if _async is True? That way we can still run synchronously as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
superset/db_engine_specs/impala.py
Outdated
# Refresh session so that the `query.status` and `query.extra.get(is_stopped)` | ||
# modified in stop_query in views / core.py is reflected here. | ||
# stop query | ||
if cls.is_cancel_query(cls, query, session, query_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too familiar with impala, to be honest, but for the other databases, we usually handle the cancel query functionality when the cancel_query
method is called. It looks like it would be more efficient to move this logic into cancel_query
in this db engine spec so that it is only run when we know that a cancel query has been requested, instead of checking on each cursor operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the impala query cancellation needs to be obtained in the cursor, similar to the hive engine query cancellation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is my custom method, not the cancel_query method, because this method is used repeatedly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@john-bodley or @betodealmeida this is the way that we cancel queries on hive, but I'm wondering if it would be more efficient to use the cancel_query method instead, providing that you can get the cursor. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also cc @villebro
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @wanghong1314! I left a few comments/questions. Do you mind also adding some tests? There are examples for my_sql, postgres, snowflake, etc..
@wanghong1314 there are also some failing CI checks. Let us know if you need any help resolving them. cc @betodealmeida for another set of eyes on this review. |
@baldoalessandro please check that I have not changed the code of mysql and postgress related engine. Why does ci report an error? |
@eschutho @bolkedebruin @eschutho I saw that the ci test passed, please help review the code, thank you |
Thanks @wanghong1314. I'm going to defer to @betodealmeida or @villebro on whether the hive pattern is still the best option here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments. I would suggest looking at a very recently merged PR of mine #22498 that may already solve this problem for you. More specifically, you may be able to leverage that QUERY_EARLY_CANCEL_KEY
and avoid having to introduce the is_cancel_query
method. Please ping me on Slack if you want to discuss this sync (we can hop on a zoom or similar if needed).
superset/config.py
Outdated
# Interval between consecutive polls when using Impala Engine | ||
IMPALA_POLL_INTERVAL = int(timedelta(seconds=5).total_seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure we don't clutter config.py
with too many disparate config keys, maybe we should remove these *_POLL_INTERVAL
keys and refactor this to something like
DB_POLL_INTERVAL_SECONDS: Dict[str, int] = {}
This could be used to specify these per engine name in your superset_config.py
(here I'd be overriding polling to 1 seconds for Hive):
DB_POLL_INTERVAL_SECONDS = {
"hive": int(timedelta(seconds=1).total_seconds()),
}
I know it's a breaking change, so we can probably fall back to HIVE_POLL_INTERVAL
in the hive spec for now. So maybe change the following line
superset/superset/db_engine_specs/hive.py
Line 378 in 01671b9
time.sleep(current_app.config["HIVE_POLL_INTERVAL"]) |
config.py
):
if sleep_interval := current_app.config.get("HIVE_POLL_INTERVAL"):
logger.warning("HIVE_POLL_INTERVAL is deprecated and will be removed in 3.0. Please use DB_POLL_INTERVAL instead")
else:
sleep_interval = current_app.config["DB_POLL_INTERVAL_SECONDS"].get(cls.engine, 5)
time.sleep(sleep_interval)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea. I will try it locally and then submit the code
fix(sqllab): Stop button for queries doesn't work in SQL Lab when using SQL Lab with impala engine and adding Progress Information
Fix the bug:#20950
SUMMARY
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
BEFORE
AFTER
progress info
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION