Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Stop query in SQL Lab with impala engine #22635

Merged
merged 4 commits into from
Jan 10, 2023

Conversation

wanghong1314
Copy link
Contributor

@wanghong1314 wanghong1314 commented Jan 7, 2023

fix(sqllab): Stop button for queries doesn't work in SQL Lab when using SQL Lab with impala engine and adding Progress Information

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

BEFORE
image

AFTER
image

progress info
image

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

@wanghong1314
Copy link
Contributor Author

@eschutho @villebro Please review the code again. Thank you

@villebro villebro changed the title fix: Stop query in SQL Lab with impala engine (#20950) fix: Stop query in SQL Lab with impala engine Jan 10, 2023
@codecov
Copy link

codecov bot commented Jan 10, 2023

Codecov Report

Merging #22635 (c2bf101) into master (159dcd7) will decrease coverage by 0.03%.
The diff coverage is 32.78%.

@@            Coverage Diff             @@
##           master   #22635      +/-   ##
==========================================
- Coverage   67.14%   67.11%   -0.04%     
==========================================
  Files        1869     1869              
  Lines       71523    71582      +59     
  Branches     7814     7814              
==========================================
+ Hits        48022    48040      +18     
- Misses      21460    21501      +41     
  Partials     2041     2041              
Flag Coverage Δ
hive 52.60% <32.78%> (-0.04%) ⬇️
mysql 78.01% <27.86%> (-0.09%) ⬇️
postgres 78.08% <27.86%> (-0.09%) ⬇️
presto 52.49% <27.86%> (-0.05%) ⬇️
python 81.31% <32.78%> (-0.09%) ⬇️
sqlite 76.48% <27.86%> (-0.09%) ⬇️
unit 51.47% <27.86%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/db_engine_specs/impala.py 44.87% <27.77%> (-38.47%) ⬇️
superset/views/core.py 75.00% <50.00%> (-0.04%) ⬇️
superset/db_engine_specs/hive.py 87.15% <75.00%> (-0.25%) ⬇️
superset/config.py 91.66% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Comment on lines 1132 to 1135
# Interval between consecutive polls when using Hive Engine
HIVE_POLL_INTERVAL = int(timedelta(seconds=5).total_seconds())
# customize the polling time of each engine. The default time is 5 seconds
DB_POLL_INTERVAL_SECONDS = {
"hive": int(timedelta(seconds=5).total_seconds()),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't really define defaults in config.py. Imagine if I want to override the poll interval for "impala" and add the following to superset_config.py:

DB_POLL_INTERVAL_SECONDS = {"impala": 1}

In this case the default for "hive" in config.py would be lost. This is why I recommended not defining the defaults in config.py, and rather placing them in their respective db engine specs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thank you for your suggestion. I can let the user set it without setting the default value. I want to change it to this?
DB_POLL_INTERVAL_SECONDS = {}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add the type as that's the convention in config.py and mypy can't infer the type from an empty dict:

DB_POLL_INTERVAL_SECONDS: Dict[str, int] = {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to this way. It's true that mypy failed

Comment on lines 109 to 112
# customize the polling time of each engine. The default time is 5 seconds
DB_POLL_INTERVAL_SECONDS = {
"hive": int(timedelta(seconds=5).total_seconds()),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is needed, as it's not changing the defaults.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can restore the previous code without changing this file

time.sleep(current_app.config["HIVE_POLL_INTERVAL"])
if sleep_interval := current_app.config.get("HIVE_POLL_INTERVAL"):
logger.warning(
"HIVE_POLL_INTERVAL is deprecated and will be removed in 3.0. Please use DB_POLL_INTERVAL instead"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deprecation warning is incorrect:

Suggested change
"HIVE_POLL_INTERVAL is deprecated and will be removed in 3.0. Please use DB_POLL_INTERVAL instead"
"HIVE_POLL_INTERVAL is deprecated and will be removed in 3.0. Please use DB_POLL_INTERVAL_SECONDS instead"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I will check it carefully before submitting next time, and I will change it


class ImpalaEngineSpec(BaseEngineSpec):
"""Engine spec for Cloudera's Impala"""

engine = "impala"
engine_name = "Apache Impala"
# Query 5543ffdf692b7d02:f78a944000000000: 3% Complete (17 out of 547)
query_progress_r = re.compile(r".*Query.*: (?P<query_progress>[0-9]+)%.*")
Copy link
Member

@villebro villebro Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we really need the leading and trailing .* here?

Suggested change
query_progress_r = re.compile(r".*Query.*: (?P<query_progress>[0-9]+)%.*")
query_progress_r = re.compile(r"Query.*: (?P<query_progress>[0-9]+)%")

Also, I know this is in line with what's being done in hive.py, but I would consider moving this outside the class into a constant QUERY_PROGRESS_REGEX in impala.py, as it's not defined in BaseEngineSpec (defining it here makes it appear like we're overriding a class attribute in BaseEngineSpec). While at it, maybe do the same for stage_progress_r in HiveEngineSpec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comments. I have no problem changing the test here

image

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for all the iterations 👍

@villebro villebro merged commit 8bf6d80 into apache:master Jan 10, 2023
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 and removed 🚢 2.1.3 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stop button for queries doesn't work in SQL Lab when using SQL Lab with impala engine
3 participants