Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sqllab] Cancel database query on stop #7611

Closed
wants to merge 1 commit into from

Conversation

smacker
Copy link
Contributor

@smacker smacker commented May 29, 2019

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

Sorry for submitting PR without creating an issue first. It doesn't look like a big change but I can convert the issue and proposal into SIP if needed.

The issue

SQL Lab has Stop button but it only updates state of superset without actually stopping the database query.
In some cases complex query can run in background for days and affect database performance.
The issue especially affect us as we use Superset with gitbase which speeks MySQL protocol.

This commit solves the problem as following:

  1. Add new key cancel_payload into extra attribute of the Query instance.
  2. Add get_cancel_query_payload method to EngineSpec that returns some
    payload from the connection which allows to cancel it later.
  3. execute_sql_statements before executing queries retrieves and saves
    cancel payload into Query
  4. Stop handler calls new cancel_query method of EngineSpec

Both new methods are implemented only for MySQL for now but it is easy
to add support for PostgreSQL, MSSQL and maybe some other DBs.

For MySQL I use KILL CONNECTION to terminate multiple statements.

Alternative approach:

Maintain list of open connections and close them on cancel.
This solutions looks cleaner but has some disadvantages:

  • would require big changes in the codebase
  • implementation would get very tricky in distributed mode
  • some drivers can keep query running even after close() call.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TEST PLAN

Run complex log-running query with MySQL using SQL lab. Press stop. The query should be killed.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

@codecov-io
Copy link

codecov-io commented May 29, 2019

Codecov Report

Merging #7611 into master will decrease coverage by 7.89%.
The diff coverage is 51.51%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #7611     +/-   ##
=========================================
- Coverage   73.75%   65.85%   -7.9%     
=========================================
  Files         115      459    +344     
  Lines       12059    22019   +9960     
  Branches        0     2415   +2415     
=========================================
+ Hits         8894    14501   +5607     
- Misses       3165     7397   +4232     
- Partials        0      121    +121
Impacted Files Coverage Δ
superset/views/core.py 73.02% <0%> (+1.56%) ⬆️
superset/models/sql_lab.py 94.8% <100%> (+0.06%) ⬆️
superset/sql_lab.py 73.3% <40%> (-4.17%) ⬇️
superset/db_engine_specs/mysql.py 90% <71.42%> (-2.16%) ⬇️
superset/db_engine_specs/base.py 87.76% <75%> (-1.64%) ⬇️
superset/db_engine_specs/impala.py 50% <0%> (-15%) ⬇️
superset/db_engine_specs/athena.py 46.66% <0%> (-11.23%) ⬇️
superset/db_engine_specs/drill.py 45% <0%> (-9.17%) ⬇️
superset/db_engine_specs/kylin.py 45.45% <0%> (-8.4%) ⬇️
superset/db_engine_specs/bigquery.py 44.82% <0%> (-7.42%) ⬇️
... and 447 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f17ba7...085fff0. Read the comment docs.

@smacker
Copy link
Contributor Author

smacker commented Jun 28, 2019

rebased on master, fixed an issue for sync mode

@smacker
Copy link
Contributor Author

smacker commented Jun 28, 2019

@mistercrunch any opinion how to proceed with this? Should I convert it into SIP? Or is it good as a PR? Maybe some changes are required (for example tests for which I would love some advice because I couldn't find tests for similar functionality)

@stale
Copy link

stale bot commented Aug 27, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label Aug 27, 2019
@mistercrunch
Copy link
Member

Superseeded by #8097 (?)

@stale stale bot removed the inactive Inactive for >= 30 days label Aug 27, 2019
@smacker
Copy link
Contributor Author

smacker commented Aug 28, 2019

@mistercrunch no. #8097 stops executing list of queries if one of them was stopped.

This PR stops (kills) underlying connection to DB (no matter 1 or more queries are there).
In async mode in SQLLab queries are executed in celery and stop button doesn't really stop them, the task is running until it gets some results from DB. (which may take days)

If you are interested in the fix, I can re-work it using extra_json instead of a new field in DB. It will make easier to add support for other backends. (for example for Spark just id isn't enough but special object should be serialized: src-d/sourced-ui#267)

@mistercrunch
Copy link
Member

It'd be great to enable the true killing of queries, sorry I missed this back in May...

@smacker
Copy link
Contributor Author

smacker commented Aug 30, 2019

Cool. Thank you. I'll re-work this PR in the following days to avoid DB migration and make it more flexible.

SQL Lab has Stop button but it only updates state of superset without
actually stopping the database query.

Complex query can run in background for long time and affect database
performance.

This commits solves the problem as following:

1. Add new key `cancel_payload` into extra attribute of the Query instance.
2. Add `get_cancel_query_payload` method to EngineSpec that returns some
payload from the connection which allows to cancel it later.
3. `execute_sql_statements` before executing queries retrieves and saves
cancel payload into Query
4. Stop handler calls new `cancel_query` method of EngineSpec

Both new methods are implemented only for MySQL for now but it is easy
to add support for PostgreSQL, MSSQL and maybe some other DBs.

For MySQL I use `KILL CONNECTION` to terminate multiple statements.
@smacker
Copy link
Contributor Author

smacker commented Aug 30, 2019

@mistercrunch I updated the code. It's simpler and more flexible now.

@stale
Copy link

stale bot commented Oct 29, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label Oct 29, 2019
@stale stale bot closed this Nov 5, 2019
@koszti koszti mentioned this pull request Jun 27, 2021
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inactive Inactive for >= 30 days size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants