Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SQL Lab] Add commit to resolve query table lock #8262

Merged
merged 1 commit into from
Sep 23, 2019

Conversation

etr2460
Copy link
Member

@etr2460 etr2460 commented Sep 20, 2019

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

We've been trying to track down database row locking issues when trying to call the stop_query endpoint for several months now. After digging into this file, it seems like the session that we create to run sql queries has autoflush enabled. Therefore, we're locking the db on these rows until committing the changes to the query. We don't commit this until after the query finishes, so the row gets locked and we can't set the query to stopped. By committing before running the query, this seems to have resolved the issue for us.

TEST PLAN

CI + Deployed in our production environment

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

@graceguo-supercat @michellethomas @john-bodley @mistercrunch

Copy link

@graceguo-supercat graceguo-supercat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But want to get confirm from @mistercrunch or @villebro.

@villebro
Copy link
Member

Hmm. Tracing back it seems there is already a session.commit() going on before this in the calling method:
https://github.com/apache/incubator-superset/blob/731c19b6305ea865bff9ec469e0f77195da5258f/superset/sql_lab.py#L346-L350
It seems to me there might be something deeper going on here; I suppose getting logs that isolate the problem for this can be tricky, but can you provide any more details for when this usually happens? peak times/completely randomly/once per week/once an hour...?

@etr2460
Copy link
Member Author

etr2460 commented Sep 20, 2019

@villebro There is a commit prior to calling execute_sql_statement, but then the query object is potentially modified again prior to running the query (setting query.tmp_table_name, query.select_as_cta_used, query.limit, and query.executed_sql). Since autoflush is enabled, I believe this is being sent to the db and locking the row for updates, and the lock isn't removed until significantly later.

This happened about 100-200 times a day, sometimes more, and was in line with our general usage stats (which makes sense, people would see it more often if they were running queries more often). I was able to get logs showing that the specific row in the query table was locked for updating, and all instances of this error went away after deploying this change.

@villebro
Copy link
Member

Ah, right you are, my bad. Weird that this hasn't caused more problems over the years. To keep the code clean, perhaps we should remove the prior commit, I'm not sure it's needed after this. Also it might make sense to disable autoflush and do a single manual flush before execution to avoid unnenessary traffic. (thinking out loud, not sure this would work/help performance).

@etr2460
Copy link
Member Author

etr2460 commented Sep 20, 2019

I think we'd still want the previous commit in case an exception is raised in the execute function (like not allowing dml statements) but we still want to ensure the query is updated first

@etr2460
Copy link
Member Author

etr2460 commented Sep 20, 2019

As for the autoflush, I'm not sure why we didn't disable it before, perhaps @mistercrunch can provide some insight

@villebro
Copy link
Member

I reread the code carefully and brushed up on docs regarding flushing and committing, and I agree, this makes sense, i.e. committing away the inexpensive mutating (=locking) parts of the transaction prior to the expensive non-mutating select. While disabling autoflush mid-flight appears to be ok (https://github.com/sqlalchemy/sqlalchemy/wiki/DisableAutoflush), I wouldn't expect this to have a big impact on performance, so probably better to leave that out for now.

@etr2460 etr2460 merged commit 11935ce into apache:master Sep 23, 2019
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.35.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XS 🚢 0.35.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants