Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Missing sql_editor_id index #27392

Merged

Conversation

justinpark
Copy link
Member

@justinpark justinpark commented Mar 4, 2024

SUMMARY

As titled, it adds the index for sql_editor_id in query table which seems to have caused a slowdown in the sqllab_bootstrap query.

TESTING INSTRUCTIONS

locally test superset db upgrade

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@justinpark justinpark requested a review from a team as a code owner March 4, 2024 22:57
@github-actions github-actions bot added the risk:db-migration PRs that require a DB migration label Mar 4, 2024
@john-bodley
Copy link
Member

@justinpark you'll also have to:

  1. Update the model to reflect the parity between how the SQLAlchemy model is defined and the underlying schema.
  2. Add a line item to UPDATING.md*.

Creating a new index to the query could be very expensive (given the size of the table) and may require downtime.


def upgrade():
op.create_index(
op.f("ix_query_sql_editor_id"), "query", ["sql_editor_id"], unique=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this index be on the (user_id, sql_editor_id) tuple instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-bodley We have agreed to use a single key index rather than these composite keys. Could you please confirm the agreement here?

Copy link

codecov bot commented Mar 4, 2024

Codecov Report

Attention: Patch coverage is 37.50000% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 69.71%. Comparing base (721977a) to head (a6ab95f).
Report is 9 commits behind head on master.

Files Patch % Lines
superset/migrations/shared/utils.py 28.57% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #27392      +/-   ##
==========================================
+ Coverage   67.34%   69.71%   +2.37%     
==========================================
  Files        1909     1909              
  Lines       74592    74616      +24     
  Branches     8320     8320              
==========================================
+ Hits        50235    52022    +1787     
+ Misses      22305    20542    -1763     
  Partials     2052     2052              
Flag Coverage Δ
hive 53.74% <37.50%> (?)
mysql 77.98% <37.50%> (+<0.01%) ⬆️
postgres 78.08% <37.50%> (+<0.01%) ⬆️
presto 53.69% <37.50%> (?)
python 83.13% <37.50%> (+4.91%) ⬆️
sqlite 77.60% <37.50%> (+<0.01%) ⬆️
unit 56.53% <37.50%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mistercrunch
Copy link
Member

From experience that table can get massive and depending on your SQL engine, and creating an index can be a locking operation, which can cascade into deadlock territory on this particular table (I think there's still a pretty crazy amount of polling happening in SQL Lab to look at query state). If you're using MySQL I'd treat this migration very very sensitively. I think there are ways to create index in a non-locking way in MySQL, but have to go out of your way to do it (ALGORITHM=INPLACE, LOCK=NONE;). Postgres should do better.

Maybe not a bad time to archive the query table table in your environment ... For most shops I think 90 days worth of query history is enough to go by, not a bad thing to schedule something that deletes 90+ day old queries daily... Note that changed_on is also indexed to help and support and efficient DELETE operation there.

@mistercrunch mistercrunch added the risk:may-require-downtime Upgrading to this feature may require downtime label Mar 5, 2024
@@ -86,7 +86,7 @@ class Query(
user_id = Column(Integer, ForeignKey("ab_user.id"), nullable=True)
status = Column(String(16), default=QueryStatus.PENDING)
tab_name = Column(String(256))
sql_editor_id = Column(String(256))
sql_editor_id = Column(String(256), index=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that it's now a index on two columns you'll have to use the Index function.

@mistercrunch
Copy link
Member

About my earlier comment, the issue may be significant enough that you'd want to write a custom migration for MySQL for your own sake at Airbnb, you can probably go direct DDL CREATE INDEX with the proper flags just for MySQL. I remember fighting hard on some simple migration on that specific table. Connections / locks kept coming, had to silence all k8s deployments, kill MySQL sessions (which kept coming back with polling) and ultimately wait for things to get through. (ALGORITHM=INPLACE, LOCK=NONE;) may allow you to not schedule or incur downtime.

@justinpark justinpark force-pushed the fix--query-sql-editor-id-missing-index branch from e249bed to fcf7a84 Compare March 7, 2024 00:21
UPDATING.md Outdated Show resolved Hide resolved


def upgrade():
op.create_index(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we skip the upgrade if the index already exists?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit to check for index existence.

@justinpark justinpark force-pushed the fix--query-sql-editor-id-missing-index branch from a6ab95f to f91ce6d Compare April 8, 2024 21:16
@justinpark
Copy link
Member Author

@michael-s-molina @john-bodley PTAL

@justinpark justinpark force-pushed the fix--query-sql-editor-id-missing-index branch 2 times, most recently from 5a38b9a to 8a2c84e Compare April 15, 2024 16:39
@justinpark justinpark force-pushed the fix--query-sql-editor-id-missing-index branch from 8a2c84e to 0dfc810 Compare April 22, 2024 17:47
@justinpark justinpark force-pushed the fix--query-sql-editor-id-missing-index branch from 0dfc810 to 1d19118 Compare May 2, 2024 20:42
@codecov-commenter
Copy link

codecov-commenter commented May 2, 2024

Codecov Report

Attention: Patch coverage is 28.57143% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 69.98%. Comparing base (2e5f3ed) to head (3e2c382).
Report is 28 commits behind head on master.

Files Patch % Lines
superset/migrations/shared/utils.py 16.66% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #27392      +/-   ##
==========================================
+ Coverage   60.49%   69.98%   +9.49%     
==========================================
  Files        1931     1933       +2     
  Lines       76241    76498     +257     
  Branches     8566     8566              
==========================================
+ Hits        46122    53539    +7417     
+ Misses      28015    20855    -7160     
  Partials     2104     2104              
Flag Coverage Δ
hive 49.21% <28.57%> (+0.04%) ⬆️
mysql 77.56% <28.57%> (?)
postgres 77.66% <28.57%> (?)
presto ?
python 83.14% <28.57%> (+19.66%) ⬆️
sqlite 77.13% <28.57%> (?)
unit 57.71% <28.57%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@justinpark justinpark merged commit 2a7bfa4 into apache:master May 6, 2024
28 checks passed
imancrsrk pushed a commit to imancrsrk/superset that referenced this pull request May 10, 2024
Co-authored-by: Michael S. Molina <michael.s.molina@gmail.com>
Co-authored-by: Michael S. Molina <70410625+michael-s-molina@users.noreply.github.com>
jzhao62 pushed a commit to jzhao62/superset that referenced this pull request May 16, 2024
Co-authored-by: Michael S. Molina <michael.s.molina@gmail.com>
Co-authored-by: Michael S. Molina <70410625+michael-s-molina@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
risk:db-migration PRs that require a DB migration risk:may-require-downtime Upgrading to this feature may require downtime size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants