Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: return query if it already exists #15207

Merged
merged 2 commits into from Jun 22, 2021

Conversation

eschutho
Copy link
Member

SUMMARY

We were seeing some duplicate client_id errors when running queries. Here's the root cause:

When we have a timeout on a query, we return a 408 , and when most browsers get a 408 response, they automatically retry the request. The issue with the query POST request is that the db request happens after we create the query, so if the db request times out, then the retry fails because the request is not idempotent, due to the fact that the client_id is created on the client and has a uniqueness constraint in the db.

My solution here is to check to see if the query exists and if it is in a pending or timed out state, don't resave it, but rather just return the original query.

I had thought of a few different options for this issue, and here are the reasonings why I didn't go with other solutions:

  1. change client_id to query.id and generate it server side. The reason that we have the id generated client side and stored in the db is so that we can stop the query before the original response has returned.
  2. return something other than a 408 when the db has timed out. If the db timeout is long, it's possible that the server will time out before the db and we'll likely still be left with the same issue.
  3. run the db request before saving the query. Because the same endpoint is used with async query requests, the refactor would be much more prone to errors.

Some concerns/questions:
If a query is new, and happens to match an existing query that is running, (unlikely but possible), we don't want to return someone else's query. There's a check on this lookup for user_id, but that can be null. There's also a lookup on sql_editor_id. My assumption is that with the combination of these three factors, we should with certainty only get the request that we are intending, but I'd welcome another set of eyes on that. The last option is to check the sql string, but the comparison on what could potentially be a long string may be costly.

In my testing, I was never able to reproduce a server timeout response, only a netstat error, and the query continued to have a status of "running" which means that it will continue to be polled for. You have to explicitly delete the query if you navigate away from the page without stopping the query. I didn't change that behavior, but it's something that we may want to investigate.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

No visual changes

TESTING INSTRUCTIONS

If you are testing locally and can put a debug statement somewhere in the sql_json POST request you can replicate a timeout. After about 2 minutes, you should see another sql_json POST request in your browser network tab, and it should return a 200 with the original query, and not raise an error.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

client_id=client_id, user_id=g.user.get_id(), sql_editor_id=tab_state_id
client_id=client_id,
user_id=g.user.get_id(),
sql_editor_id=str(tab_state_id),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting an error when trying to delete the query b/c it was passing in an int here.

@eschutho eschutho force-pushed the elizabeth/fix-db-timeout-error branch from 567d6b6 to 65aa038 Compare June 17, 2021 00:46
Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good... such a tricky bug! I wasn't familiar with 408.

@codecov
Copy link

codecov bot commented Jun 17, 2021

Codecov Report

Merging #15207 (1231f73) into master (70afa08) will decrease coverage by 0.15%.
The diff coverage is 71.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15207      +/-   ##
==========================================
- Coverage   77.09%   76.93%   -0.16%     
==========================================
  Files         971      971              
  Lines       50236    50242       +6     
  Branches     6494     6494              
==========================================
- Hits        38729    38656      -73     
- Misses      11302    11381      +79     
  Partials      205      205              
Flag Coverage Δ
hive ?
mysql 81.69% <71.42%> (-0.01%) ⬇️
postgres 81.71% <71.42%> (-0.01%) ⬇️
python 81.80% <71.42%> (-0.30%) ⬇️
sqlite 81.34% <71.42%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/views/sql_lab.py 60.68% <ø> (ø)
superset/views/core.py 75.42% <71.42%> (-0.04%) ⬇️
superset/db_engines/hive.py 0.00% <0.00%> (-82.15%) ⬇️
superset/db_engine_specs/hive.py 69.20% <0.00%> (-17.21%) ⬇️
superset/views/database/mixins.py 81.03% <0.00%> (-1.73%) ⬇️
superset/db_engine_specs/presto.py 83.36% <0.00%> (-1.06%) ⬇️
superset/db_engine_specs/base.py 87.97% <0.00%> (-0.41%) ⬇️
superset/connectors/sqla/models.py 88.15% <0.00%> (-0.24%) ⬇️
superset/utils/core.py 88.93% <0.00%> (-0.13%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 70afa08...1231f73. Read the comment docs.

@eschutho
Copy link
Member Author

/testenv up

@github-actions
Copy link
Contributor

@eschutho Ephemeral environment spinning up at http://50.112.146.179:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

@eschutho
Copy link
Member Author

did a visual test on ephemeral env and it looked good.

@eschutho eschutho merged commit 58cc78d into apache:master Jun 22, 2021
@eschutho eschutho deleted the elizabeth/fix-db-timeout-error branch June 22, 2021 20:57
@github-actions
Copy link
Contributor

Ephemeral environment shutdown and build artifacts deleted.

@henryyeh
Copy link
Contributor

🏷️ 2021.25

henryyeh pushed a commit to preset-io/superset that referenced this pull request Jun 23, 2021
* check if query exists before saving a new one

* fix test

(cherry picked from commit 58cc78d)
henryyeh pushed a commit to preset-io/superset that referenced this pull request Jun 23, 2021
* check if query exists before saving a new one

* fix test

(cherry picked from commit 58cc78d)
cccs-RyanS pushed a commit to CybercentreCanada/superset that referenced this pull request Dec 17, 2021
* check if query exists before saving a new one

* fix test
QAlexBall pushed a commit to QAlexBall/superset that referenced this pull request Dec 29, 2021
* check if query exists before saving a new one

* fix test
cccs-rc pushed a commit to CybercentreCanada/superset that referenced this pull request Mar 6, 2024
* check if query exists before saving a new one

* fix test
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.3.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels preset:2021.23 preset:2021.25 preset-io size/M 🚢 1.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants