Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sqllab] fix exception caused by casting string to int with psycopg2 #9287

Merged
merged 5 commits into from Mar 16, 2020

Conversation

nytai
Copy link
Member

@nytai nytai commented Mar 11, 2020

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

There's an issue with running the feature SQLLAB_BACKEND_PERSISTENCE whole using postgres as the metadata db.
For the expression CAST('JUScSB3M_' as INTEGER) MySQL and SQLite produce 0, postgres raises an exception: sqlalchemy.exc.DataError: (psycopg2.errors.InvalidTextRepresentation) invalid input syntax for integer: "JUScSB3M_".

This causes an unhandled exception at this line: https://github.com/apache/incubator-superset/pull/8769/files#r384774308

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TEST PLAN

error can be reproduced by:

  • run superset with pg as meta db
  • create a few tabs in sqllab and run a few queries in each tab
  • enable feature flag SQLLAB_BACKEND_PERSISTENCE
  • reload/visit sqllab
  • notice a 500 from the server due to postgres raising an exception trying to cast a string to an int

With the changes in this PR the above exception no longer occurs.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

@betodealmeida @robdiciuccio @villebro @craig-rueda @robdiciuccio

@villebro
Copy link
Member

I haven't been following the specifics of the SQL Lab backend persistence work, so I might have missed something crucial. What I don't understand is why are we sometimes casting something that's a string/hash into an integer?

@betodealmeida
Copy link
Member

@villebro the migration of the SQL Lab state from localStorage to the backend is not atomic, so for a while the IDs might be a mix of strings (from localStorage) and ints (from the backend).

We could've used strings for IDs in the backend as well, but since I was expecting the lifetime of the feature flag to be small and that people would adopt it quickly, I opted for using ints and handling the corner cases.

@codecov-io
Copy link

Codecov Report

Merging #9287 into master will not change coverage by %.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #9287   +/-   ##
=======================================
  Coverage   58.90%   58.90%           
=======================================
  Files         373      373           
  Lines       12026    12026           
  Branches     2953     2953           
=======================================
  Hits         7084     7084           
  Misses       4763     4763           
  Partials      179      179           
Impacted Files Coverage Δ
...erset-frontend/src/datasource/DatasourceEditor.jsx 61.25% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d8fea64...40c2938. Read the comment docs.

@nytai
Copy link
Member Author

nytai commented Mar 12, 2020

@villebro part of the tabs migration process, which happens once the user visits sqllab when this feature is enabled, involves using the tab id(int) cast to a string as the sql_editor_id. Once all tabs are migrated over the type will be consistent (numeric strings). The before migration and in between state is where the error occurs (or doesn't in the case of MySQL/SQLite/others probably).

@villebro
Copy link
Member

@betodealmeida @nytai my concern with the current exception handling is that is looks like it simply fails the process if a string can't be cast to an int. How do we ensure that affected queries don't get lost in the process? Sorry if I'm missing something obvious.

@betodealmeida
Copy link
Member

Yeah, @villebro is right.

I think the problem here is that we might have leftover queries in the queries table that are associated with tabs that were deleted. These will never be migrated to the DB, since the tabs were removed, and they will have a sql_editor_id value that is a string.

We should filter out these queries, returning only queries that have a sql_editor_id that is an int and is in tab_state_ids.

user_queries = (
db.session.query(Query)
.filter_by(user_id=user_id)
.filter(Query.sql_editor_id.cast(Integer).in_(tab_state_ids))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nytai what if you do this:

tab_state_ids = [str(tab_state[0]) for tab_state in tabs_state]
...
user_queries = (
    db.session.query(Query)
    .filter_by(user_id=user_id)
    .filter(Query.sql_editor_id.in_(tab_state_ids))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, that definitely works! Earlier I was trying to figure out how to craft some regex to filter out rows where sql_editor_id is not numeric... casting the ids to strings makes more sense 😄

@nytai
Copy link
Member Author

nytai commented Mar 12, 2020

@villebro good catch, we would have always been wiping user's queries with the exception handling approach.

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome sauce!

@mistercrunch mistercrunch merged commit 8764ae3 into apache:master Mar 16, 2020
@mistercrunch mistercrunch deleted the tai/sqllab-backend-fix branch March 16, 2020 17:52
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.36.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/S 🚢 0.36.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants