Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(mssql): reverts #9644 and displays a better error msg #9752

Merged
merged 12 commits into from May 14, 2020

Conversation

dpgaspar
Copy link
Member

@dpgaspar dpgaspar commented May 6, 2020

CATEGORY

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

On SQLLAb executing SQL statements with CAST and AT fail. This is a partial fix, since this seems to open up a bigger problem sqlparse does dot correctly identify columns with CAST(FOO_COL AS TYPEX) AT TIME ZONE 'Eastern Standard Time' for example, and just adds CAST(FOO_OLD AS TYPEX) on the IdentifierList.

This is a partial fix, since it will discard CAST from the alias setting, so that we don't double alias it. Yet it means that all CAST statements need to have an Alias.

I think we should consider removing our forced TOP from MSSQL engines and maybe others that use TOP, too much complexity for the value it adds.

PS: Following comments I'm reverting all logic behind squeezing aliases on functions for MSSQL. This implements a much simpler approach by catching MSSQL specific error for non aliased functions. This error is easy to catch since we are wrapping a TOP on the user's statement, otherwise we get an error from superset saying no field of name

Examples:
Screenshot 2020-05-12 at 13 08 02

Screenshot 2020-05-12 at 13 26 58

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

@dpgaspar dpgaspar requested a review from villebro May 6, 2020 15:12
@dpgaspar dpgaspar marked this pull request as ready for review May 6, 2020 15:12
@villebro
Copy link
Member

villebro commented May 6, 2020

This is really a can of worms, and the deeper we go the more sqlparse-like logic we'll be introducing into Superset with unpleasant maintenance burden. I would almost propose deprecating advanced limiting for TOP-syntax dialects.

@mistercrunch
Copy link
Member

The problem is not as much TOP as it is the requirement for alias in subquery.

And oh wow! I didn't know we were altering the SQL (as opposed to just wrapping it). That's pretty scary. Whenever parsing or alerting SQL is part of the solution, it's pretty scary. I'd advocate for even rolling back the logic that's in there currently (from before this PR).

Also if this logic should be anywhere, it should be scoped to MSSQL in db_engine_spec module

Few other options:

  1. force a TOP clause (similar to LimitMethod.FORCE_LIMIT) it is altering the SQL, but somewhat less risky than squeezing aliases in. Some optimizers will do better with that than with the LimitMethod.WRAP_SQL approach.
  2. surface a good error message to user "please alias all of your columns" and move forward with LimitMethod.WRAP_SQL
  3. use limit_method = LimitMethod.FETCH_MANY, this has bad perf implications (usually means a larger "server side cursor" than necessary)
  4. long shot: look for a way to let the server know ahead of time, some sort of cursor hint or parameter. That's clearly out of the DBAPI specification but it may exist

Personally I think 1 is best. It has the best perf guarantees (gives the clearest declarative insight to the db optimizer) and requires limited query alteration. When we moved forward with this a while ago I remember we looked at how other tools did this but forgot the details, maybe @bkyryliuk remembers?

@dpgaspar
Copy link
Member Author

dpgaspar commented May 6, 2020

thks for the ideas @mistercrunch, we had/have a couple of problems with MSSQL on SQLLab.
1 - TOP wrapping was not working properly
2 - Even without TOP wrapping, functions on MSSQL like COUNT, SUM etc are set has <anonymous> and these are not supported by superset (pandas dataframes?)

Given that, maybe 2 is a good approach, or a mix of 1 and 2. May be wrong here, but still think that altering SQL statements is dangerous/difficult and kind of out of scope for superset. Another option could be: not forcing TOP on some engines but refusing to execute statements that don't have them?

Note: This is scoped to MSSQL, because it was an obvious risky change

@mistercrunch mistercrunch added !deprecated-label:bug Deprecated label - Use #bug instead .database labels May 6, 2020
@bkyryliuk
Copy link
Member

yeah we've looked into alternatives quite some time ago and sqlparse was the only feasible option at a time. There are DB specific parsers, but they often implemented in different languages e.g. java + antlr for hive and presto as I remember and using them as was mentioned will be opening a can of warms. Explain query can provide some useful information as well, however I am not sure about MS SQL specifics there. I'm curious what is a sqllachemy solution if it supports query construction for MSSQL

@dpgaspar
Copy link
Member Author

dpgaspar commented May 7, 2020

@bkyryliuk we have a simple query construction here: https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs/base.py#L397 and it's being used for MSSQL. Is this something similar with what you're looking for?

@mistercrunch also mentioned that using a WRAP_SQL is inefficient on some engines.

My plan is, if feasible/simple change WRAP_SQL to something like FORCE_TOP and doing the best to identify non aliased functions and warn the user about them. This last one could be tricky for statements like: SELECT COL1, COL1, CAST(COL3 AS DATETIME) AT TIME ZONE "SOME_TZ", COL4 because sqlparse seems to get lost on AT and does not identify COL4 has a column.

@bkyryliuk
Copy link
Member

I like the approach, and eventually it would be nice to have that logic in the db specific engines.
I briefly looked into explain and it doesn't look promising, https://docs.microsoft.com/en-us/sql/t-sql/queries/explain-transact-sql?view=azure-sqldw-latest

Another potential alternative could be to catch an error and make it more user friendly e.g. stating that all columns should be aliased. It would make user experience slightly worse in the beginning, but over time should not be an issue and may be slightly easier to implement.

@mistercrunch
Copy link
Member

For reference, here's the FORCE_LIMIT logic:
https://github.com/apache/incubator-superset/blob/13c5b133a9d054b07dff5bbcff987551e3c4c42e/superset/sql_parse.py#L255-L286

FORCE_TOP wouldn't be that different. It's editing the SQL, but it's pretty safe (not as bad as squeezing aliases into subqueries...)

@codecov-io
Copy link

codecov-io commented May 7, 2020

Codecov Report

Merging #9752 into master will decrease coverage by 0.28%.
The diff coverage is 46.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9752      +/-   ##
==========================================
- Coverage   70.79%   70.51%   -0.29%     
==========================================
  Files         586      587       +1     
  Lines       30445    30446       +1     
  Branches     3121     3121              
==========================================
- Hits        21555    21469      -86     
- Misses       8776     8856      +80     
- Partials      114      121       +7     
Flag Coverage Δ
#cypress 52.84% <ø> (-0.80%) ⬇️
#javascript 59.00% <ø> (-0.06%) ⬇️
#python 70.85% <46.66%> (-0.08%) ⬇️
Impacted Files Coverage Δ
superset/sql_parse.py 99.29% <ø> (-0.09%) ⬇️
superset/db_engine_specs/mssql.py 78.72% <12.50%> (-14.30%) ⬇️
superset/utils/core.py 89.41% <85.71%> (ø)
...et-frontend/src/SqlLab/reducers/getInitialState.js 33.33% <0.00%> (-16.67%) ⬇️
superset-frontend/src/reduxUtils.js 76.47% <0.00%> (-10.30%) ⬇️
...rontend/src/SqlLab/components/TabbedSqlEditors.jsx 75.52% <0.00%> (-6.30%) ⬇️
superset-frontend/src/SqlLab/actions/sqlLab.js 61.13% <0.00%> (-5.68%) ⬇️
...rontend/src/SqlLab/components/SqlEditorLeftBar.jsx 44.00% <0.00%> (-4.00%) ⬇️
superset-frontend/src/SqlLab/reducers/sqlLab.js 37.44% <0.00%> (-3.30%) ⬇️
...end/src/SqlLab/components/TemplateParamsEditor.jsx 88.57% <0.00%> (-2.86%) ⬇️
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80f8349...cc5c0fc. Read the comment docs.

@dpgaspar
Copy link
Member Author

dpgaspar commented May 7, 2020

Yes I know and totally agree @mistercrunch bad initial approach on my side.

@bkyryliuk catching the error may be a good approach, could turn out to be difficult to pin the engine error, specific to non aliased functions, on the other hand, it's also a bit hard to parse the query and search for non aliased functions. So I may give it a try

@pull-request-size pull-request-size bot added size/L and removed size/M labels May 11, 2020
@dpgaspar
Copy link
Member Author

dpgaspar commented May 11, 2020

Updated this PR and description with:
"""
Following comments I'm reverting all logic behind squeezing aliases on functions for MSSQL. This implements a much simpler approach by catching MSSQL specific error for non aliased functions. This error is easy to catch since we are wrapping a TOP on the user's statement, otherwise we get an error from superset saying no field of name
"""

@bkyryliuk @mistercrunch @villebro

@dpgaspar dpgaspar changed the title fix(mssql): CAST with AT produces statements with syntax errors fix(mssql): reverts #9644 and displays a better error msg May 12, 2020
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor suggestions

superset/db_engine_specs/mssql.py Outdated Show resolved Hide resolved
superset/db_engine_specs/mssql.py Outdated Show resolved Hide resolved
tests/db_engine_specs/mssql_tests.py Outdated Show resolved Hide resolved
dpgaspar and others added 3 commits May 14, 2020 15:22
Co-authored-by: Ville Brofeldt <33317356+villebro@users.noreply.github.com>
Co-authored-by: Ville Brofeldt <33317356+villebro@users.noreply.github.com>
Co-authored-by: Ville Brofeldt <33317356+villebro@users.noreply.github.com>
@dpgaspar dpgaspar merged commit 3cc5400 into apache:master May 14, 2020
@dpgaspar dpgaspar deleted the fix/mssql-sqllab-cast branch May 14, 2020 16:00
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.37.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels !deprecated-label:bug Deprecated label - Use #bug instead size/L 🚢 0.37.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants