New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(mssql): reverts #9644 and displays a better error msg #9752
Conversation
This is really a can of worms, and the deeper we go the more |
The problem is not as much And oh wow! I didn't know we were altering the SQL (as opposed to just wrapping it). That's pretty scary. Whenever parsing or alerting SQL is part of the solution, it's pretty scary. I'd advocate for even rolling back the logic that's in there currently (from before this PR). Also if this logic should be anywhere, it should be scoped to MSSQL in Few other options:
Personally I think 1 is best. It has the best perf guarantees (gives the clearest declarative insight to the db optimizer) and requires limited query alteration. When we moved forward with this a while ago I remember we looked at how other tools did this but forgot the details, maybe @bkyryliuk remembers? |
thks for the ideas @mistercrunch, we had/have a couple of problems with MSSQL on SQLLab. Given that, maybe 2 is a good approach, or a mix of 1 and 2. May be wrong here, but still think that altering SQL statements is dangerous/difficult and kind of out of scope for superset. Another option could be: not forcing Note: This is scoped to MSSQL, because it was an obvious risky change |
yeah we've looked into alternatives quite some time ago and sqlparse was the only feasible option at a time. There are DB specific parsers, but they often implemented in different languages e.g. java + antlr for hive and presto as I remember and using them as was mentioned will be opening a can of warms. Explain query can provide some useful information as well, however I am not sure about MS SQL specifics there. I'm curious what is a sqllachemy solution if it supports query construction for MSSQL |
@bkyryliuk we have a simple query construction here: https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs/base.py#L397 and it's being used for MSSQL. Is this something similar with what you're looking for? @mistercrunch also mentioned that using a My plan is, if feasible/simple change |
I like the approach, and eventually it would be nice to have that logic in the db specific engines. Another potential alternative could be to catch an error and make it more user friendly e.g. stating that all columns should be aliased. It would make user experience slightly worse in the beginning, but over time should not be an issue and may be slightly easier to implement. |
For reference, here's the FORCE_LIMIT logic: FORCE_TOP wouldn't be that different. It's editing the SQL, but it's pretty safe (not as bad as squeezing aliases into subqueries...) |
Codecov Report
@@ Coverage Diff @@
## master #9752 +/- ##
==========================================
- Coverage 70.79% 70.51% -0.29%
==========================================
Files 586 587 +1
Lines 30445 30446 +1
Branches 3121 3121
==========================================
- Hits 21555 21469 -86
- Misses 8776 8856 +80
- Partials 114 121 +7
Continue to review full report at Codecov.
|
Yes I know and totally agree @mistercrunch bad initial approach on my side. @bkyryliuk catching the error may be a good approach, could turn out to be difficult to pin the engine error, specific to non aliased functions, on the other hand, it's also a bit hard to parse the query and search for non aliased functions. So I may give it a try |
Updated this PR and description with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor suggestions
Co-authored-by: Ville Brofeldt <33317356+villebro@users.noreply.github.com>
Co-authored-by: Ville Brofeldt <33317356+villebro@users.noreply.github.com>
Co-authored-by: Ville Brofeldt <33317356+villebro@users.noreply.github.com>
CATEGORY
SUMMARY
On SQLLAb executing SQL statements with
CAST
andAT
fail. This is a partial fix, since this seems to open up a bigger problemsqlparse
does dot correctly identify columns withCAST(FOO_COL AS TYPEX) AT TIME ZONE 'Eastern Standard Time'
for example, and just addsCAST(FOO_OLD AS TYPEX)
on the IdentifierList.This is a partial fix, since it will discard
CAST
from the alias setting, so that we don't double alias it. Yet it means that allCAST
statements need to have an Alias.I think we should consider removing our forced
TOP
from MSSQL engines and maybe others that useTOP
, too much complexity for the value it adds.PS: Following comments I'm reverting all logic behind squeezing aliases on functions for MSSQL. This implements a much simpler approach by catching MSSQL specific error for non aliased functions. This error is easy to catch since we are wrapping a TOP on the user's statement, otherwise we get an error from superset saying
no field of name
Examples:
ADDITIONAL INFORMATION
REVIEWERS