New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sql lab] improve table name detection in free form SQL #6793
[sql lab] improve table name detection in free form SQL #6793
Conversation
23e6bc9
to
0e428c3
Compare
0e428c3
to
57b3d57
Compare
Codecov Report
@@ Coverage Diff @@
## master #6793 +/- ##
==========================================
- Coverage 56.03% 56.02% -0.02%
==========================================
Files 527 527
Lines 23286 23279 -7
Branches 2788 2788
==========================================
- Hits 13049 13042 -7
Misses 9827 9827
Partials 410 410
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this LGTM. I've been working with sqlparse
recently and agree that parsing SQL can be extremely complex. The only other approach I can think of is to pull out all the name tokens and compare them with the tables in the metastore however this could be quite expensive.
superset/sql_parse.py
Outdated
|
||
if ( | ||
item.ttype in Keyword and ( | ||
item.value.upper() in PRECEDES_TABLE_NAME or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use item.normalized
(see here).
superset/sql_parse.py
Outdated
if isinstance(item, Identifier): | ||
self.__process_identifier(item) | ||
elif isinstance(item, IdentifierList): | ||
for token in item.tokens: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would IdentifierList.get_identifiers() help here?
e63d3a8
to
50078df
Compare
* [sql lab] improve table name detection in free form SQL * flake * Addressing comments (cherry picked from commit 5a40f71)
This addresses some special cases where subqueries as expressions would not be covered as well as other cases where a from clause would have a mix of identifiers (tables) and subqueries.
This code is fairly hard to reason about and SQL parsing is a huge bottomless can of worms. I researched solutions that would do this on our behalf reliably in Python but couldn't find anything. It could be good to refactor this logic out as a contribution to
sqlparse
or as its own package.