Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sql lab] improve table name detection in free form SQL #6793

Merged
merged 3 commits into from Feb 5, 2019

Conversation

mistercrunch
Copy link
Member

This addresses some special cases where subqueries as expressions would not be covered as well as other cases where a from clause would have a mix of identifiers (tables) and subqueries.

This code is fairly hard to reason about and SQL parsing is a huge bottomless can of worms. I researched solutions that would do this on our behalf reliably in Python but couldn't find anything. It could be good to refactor this logic out as a contribution to sqlparse or as its own package.

@codecov-io
Copy link

codecov-io commented Feb 1, 2019

Codecov Report

Merging #6793 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6793      +/-   ##
==========================================
- Coverage   56.03%   56.02%   -0.02%     
==========================================
  Files         527      527              
  Lines       23286    23279       -7     
  Branches     2788     2788              
==========================================
- Hits        13049    13042       -7     
  Misses       9827     9827              
  Partials      410      410
Impacted Files Coverage Δ
superset/sql_parse.py 99.15% <100%> (-0.05%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7b51ec...50078df. Read the comment docs.

Copy link
Member

@john-bodley john-bodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this LGTM. I've been working with sqlparse recently and agree that parsing SQL can be extremely complex. The only other approach I can think of is to pull out all the name tokens and compare them with the tables in the metastore however this could be quite expensive.


if (
item.ttype in Keyword and (
item.value.upper() in PRECEDES_TABLE_NAME or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use item.normalized (see here).

if isinstance(item, Identifier):
self.__process_identifier(item)
elif isinstance(item, IdentifierList):
for token in item.tokens:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mistercrunch mistercrunch merged commit 5a40f71 into apache:master Feb 5, 2019
graceguo-supercat pushed a commit to graceguo-supercat/superset that referenced this pull request Mar 18, 2019
* [sql lab] improve table name detection in free form SQL

* flake

* Addressing comments

(cherry picked from commit 5a40f71)
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.34.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.34.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants