Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve SQL row limiting and metadata operations #35

Merged
merged 3 commits into from
Mar 12, 2024

Conversation

hab6
Copy link
Contributor

@hab6 hab6 commented Mar 4, 2024

Overview

These code changes are an attempt to improve some operations in Apache Superset SQL Lab when using the Ingres dialect for SQLAlchemy.

Related internal ticket II-13536

Details

When using the SQLAlchemy-Ingres dialect in Apache Superset SQL Lab, there are a couple of unrelated problems that occur, both when choosing a table from the SEE TABLE SCHEMA drop list which attempts to execute a SQL SELECT statement to retrieve rows from the target table.

Problems

  1. The code in IngresDialect::get_columns calls IngresDialect::denormalize_name, which converts the given string to all lowercase. If the table name is mixed (or upper) case, the SQL statement to retrieve column names returns no rows, which causes Superset to build a syntactically invalid SELECT statement that specifies no columns. It is important to note that this problem could also manifest outside of Apache Superset given the right conditions.
  2. The IngresDialect::limit_clause is called which returns a row limiting clause using FIRST FETCH n ROWS ONLY after which Superset also appends LIMIT n, causing a syntactically incorrect SQL statement having two row limiting clauses. Superset appears to not have the ability to recognize the FETCH FIRST ... clause so that it could handle it properly and avoid adding the LIMIT clause.

Fixes / Workarounds

  1. The simple fix is to remove the lowercase conversion, leaving the table name as is.
  2. The fix to method IngresDialect::limit_clause gives precedence to the LIMIT n clause and only uses FETCH FIRST n ROWS ONLY if there is also an OFFSET m clause. When the row limiting clause is LIMIT n, Superset will not append another "LIMIT" clause and the SQL statement remains syntactically valid.

SQLAlchemy Test Suite Results

SQLAlchemy 2.0.27
18342 tests
Passed Failed Errors Skipped Warnings
Before Fixes 12399 1904 2220 2020 8
After Fixes 12424 1930 2156 2020 9
Diffs +25 +26 -64 - +1
SQLAlchemy 1.4.51
13909 tests
Passed Failed Errors Skipped
Before Fixes 9277 1883 2080 730
After Fixes 9291 1891 2077 730
Diffs +14 +8 -3 -

Concerns

  1. The Ingres dialect needs a more robust way of handling case of identifiers. Ingres Terminal Monitor code would be a good reference for this.
  2. The Superset config parameter SQL_MAX_ROW affects behavior in Superset SQL Lab. Without the proposed fix to the SQLAlchemy Ingres dialect limit_clause method, adhoc queries having a FETCH FIRST ... row limiting clause always fail with a syntax error since Superset adds the redundant LIMIT n clause. However, with the fix to method limit_clause, adhoc queries using a FETCH FIRST ... row limiting clause will work if SQL_MAX_ROW is set to 0 (zero). If SQL_MAX_ROW is set to a value > 0, Superset appends a LIMIT clause causing a syntax error. Adhoc queries using LIMIT for row limiting work regardless of the value of SQL_MAX_ROW.

@hab6
Copy link
Contributor Author

hab6 commented Mar 4, 2024

Per discussion with @clach04

Tested the current sql_parse.py from the Superset headrev with adhoc SQL and a row limiting clause.

  • Using clause FETCH FIRST n ROWS ONLY fails with a syntax error since the resulting SQL contains two row limiting clauses (LIMIT 4 FETCH FIRST 101 ROWS ONLY)
  • Using clause LIMIT n succeeds

Tested this hack from June 8, 2020 applied to sql_parse.py of the current Superset headrev with adhoc SQL and a row limiting clause.

  • Using clause FETCH FIRST n ROWS ONLY succeeds
  • Using clause LIMIT n fails with a syntax error since resulting SQL contains two row limiting clauses (LIMIT 4 FETCH FIRST 101 ROWS ONLY)

This lends to the theory that the Superset SQL parser supports using the row limiting clause LIMIT n but does not support the clause FETCH FIRST n ROWS ONLY.

Copy link
Member

@clach04 clach04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main change looks good, minor refactor suggested.

Additional change should be reverted.

@@ -641,7 +643,7 @@ def denormalize_name(self, name):
if name is None:
return None
else:
return name.lower().encode('latin1')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing but I suspect this is accidentally working for the test case (compare with normalize name which does NOT change case). I recommend reverting this line and then making this separate and handling DB_NAME_CASE lookup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, the existing code with encoding change is DEFINTELY incorrect. This will only work for some values/II_CHARSETxx settings/combinations. This needs a separate issue, I just opened #36 (worth a jira ticket too for cross reference?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted change and will deal with it in #36

if not self.is_subquery():
if select._offset:
if select._offset is not None and select._offset > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seeing a pattern of checks here.

What do you think about a small utility function similar to:

def is_integer_greater_than_zero(check_value):
    return check_value is not None and check_value > 0

and then using that instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on this suggestion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! Utility function added to base.py for checking if value is greater than zero.

@hab6
Copy link
Contributor Author

hab6 commented Mar 7, 2024

Superset issue 27427 opened for the problem involving row limiting clauses.

Copy link
Member

@clach04 clach04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@clach04
Copy link
Member

clach04 commented Mar 12, 2024

@hab6 this is marked as Draft so can not be submitted yet. Any concerns with un-drafting and merging? lgtm :-)

@hab6
Copy link
Contributor Author

hab6 commented Mar 12, 2024

@clach04 thanks for your suggestion per our brief conversation. I agree that it will be good to merge these changes and provide some relief for the problem addressed by these code changes.

@hab6 hab6 marked this pull request as ready for review March 12, 2024 17:13
@hab6 hab6 merged commit 6f891c3 into master Mar 12, 2024
@hab6 hab6 deleted the hab6-improve-metadata-ops branch March 12, 2024 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants