Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common Table Expressions (CTE's) do not work for Microsoft SQL Server database #8074

Closed
abhinavsood opened this issue Aug 20, 2019 · 15 comments · Fixed by #18746
Closed

Common Table Expressions (CTE's) do not work for Microsoft SQL Server database #8074

abhinavsood opened this issue Aug 20, 2019 · 15 comments · Fixed by #18746
Labels
!deprecated-label:bug Deprecated label - Use #bug instead .pinned Draws attention

Comments

@abhinavsood
Copy link

Problem

Common Table Expressions (CTE) queries are not working as expected in SQL Lab against SQL Server databases. These are queries written using the WITH clause.

Steps to Reproduce

Start superset using:

superset run -p 8080 --with-threads --reload --debugger

Add a SQL Server database and tables in Sources.

Make sure to place a check for Allow DML and ensure that it is selected and set to True.

Navigate to SQL Lab. Run the following query:

WITH t AS (SELECT 1 a) SELECT * FROM t

Expected Result

------
| a  |
------
| 1  |
------

Observed Result

The following error message is seen:

2019-08-20 14:07:53,005:ERROR:root:('42000', "[42000] [FreeTDS][SQL Server]Incorrect syntax near the keyword 'WITH'. (156) (SQLExecDirectW)")
Traceback (most recent call last):
  File "/home/user/anaconda3/envs/superset/lib/python3.6/site-packages/superset/sql_lab.py", line 190, in execute_sql_statement
    db_engine_spec.execute(cursor, sql, async_=True)
  File "/home/user/anaconda3/envs/superset/lib/python3.6/site-packages/superset/db_engine_specs.py", line 421, in execute
    cursor.execute(query)
pyodbc.ProgrammingError: ('42000', "[42000] [FreeTDS][SQL Server]Incorrect syntax near the keyword 'WITH'. (156) (SQLExecDirectW)")
2019-08-20 14:07:53,017:INFO:werkzeug:127.0.0.1 - - [20/Aug/2019 14:07:53] "POST /superset/sql_json/ HTTP/1.1" 500 -

System and Environment Details

Database: Microsoft SQL Server 2017

Operating System: Red Hat Enterprise Linux 7 x86_64
Python: Python 3.6.7 (Anaconda Python)

Superset:

Superset 0.33.0rc1
flask 1.1.1
werkzeug 0.15.5

Other packages:

sqlalchemy 1.3.7
sqlparse 0.3.0
unixodbc 2.3.7
pyodbc 4.0.23
FreeTDS 1.1.11

Additional Notes

The query runs as expected against the same database:

  • Directly in Microsoft SQL Server Management Studio 17
  • Through FreeTDS (by running tsql -S <FQDN> and then executing the query)
  • Through pyODBC and SQLAlchemy in a python program.

The query runs as expected against a PostGreSQL database through SQL Lab.

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label #bug to this issue, with a confidence of 0.94. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@issue-label-bot issue-label-bot bot added the !deprecated-label:bug Deprecated label - Use #bug instead label Aug 20, 2019
@villebro
Copy link
Member

Reproduced on master; running the same on sqlite works just fine. Will take a closer look.

@villebro
Copy link
Member

Looking into this, it turns out that the bug stems from the fact that MSSQL uses the non-ANSI TOP syntax for limiting row count (SQL Lab sticks on a limit on each query, which is currently causing this problem). I'm sure we can work around this issue somehow, e.g. by disabling auto-limiting for non-ANSI DBs. However, I don't expect non-ANSI functionality to be as highly prioritized as ANSI SQL compliant syntax, i.e. expect some functionality in Superset to be less comprehensive for MSSQL than e.g. Postgres. We'll be sure to add a warning in the docs for this once we establish to what extent non-standard syntax is supported.

@abhinavsood
Copy link
Author

Looking into this, it turns out that the bug stems from the fact that MSSQL uses the non-ANSI TOP syntax for limiting row count (SQL Lab sticks on a limit on each query, which is currently causing this problem). I'm sure we can work around this issue somehow, e.g. by disabling auto-limiting for non-ANSI DBs.

AFAIK, only SQL Server and Access use the TOP syntax and while Access is not commonly used anymore, SQL Server is still quite prevalent in "Enterprise" settings. This particular use-case should be handled because CTEs are a very mainstream and extremely useful feature to use.

However, I don't expect non-ANSI functionality to be as highly prioritized as ANSI SQL compliant syntax, i.e. expect some functionality in Superset to be less comprehensive for MSSQL than e.g. Postgres.

I totally understand and agree with this. I avoid non-ANSI SQL functionality in any database like plague and lack of support for non-ANSI features in tools hasn't caused me any pain yet!

We'll be sure to add a warning in the docs for this once we establish to what extent non-standard syntax is supported.

That would be great!

@villebro
Copy link
Member

Please take a look at my PR that fixes the immediate problem with CTEs on MSSQL (apparently this should also fix the same problem on DB2, Oracle and Teradata). Adding full support for limits on non-ANSI engines is not something I feel an immediate urge to do, and quite frankly I don't want to encourage DB vendors to procrastinate on moving away from bespoke syntax. But anyway, let me know if this is ok for your use case.

@abhinavsood
Copy link
Author

@villebro I am currently unable to test the fix. The docker environment for development fails to initialize:

Step 13/23 : RUN pip install --upgrade setuptools pip     && pip install -r requirements.txt -r requirements-dev.txt -r requirements-extra.txt     && rm -rf /root/.cache/pip
 ---> Running in 757208a956a6

Collecting setuptools
  Downloading https://files.pythonhosted.org/packages/b2/86/095d2f7829badc207c893dd4ac767e871f6cd547145df797ea26baea4e2e/setuptools-41.2.0-py2.py3-none-any.whl (576kB)
Collecting pip
  Downloading https://files.pythonhosted.org/packages/8d/07/f7d7ced2f97ca3098c16565efbe6b15fafcba53e8d9bdb431e09140514b0/pip-19.2.2-py2.py3-none-any.whl (1.4MB)
Installing collected packages: setuptools, pip
  Found existing installation: setuptools 41.0.1
    Uninstalling setuptools-41.0.1:
ERROR: Could not install packages due to an EnvironmentError: [Errno 39] Directory not empty: 'packaging'

WARNING: You are using pip version 19.1.1, however version 19.2.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
ERROR: Service 'superset' failed to build: The command '/bin/sh -c pip install --upgrade setuptools pip     && pip install -r requirements.txt -r requirements-dev.txt -r requirements-extra.txt     && rm -rf /root/.cache/pip' returned a non-zero code: 1

Do you have any suggestions on what might be causing this? Thanks!

@villebro
Copy link
Member

Unfortunately I an not that familiar with how the docker setup under contrib works (we use a custom Dockerfile in our env). But seems like rebuilding from scratch should help.

@abhinavsood
Copy link
Author

@villebro - I was able to test this and can confirm that everything works as expected. The test CTE as well as CTEs fetching records from the database work.

A couple of things to note:

  1. Users would have to be careful trying to pull more data than the browser can handle
  2. The SQL Lab UI shows the Limit 1000 button below the query box which wouldn't have any effect for queries against non-ANSI compliant databases

@villebro
Copy link
Member

If you check the diff, I added a warning about using CTEs on non-ANSIs.

Btw, I'm working on something that might improve CTE support on all engines and should fix this problem on non-ANSIs, too, but we'll see how that works out. But in the meantime this is probably the best compromise I can offer.

@abhinavsood
Copy link
Author

Btw, I'm working on something that might improve CTE support on all engines and should fix this problem on non-ANSIs, too, but we'll see how that works out. But in the meantime this is probably the best compromise I can offer.

Sounds like a plan!

BTW, another tab where a user can enter custom SQL is Charts > Edit Data Source > Advanced > SQL. Users must understand that CTEs have no place there because any aggregates are done on the results of the provided query - so if someone absolutely needs to use the results from a CTE, the right way to do this would be to create a view and use that view instead, as the data source.

@villebro
Copy link
Member

The query under data source is what is created when you click explore in SQL Lab. What I'm currently tinkering with is adding full CTE support, i.e. being able to write a CTE in SQL Lab and then using that as a data source in charts. Having to create a view is an inconvenient step, and being able to bypass that should make data exploration slightly easier.

@abhinavsood
Copy link
Author

Having to create a view is an inconvenient step, and being able to bypass that should make data exploration slightly easier.

Definitely an inconvenience that I am dealing with right now but would have preferred not to!

What I'm currently tinkering with is adding full CTE support, i.e. being able to write a CTE in SQL Lab and then using that as a data source in charts.

So this sounds fantastic! Thank you!

@villebro
Copy link
Member

Looking more closely at this problem, it seems MSSQL is one of the few engines that doesn't support wrapping CTEs in subqueries (even Oracle supports this). Furthermore, most engines do equal predicate pushdown for CTEs and subqueries, which means that wrapping SQL Lab queries in CTEs in most cases wouldn't have a significant positive performance impact (on Postgres 11 and below CTEs actually perform worse than subqueries!). As this problem seems to be mostly an MSSQL problem, I think I'll leave this on the backlog for now, as making this work correctly would require a rather significant development effort that's difficult to justify given other more pressing issues/features.

@stale
Copy link

stale bot commented Oct 28, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label Oct 28, 2019
@stale stale bot closed this as completed Nov 4, 2019
@nytai nytai reopened this Jan 31, 2022
@stale stale bot removed the inactive Inactive for >= 30 days label Jan 31, 2022
@nytai nytai added the .pinned Draws attention label Jan 31, 2022
villebro added a commit that referenced this issue Feb 10, 2022
* Fix for handling regular CTE queries with MSSQL,#8074

* Moved the get_cte_query function from mssql.py to base.py for using irrespetcive of dbengine

* Fix for handling regular CTE queries with MSSQL,#8074

* Moved the get_cte_query function from mssql.py to base.py for using irrespetcive of dbengine

* Unit test added for the db engine CTE SQL parsing.

Unit test added for the db engine CTE SQL parsing.  Removed additional spaces from the CTE parsing SQL generation.

* implement in sqla model

* lint + cleanup

Co-authored-by: Ville Brofeldt <ville.v.brofeldt@gmail.com>
@villebro
Copy link
Member

@abhinavsood please check #18746 which should fix this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
!deprecated-label:bug Deprecated label - Use #bug instead .pinned Draws attention
Projects
None yet
3 participants