[sql] Correct SQL parameter formatting #5178

john-bodley · 2018-06-12T01:44:45Z

TL;DR This PR removes the need for double escaping the percent character both in SQL Lab and explorer and ensures both approaches interface with the SQLA engine in a consistent manner.

This PR came about due to an error when exporting a SQL Lab result to CSV. For example in Presto one can successfully run the following query:

SELECT '%'

which executes directly via the SQLAlchemy engine, however an error was thrown when trying to export to CSV which uses the Pandas interface where the parameters argument is {} which causes an exception in PyHive which uses the pyformat SQL parameterstyle. In essence,

SELECT '%' %  {}

is invalid and thus one would need to escape the % character, i.e.,

SELECT '%%' %  {}

The reason for the error is the Pandas SQL uses the engine connection (which doesn't expose the execute arguments) whereas SQL Lab uses a raw connection (which exposes the execute arguments).

We should ensure that both SQL Lab and explore use the same interface, which means that either i) the compiled SQL statements include escaping of % to satisfy pyformat and enforce the parameter args to be either () or {}, or ii) the compiled SQL statements remove all escaping and enforce the parameter args to be None. Note in both cases one can copy-and-paste explorer queries directly in SQL Lab, though this PR implements the second option as i) one doesn't need to escape the % character for the appropriate dialects, and ii) the raw SQL is what one would run in the native DB CLI.

The fix is simply to bypass the Pandas API for querying and mimic the logic that SQL Lab uses which ensures that for presto parameters is None and thus it will not re-format the string.

Note this PR also fixes the Hive and MySQL engine specs inline with their API, i.e., both Hive and MySQL explicitly test whether the parameters and args respectively are None (rather than doing a boolean check).

This means for MySQL in SQL Lab one can run queries of the form,

SELECT * FROM table WHERE column LIKE '%foo%'

as opposed to:

SELECT * FROM table WHERE column LIKE '%%foo%%'

since previously the MySQL engine spec defined args to {} causing the string to be formatted with the empty dictionary args.

Closes #4098, #4857

to: @betodealmeida @michellethomas @mistercrunch
cc: @graceguo-supercat @timifasubaa

mistercrunch · 2018-06-12T15:30:07Z

Very nice. Happy to see this as read_sql has been getting in the way of getting consistency across SQL Lab & charts.

codecov-io · 2018-06-12T21:33:51Z

Codecov Report

Merging #5178 into master will increase coverage by 0.05%.
The diff coverage is 91.66%.

@@            Coverage Diff             @@
##           master    #5178      +/-   ##
==========================================
+ Coverage   60.71%   60.77%   +0.05%     
==========================================
  Files         258      258              
  Lines       19701    19707       +6     
  Branches     1970     1970              
==========================================
+ Hits        11962    11977      +15     
+ Misses       7730     7721       -9     
  Partials        9        9

Impacted Files	Coverage Δ
superset/sql_lab.py	`75.4% <100%> (ø)`	⬆️
superset/connectors/sqla/models.py	`78% <100%> (+0.7%)`	⬆️
superset/db_engine_specs.py	`53.85% <80%> (+0.97%)`	⬆️
superset/models/core.py	`86.71% <94.11%> (+0.14%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d483ed1...eeebcb0. Read the comment docs.

john-bodley · 2018-06-13T02:20:56Z

superset/models/core.py

+        with closing(engine.raw_connection()) as conn:
+            with closing(conn.cursor()) as cursor:
+                for sql in sqls:
+                    cursor.execute(sql, **self.db_engine_spec.cursor_execute_kwargs)


@mistercrunch my one concern is for Hive and whether running async=True (as defined by the engine spec) makes sense here.

Hmm yeah looks like this won't work. The async=True is to get the progress bar. Some options:

Make db_engine_spec.handle_cursor broader as in an execute method that returns the last recordset or dataframe. This means moving a fair amount of logic from SQL Lab and explore into db_engine_spec. It's probably the right thing to do.

hacks around intercepting and disabling async when in explore/dashboard, but not in SQL Lab. That's more hacky but less disruptive

I originally was thinking of wrapping the execute method, though one still needs to decipher when to run in sync (explorer) vs. async (SQL Lab) mode.

john-bodley · 2018-06-17T03:50:39Z

@mistercrunch I address your comment regarding the async logic for SQL Lab vs. explorer. Would you mind taking a look?

mistercrunch · 2018-06-18T16:10:33Z

superset/models/core.py

+                for sql in sqls:
+                    self.db_engine_spec.execute(cursor, sql)
+                df = pd.DataFrame.from_records(
+                    data=list(cursor.fetchall()),


This LGTM. There's a bit of complexity/differences still left around the way explore vs sqllab get their data. I think the main difference is around Hive, the async param and whether handle_cursor is called or not (called only in SQL Lab).

This PR is a step in the right direction. Maybe eventually we want to make bring more into db_engine_spec, and have a get_records that receives update_process=False that internally would deal with async.

@mistercrunch do you want me to makes these changes here, or should these issues be addressed in a future PR?

mistercrunch · 2018-06-18T16:12:35Z

tox.ini

@@ -36,7 +36,7 @@ setenv =
    SUPERSET_CONFIG = tests.superset_test_config
    SUPERSET_HOME = {envtmpdir}
    py27-mysql: SUPERSET__SQLALCHEMY_DATABASE_URI = mysql://mysqluser:mysqluserpassword@localhost/superset?charset=utf8
-    py34-mysql: SUPERSET__SQLALCHEMY_DATABASE_URI = mysql://mysqluser:mysqluserpassword@localhost/superset
+    py{34,36}-mysql: SUPERSET__SQLALCHEMY_DATABASE_URI = mysql://mysqluser:mysqluserpassword@localhost/superset


Side note, we may be able to drop 3.4 support soon. While 2.7 should have a long tail of support, I don't think it's as needed with 3.4

Agreed. This PR just adds the missing Python 3.6 version which I was using for testing locally. Note we've kind of already dropped 3.4 per here. I'll create a new PR to remove reference to it in Tox.

john-bodley · 2018-06-18T18:01:39Z

Note this PR is gated by #5206 as I didn't want to have to mutate multiple columns in the migration script.

john-bodley · 2018-07-19T16:32:44Z

@mistercrunch are you ok if I merge this PR?

mistercrunch · 2018-07-20T22:54:23Z

Yup @john-bodley go for it.

(cherry picked from commit 7fcc2af)

john-bodley changed the title ~~John bodley sql percent sign~~ [get_df] Correct SQL parameter formatting Jun 12, 2018

john-bodley changed the title ~~[get_df] Correct SQL parameter formatting~~ [wip][sql] Correct SQL parameter formatting Jun 12, 2018

john-bodley force-pushed the john-bodley-sql-percent-sign branch 5 times, most recently from 94bc031 to 3ce3994 Compare June 12, 2018 18:55

john-bodley changed the title ~~[wip][sql] Correct SQL parameter formatting~~ [sql] Correct SQL parameter formatting Jun 12, 2018

john-bodley commented Jun 13, 2018

View reviewed changes

john-bodley force-pushed the john-bodley-sql-percent-sign branch 6 times, most recently from 5168c7e to 87c205e Compare June 17, 2018 03:49

mistercrunch reviewed Jun 18, 2018

View reviewed changes

[sql] Correct SQL parameter formatting

eeebcb0

john-bodley force-pushed the john-bodley-sql-percent-sign branch from 87c205e to eeebcb0 Compare June 18, 2018 22:44

john-bodley merged commit 7fcc2af into apache:master Jul 21, 2018

john-bodley deleted the john-bodley-sql-percent-sign branch July 21, 2018 19:01

villebro mentioned this pull request Jul 23, 2018

Fix broken dedup and remove redundant db_spec logic #5467

Merged

graceguo-supercat mentioned this pull request Jul 24, 2018

[cherry-pick] migrate all dashboards airbnb/superset-fork#66

Merged

timifasubaa pushed a commit to airbnb/superset-fork that referenced this pull request Jul 25, 2018

[sql] Correct SQL parameter formatting (apache#5178)

06cd144

graceguo-supercat pushed a commit to graceguo-supercat/superset that referenced this pull request Jul 26, 2018

[sql] Correct SQL parameter formatting (apache#5178)

7554d59

(cherry picked from commit 7fcc2af)

wenchma pushed a commit to wenchma/incubator-superset that referenced this pull request Nov 16, 2018

[sql] Correct SQL parameter formatting (apache#5178)

b00bb69

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.28.0 labels Feb 27, 2024

betodealmeida mentioned this pull request Apr 29, 2024

fix: druid double percent #28270

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sql] Correct SQL parameter formatting #5178

[sql] Correct SQL parameter formatting #5178

john-bodley commented Jun 12, 2018 •

edited

Loading

mistercrunch commented Jun 12, 2018 •

edited

Loading

codecov-io commented Jun 12, 2018 •

edited

Loading

john-bodley Jun 13, 2018

mistercrunch Jun 13, 2018

john-bodley Jun 13, 2018

john-bodley commented Jun 17, 2018

mistercrunch Jun 18, 2018

john-bodley Jun 19, 2018

mistercrunch Jun 18, 2018

john-bodley Jun 18, 2018

john-bodley commented Jun 18, 2018

john-bodley commented Jul 19, 2018

mistercrunch commented Jul 20, 2018 •

edited

Loading

[sql] Correct SQL parameter formatting #5178

[sql] Correct SQL parameter formatting #5178

Conversation

john-bodley commented Jun 12, 2018 • edited Loading

mistercrunch commented Jun 12, 2018 • edited Loading

codecov-io commented Jun 12, 2018 • edited Loading

Codecov Report

john-bodley Jun 13, 2018

Choose a reason for hiding this comment

mistercrunch Jun 13, 2018

Choose a reason for hiding this comment

john-bodley Jun 13, 2018

Choose a reason for hiding this comment

john-bodley commented Jun 17, 2018

mistercrunch Jun 18, 2018

Choose a reason for hiding this comment

john-bodley Jun 19, 2018

Choose a reason for hiding this comment

mistercrunch Jun 18, 2018

Choose a reason for hiding this comment

john-bodley Jun 18, 2018

Choose a reason for hiding this comment

john-bodley commented Jun 18, 2018

john-bodley commented Jul 19, 2018

mistercrunch commented Jul 20, 2018 • edited Loading

john-bodley commented Jun 12, 2018 •

edited

Loading

mistercrunch commented Jun 12, 2018 •

edited

Loading

codecov-io commented Jun 12, 2018 •

edited

Loading

mistercrunch commented Jul 20, 2018 •

edited

Loading