-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sql] Correct SQL parameter formatting #5178
[sql] Correct SQL parameter formatting #5178
Conversation
Very nice. Happy to see this as |
94bc031
to
3ce3994
Compare
Codecov Report
@@ Coverage Diff @@
## master #5178 +/- ##
==========================================
+ Coverage 60.71% 60.77% +0.05%
==========================================
Files 258 258
Lines 19701 19707 +6
Branches 1970 1970
==========================================
+ Hits 11962 11977 +15
+ Misses 7730 7721 -9
Partials 9 9
Continue to review full report at Codecov.
|
superset/models/core.py
Outdated
with closing(engine.raw_connection()) as conn: | ||
with closing(conn.cursor()) as cursor: | ||
for sql in sqls: | ||
cursor.execute(sql, **self.db_engine_spec.cursor_execute_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mistercrunch my one concern is for Hive and whether running async=True (as defined by the engine spec) makes sense here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah looks like this won't work. The async=True
is to get the progress bar. Some options:
- Make
db_engine_spec.handle_cursor
broader as in anexecute
method that returns the last recordset or dataframe. This means moving a fair amount of logic from SQL Lab and explore into db_engine_spec. It's probably the right thing to do. - hacks around intercepting and disabling async when in explore/dashboard, but not in SQL Lab. That's more hacky but less disruptive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally was thinking of wrapping the execute method, though one still needs to decipher when to run in sync (explorer) vs. async (SQL Lab) mode.
5168c7e
to
87c205e
Compare
@mistercrunch I address your comment regarding the async logic for SQL Lab vs. explorer. Would you mind taking a look? |
for sql in sqls: | ||
self.db_engine_spec.execute(cursor, sql) | ||
df = pd.DataFrame.from_records( | ||
data=list(cursor.fetchall()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. There's a bit of complexity/differences still left around the way explore vs sqllab get their data. I think the main difference is around Hive, the async
param and whether handle_cursor
is called or not (called only in SQL Lab).
This PR is a step in the right direction. Maybe eventually we want to make bring more into db_engine_spec, and have a get_records
that receives update_process=False
that internally would deal with async
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mistercrunch do you want me to makes these changes here, or should these issues be addressed in a future PR?
@@ -36,7 +36,7 @@ setenv = | |||
SUPERSET_CONFIG = tests.superset_test_config | |||
SUPERSET_HOME = {envtmpdir} | |||
py27-mysql: SUPERSET__SQLALCHEMY_DATABASE_URI = mysql://mysqluser:mysqluserpassword@localhost/superset?charset=utf8 | |||
py34-mysql: SUPERSET__SQLALCHEMY_DATABASE_URI = mysql://mysqluser:mysqluserpassword@localhost/superset | |||
py{34,36}-mysql: SUPERSET__SQLALCHEMY_DATABASE_URI = mysql://mysqluser:mysqluserpassword@localhost/superset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note, we may be able to drop 3.4
support soon. While 2.7
should have a long tail of support, I don't think it's as needed with 3.4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This PR just adds the missing Python 3.6 version which I was using for testing locally. Note we've kind of already dropped 3.4 per here. I'll create a new PR to remove reference to it in Tox.
Note this PR is gated by #5206 as I didn't want to have to mutate multiple columns in the migration script. |
87c205e
to
eeebcb0
Compare
@mistercrunch are you ok if I merge this PR? |
Yup @john-bodley go for it. |
(cherry picked from commit 7fcc2af)
TL;DR This PR removes the need for double escaping the percent character both in SQL Lab and explorer and ensures both approaches interface with the SQLA engine in a consistent manner.
This PR came about due to an error when exporting a SQL Lab result to CSV. For example in Presto one can successfully run the following query:
which executes directly via the SQLAlchemy engine, however an error was thrown when trying to export to CSV which uses the Pandas interface where the
parameters
argument is{}
which causes an exception in PyHive which uses thepyformat
SQL parameterstyle. In essence,is invalid and thus one would need to escape the
%
character, i.e.,The reason for the error is the Pandas SQL uses the engine connection (which doesn't expose the execute arguments) whereas SQL Lab uses a raw connection (which exposes the execute arguments).
We should ensure that both SQL Lab and explore use the same interface, which means that either i) the compiled SQL statements include escaping of
%
to satisfypyformat
and enforce the parameter args to be either()
or{}
, or ii) the compiled SQL statements remove all escaping and enforce the parameter args to beNone
. Note in both cases one can copy-and-paste explorer queries directly in SQL Lab, though this PR implements the second option as i) one doesn't need to escape the%
character for the appropriate dialects, and ii) the raw SQL is what one would run in the native DB CLI.The fix is simply to bypass the Pandas API for querying and mimic the logic that SQL Lab uses which ensures that for presto
parameters
isNone
and thus it will not re-format the string.Note this PR also fixes the Hive and MySQL engine specs inline with their API, i.e., both Hive and MySQL explicitly test whether the
parameters
andargs
respectively areNone
(rather than doing a boolean check).This means for MySQL in SQL Lab one can run queries of the form,
as opposed to:
since previously the MySQL engine spec defined
args
to{}
causing the string to be formatted with the empty dictionary args.Closes #4098, #4857
to: @betodealmeida @michellethomas @mistercrunch
cc: @graceguo-supercat @timifasubaa