Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force lowercase column names for Snowflake and Oracle #4994

Merged
merged 6 commits into from
May 14, 2018
Merged

Force lowercase column names for Snowflake and Oracle #4994

merged 6 commits into from
May 14, 2018

Conversation

villebro
Copy link
Member

@villebro villebro commented May 12, 2018

When calling cursor.description using the Oracle and Snowflake SQL Alchemy connectors, column names are returned in all uppercase. This causes problems when these column names are used to match column names in DataFrames that have been constructed from a ResultProxy instance, which for these connectors return lowercase column names when calling the keys() function. An example of this is Pandas' read_sql_query function, which is used in the Viz component.

This PR should fix #4662, #4770 and #953. A long-term sustainable fix would be replacing this and other cursor.execute()-based select queries with connection.execute()calls to ensure uniform column names. This should be fairly simple and would probably tidy up the code a fair bit, unless there is an explicit reason for using the low level API.

@codecov-io
Copy link

Codecov Report

Merging #4994 into master will decrease coverage by <.01%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #4994      +/-   ##
=========================================
- Coverage    77.1%   77.1%   -0.01%     
=========================================
  Files          44      44              
  Lines        8636    8644       +8     
=========================================
+ Hits         6659    6665       +6     
- Misses       1977    1979       +2
Impacted Files Coverage Δ
superset/sql_lab.py 74.31% <100%> (-0.14%) ⬇️
superset/db_engine_specs.py 52.86% <77.77%> (+0.39%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7d5195a...631704a. Read the comment docs.

@mistercrunch
Copy link
Member

I think there are instances where we need to use the lower level API, we're likely to move towards always using the lower level API to get more engine-specific control.

Is oracle/snowflake always case unsensitive? Can case-sensitivity ever be turned on? If so would the current PR still work?

@villebro
Copy link
Member Author

villebro commented May 13, 2018

Ok, that's fine, too, as long as we're always consistently using the same API. Would the plan be to add querying methods to the db_engine_spec? This way we could have more control over the details later.

With regards to the details on Oracle/Snowflake case logic, I think this hack works fine for now, at least much better than if left untouched. Reading the docs both Oracle and Snowflake return case insensitive column names as uppercase (I'm assuming the docs refer to cursor.description), but if an uppercase column name is sent to SQLAlchemy, it treats that as case sensitive, and encloses the column name in quotes, resulting in the problem that we've observed. I can't come up with a scenario where this fix/hack would break anything, as any query that was previously working was already case insensitive, hence would be unaffected by lowercasing the column names (at least that's my interpretation).

I've actually raised the cursor.description vs resultproxy.keys() inconsistency with Snowflake and hope to get some help from them on that front, either guidance on how to use the connector or a change to it's behaviour/room for customisation of engine_params in future versions. The details surrounding the Oracle connector aren't as familiar to me, but can look into that in the coming days. At any rate, the current problem stems from mixing cursor with connection, so this problem should ultimately go away if all db connectivity is streamlined.

@mistercrunch mistercrunch merged commit b391676 into apache:master May 14, 2018
michellethomas pushed a commit to michellethomas/panoramix that referenced this pull request May 24, 2018
* Force lowercase column names for Snowflake and Oracle

* Force lowercase column names for Snowflake and Oracle

* Remove lowercasing of DB2 columns

* Remove DB2 lowercasing

* Fix test cases
timifasubaa pushed a commit to timifasubaa/incubator-superset that referenced this pull request May 31, 2018
* Force lowercase column names for Snowflake and Oracle

* Force lowercase column names for Snowflake and Oracle

* Remove lowercasing of DB2 columns

* Remove DB2 lowercasing

* Fix test cases
@villebro villebro deleted the db_engine_normalize_col branch July 3, 2018 05:17
@mmuru
Copy link
Contributor

mmuru commented Jul 24, 2018

I tried to verify this fix against Snowflake and Superset 0.26.3 (latest release) and the issue is still exist. The workaround #4662 and #4770 only works and verified.

timifasubaa pushed a commit to airbnb/superset-fork that referenced this pull request Jul 25, 2018
* Force lowercase column names for Snowflake and Oracle

* Force lowercase column names for Snowflake and Oracle

* Remove lowercasing of DB2 columns

* Remove DB2 lowercasing

* Fix test cases
@villebro
Copy link
Member Author

@mmuru That's strange, this effectively does the same thing that the workarounds do. However, this was just a hack, and will no longer work due to unrelated changes in master branch. However, I currently have a local build that is working very well with Snowflake (based on master branch), but still have to to do some additional testing. I should have a fix ready in a day or so if you want to try it out. The patch should actually work well for 0.26.3, too, if you want to backport it.

@mistercrunch
Copy link
Member

Should I revert?

@villebro
Copy link
Member Author

@mmuru did you refer to PR #5467 or this PR #4994 ? #5467 won't work for 0.26.3 as it requires changes that are present in master branch. @mistercrunch I would recommend not reverting in master, as the old workaround is not compqtible with the ned db api 1 logic. I will submit a new PR soon that should fix this problem permanently.

@mistercrunch
Copy link
Member

Ok let's push a fix through then.

@mmuru
Copy link
Contributor

mmuru commented Jul 25, 2018

@villebro: I was referring your PR #4994 change set. The #4662 workaround simply outside Superset and manually setting requires_name_normalize=False. This one was very nice but one has to manually override snowflake-sqlalchemy code. The #4770 workaround is manually edit and lower case columns within Superset. Sure, let me know when your fix is ready and I will test and verify it.

@villebro
Copy link
Member Author

@mmuru Check out #5487 . Feedback much appreciated!

wenchma pushed a commit to wenchma/incubator-superset that referenced this pull request Nov 16, 2018
* Force lowercase column names for Snowflake and Oracle

* Force lowercase column names for Snowflake and Oracle

* Remove lowercasing of DB2 columns

* Remove DB2 lowercasing

* Fix test cases
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.26.0 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.26.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Key Error in Viz using Snowflake Connector
4 participants