Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sql lab] Fix issue around VARBINARY type in Presto #5121

Merged
merged 1 commit into from
Jun 20, 2018

Conversation

mistercrunch
Copy link
Member

When receiving a VARBINARY field out of Presto, it shows up as type
bytes out of the pyhive driver. Then the pre 3.15 version of
simplejson attempts to convert it to utf8 by default and it craps out.

I bumped to simplejson>=3.25.0 and set encoding=None as documented
here
https://simplejson.readthedocs.io/en/latest/#basic-usage so that we can
handle bytes on our own.

When receiving a VARBINARY field out of Presto, it shows up as type
`bytes` out of the pyhive driver. Then the pre 3.15 version of
simplejson attempts to convert it to utf8 by default and it craps out.

I bumped to simplejson>=3.25.0 and set `encoding=None` as documented
here
https://simplejson.readthedocs.io/en/latest/#basic-usage so that we can
handle bytes on our own.
@codecov-io
Copy link

Codecov Report

Merging #5121 into master will decrease coverage by 0.03%.
The diff coverage is 20%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5121      +/-   ##
==========================================
- Coverage   77.51%   77.48%   -0.04%     
==========================================
  Files          44       44              
  Lines        8735     8740       +5     
==========================================
+ Hits         6771     6772       +1     
- Misses       1964     1968       +4
Impacted Files Coverage Δ
superset/views/core.py 74.66% <ø> (ø) ⬆️
superset/utils.py 87.82% <20%> (-0.7%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d3e96b...1b10f5c. Read the comment docs.

@mistercrunch
Copy link
Member Author

@betodealmeida @hughhhh 👀

@@ -323,6 +322,11 @@ def base_json_conv(obj):
return str(obj)
elif isinstance(obj, timedelta):
return str(obj)
elif isinstance(obj, bytes):
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a unit test here? Also don't we want to raise the error vs. swallow by sending [bytes] or even put some type of logger.error()?

@hughhhh hughhhh added the lyft Related to Lyft label Jun 2, 2018
@@ -323,6 +322,11 @@ def base_json_conv(obj):
return str(obj)
elif isinstance(obj, timedelta):
return str(obj)
elif isinstance(obj, bytes):
try:
return '{}'.format(obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will return different values depending on Python version. For Python 2 (where bytes are strings):

>>> print(repr('{}'.format(b'test')))
'test'

For Python 3:

>>> print(repr('{}'.format(b'test')))
"b'test'"

Why not return a list of integers here instead?

>>> list(b'test')
[116, 101, 115, 116]

What is the use case for VARBINARY?

@mistercrunch
Copy link
Member Author

Reviving this trying to push it through. For context Presto/Hive has the VARBINARY type, and data preview or other select * query fail to json.dumps. Here I'm just trying to prevent the error and show something in the data table. I think ideally we'd show some hex of the binary, but I think there are cases where string might be instances of bytes (I haven't seen it, but saw related issues when stackoverflowin' my issue) and thought the format approach would deal with this nicely.

@mistercrunch
Copy link
Member Author

Merging this as it fixes a bug. We can tweak the way bytes look in the future if needed.

@mistercrunch mistercrunch merged commit 409ac68 into apache:master Jun 20, 2018
timifasubaa pushed a commit to airbnb/superset-fork that referenced this pull request Jul 25, 2018
When receiving a VARBINARY field out of Presto, it shows up as type
`bytes` out of the pyhive driver. Then the pre 3.15 version of
simplejson attempts to convert it to utf8 by default and it craps out.

I bumped to simplejson>=3.25.0 and set `encoding=None` as documented
here
https://simplejson.readthedocs.io/en/latest/#basic-usage so that we can
handle bytes on our own.
timifasubaa pushed a commit to airbnb/superset-fork that referenced this pull request Sep 14, 2018
When receiving a VARBINARY field out of Presto, it shows up as type
`bytes` out of the pyhive driver. Then the pre 3.15 version of
simplejson attempts to convert it to utf8 by default and it craps out.

I bumped to simplejson>=3.25.0 and set `encoding=None` as documented
here
https://simplejson.readthedocs.io/en/latest/#basic-usage so that we can
handle bytes on our own.

(cherry picked from commit 409ac68)
timifasubaa added a commit to airbnb/superset-fork that referenced this pull request Sep 14, 2018
wenchma pushed a commit to wenchma/incubator-superset that referenced this pull request Nov 16, 2018
When receiving a VARBINARY field out of Presto, it shows up as type
`bytes` out of the pyhive driver. Then the pre 3.15 version of
simplejson attempts to convert it to utf8 by default and it craps out.

I bumped to simplejson>=3.25.0 and set `encoding=None` as documented
here
https://simplejson.readthedocs.io/en/latest/#basic-usage so that we can
handle bytes on our own.
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.26.0 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels lyft Related to Lyft 🚢 0.26.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants