Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Use utils.json_iso_dttm_ser to dump jsons when async query execution #13830

Merged
merged 2 commits into from Apr 19, 2021
Merged

Conversation

cabo40
Copy link
Contributor

@cabo40 cabo40 commented Mar 27, 2021

SUMMARY

If encoding is not None, the default encoder utils.json_iso_dttm_ser is not used for binary data and instead it tries to encode to the default 'utf-8'.
For data exploration on tables with blob columns doing SELECT * will throw an error.
This fixes #13829

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

before:
image

after:
image

TEST PLAN

Try executing select decode('DEADBEEF', 'hex'); on the sample DB with async turned on.

ADDITIONAL INFORMATION

Asynchronous query execution must be enabled for the database.

If encoding is not `None`, the default encoder `utils.json_iso_dttm_ser` is not used for binary data and instead tries to encode to the default 'utf-8'. 
This fixes #13829
@codecov
Copy link

codecov bot commented Mar 27, 2021

Codecov Report

Merging #13830 (02c4839) into master (18ff484) will increase coverage by 2.17%.
The diff coverage is 88.52%.

❗ Current head 02c4839 differs from pull request most recent head e1345db. Consider uploading reports for the commit e1345db to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master   #13830      +/-   ##
==========================================
+ Coverage   77.22%   79.40%   +2.17%     
==========================================
  Files         935      939       +4     
  Lines       47266    47541     +275     
  Branches     5893     5938      +45     
==========================================
+ Hits        36502    37750    +1248     
+ Misses      10616     9670     -946     
+ Partials      148      121      -27     
Flag Coverage Δ
cypress 56.04% <66.00%> (-3.97%) ⬇️
hive ?
mysql 80.71% <87.30%> (+0.11%) ⬆️
postgres 80.74% <87.30%> (+0.11%) ⬆️
presto 80.46% <90.00%> (+0.16%) ⬆️
python 81.04% <90.38%> (-0.10%) ⬇️
sqlite 80.34% <87.30%> (+0.14%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...rset-frontend/src/SqlLab/components/QueryTable.jsx 66.66% <ø> (ø)
...tend/src/SqlLab/components/ScheduleQueryButton.tsx 28.12% <ø> (ø)
...end/src/common/components/DropdownButton/index.tsx 24.00% <0.00%> (ø)
...frontend/src/components/DatabaseSelector/index.tsx 92.70% <ø> (+1.04%) ⬆️
...rset-frontend/src/components/ErrorMessage/types.ts 100.00% <ø> (ø)
superset-frontend/src/components/Icon/index.tsx 100.00% <ø> (ø)
superset-frontend/src/components/Loading/index.tsx 100.00% <ø> (ø)
...rset-frontend/src/components/ProgressBar/index.tsx 100.00% <ø> (ø)
...ontend/src/components/URLShortLinkButton/index.jsx 100.00% <ø> (ø)
.../components/CrossFilterScopingModal/utils/index.ts 100.00% <ø> (ø)
... and 244 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 18ff484...e1345db. Read the comment docs.

@cabo40 cabo40 changed the title [SQL Lab] Use utils.json_iso_dttm_ser to dump jsons when async query execution fix [SQL Lab] Use utils.json_iso_dttm_ser to dump jsons when async query execution Mar 27, 2021
@cabo40 cabo40 changed the title fix [SQL Lab] Use utils.json_iso_dttm_ser to dump jsons when async query execution fix: Use utils.json_iso_dttm_ser to dump jsons when async query execution Mar 27, 2021
@zhaoyongjie
Copy link
Member

/testenv up

1 similar comment
@junlincc
Copy link
Member

junlincc commented Apr 1, 2021

/testenv up

@@ -2211,7 +2211,8 @@ def results_exec( # pylint: disable=too-many-return-statements
obj = apply_display_max_row_limit(obj, rows)

return json_success(
json.dumps(obj, default=utils.json_iso_dttm_ser, ignore_nan=True)
json.dumps(obj, default=utils.json_iso_dttm_ser, ignore_nan=True,
encoding=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't specify the encoding type, I guess json.dumps might use ASCII encoding, which might be a problem for Asian characters.

https://github.com/simplejson/simplejson/blob/8bef979ad8272cbc2903970f4b9992f603d50973/simplejson/encoder.py#L275-L302

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!
Thanks for checking the PR. I looked into the encoding and it seems that if a utf-8 string is in the json as a binary string, it will pass through

return obj.decode("utf-8")
and get encoded into utf-8, and if the string is a str it will go through https://github.com/simplejson/simplejson/blob/8bef979ad8272cbc2903970f4b9992f603d50973/simplejson/encoder.py#L51 and also get encoded into utf-8.
There are two other places where the dump is done with encoding=None:
encoding=None,

encoding=None,

I also think that the other calls to json.dumps with default=utils.* should have encoding set to None.
I did a brief test of the change with non ASCII strings and they were correctly displayed.
Is there some other test you think could help mitigate the risk of querying non ASCII chars?
Cheers :)
image

@github-actions
Copy link
Contributor

github-actions bot commented Apr 1, 2021

@zhaoyongjie Ephemeral environment creation is currently limited to committers.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 2, 2021

@junlincc Ephemeral environment spinning up at http://54.191.32.199:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

@zhaoyongjie
Copy link
Member

zhaoyongjie commented Apr 10, 2021

Hi @cabo40
I turned on Async Query, and enabled SQL Lab result cache, then I did not reproduce this issue. Could you double-check this issue on the latest master?

@zhaoyongjie
Copy link
Member

Hi @cabo40,

Sorry, please abort my earlier reply, I reproduced the problem, but it doesn't seem to be caused by the global async query. Here is my configuration for reproducing this issue

import os
from cachelib.redis import RedisCache


SQLALCHEMY_DATABASE_URI = 'postgresql://postgres:postgres@localhost/superset'

REDIS_HOST = os.environ.get("REDIS_HOST", "localhost")
REDIS_PORT = os.environ.get("REDIS_PORT", "6379")
REDIS_CELERY_DB = os.environ.get("REDIS_CELERY_DB", 2)
REDIS_RESULTS_DB = os.environ.get("REDIS_RESULTS_DB", 3)


class CeleryConfig(object):
    BROKER_URL = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}"
    CELERY_IMPORTS = ("superset.sql_lab",)
    CELERY_RESULT_BACKEND = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}"
    CONCURRENCY = 1


CELERY_CONFIG = CeleryConfig
RESULTS_BACKEND = RedisCache(
    host=REDIS_HOST,
    port=REDIS_PORT,
    db=REDIS_RESULTS_DB,
)

Copy link
Member

@zhaoyongjie zhaoyongjie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this fix. please run

tox -e pre-commit

to fix PEP8

@zhaoyongjie zhaoyongjie added the need:merge The PR is ready to be merged label Apr 11, 2021
@zhaoyongjie zhaoyongjie merged commit 1448f78 into apache:master Apr 19, 2021
@github-actions
Copy link
Contributor

Ephemeral environment shutdown and build artifacts deleted.

amitmiran137 added a commit to simcha90/incubator-superset that referenced this pull request Apr 19, 2021
* master:
  fix: Use utils.json_iso_dttm_ser to dump jsons when async query execution (apache#13830)
  feat: TrinoEngineSpec.adjust_database_uri (apache#14122)
  chore: bump package.json (apache#14222)
  Add superset helm repository (apache#14223)
  fix(cross-filters): Fix missed metadata (apache#14220)
@cabo40 cabo40 deleted the patch-2 branch April 21, 2021 16:02
QAlexBall pushed a commit to QAlexBall/superset that referenced this pull request Dec 29, 2021
…tion (apache#13830)

* Use utils.json_iso_dttm_ser to dump jsons

If encoding is not `None`, the default encoder `utils.json_iso_dttm_ser` is not used for binary data and instead tries to encode to the default 'utf-8'. 
This fixes apache#13829

* Change to comply with tox -m pre-commit
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.2.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels need:merge The PR is ready to be merged size/XS 🚢 1.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SQL Lab] UnicodeDecodeError when querying blob data and DB has asynchronous query execution enabled
4 participants