Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: JSON serializers #22029

Merged

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented Nov 3, 2022

SUMMARY

I'm not exactly sure why this issue didn't surface prior to #21936 when np.vectorize was being leveraged, but the JSON serialization failed when an np.array was undefined, i.e., np.array(None), i.e., in SQL Lab one would see the following,

Unserializable object None of type <class 'numpy.ndarray'>

if the column was an array but the array was null.

The issue was that the base_json_conv treated a return value of None as representing an object which wasn't able to be converted however some converted types can have a return value of None, i.e., np.array(None).tolist() is None.

The fix is to make base_json_conv raise a TypeError if the type is unknown and to reorganize the logic for the JSON serializers. See the inline comments for more details.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

Added unit tests.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@@ -581,47 +588,60 @@ def base_json_conv( # pylint: disable=inconsistent-return-statements
except Exception: # pylint: disable=broad-except
return "[bytes]"

raise TypeError(f"Unserializable object {obj} of type {type(obj)}")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this would return None however there was no way to distinguish between the conversion of np.array(None) which returns None and an unserializable object.


def json_iso_dttm_ser(obj: Any, pessimistic: bool = False) -> str:
def json_iso_dttm_ser(obj: Any, pessimistic: bool = False) -> Any:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type was wrong. Per the base_json_conv method there's a slew of viable return types.



def pessimistic_json_iso_dttm_ser(obj: Any) -> str:
def pessimistic_json_iso_dttm_ser(obj: Any) -> Any:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type was wrong. Per the base_json_conv method there's a slew of viable return types.

else:
return obj.isoformat()

try:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic has been flipped. Rather than trying the base conversion—which now throws—first and then the more specific temporal types, the logic now tries to convert the more specific and then falls back to the more generic. This ensures there's less error handling.

else:
raise TypeError("Unserializable object {} of type {}".format(obj, type(obj)))
return obj
return datetime_to_epoch(obj)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

assert isinstance(base_json_conv(uuid.uuid4()), str) is True
assert isinstance(base_json_conv(time()), str) is True
assert isinstance(base_json_conv(timedelta(0)), str) is True
assert isinstance(base_json_conv(np.bool_(1)), bool)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is True is unnecessary.

assert isinstance(base_json_conv(np.bool_(1)), bool)
assert isinstance(base_json_conv(np.int64(1)), int)
assert isinstance(base_json_conv(np.array([1, 2, 3])), list)
assert base_json_conv(np.array(None)) is None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new test.

assert isinstance(base_json_conv(bytes()), str)
assert base_json_conv(bytes("", encoding="utf-16")) == ["bytes"]

with pytest.raises(TypeError):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new test.

@@ -105,19 +111,26 @@ def test_json_iso_dttm_ser(self):
assert json_iso_dttm_ser(dttm) == dttm.isoformat()
assert json_iso_dttm_ser(dt) == dt.isoformat()
assert json_iso_dttm_ser(t) == t.isoformat()
assert json_int_dttm_ser(np.int64(1)) == 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-temporal case was never tested.

@@ -94,9 +94,15 @@ def test_json_int_dttm_ser(self):
assert json_int_dttm_ser(datetime(1970, 1, 1)) == 0
assert json_int_dttm_ser(date(1970, 1, 1)) == 0
assert json_int_dttm_ser(dttm + timedelta(milliseconds=1)) == (ts + 1)
assert json_int_dttm_ser(np.int64(1)) == 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-temporal case was never tested.

@john-bodley john-bodley marked this pull request as ready for review November 3, 2022 19:23
return datetime_to_epoch(obj)

if isinstance(obj, date):
return (obj - EPOCH.date()).total_seconds() * 1000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can use obj.timestamp() now since we are on python 3.3+.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktmud this is a date object and not a datetime object.

@john-bodley john-bodley force-pushed the john-bodley--fix-base_json_conv branch 2 times, most recently from 6acb9e1 to 3516a32 Compare November 3, 2022 22:37
@john-bodley john-bodley force-pushed the john-bodley--fix-base_json_conv branch from 3516a32 to 28cccf8 Compare November 3, 2022 23:55
@codecov
Copy link

codecov bot commented Nov 4, 2022

Codecov Report

Merging #22029 (be35ab9) into master (d3f930a) will decrease coverage by 11.44%.
The diff coverage is 63.93%.

❗ Current head be35ab9 differs from pull request most recent head 28cccf8. Consider uploading reports for the commit 28cccf8 to get more accurate results

@@             Coverage Diff             @@
##           master   #22029       +/-   ##
===========================================
- Coverage   66.96%   55.51%   -11.45%     
===========================================
  Files        1807     1813        +6     
  Lines       69222    69426      +204     
  Branches     7403     7448       +45     
===========================================
- Hits        46352    38541     -7811     
- Misses      20965    28965     +8000     
- Partials     1905     1920       +15     
Flag Coverage Δ
hive ?
mysql ?
postgres ?
presto 52.74% <39.90%> (-0.08%) ⬇️
python 57.66% <62.01%> (-23.83%) ⬇️
sqlite ?
unit 51.18% <58.17%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ackages/superset-ui-chart-controls/src/fixtures.ts 100.00% <ø> (ø)
...d/packages/superset-ui-chart-controls/src/index.ts 100.00% <ø> (ø)
...perset-ui-chart-controls/src/sections/sections.tsx 71.42% <0.00%> (-16.08%) ⬇️
...chart-controls/src/shared-controls/dndControls.tsx 58.33% <ø> (ø)
...d/packages/superset-ui-chart-controls/src/types.ts 100.00% <ø> (ø)
...kages/superset-ui-core/src/query/types/Operator.ts 100.00% <ø> (ø)
...packages/superset-ui-core/src/query/types/Query.ts 100.00% <ø> (ø)
...ackages/superset-ui-core/src/utils/featureFlags.ts 100.00% <ø> (ø)
.../legacy-plugin-chart-histogram/src/controlPanel.ts 40.00% <0.00%> (ø)
...lugin-chart-handlebars/src/plugin/controlPanel.tsx 50.00% <ø> (ø)
... and 393 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@john-bodley john-bodley merged commit 6bbf4f8 into apache:master Nov 4, 2022
john-bodley added a commit to airbnb/superset-fork that referenced this pull request Nov 4, 2022
(cherry picked from commit 6bbf4f8)
john-bodley added a commit to airbnb/superset-fork that referenced this pull request Nov 4, 2022
(cherry picked from commit 6bbf4f8)
assert isinstance(base_json_conv(uuid.uuid4()), str) is True
assert isinstance(base_json_conv(time()), str) is True
assert isinstance(base_json_conv(timedelta(0)), str) is True
assert isinstance(base_json_conv(np.bool_(1)), bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: when adding stuff to old tests like this I nowadays refactor them by 1) converting them to functional pytest tests 2) to use pytest.mark.parameterize to DRY the repeated logic.

@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 and removed 🚢 2.1.3 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants