Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: JSON serializers #22029

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
11 changes: 8 additions & 3 deletions superset/models/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -703,9 +703,14 @@ def get_pk_constraint(
self, table_name: str, schema: Optional[str] = None
) -> Dict[str, Any]:
pk_constraint = self.inspector.get_pk_constraint(table_name, schema) or {}
return {
key: utils.base_json_conv(value) for key, value in pk_constraint.items()
}

def _convert(value: Any) -> Any:
try:
return utils.base_json_conv(value)
except TypeError:
return None

return {key: _convert(value) for key, value in pk_constraint.items()}

def get_foreign_keys(
self, table_name: str, schema: Optional[str] = None
Expand Down
76 changes: 48 additions & 28 deletions superset/utils/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -556,9 +556,16 @@ def format_timedelta(time_delta: timedelta) -> str:
return str(time_delta)


def base_json_conv( # pylint: disable=inconsistent-return-statements
obj: Any,
) -> Any:
def base_json_conv(obj: Any) -> Any:
"""
Tries to convert additional types to JSON compatible forms.

:param obj: The serializable object
:returns: The JSON compatible form
:raises TypeError: If the object cannot be serialized
:see: https://docs.python.org/3/library/json.html#encoders-and-decoders
"""

if isinstance(obj, memoryview):
obj = obj.tobytes()
if isinstance(obj, np.int64):
Expand All @@ -581,47 +588,60 @@ def base_json_conv( # pylint: disable=inconsistent-return-statements
except Exception: # pylint: disable=broad-except
return "[bytes]"

raise TypeError(f"Unserializable object {obj} of type {type(obj)}")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this would return None however there was no way to distinguish between the conversion of np.array(None) which returns None and an unserializable object.



def json_iso_dttm_ser(obj: Any, pessimistic: bool = False) -> str:
def json_iso_dttm_ser(obj: Any, pessimistic: bool = False) -> Any:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type was wrong. Per the base_json_conv method there's a slew of viable return types.

"""
json serializer that deals with dates
A JSON serializer that deals with dates by serializing them to ISO 8601.

>>> dttm = datetime(1970, 1, 1)
>>> json.dumps({'dttm': dttm}, default=json_iso_dttm_ser)
'{"dttm": "1970-01-01T00:00:00"}'
>>> json.dumps({'dttm': datetime(1970, 1, 1)}, default=json_iso_dttm_ser)
'{"dttm": "1970-01-01T00:00:00"}'

:param obj: The serializable object
:param pessimistic: Whether to be pessimistic regarding serialization
:returns: The JSON compatible form
:raises TypeError: If the non-pessimistic object cannot be serialized
"""
val = base_json_conv(obj)
if val is not None:
return val

if isinstance(obj, (datetime, date, pd.Timestamp)):
obj = obj.isoformat()
else:
return obj.isoformat()

try:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic has been flipped. Rather than trying the base conversion—which now throws—first and then the more specific temporal types, the logic now tries to convert the more specific and then falls back to the more generic. This ensures there's less error handling.

return base_json_conv(obj)
except TypeError as ex:
if pessimistic:
return "Unserializable [{}]".format(type(obj))
return f"Unserializable [{type(obj)}]"

raise TypeError("Unserializable object {} of type {}".format(obj, type(obj)))
return obj
raise ex


def pessimistic_json_iso_dttm_ser(obj: Any) -> str:
def pessimistic_json_iso_dttm_ser(obj: Any) -> Any:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type was wrong. Per the base_json_conv method there's a slew of viable return types.

"""Proxy to call json_iso_dttm_ser in a pessimistic way

If one of object is not serializable to json, it will still succeed"""
return json_iso_dttm_ser(obj, pessimistic=True)


def json_int_dttm_ser(obj: Any) -> float:
"""json serializer that deals with dates"""
val = base_json_conv(obj)
if val is not None:
return val
def json_int_dttm_ser(obj: Any) -> Any:
"""
A JSON serializer that deals with dates by serializing them to EPOCH.

>>> json.dumps({'dttm': datetime(1970, 1, 1)}, default=json_int_dttm_ser)
'{"dttm": 0.0}'

:param obj: The serializable object
:returns: The JSON compatible form
:raises TypeError: If the object cannot be serialized
"""

if isinstance(obj, (datetime, pd.Timestamp)):
obj = datetime_to_epoch(obj)
elif isinstance(obj, date):
obj = (obj - EPOCH.date()).total_seconds() * 1000
else:
raise TypeError("Unserializable object {} of type {}".format(obj, type(obj)))
return obj
return datetime_to_epoch(obj)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.


if isinstance(obj, date):
return (obj - EPOCH.date()).total_seconds() * 1000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can use obj.timestamp() now since we are on python 3.3+.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktmud this is a date object and not a datetime object.


return base_json_conv(obj)


def json_dumps_w_dates(payload: Dict[Any, Any], sort_keys: bool = False) -> str:
Expand Down
33 changes: 23 additions & 10 deletions tests/integration_tests/utils_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,9 +94,10 @@ def test_json_int_dttm_ser(self):
assert json_int_dttm_ser(datetime(1970, 1, 1)) == 0
assert json_int_dttm_ser(date(1970, 1, 1)) == 0
assert json_int_dttm_ser(dttm + timedelta(milliseconds=1)) == (ts + 1)
assert json_int_dttm_ser(np.int64(1)) == 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-temporal case was never tested.


with self.assertRaises(TypeError):
json_int_dttm_ser("this is not a date")
json_int_dttm_ser(np.datetime64())

def test_json_iso_dttm_ser(self):
dttm = datetime(2020, 1, 1)
Expand All @@ -105,19 +106,31 @@ def test_json_iso_dttm_ser(self):
assert json_iso_dttm_ser(dttm) == dttm.isoformat()
assert json_iso_dttm_ser(dt) == dt.isoformat()
assert json_iso_dttm_ser(t) == t.isoformat()
assert json_iso_dttm_ser(np.int64(1)) == 1

assert (
json_iso_dttm_ser(np.datetime64(), pessimistic=True)
== "Unserializable [<class 'numpy.datetime64'>]"
)

with self.assertRaises(TypeError):
json_iso_dttm_ser("this is not a date")
json_iso_dttm_ser(np.datetime64())

def test_base_json_conv(self):
assert isinstance(base_json_conv(np.bool_(1)), bool) is True
assert isinstance(base_json_conv(np.int64(1)), int) is True
assert isinstance(base_json_conv(np.array([1, 2, 3])), list) is True
assert isinstance(base_json_conv(set([1])), list) is True
assert isinstance(base_json_conv(Decimal("1.0")), float) is True
assert isinstance(base_json_conv(uuid.uuid4()), str) is True
assert isinstance(base_json_conv(time()), str) is True
assert isinstance(base_json_conv(timedelta(0)), str) is True
assert isinstance(base_json_conv(np.bool_(1)), bool)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is True is unnecessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: when adding stuff to old tests like this I nowadays refactor them by 1) converting them to functional pytest tests 2) to use pytest.mark.parameterize to DRY the repeated logic.

assert isinstance(base_json_conv(np.int64(1)), int)
assert isinstance(base_json_conv(np.array([1, 2, 3])), list)
assert base_json_conv(np.array(None)) is None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new test.

assert isinstance(base_json_conv(set([1])), list)
assert isinstance(base_json_conv(Decimal("1.0")), float)
assert isinstance(base_json_conv(uuid.uuid4()), str)
assert isinstance(base_json_conv(time()), str)
assert isinstance(base_json_conv(timedelta(0)), str)
assert isinstance(base_json_conv(bytes()), str)
assert base_json_conv(bytes("", encoding="utf-16")) == "[bytes]"

with pytest.raises(TypeError):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new test.

base_json_conv(np.datetime64())

def test_zlib_compression(self):
json_str = '{"test": 1}'
Expand Down