Bad unicode characters not handled correctly #84

olliemath · 2020-01-03T16:07:25Z

When parsing unicode strings with non-ascii characters the ValueError is not built correctly. In CPython this manifests itself as an empty ValueError, whereas in PyPy3 it actually causes a segfault.

For example

from ciso8601 import parse_datetime

try:
    parse_datetime("2019🐵01🐵01")
except ValueError as e:
    assert e.args and "Invalid character" in e.args[0]

will fail with either a segfault or an assertion error depending on your interpreter.

In the real world this was seen with non-ascii dashes - e.g. for "2019—01—01"

The text was updated successfully, but these errors were encountered:

olliemath · 2020-01-03T16:12:26Z

This seems to be caused by a non-ascii character ending up in the PyErr_Format function here: https://github.com/closeio/ciso8601/blob/master/module.c#L76 (also further up at line 58) - in particular replacing %c by %d whenever c is outside of the 0-256 range gives us a detailed error again (and stops segfaults in pypy3)

movermeyer mentioned this issue Jan 4, 2020

Add handling for non-ASCII characters in error messages #85

Merged

movermeyer self-assigned this Jan 4, 2020

movermeyer closed this as completed in #85 Jan 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad unicode characters not handled correctly #84

Bad unicode characters not handled correctly #84

olliemath commented Jan 3, 2020

olliemath commented Jan 3, 2020

Bad unicode characters not handled correctly #84

Bad unicode characters not handled correctly #84

Comments

olliemath commented Jan 3, 2020

olliemath commented Jan 3, 2020