Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad unicode characters not handled correctly #84

Closed
olliemath opened this issue Jan 3, 2020 · 1 comment · Fixed by #85
Closed

Bad unicode characters not handled correctly #84

olliemath opened this issue Jan 3, 2020 · 1 comment · Fixed by #85
Assignees

Comments

@olliemath
Copy link

When parsing unicode strings with non-ascii characters the ValueError is not built correctly. In CPython this manifests itself as an empty ValueError, whereas in PyPy3 it actually causes a segfault.

For example

from ciso8601 import parse_datetime

try:
    parse_datetime("2019🐵01🐵01")
except ValueError as e:
    assert e.args and "Invalid character" in e.args[0]

will fail with either a segfault or an assertion error depending on your interpreter.

In the real world this was seen with non-ascii dashes - e.g. for "2019—01—01"

@olliemath
Copy link
Author

This seems to be caused by a non-ascii character ending up in the PyErr_Format function here: https://github.com/closeio/ciso8601/blob/master/module.c#L76 (also further up at line 58) - in particular replacing %c by %d whenever c is outside of the 0-256 range gives us a detailed error again (and stops segfaults in pypy3)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants