Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyString::to_string panics on UnicodeDecodeError #246

Closed
3 tasks
wkschwartz opened this issue Dec 17, 2020 · 1 comment · Fixed by #247
Closed
3 tasks

PyString::to_string panics on UnicodeDecodeError #246

wkschwartz opened this issue Dec 17, 2020 · 1 comment · Fixed by #247

Comments

@wkschwartz
Copy link

wkschwartz commented Dec 17, 2020

PyString::to_string returns a PyResult<Cow<str>>, so Python's raising an exception should cause PyString::to_string to return Err(PyErr) holding a UnicodeDecodeError. However, PyString::to_string panics if a Python str holds an unpaired surrogate, e.g.,

"foo \ud800 bar"

The problem arises because PyString::to_string's implementation relies on PyString::data, whose API is fundamentally incompatible: PyString::data does not return a PyResult. Thus, when the call to PyUnicode_AsUTF8AndSize in PyString::data raises an exception, the exception is not handled:

let data = ffi::PyUnicode_AsUTF8AndSize(self.as_ptr(), &mut size) as *const u8;
if data.is_null() {
PyErr::fetch(py).print(py);
panic!("PyUnicode_AsUTF8AndSize failed");
}

The fix, I believe, is to

wkschwartz added a commit to wkschwartz/PyOxidizer that referenced this issue Dec 19, 2020
wkschwartz added a commit to wkschwartz/PyOxidizer that referenced this issue Dec 19, 2020
This works by adding a `path_hook` instance method to `OxidizedFinder`, which
returns a path-entry finder that implements `importlib.abc.PathEntryFinder`
(except deprecated APIs) and an `iter_modules` method.
`OxidizedFinder.path_hook` is installed in `sys.path_hooks` if
`OxidizedPythonInterpreterConfig`'s `oxidized_importer` and
`filesystem_importer` are true.

This patch also includes documentation and tests for everything.

# KNOWN BUGS

I don't feel like implementing `importlib.abc.PathEntryFinder.find_loader`,
which has been deprecated since Python 3.4. If people really need the path-entry
finder to have a `find_loader` method, we can crib the code from
`importlib.abc.PathEntryFinder`:
https://github.com/python/cpython/blob/550e4673be538d98b6ddf5550b3922539cf5c4b2/Lib/importlib/abc.py#L95-L127

`OxidizedFinder.path_hook` panics when a path contains an unpaired
surrogate after the current executable path because of
dgrunwald/rust-cpython#246 (PyString::to_string panics on UnicodeDecodeError).
@dgrunwald
Copy link
Owner

The PyString::data API is fine, the issue is that it never got properly implemented for Python 3 (hence the TODO in the method body).
For your test string, I'd expect data() to return the PyStringData::Utf16 variant directly without attempting a conversion to UTF-8. Then PyStringData::to_string() will correctly handle the error instead of panicking.

dgrunwald added a commit that referenced this issue Dec 20, 2020
…unicode object

This fixes #246: panic in `PyString::to_string` and `PyString::to_string_lossy` when a Python3 unicode string contains unpaired surrogates.
dgrunwald added a commit that referenced this issue Dec 21, 2020
…unicode object

This fixes #246: panic in `PyString::to_string` and `PyString::to_string_lossy` when a Python3 unicode string contains unpaired surrogates.
dgrunwald added a commit that referenced this issue Dec 21, 2020
…unicode object

This fixes #246: panic in `PyString::to_string` and `PyString::to_string_lossy` when a Python3 unicode string contains unpaired surrogates.
wkschwartz added a commit to wkschwartz/PyOxidizer that referenced this issue Dec 31, 2020
This works by adding a `path_hook` instance method to `OxidizedFinder`, which
returns a path-entry finder that implements `importlib.abc.PathEntryFinder`
(except deprecated APIs) and an `iter_modules` method.
`OxidizedFinder.path_hook` is installed in `sys.path_hooks` if
`OxidizedPythonInterpreterConfig`'s `oxidized_importer` and
`filesystem_importer` are true.

This patch also includes documentation and tests for everything.

I don't feel like implementing `importlib.abc.PathEntryFinder.find_loader`,
which has been deprecated since Python 3.4. If people really need the path-entry
finder to have a `find_loader` method, we can crib the code from
`importlib.abc.PathEntryFinder`:
https://github.com/python/cpython/blob/550e4673be538d98b6ddf5550b3922539cf5c4b2/Lib/importlib/abc.py#L95-L127

`OxidizedFinder.path_hook` panics when a path contains an unpaired
surrogate after the current executable path because of
dgrunwald/rust-cpython#246 (PyString::to_string panics on UnicodeDecodeError).
wkschwartz added a commit to wkschwartz/PyOxidizer that referenced this issue Mar 2, 2021
This works by adding a `path_hook` instance method to `OxidizedFinder`, which
returns a path-entry finder that implements `importlib.abc.PathEntryFinder`
(except deprecated APIs) and an `iter_modules` method.
`OxidizedFinder.path_hook` is installed in `sys.path_hooks` if
`OxidizedPythonInterpreterConfig`'s `oxidized_importer` and
`filesystem_importer` are true.

This patch also includes documentation and tests for everything.

I don't feel like implementing `importlib.abc.PathEntryFinder.find_loader`,
which has been deprecated since Python 3.4. If people really need the path-entry
finder to have a `find_loader` method, we can crib the code from
`importlib.abc.PathEntryFinder`:
https://github.com/python/cpython/blob/550e4673be538d98b6ddf5550b3922539cf5c4b2/Lib/importlib/abc.py#L95-L127

`OxidizedFinder.path_hook` panics when a path contains an unpaired
surrogate after the current executable path because of
dgrunwald/rust-cpython#246 (PyString::to_string panics on UnicodeDecodeError).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants