-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyString::to_string panics on UnicodeDecodeError #246
Comments
wkschwartz
added a commit
to wkschwartz/PyOxidizer
that referenced
this issue
Dec 19, 2020
wkschwartz
added a commit
to wkschwartz/PyOxidizer
that referenced
this issue
Dec 19, 2020
This works by adding a `path_hook` instance method to `OxidizedFinder`, which returns a path-entry finder that implements `importlib.abc.PathEntryFinder` (except deprecated APIs) and an `iter_modules` method. `OxidizedFinder.path_hook` is installed in `sys.path_hooks` if `OxidizedPythonInterpreterConfig`'s `oxidized_importer` and `filesystem_importer` are true. This patch also includes documentation and tests for everything. # KNOWN BUGS I don't feel like implementing `importlib.abc.PathEntryFinder.find_loader`, which has been deprecated since Python 3.4. If people really need the path-entry finder to have a `find_loader` method, we can crib the code from `importlib.abc.PathEntryFinder`: https://github.com/python/cpython/blob/550e4673be538d98b6ddf5550b3922539cf5c4b2/Lib/importlib/abc.py#L95-L127 `OxidizedFinder.path_hook` panics when a path contains an unpaired surrogate after the current executable path because of dgrunwald/rust-cpython#246 (PyString::to_string panics on UnicodeDecodeError).
The |
dgrunwald
added a commit
that referenced
this issue
Dec 20, 2020
…unicode object This fixes #246: panic in `PyString::to_string` and `PyString::to_string_lossy` when a Python3 unicode string contains unpaired surrogates.
dgrunwald
added a commit
that referenced
this issue
Dec 21, 2020
…unicode object This fixes #246: panic in `PyString::to_string` and `PyString::to_string_lossy` when a Python3 unicode string contains unpaired surrogates.
dgrunwald
added a commit
that referenced
this issue
Dec 21, 2020
…unicode object This fixes #246: panic in `PyString::to_string` and `PyString::to_string_lossy` when a Python3 unicode string contains unpaired surrogates.
wkschwartz
added a commit
to wkschwartz/PyOxidizer
that referenced
this issue
Dec 31, 2020
This works by adding a `path_hook` instance method to `OxidizedFinder`, which returns a path-entry finder that implements `importlib.abc.PathEntryFinder` (except deprecated APIs) and an `iter_modules` method. `OxidizedFinder.path_hook` is installed in `sys.path_hooks` if `OxidizedPythonInterpreterConfig`'s `oxidized_importer` and `filesystem_importer` are true. This patch also includes documentation and tests for everything. I don't feel like implementing `importlib.abc.PathEntryFinder.find_loader`, which has been deprecated since Python 3.4. If people really need the path-entry finder to have a `find_loader` method, we can crib the code from `importlib.abc.PathEntryFinder`: https://github.com/python/cpython/blob/550e4673be538d98b6ddf5550b3922539cf5c4b2/Lib/importlib/abc.py#L95-L127 `OxidizedFinder.path_hook` panics when a path contains an unpaired surrogate after the current executable path because of dgrunwald/rust-cpython#246 (PyString::to_string panics on UnicodeDecodeError).
wkschwartz
added a commit
to wkschwartz/PyOxidizer
that referenced
this issue
Mar 2, 2021
This works by adding a `path_hook` instance method to `OxidizedFinder`, which returns a path-entry finder that implements `importlib.abc.PathEntryFinder` (except deprecated APIs) and an `iter_modules` method. `OxidizedFinder.path_hook` is installed in `sys.path_hooks` if `OxidizedPythonInterpreterConfig`'s `oxidized_importer` and `filesystem_importer` are true. This patch also includes documentation and tests for everything. I don't feel like implementing `importlib.abc.PathEntryFinder.find_loader`, which has been deprecated since Python 3.4. If people really need the path-entry finder to have a `find_loader` method, we can crib the code from `importlib.abc.PathEntryFinder`: https://github.com/python/cpython/blob/550e4673be538d98b6ddf5550b3922539cf5c4b2/Lib/importlib/abc.py#L95-L127 `OxidizedFinder.path_hook` panics when a path contains an unpaired surrogate after the current executable path because of dgrunwald/rust-cpython#246 (PyString::to_string panics on UnicodeDecodeError).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
PyString::to_string
returns aPyResult<Cow<str>>
, so Python's raising an exception should causePyString::to_string
to returnErr(PyErr)
holding aUnicodeDecodeError
. However,PyString::to_string
panics if a Pythonstr
holds an unpaired surrogate, e.g.,"foo \ud800 bar"
The problem arises because
PyString::to_string
's implementation relies onPyString::data
, whose API is fundamentally incompatible:PyString::data
does not return aPyResult
. Thus, when the call toPyUnicode_AsUTF8AndSize
inPyString::data
raises an exception, the exception is not handled:rust-cpython/src/objects/string.rs
Lines 291 to 295 in 633dafc
The fix, I believe, is to
PyUnicode_AsUTF8AndSize
directly inPyString::to_string
's implementation and handle its exception rather than relying onPyString::data
PyString::to_string
andPyString::data
, with the latterunwrap()
ing and the former returning thePyResult
PyString::to_str(&self, py: Python) -> PyResult<&str>
, which does not have the panic bugPyString::data
may panic if the Pythonstr
is not representable as a Rust stringThe text was updated successfully, but these errors were encountered: