-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Refs: https://groups.google.com/d/msg/cython-users/oqk3GQ2pJ8M/-oBEvfWXDgAJ
I have a python2 project where the pyx files contain the following directive:
# cython: c_string_type=unicode, c_string_encoding=utf8
In the process of converting to python3, I am finding that even with these directive, the conversion from a python3 str is not automatically encoded to "utf8" bytes when converted to a C++ std::string:
Reproduction:
https://gist.github.com/justinfx/8023d341becc8a1092e5beacd7a249eb
In python3, this results in the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "test.pyx", line 6, in test.test
File "stringsource", line 15, in string.from_py.__pyx_convert_string_from_py_std__in_string
TypeError: expected bytes, str found
I have tested this behaviour both in cython 0.28.5 as well as master, using all available language level values.
My expected results would be that given the directives, any implicit assignment/conversion to std::string would automatically encode to 'utf8' bytes.
My current workaround in dealing with the ton of locations where a python string is assigned to a std::string or passed to an argument, or even part of implicit map or list conversions, is to explicitly wrap each site in a conversion helper:
# cython: c_string_type=unicode, c_string_encoding=utf8
from libcpp.string cimport string
from cpython.version cimport PY_MAJOR_VERSION
cdef unicode _text(s):
if type(s) is unicode:
return <unicode>s
elif PY_MAJOR_VERSION < 3 and isinstance(s, bytes):
return (<bytes>s).decode('ascii')
elif isinstance(s, unicode):
return unicode(s)
else:
raise TypeError("Could not convert to unicode.")
cdef string _string(basestring s) except *:
cdef string c_str = _text(s).encode("utf-8")
return c_str
# ...
self.field = _string(s)
This has been error prone since I keep overlooking hard to spot type conversions. It would be amazing for the behaviour in Cython to be updated to support automatic conversions based on my directives.