Skip to content

python3 str to std::string conversion is not automatic #2819

@justinfx

Description

@justinfx

Refs: https://groups.google.com/d/msg/cython-users/oqk3GQ2pJ8M/-oBEvfWXDgAJ

I have a python2 project where the pyx files contain the following directive:

# cython: c_string_type=unicode, c_string_encoding=utf8

In the process of converting to python3, I am finding that even with these directive, the conversion from a python3 str is not automatically encoded to "utf8" bytes when converted to a C++ std::string:

Reproduction:
https://gist.github.com/justinfx/8023d341becc8a1092e5beacd7a249eb

In python3, this results in the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "test.pyx", line 6, in test.test
  File "stringsource", line 15, in string.from_py.__pyx_convert_string_from_py_std__in_string
TypeError: expected bytes, str found

I have tested this behaviour both in cython 0.28.5 as well as master, using all available language level values.

My expected results would be that given the directives, any implicit assignment/conversion to std::string would automatically encode to 'utf8' bytes.

My current workaround in dealing with the ton of locations where a python string is assigned to a std::string or passed to an argument, or even part of implicit map or list conversions, is to explicitly wrap each site in a conversion helper:

    # cython: c_string_type=unicode, c_string_encoding=utf8

    from libcpp.string cimport string
    from cpython.version cimport PY_MAJOR_VERSION

    cdef unicode _text(s):
        if type(s) is unicode:
            return <unicode>s

        elif PY_MAJOR_VERSION < 3 and isinstance(s, bytes):
            return (<bytes>s).decode('ascii')
        
        elif isinstance(s, unicode):
            return unicode(s)
        
        else:
            raise TypeError("Could not convert to unicode.")

    cdef string _string(basestring s) except *:
        cdef string c_str = _text(s).encode("utf-8")
        return c_str

    # ...
    self.field = _string(s)

This has been error prone since I keep overlooking hard to spot type conversions. It would be amazing for the behaviour in Cython to be updated to support automatic conversions based on my directives.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions