Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3 str to std::string conversion is not automatic #2819

Closed
justinfx opened this issue Jan 31, 2019 · 8 comments
Closed

python3 str to std::string conversion is not automatic #2819

justinfx opened this issue Jan 31, 2019 · 8 comments

Comments

@justinfx
Copy link

Refs: https://groups.google.com/d/msg/cython-users/oqk3GQ2pJ8M/-oBEvfWXDgAJ

I have a python2 project where the pyx files contain the following directive:

# cython: c_string_type=unicode, c_string_encoding=utf8

In the process of converting to python3, I am finding that even with these directive, the conversion from a python3 str is not automatically encoded to "utf8" bytes when converted to a C++ std::string:

Reproduction:
https://gist.github.com/justinfx/8023d341becc8a1092e5beacd7a249eb

In python3, this results in the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "test.pyx", line 6, in test.test
  File "stringsource", line 15, in string.from_py.__pyx_convert_string_from_py_std__in_string
TypeError: expected bytes, str found

I have tested this behaviour both in cython 0.28.5 as well as master, using all available language level values.

My expected results would be that given the directives, any implicit assignment/conversion to std::string would automatically encode to 'utf8' bytes.

My current workaround in dealing with the ton of locations where a python string is assigned to a std::string or passed to an argument, or even part of implicit map or list conversions, is to explicitly wrap each site in a conversion helper:

    # cython: c_string_type=unicode, c_string_encoding=utf8

    from libcpp.string cimport string
    from cpython.version cimport PY_MAJOR_VERSION

    cdef unicode _text(s):
        if type(s) is unicode:
            return <unicode>s

        elif PY_MAJOR_VERSION < 3 and isinstance(s, bytes):
            return (<bytes>s).decode('ascii')
        
        elif isinstance(s, unicode):
            return unicode(s)
        
        else:
            raise TypeError("Could not convert to unicode.")

    cdef string _string(basestring s) except *:
        cdef string c_str = _text(s).encode("utf-8")
        return c_str

    # ...
    self.field = _string(s)

This has been error prone since I keep overlooking hard to spot type conversions. It would be amazing for the behaviour in Cython to be updated to support automatic conversions based on my directives.

@scoder
Copy link
Contributor

scoder commented Feb 1, 2019

Just guessing, but it might be the setting of __PYX_DEFAULT_STRING_ENCODING_IS_DEFAULT in the generated code. For the c_string_encoding names UTF-8/utf8/…, it should be true in Py3, where the default encoding is always utf8.

Could you try changing __PYX_DEFAULT_STRING_ENCODING_IS_DEFAULT to 1 (it should be 0 now) in the generated C file and try that directly under Py3, without running Cython again?

robertwb added a commit that referenced this issue Feb 3, 2019
@justinfx
Copy link
Author

justinfx commented Feb 4, 2019 via email

@justinfx
Copy link
Author

@scoder, thanks for this! Is this going to be released in a 0.29.6 patch or is it really waiting for the 3.0.0 tag? Just wondering since I am blocked on this fix.

@scoder
Copy link
Contributor

scoder commented Feb 16, 2019

I think it's on the edge. It sort-of falls into the "new feature" bucket, and I find it a bit difficult to decide whether it's safe to throw at users in a point release, since it changes the behaviour of existing code in a potentially unexpected way. There might be code out there that deals with this use case in its own way already, and I do not know what such code would do with the new auto-conversion.

@justinfx
Copy link
Author

justinfx commented Feb 16, 2019

I can see from my current approach of manually handling this problem that I am preventing the std::string from receiving a direct assignment of a python string in all cases to avoid a crash. So if this new patch were introduced to my code, I would end up continuing to step around it until I removed the conversion utility. Maybe I am missing something, but without this fix it feels like a bug when you go from working behavior in py2 to crashing in py3, even when the directives imply that the conversion should work. Seems like a pretty important fix for anyone converting to py3 and hitting tons and tons of string conversions. Before this fix, I am still scared that I have missed implicit conversions with args being passed around. I went and removed all the explicit std::string static argument annotations to avoid accidentally passing a py str before its manually converted.

If it's a 3.0.0 thing, then I just have to manually handle my dependency against a commit. My company is behind a proxy and we request local packaging from 3rd party releases that track a semver.

@robertwb
Copy link
Contributor

It is somewhat close to the edge, but I think it falls in the bugfix bucket, as this was the obvious intent, and it changes an exception being raised into the desired behavior. So I think it's safe for a point release.

scoder pushed a commit that referenced this issue Feb 19, 2019
@scoder scoder modified the milestones: 3.0, 0.29.6 Feb 19, 2019
@scoder
Copy link
Contributor

scoder commented Feb 19, 2019

Ok, fingers crossed.

ml-evs added a commit to force-h2020/acadopy that referenced this issue May 3, 2019
* Added default ACADO install locations to include/library dirs

* Added some installation instructions and requirements

* Added Dockerfile for centos ACADO+acadopy build

* Added Travis

* Added language level directives and switched to relative cimports

* Added string conversion directives to circumvent cython/cython#2819

* Define __truediv__ as well as __div__ and test division

* Updated Cython property syntax and other formatting changes

* Added rocket flight example to tests

* Added interface to return value comparisons

* Removed unused eigency dependency
@kylemcdonald
Copy link

I ran into this while wrapping C++ code that. Adding this line to the top of my .pyx wrapper file fixed my problem:

# cython: c_string_type=unicode, c_string_encoding=utf8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants