Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 encoded strings #185

Closed
mre opened this issue Jun 17, 2018 · 4 comments
Closed

UTF-8 encoded strings #185

mre opened this issue Jun 17, 2018 · 4 comments

Comments

@mre
Copy link

mre commented Jun 17, 2018

I'm writing a json-module compatible library for encoding and decoding JSON.
The following input string works with json, but not with my module:

    def test_decodeEscape(self):
        base = '\u00e5'.encode('utf-8')
        quote = "\"".encode()
        input = quote + base + quote
        print(input)
        json.loads(input)
>       mymodule.loads(input)
E       TypeError

💥 Reproducing

This repository contains a demonstration of the issue.
The loads function takes a &str as an input parameter. Maybe I have to change that to a different type in order to avoid the TypeError or maybe cast it to a different type.

🌍 Environment

  • Operating system and version: MacOS X 10.12
  • Python version: 3.6.5
  • Installation: brew, no virtualenv
  • Rust version: rustc 1.28.0-nightly (5bf68db6e 2018-05-28)

Any idea would be very much appreciated.

@althonos
Copy link
Member

althonos commented Jun 18, 2018

It is not that your function does not accept UTF-8 encoded strings, it is that it does accept bytestrings (ìt will fail with mymodule.loads(b'abc)').

If you want to accept only bytestrings, use PyBytes as your function argument; if you want only strings, use &str or PyString; if you want to enforce unicode strings (u'abc') in Python 2, use PyUnicode; if you want to support several types of arguments, you need to accept PyObject as your function argument and typecheck dynamically yourself (I guess ?).

You could also probably do the type check on the Python side, and use functools.singledispatch to map to the correct underlying function (like _load_str and _load_bytes for instance).

@althonos
Copy link
Member

Kudos for the test repository, that's how I figured it out ! 😉

@mre
Copy link
Author

mre commented Jun 18, 2018

Thanks! I'll try that. Just saw that the official json module also accepts a PyObject it seems.

@althonos
Copy link
Member

If I'm not mistaken, all CPython functions do because there's no downcast from Python int to C int for instance, so all functions do and downcasting is done in the function body (one of the things that's handled for you in most cases in pyo3). If you try json.loads(1) you'll still get a TypeError because only str, bytes and bytearray objects are accepted by that function!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants