-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode handling meta-issue [JIRA: CLIENTS-791] #334
Comments
I'm happy to help here if I can. Do we have some uniform policy we use in the other clients? Say Java? |
The plan I'm thinking of is this:
|
I suppose the issue with 2. is that if there is already binary data written, we won't be able to decode it as UTF-8 when we read. Would we want/need a 'bypass' decoder that returns |
I agree with your reservations about 2. Without roundtrip charset information however, we can't be sure of anything. As of now, we just puke back at the user if they give us a |
Not sure I completely followed this. |
https://github.com/basho/riak-python-client/blob/master/riak/riak_object.py#L113-L117 |
Oh yes, right. Get it now. We're certainly in a place where incremental improvement 'is a thing'. |
Worth mentioning: we've explicitly disallowed the creation of non-ascii YZ indexes until we sort out filesystem compatibility and Riak's internal handling. I imagine the clients will just hand back the (not terribly informative) error message, but fyi. |
Thanks @macintux. Edited description. |
I'm not sure the client should block it, lest we have to coordinate when we fix it on the backend. |
@macintux What does the error coming back look like then? |
"Invalid character in index name " via Webmachine. I don't know off-hand what the PB interface does; the error tuple internally is |
What about just requiring that the user always pass in a |
The discussion here focuses on unicode keys, but issue #32 which was closed in favour of this one is about general binary keys. Currently, we have to wrap all keys passed to the library in a very hacky class, as unfortunately the insistence on doing encoding in the library gets in the way: class BinaryString(str):
def encode(self, ignored):
return self Riak's PBC API (message RpbGetReq) makes the key field It would be really great if there was a way to specify |
@mcobzarenco Yes, that is the goal. |
Any plans on actually fixing this? |
@cread - can you provide information (ideally code as well) of how this issue affects you? The reason I ask is that I am not very familiar, at this point, with Python's string handling with regard to encoding vs. bytes. Python 2 vs Python 3 complicates things further. Thank you! |
The problem is that we're using a uint64 as our riak object key. Up until now we've been using raw protobuf in C++ to communicate with riak and have not had any problems. We're now trying to write a helper utility in Python but get an error like this: >>> r = riak.RiakClient()
>>> r._protocol
'pbc'
>>> bucket = r.bucket('mybucket')
>>> key = '\xfd\x01U\xc3\x02\x00\x00\x00'
>>> obj = bucket.get(key)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "build/bdist.linux-x86_64/egg/riak/bucket.py", line 231, in get
File "build/bdist.linux-x86_64/egg/riak/riak_object.py", line 110, in __init__
TypeError: Unicode keys are not supported. |
Using C++, you've been using the bytes representing the I recommend using Python 3 since the check for "Unicode" keys is only for Python 2 (see here). I have forwarded this issue on to product management to consider its priority. |
Correct. Thanks, moving to Python 3 for this is not an option for us at the moment. |
While you're talking to product management, can you also ask them about this issue too: basho/riak#789 Thanks again! |
We need to review and have a policy for all handling of Unicode character data. This issue consolidates and supercedes:
Key issues:
The text was updated successfully, but these errors were encountered: