Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RestXMLSerializer has problem with multi-byte unicode strings py2.7 #868

Closed
purintai opened this issue Apr 6, 2016 · 3 comments · Fixed by #888
Closed

RestXMLSerializer has problem with multi-byte unicode strings py2.7 #868

purintai opened this issue Apr 6, 2016 · 3 comments · Fixed by #888
Labels
bug This issue is a confirmed bug. unicode

Comments

@purintai
Copy link

purintai commented Apr 6, 2016

Environment:

  • Amazon Linux AMI 2016.03.0 (HVM)
  • Python: 2.7.10
  • boto3: 1.3.0
  • botocore: 1.4.9

Reproduce:

>>> import boto3
>>> client = boto3.client('s3')
>>> bucket = '<your-bucket-name>'
>>> key = u'日本語でおk'
>>> client.put_object(Bucket=bucket, Key=key)
>>> client.delete_objects(Bucket=bucket, Delete={'Objects': [{'Key': key}]})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/client.py", line 236, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/client.py", line 476, in _make_api_call
    api_params, operation_model, context=request_context)
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/client.py", line 529, in _convert_to_request_dict
    api_params, operation_model)
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/validate.py", line 271, in serialize_to_request
    operation_model)
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/serialize.py", line 415, in serialize_to_request
    serialized, shape, shape_members)
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/serialize.py", line 457, in _serialize_payload
    shape_members[payload_member])
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/serialize.py", line 532, in _serialize_body_params
    self._serialize(shape, params, pseudo_root, root_name)
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/serialize.py", line 539, in _serialize
    method(xmlnode, params, shape, name)
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/serialize.py", line 565, in _serialize_type_structure
    self._serialize(member_shape, value, structure_node, member_name)
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/serialize.py", line 539, in _serialize
    method(xmlnode, params, shape, name)
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/serialize.py", line 576, in _serialize_type_list
    self._serialize(member_shape, item, list_node, element_name)
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/serialize.py", line 539, in _serialize
    method(xmlnode, params, shape, name)
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/serialize.py", line 565, in _serialize_type_structure
    self._serialize(member_shape, value, structure_node, member_name)
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/serialize.py", line 539, in _serialize
    method(xmlnode, params, shape, name)
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/serialize.py", line 618, in _default_serialize
    node.text = str(params)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)

Otherwise, pass with multi-byte non unicode string cause another exception.

>>> client.delete_objects(Bucket=bucket, Delete={'Objects': [{'Key': '日本語でおk'}]})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/client.py", line 236, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/client.py", line 476, in _make_api_call
    api_params, operation_model, context=request_context)
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/client.py", line 529, in _convert_to_request_dict
    api_params, operation_model)
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/validate.py", line 271, in serialize_to_request
    operation_model)
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/serialize.py", line 415, in serialize_to_request
    serialized, shape, shape_members)
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/serialize.py", line 457, in _serialize_payload
    shape_members[payload_member])
  File "/home/ec2-user/Workspace/test/local/lib/python2.7/site-packages/botocore/serialize.py", line 534, in _serialize_body_params
    return ElementTree.tostring(real_root, encoding=self.DEFAULT_ENCODING)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1126, in tostring
    ElementTree(element).write(file, encoding, method=method)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 820, in write
    serialize(write, self._root, encoding, qnames, namespaces)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 937, in _serialize_xml
    write(_escape_cdata(text, encoding))
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1073, in _escape_cdata
    return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

At least, multi-byte string cannot be prohibited, I thought.
Fixes of the code will vary depending on which is right.

Needed the opinions.

@JordonPhillips JordonPhillips added bug This issue is a confirmed bug. confirmed unicode labels Apr 6, 2016
@JordonPhillips
Copy link
Contributor

I'm not sure why you deleted the body of the issue, so here's the steps to reproduce:

import boto3

bucket = 'bucket-name'
key = u'日本語でおk'
s3 = boto3.client('s3')
s3.put_object(Bucket=bucket, Key=key)

# Results in a UnicodeEncodeError on 2.7
s3.delete_object(Bucket=bucket, Key=key)

Thanks for reporting!

@purintai
Copy link
Author

purintai commented Apr 6, 2016

@JordonPhillips Thank your follow-up.
But works fine and doesn't raise exception in the delete_object function.
So delete_object function use BaseRestSerializer not RestXMLSerializer.

@jamesls
Copy link
Member

jamesls commented Apr 7, 2016

FWIW, I think this is only an issue on delete_objects (plural) as suggested in the original post. It works for me when using delete_object (singular):

>>> import botocore.session
>>> botocore.session.get_session().create_client('s3').delete_objects(Bucket='anything', Delete={'Objects': [{'Key': u'日本語でおk'}]})

JordonPhillips added a commit to JordonPhillips/botocore that referenced this issue Apr 22, 2016
In the rest xml parser, we were converting the input we recieve into a
`str`, which was causing failures on python 2 when multibyte unicode
strings were passed in. The fix is to simply use `six.text_type`, which
is `unicode` on 2 and `str` on 3. The string will be encoded into the
default encoding later on in the serializer.

Fixes boto#868
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a confirmed bug. unicode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants