Skip to content
This repository has been archived by the owner on May 10, 2024. It is now read-only.

key.set_remote_data escapes spaces in Cache-Control metadata #2536

Open
kevinw opened this issue Aug 19, 2014 · 3 comments
Open

key.set_remote_data escapes spaces in Cache-Control metadata #2536

kevinw opened this issue Aug 19, 2014 · 3 comments

Comments

@kevinw
Copy link

kevinw commented Aug 19, 2014

I think setting metadata on buckets is escaping values a bit too aggressively.

Using key.set_remote_metadata:

CACHE_CONTROL_MAX = 'max-age=%d, public' % (3600 * 24 * 360 * 2)
k.set_remote_metadata({'Cache-Control': CACHE_CONTROL_MAX}, {}, preserve_acl=True)

and then checking the headers:

→ curl -I $URL
HTTP/1.1 200 OK
Date: Tue, 19 Aug 2014 23:46:01 GMT
Cache-Control: max-age=62208000,%20public
Last-Modified: Tue, 19 Aug 2014 23:28:46 GMT

That extra %20 isn't right--I don't think spaces should be escaped here.

@kevinw kevinw changed the title key.set_remote_data with spaces in Cache-Control metadata key.set_remote_data escapes spaces in Cache-Control metadata Aug 19, 2014
@opottone
Copy link
Contributor

Agreed, encoding space as %20 does not seem correct.

This seems to all system metadata, not only Cache-Control.

When I tried to reproduce this, the bug only happens when the argument is unicode; spaces in str are not encoded.

>>> k = bucket.new_key('test')
>>> k.set_remote_metadata({'Cache-Control': 'max-age=3600; public'}, {}, True)
>>> k = bucket.get_key('test')
>>> k.cache_control
'max-age=3600; public'
>>> 
>>> k = bucket.new_key('test')
>>> k.set_remote_metadata({'Cache-Control': u'max-age=3600; public'}, {}, True)
>>> k = bucket.get_key('test')
>>> k.cache_control
'max-age=3600;%20public'

This is of course with Python 2 which has separate str and unicode types. Python 3 does not, so things are different there. @kevinw it seems that your example is with Python 3, right?

As a separate but related issue, set_metadata() does even more unexpected percent-encoding, but again only for unicode. This is more or less same as #1469.

>>> k = bucket.new_key('test')
>>> k.set_metadata('Content-Type', 'text/plain; charset=utf-8')
>>> k.set_metadata('Cache-Control', 'max-age=3600; public')
>>> k.set_contents_from_string('')
>>> k = bucket.get_key('test')
>>> k.content_type
'text/plain; charset=utf-8'
>>> k.cache_control
'max-age=3600; public'
>>> 
>>> k = bucket.new_key('test')
>>> k.set_metadata('Content-Type', u'text/plain; charset=utf-8')
>>> k.set_metadata('Cache-Control', u'max-age=3600; public')
>>> k.set_contents_from_string('')
>>> k = bucket.get_key('test')
>>> k.content_type
'text%2Fplain%3B+charset%3Dutf-8'
>>> k.cache_control
'max-age%3D3600%3B+public'

It seems that the original problem can be fixed simply by adding space to the string safe in line 374 in boto/connection.py, but @danielgtaylor told me that there are some backwards compatibility issues, and tests need to be updated; see #2477 (comment).

@kevinw
Copy link
Author

kevinw commented Aug 26, 2014

@opottone I'm using Python 2, but with unicode_literals from __future__.

Using byte strings instead does avoid the bug for me too, thanks.

@wadey
Copy link

wadey commented Sep 13, 2016

I ran into the same bug when trying to copy a key using using initiate_multipart_upload:

mp = bucket.initiate_multipart_upload(key.name, metadata=key.metadata)
…

I had to work around it by doing this:

metadata = {}
for k, v in key.metadata.iteritems():
    metadata[k] = v.encode('utf8')
mp = bucket.initiate_multipart_upload(key.name, metadata=metadata)
…

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants