UnicodeEncodeError when using traversal and url is quoted #352

Closed
ghost opened this Issue Nov 15, 2011 · 2 comments

Comments

Projects
None yet
2 participants
@ghost

ghost commented Nov 15, 2011

A quoted URL like http://localhost:6543/CustomerID/%C4%84%C5%81%C3%93$! causes this error:

File "/home/mikado/mydevenv/env/lib/python2.6/site-packages/pyramid-1.2.1-py2.6.egg/pyramid/traversal.py", line 478, in traversal_path
path = path.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 20-22: ordinal not in range(128)

I've found that path is unquoted in httpserver.py line 180 - WSGIHandlerMixin.wsgi_setup() from that moment environ['PATH_INFO'] is unquoted. Traversal docs says "Each segment is URL-unquoted, and subsequently decoded
into Unicode. Each segment is assumed to be encoded using the UTF-8 encoding (or a subset, such as ASCII)".
Well, it's only unquoted and not encoded to utf-8.

In my patch I just force to encode('utf-8') instead 'ascii' in traversal.py traversal_path() but I'm not sure if that's a right way of doing it.

Owner

mcdonc commented Nov 18, 2011

I'm pretty sure I don't understand. Paste's httpserver.py just calls "urllib.unquote(path)". Assuming "path" is a string and not Unicode, it is not converted to Unicode by urllib.unquote:

>>> import urllib
>>> urllib.unquote('/CustomerID/%C4%84%C5%81%C3%93$!')
'/CustomerID/\xc4\x84\xc5\x81\xc3\x93$!'

So I still have no idea how the condition "if isinstance(path, unicode)" is triggered in traversal_path. The change you've made isn't the right fix; the real one is to figure out why Pyramid is receiving Unicode as PATH_INFO. PATH_INFO should never be Unicode.

Pyramid 1.2.3 solved the issue.

mcdonc closed this Nov 27, 2011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment