UnicodeEncodeError when using traversal and url is quoted #352

Nov 15, 2011

A quoted URL like http://localhost:6543/CustomerID/%C4%84%C5%81%C3%93$! causes this error:

File "/home/mikado/mydevenv/env/lib/python2.6/site-packages/pyramid-1.2.1-py2.6.egg/pyramid/traversal.py", line 478, in traversal_path
path = path.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 20-22: ordinal not in range(128)

I've found that path is unquoted in httpserver.py line 180 - WSGIHandlerMixin.wsgi_setup() from that moment environ['PATH_INFO'] is unquoted. Traversal docs says "Each segment is URL-unquoted, and subsequently decoded
into Unicode. Each segment is assumed to be encoded using the UTF-8 encoding (or a subset, such as ASCII)".
Well, it's only unquoted and not encoded to utf-8.

In my patch I just force to encode('utf-8') instead 'ascii' in traversal.py traversal_path() but I'm not sure if that's a right way of doing it.


Nov 18, 2011

I'm pretty sure I don't understand. Paste's httpserver.py just calls "urllib.unquote(path)". Assuming "path" is a string and not Unicode, it is not converted to Unicode by urllib.unquote:

>>> import urllib
>>> urllib.unquote('/CustomerID/%C4%84%C5%81%C3%93$!')

So I still have no idea how the condition "if isinstance(path, unicode)" is triggered in traversal_path. The change you've made isn't the right fix; the real one is to figure out why Pyramid is receiving Unicode as PATH_INFO. PATH_INFO should never be Unicode.

Pyramid 1.2.3 solved the issue.

Nov 27, 2011

