-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
url quote/unquote with python 3 broken #164
Comments
The problem surfaces in I didn't investigate RFC 3986 much, but it seems to indicate that non-ASCII characters, when they are percent-encoded, should be translated to their UTF-8 representation. In turn, that means that when we decode an URL, we should assume that it was encoded in UTF-8. However, just changing "latin-1" for "UTF-8" breaks a few tests. I'm inclined to think the assumptions those tests make about the expected strings are wrong, but I'd like a second opinion on that. |
Here is another report for stuff related to |
Linking this to #161 |
@quantum-omega Note that I tried using the |
The problem is that in Python 3 it is not valid to provide byte values for a Here's the output if you provide a I will take a look and see if after accounting for the fact that it's a bytestring, the two functions still don't work, I haven't tested that. Just came to me that this may be a reason for the issue. |
If we use bytes the same way in both Python2 and Python3 we don't have the issue:
|
I haven't looked into this in quite a while but that corresponds to what I
would normally write to ensure Python2/3 compatibility, so it makes sense
to me that this would be the way to fix it.
However, shouldn't URIs/URLs normally contain only ASCII, with the
characters outside of that range URL-encoded, with the encoding of the
escapped values left to the interpreting program (usually UTF-8 but not
necessarily)? Here, the unicode byte sequence should not even appear in a
valid URL and instead of it, we should have "%5c" or something like that.
If we get to a point where we have UTF-8 in a URL, that means some decoding
already took place, and probably not in the right spot.
Le 9 mai 2017 03:25, "Rémy HUBSCHER" <notifications@github.com> a écrit :
… The problem is that in Python 3 it is not valid to provide byte values for
a str by backslash escaping them
If we use bytes the same way in both Python2 and Python3 we don't have the
issue:
Python 2.7.13 (default, Jan 19 2017, 14:48:08)
[GCC 6.3.0 20170118] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> url = 'http://localhost/foo\xe2\x88\xa7bar/'
>>> url.decode('utf-8')
u'http://localhost/foo\u2227bar/'
>>> print(url.decode('utf-8'))http://localhost/foo∧bar/
Python 3.5.3 (default, Jan 19 2017, 14:11:04)
[GCC 6.3.0 20170118] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> url = b'http://localhost/foo\xe2\x88\xa7bar/'
>>> url.decode('utf-8')
'http://localhost/foo∧bar/'
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#164 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA_xHL6lq5IU00pRbt0TvhvthYe36burks5r4BTNgaJpZM4CrVLJ>
.
|
Hi,
I have some trouble with th url handling of webob in python 3, after some investigation, it seems
that the issue is localed in webob.compat.url_unquote
A simple testcase to reproduce the issue that works in python 2 but not in python 3:
Regards,
The text was updated successfully, but these errors were encountered: