Skip to content

better handling of strange url #31

Open
@flyingeek

Description

@flyingeek

Hello,

I am using Django and URLObject, I encounter some UnidecodeEncodeError due to the use of URLObject with some invalid URLs (coming from search engines).

>>> from urlobject.urlobject import QueryString
>>> qs = QueryString(u's=glaci%E8re')
>>> qs.list
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/Eric/Python/Env/lpdc/lib/python2.7/site-packages/urlobject/query_string.py", line 35, in list
    value = qs_decode(value)
  File "/Users/Eric/Python/Env/lpdc/lib/python2.7/site-packages/urlobject/query_string.py", line 138, in _qs_decode_py2
    return urllib.unquote_plus(s).decode('utf-8')
  File "/Users/Eric/Python/Env/lpdc/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe8 in position 5: invalid continuation byte

A partial solution would be:

def _qs_decode_py2(s):
    """Unquote unicode or str using query string rules."""
    if isinstance(s, unicode):
        s = s.encode('utf-8')
    return urllib.unquote_plus(s).decode('utf-8', errors='replace')

But I don't know for py3.

For information Django also does replace when handling query_string.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions