Skip to content

Encoding issue in request.py #464

@rjstanford

Description

@rjstanford

I'm not entirely sure what the intent is here so hesitate to file a PR. We saw some errors thrown by our webapp (using gunicorn) and traced it to request.encget():

  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/webob/request.py", line 495, in url
    url = self.path_url
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/webob/request.py", line 467, in path_url
    bpath_info = bytes_(self.path_info, self.url_encoding)
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/webob/descriptors.py", line 70, in fget
    return req.encget(key, encattr=encattr)
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/webob/request.py", line 165, in encget
    return bytes_(val, 'latin-1').decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 66: invalid start byte"

My read of util.byte_ is that, when passed a string, it performs val.encode() on it. So the following code in encget():

return bytes_(val, "latin-1").decode(encoding)

is the same as doing:

return val.encode("latin-1", "strict").decode(encoding)

Based on our exception we can see that the value of encoding is "utf-8", which gives us:

return val.encode("latin-1", "strict").decode("utf-8")

or with a specific example that will fail:

x = "À".encode('latin-1').decode('utf-8')

I'm not sure why we'd ever be explicitly encoding a string as latin-1 and then decoding it as UTF-8 in the first place -- a simpler return val.encode(encoding) would seem more appropriate here -- but again, there's probably nuance that I'm not understanding, hence the issue report.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions