Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError within webob when parsing query strings with invalid URL encoded parameters #195

Closed
jkchin opened this issue Apr 13, 2015 · 8 comments

Comments

@jkchin
Copy link

jkchin commented Apr 13, 2015

UnicodeDecodeError within webob when parsing query strings with invalid URL encoded parameters
ie

test.com?debug=%FA  # %FA is an invalid utf-8 encoded string

Traceback:

  File "/opt/webapp/test/local/lib/python2.7/site-packages/webob/request.py", line 838, in GET
    vars = GetDict(data, env)
  File "/opt/webapp/test/local/lib/python2.7/site-packages/webob/multidict.py", line 287, in __init__
    MultiDict.__init__(self, data)
  File "/opt/webapp/test/local/lib/python2.7/site-packages/webob/multidict.py", line 38, in __init__
    items = list(args[0])
  File "/opt/webapp/test/local/lib/python2.7/site-packages/webob/compat.py", line 125, in parse_qsl_text
    yield (x.decode(encoding), y.decode(encoding))
  File "/opt/webapp/test/lib/python2.7/encodings/utf_8.py", line 16, in decode(input='\xfa', errors='strict')
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xfa in position 0: invalid start byte

    local variables:
    - errors: 'strict'
    - input: '\xfa'

This issue is more apparent if the URL is used as a template, and a user either programmatically fills the template URL or enters it by hand.
For example:

test.com?n=[n_value]&dept=[dept_value]&t=[t_value]

In our use case, we don't actually consume the custom query params, but rather redirect these query params to a final URL of choice for a different web application to consume. Hence we ideally want to keep the original query parameters even if we couldn't utf-8 decode them.

@dairiki
Copy link
Contributor

dairiki commented Jun 3, 2015

This is mostly a duplicate of #161, I think.

@dairiki
Copy link
Contributor

dairiki commented Jun 3, 2015

In our use case, we don't actually consume the custom query params, but rather redirect these query params to a final URL of choice for a different web application to consume. Hence we ideally want to keep the original query parameters even if we couldn't utf-8 decode them.

I think that if you want access to the invalidly-encoded parts, you're probably going to have to extract them from request.query_string yourself. I don't see a clean way for request.GET to provide the decoded values, while also providing access to the (improperly encoded) raw values.

@jkchin
Copy link
Author

jkchin commented Jun 10, 2015

@dairiki Thanks for the suggestion of using request.query_string, that make sense.

However I think one of the bigger issues is that any multidict access in the webapp can cause an error when users can type malform input in the query string. Even if I identified there's an undecodeable request.query_string and special handle it, if any other part of the webapp or middleware (that I may or may not control) does a simple multidict lookup (request.GET.get("prettyjson")), it may foil my attempts

Tying it back to my use case, I'll still be processing the request as per normal, only to forward the malformed query params as-is after I'm done. It's during the processing that any multidict lookup outside my control will cause issues.

@dairiki
Copy link
Contributor

dairiki commented Jun 10, 2015

@jkchin Agreed that having UnicodeDecodeErrors raised when request.GET is accessed is a bad thing. (That’s what #161 is about.) I’m working on a pull request which will have request.GET return a webob.multidict.NoVars instance when QUERY_STRING is mis-encoded.

@mmerickel
Copy link
Member

@dairiki This will likely not be accepted in favor of something like #115 (which you could definitely contribute to).

@dairiki
Copy link
Contributor

dairiki commented Jun 10, 2015

@mmerickel Thanks for the note!

I'm going to continue discussion on this at #161 where there is a bit more context.

@digitalresistor
Copy link
Member

Related: Pylons/pyramid#1374

@mmerickel
Copy link
Member

Closing this to consolidate the discussion in #161.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants