New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http/wsgi.py: do not unquote path before injecting into environ #1211

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
4 participants
@jgehrcke
Contributor

jgehrcke commented Feb 24, 2016

This is a follow-up to #930, where we decided that WSGI environ['PATH_INFO'] should not be an unquoted (percent-decoded) path, but the undecoded (original) path. This was then fixed in aiohttp's WSGI implementation, see aio-libs/aiohttp#177 -- but it looks like we missed taking care of Gunicorn's WSGI implementation.

In the current aiohttp implementation we find (https://github.com/KeepSafe/aiohttp/blob/master/aiohttp/wsgi.py):

path_info = uri_parts.path
[...]

environ['PATH_INFO'] = path_info

However, in gunicorn/http/wsgi.py we currently still find:

environ['PATH_INFO'] = unquote_to_wsgi_str(path_info)

This leads to a class of problems where escaped and non-escaped parts are not easily distinguishable anymore on framework-level, as vividly clarified by the following issue: https://code.djangoproject.com/ticket/15718
There, the behavior is partially controllable in Apache's mod_wsgi by setting AllowEncodedSlashes On.

There, the issue appeared with Django+mod_wsgi. In our case, the issue appeared with Falcon+Gunicorn, but factually it's exactly the same problem.

I think the most important insight is from aio-libs/aiohttp#177

IIRC WSGI standard doesn't specifies path unquoting.

So, I went ahead and simply removed the unquoting operation (exactly as done in aiohttp's wsgi.py). I ran tests against Python 3.4, and nothing broke.

This change affects three worker types (async, gthread, sync):

$ grep -HR "wsgi.create" .
./gunicorn/workers/async.py:            resp, environ = wsgi.create(req, sock, addr,
./gunicorn/workers/gthread.py:            resp, environ = wsgi.create(req, conn.sock, conn.addr,
./gunicorn/workers/sync.py:            resp, environ = wsgi.create(req, client, addr,

I think we can just merge this, mainly motivated by the fact that aiohttp uses the same method.

Still, it would be nice to now add a test that breaks with the old behavior, and passes with the new one. However, I am not warm enough with the gunicorn test structure.

And of course the question is if applications out there in the world rely on that behavior ... :/

In our application we have to be able to distinguish real slashes from escaped slashes on framework level. That is, we have to use a custom branch of Gunicorn right now -- otherwise that information is lost once requests enter framework level.

@berkerpeksag

This comment has been minimized.

Collaborator

berkerpeksag commented Feb 24, 2016

Gunicorn (and other servers like Werkzeug) follows PEP 3333 it implicitly requires PATH_INFO and friends to be unquoted. If you don't unquote PATH_INFO, you are going to get broken PATH_INFO. See https://www.python.org/dev/peps/pep-3333/#url-reconstruction for the URL reconstruction algorithm:

...
url += quote(environ.get('SCRIPT_NAME', ''))
url += quote(environ.get('PATH_INFO', ''))
...
@jgehrcke

This comment has been minimized.

Contributor

jgehrcke commented Feb 24, 2016

@berkerpeksag yes, the problem is the "implicitly". It's not explicit, so it leaves some room for interpretation.

Obviously, certain applications require to be able to distinguish escaped from non-escaped entities in the original request path. That's why mod_wsgi introduced the configuration variable AllowEncodedSlashes.

Now, what would you propose? I try to summarize:

  • Werkzeug implements its own unquoting, and it does not look like it takes special care of slashes.
  • wsgiref in Python 3 uses env['PATH_INFO'] = urllib.parse.unquote_to_bytes(path).decode('iso-8859-1') (https://github.com/python/cpython/blob/3.5/Lib/wsgiref/simple_server.py#L85)
  • aiohttp does not do unquoting.
  • mod_wsgi seems to do unquoting with the option to leave %F2 alone.

Heavily related discussions:
pallets/werkzeug#477
pallets/werkzeug#797
pallets/flask#900
http://www.leakon.com/archives/865

It seems like double-encoding is the most reliable solution to that for applications.

@benoitc

This comment has been minimized.

Owner

benoitc commented Feb 24, 2016

application that need the RAW URI can get it from the environment variable RAW_URI in gunicorn. This how it's done since awhile. I would stick that way so we keep complying strictly to the WSGI spec. Imo we should have the same behaviour in all workers. Thoughts?

@jgehrcke

This comment has been minimized.

Contributor

jgehrcke commented Feb 24, 2016

@benoitc thanks for the pointer towards RAW_URI! After getting a better overview about this topic I concur with you saying

I would stick that way so we keep complying strictly to the WSGI spec

So, that PR can be closed, I guess. I'm not exactly sure what you mean with

we should have the same behaviour in all workers

Is that not the case as of now?

Just FYI, in our application, we are now relying on clients/API consumers to replace slashes that are supposed to be transparent to the URL template engine of our WSGI application with a %252F. Gunicorn's wsgi.py guarantees to do one level of decoding (to %2F), and we can safely perform the second encoding step (to /) after app-level URL parsing.

@tilgovi

This comment has been minimized.

Collaborator

tilgovi commented Feb 24, 2016

I think when @benoitc says that every worker should behave the same he's saying that the aio worker should do the same as the others.

Does that require aio-libs/aiohttp#343 to happen or is gunicorn unquoting before aiohttp see the request?

@tilgovi

This comment has been minimized.

Collaborator

tilgovi commented Mar 13, 2016

Closing per #1211 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment