Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use urlsplit on request path #260

Closed
digitalresistor opened this issue Aug 27, 2019 · 9 comments · Fixed by #261
Closed

Don't use urlsplit on request path #260

digitalresistor opened this issue Aug 27, 2019 · 9 comments · Fixed by #261

Comments

@digitalresistor
Copy link
Member

digitalresistor commented Aug 27, 2019

scheme, netloc, path, query, fragment = urlparse.urlsplit(uri)

This leads to things like:

>>> from urllib import parse as urlparse
>>> urlparse.urlsplit('//testing/whatever')
SplitResult(scheme='', netloc='testing', path='/whatever', query='', fragment='')

Which means we accidentally drop testing before sending it on to the WSGI application. Ask me later how I figured that out.

A request such as:

GET //testing/whatever HTTP/1.1

Is perfectly valid. Non-sensical maybe, but perfectly valid.

@mmerickel
Copy link
Member

That just looks like a scheme-relative url. Not sure why you’re passing the path to urlsplit instead of the whole url.

Does pep3333 say servers should collapse / in urls before sending them to the app? Is that why this issue is in waitress?

@digitalresistor
Copy link
Member Author

digitalresistor commented Aug 27, 2019

@mmerickel there is no whole URL when parsing a GET request. You would have to re-constitute a whole URL.

However, the following is ALSO a legal GET request, and is likely why the urlsplit exists in the first place:

GET http://example.com/testing/whatever HTTP/1.0

This form is only ever used when the server is a proxy server, and it is not likely to exist in the wild. But that means both cases do need to get handled.

As for it being a scheme relative URL, it is not, the URL that was accessed by the client was:

http://127.0.0.1//testing/whatever

Which gets turned into this by curl and friends:

GET //testing/whatever HTTP/1.1
Host: 127.0.0.1
User-Agent: curl/7.54.0
Accept: */*

@digitalresistor
Copy link
Member Author

waitress/waitress/parser.py

Lines 200 to 208 in 94e2311

command, uri, version = crack_first_line(first_line)
version = tostr(version)
command = tostr(command)
self.command = command
self.version = version
(self.proxy_scheme,
self.proxy_netloc,
self.path,
self.query, self.fragment) = split_uri(uri)

Specifically this is the uri get from crack_first_line(), we don't even have any information on any other headers yet.

@mmerickel
Copy link
Member

ok well if you have some context that says it's in the status line of a request then you know you don't support relative urls so you can normalize it right?

@mmerickel
Copy link
Member

I wasn't clear from your original issue that this is specifically about cracking the first line of a request.

@mmerickel
Copy link
Member

You have since edited it to be more clear, but I read it before the edits. :p

@digitalresistor
Copy link
Member Author

It was 02:15 in the morning when I first found the issue and added this issue to the issue tracker. I realize that I had a bunch of additional context in my head that was not in the issue. Sorry for not making it more clear.

The issue here is that we crack the first line, and then pass that uri from that to urlsplit. If we are a proxy server then that makes sense, because we will get a full URL and urlsplit will do its magic correctly. If however we have just a path + query string, then urlsplit fails miserably when the path starts with two /'s because it assumes the first part is the netloc.

Basically this needs to change to:

  1. See if uri starts with "/", then split on ? and #
  2. If not the above, then use urlsplit

@mmerickel
Copy link
Member

Is it not enough to check specifically for the uri.startswith('//') situation and drop the first / before passing it to urlsplit?

@digitalresistor
Copy link
Member Author

@mmerickel and then add it back before setting it as the path? It is valid to send //testing/whatever as the PATH_INFO to the WSGI application...

digitalresistor added a commit that referenced this issue Aug 27, 2019
digitalresistor added a commit that referenced this issue Aug 27, 2019
The HTTP spec states that it is acceptable to send a request like:

GET //whatever/testing HTTP/1.1

This should get properly conveyed to the WSGI application, but due to
the way that urlsplit works in the standard library this did not happen
correctly. With this fix we pass through the original path as requested
by the client, and the WSGI application will be responsible for
collapsing multiple empty path segments as necessary.

Fixes #260
digitalresistor added a commit that referenced this issue Aug 27, 2019
The HTTP spec states that it is acceptable to send a request like:

GET //whatever/testing HTTP/1.1

This should get properly conveyed to the WSGI application, but due to
the way that urlsplit works in the standard library this did not happen
correctly. With this fix we pass through the original path as requested
by the client, and the WSGI application will be responsible for
collapsing multiple empty path segments as necessary.

Fixes #260
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants