Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use urlsplit on request path #260

Closed
bertjwregeer opened this issue Aug 27, 2019 · 9 comments · Fixed by #261

Comments

@bertjwregeer
Copy link
Member

commented Aug 27, 2019

scheme, netloc, path, query, fragment = urlparse.urlsplit(uri)

This leads to things like:

>>> from urllib import parse as urlparse
>>> urlparse.urlsplit('//testing/whatever')
SplitResult(scheme='', netloc='testing', path='/whatever', query='', fragment='')

Which means we accidentally drop testing before sending it on to the WSGI application. Ask me later how I figured that out.

A request such as:

GET //testing/whatever HTTP/1.1

Is perfectly valid. Non-sensical maybe, but perfectly valid.

@mmerickel

This comment has been minimized.

Copy link
Member

commented Aug 27, 2019

That just looks like a scheme-relative url. Not sure why you’re passing the path to urlsplit instead of the whole url.

Does pep3333 say servers should collapse / in urls before sending them to the app? Is that why this issue is in waitress?

@bertjwregeer

This comment has been minimized.

Copy link
Member Author

commented Aug 27, 2019

@mmerickel there is no whole URL when parsing a GET request. You would have to re-constitute a whole URL.

However, the following is ALSO a legal GET request, and is likely why the urlsplit exists in the first place:

GET http://example.com/testing/whatever HTTP/1.0

This form is only ever used when the server is a proxy server, and it is not likely to exist in the wild. But that means both cases do need to get handled.

As for it being a scheme relative URL, it is not, the URL that was accessed by the client was:

http://127.0.0.1//testing/whatever

Which gets turned into this by curl and friends:

GET //testing/whatever HTTP/1.1
Host: 127.0.0.1
User-Agent: curl/7.54.0
Accept: */*
@bertjwregeer

This comment has been minimized.

Copy link
Member Author

commented Aug 27, 2019

waitress/waitress/parser.py

Lines 200 to 208 in 94e2311

command, uri, version = crack_first_line(first_line)
version = tostr(version)
command = tostr(command)
self.command = command
self.version = version
(self.proxy_scheme,
self.proxy_netloc,
self.path,
self.query, self.fragment) = split_uri(uri)

Specifically this is the uri get from crack_first_line(), we don't even have any information on any other headers yet.

@mmerickel

This comment has been minimized.

Copy link
Member

commented Aug 27, 2019

ok well if you have some context that says it's in the status line of a request then you know you don't support relative urls so you can normalize it right?

@mmerickel

This comment has been minimized.

Copy link
Member

commented Aug 27, 2019

I wasn't clear from your original issue that this is specifically about cracking the first line of a request.

@mmerickel

This comment has been minimized.

Copy link
Member

commented Aug 27, 2019

You have since edited it to be more clear, but I read it before the edits. :p

@bertjwregeer

This comment has been minimized.

Copy link
Member Author

commented Aug 27, 2019

It was 02:15 in the morning when I first found the issue and added this issue to the issue tracker. I realize that I had a bunch of additional context in my head that was not in the issue. Sorry for not making it more clear.

The issue here is that we crack the first line, and then pass that uri from that to urlsplit. If we are a proxy server then that makes sense, because we will get a full URL and urlsplit will do its magic correctly. If however we have just a path + query string, then urlsplit fails miserably when the path starts with two /'s because it assumes the first part is the netloc.

Basically this needs to change to:

  1. See if uri starts with "/", then split on ? and #
  2. If not the above, then use urlsplit
@mmerickel

This comment has been minimized.

Copy link
Member

commented Aug 27, 2019

Is it not enough to check specifically for the uri.startswith('//') situation and drop the first / before passing it to urlsplit?

@bertjwregeer

This comment has been minimized.

Copy link
Member Author

commented Aug 27, 2019

@mmerickel and then add it back before setting it as the path? It is valid to send //testing/whatever as the PATH_INFO to the WSGI application...

bertjwregeer added a commit that referenced this issue Aug 27, 2019
bertjwregeer added a commit that referenced this issue Aug 27, 2019
Allow requests with URI that starts with multiple slashes
The HTTP spec states that it is acceptable to send a request like:

GET //whatever/testing HTTP/1.1

This should get properly conveyed to the WSGI application, but due to
the way that urlsplit works in the standard library this did not happen
correctly. With this fix we pass through the original path as requested
by the client, and the WSGI application will be responsible for
collapsing multiple empty path segments as necessary.

Fixes #260
bertjwregeer added a commit that referenced this issue Aug 27, 2019
Allow requests with URI that starts with multiple slashes
The HTTP spec states that it is acceptable to send a request like:

GET //whatever/testing HTTP/1.1

This should get properly conveyed to the WSGI application, but due to
the way that urlsplit works in the standard library this did not happen
correctly. With this fix we pass through the original path as requested
by the client, and the WSGI application will be responsible for
collapsing multiple empty path segments as necessary.

Fixes #260
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.