Waitress errors on curl request with non ASCII in URL. #127

GrahamDumpleton · 2016-04-21T05:49:48Z

If you issue a request with curl of:

curl http://127.0.0.1:8080/a=тест

Waitress server will die with:

ERROR:waitress:uncaptured python exception, closing channel <waitress.channel.HTTPChannel connected 127.0.0.1:59045 at 0x103ad7b38> (<class 'UnicodeDecodeError'>:'ascii' codec can't decode byte 0xd1 in position 3: ordinal not in range(128) [/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/asyncore.py|read|83] [/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/asyncore.py|handle_read_event|423] [/private/tmp/py35/lib/python3.5/site-packages/waitress/channel.py|handle_read|174] [/private/tmp/py35/lib/python3.5/site-packages/waitress/channel.py|received|191] [/private/tmp/py35/lib/python3.5/site-packages/waitress/parser.py|received|102] [/private/tmp/py35/lib/python3.5/site-packages/waitress/parser.py|parse_header|206] [/private/tmp/py35/lib/python3.5/site-packages/waitress/parser.py|split_uri|254] [/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/parse.py|urlsplit|327] [/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/parse.py|_coerce_args|114] [/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/parse.py|_decode_args|98] [/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/parse.py|<genexpr>|98])

This came up in discussion:

http://bugs.python.org/issue26808

You may want to check that and related issues:

to check how Waitress behaves in cases of client sending non ASCII.

Right now Waitress fails. Both wsgiref and Gunicorn appear to get it wrong. But mod_wsgi appears to get the desired result.

The text was updated successfully, but these errors were encountered:

tseaver · 2016-04-21T14:33:40Z

Thanks for the report. A minor clarification: the waitress process itself doesn't die: it closes the connection without returning anything.

digitalresistor · 2016-04-21T16:36:12Z

It looks like cURL is not percent encoding the URL, and is instead sending UTF-8 to the server, which is not valid for the HTTP specification which requires latin-1 for requests, and thus requires that URL to be urlencoded.

rr- · 2017-02-05T23:40:25Z

Why latin-1? How do we encode 漢字 with it?

The standard seems to advocate UTF-8 rather than latin-1:

Non-ASCII characters must first be encoded according to UTF-8 [STD63], and then each octet of the corresponding UTF-8 sequence must be percent-encoded to be represented as URI characters.

https://tools.ietf.org/html/rfc3986
http://stackoverflow.com/a/913653

Percent-encoded URLs do not currently work either:

Input - GET /tag/Madoka%E2%99%A5Magika HTTP/1.0 (generated by modern web browser accessing /tag/Madoka♥Magika)
Output - /tag/Madokaâ¥Magika

digitalresistor · 2017-02-06T01:27:57Z

percent-encoded to be represented as URI characters.

Percent encoding is latin-1 (ASCII).

Percent-encoded URLs do not currently work either:

I am not sure what you mean here...

On a Pyramid application running locally on my machine (Python 3.5, waitress 1.0.1):

 curl -vvvvv http://10.10.10.205:6543/Madoka%E2%99%A5Magika
*   Trying 10.10.10.205...
* TCP_NODELAY set
* Connected to 10.10.10.205 (10.10.10.205) port 6543 (#0)
> GET /Madoka%E2%99%A5Magika HTTP/1.1
> Host: 10.10.10.205:6543
> User-Agent: curl/7.51.0
> Accept: */*
> 
< HTTP/1.1 404 Not Found
< Content-Length: 921
< Content-Type: text/html; charset=UTF-8
< Date: Mon, 06 Feb 2017 01:20:58 GMT
< Server: waitress
< 
<html>
 <head>
  <title>404 Not Found</title>
 </head>
 <body>
  <h1>404 Not Found</h1>
  The resource could not be found.<br/><br/>
debug_notfound of url http://10.10.10.205:6543/Madoka%E2%99%A5Magika; path_info: &#x27;/Madoka&#9829;Magika&#x27;, context: &lt;myapp.traversal.Root object at 0x109773b00&gt;, view_name: &#x27;Madoka&#9829;Magika&#x27;, subpath: (), traversed: (), root: &lt;myapp.traversal.Root object at 0x109773b00&gt;, vroot: &lt;myapp.traversal.Root object at 0x109773b00&gt;, vroot_path: ()


 <link rel="stylesheet" type="text/css" href="http://10.10.10.205:6543/_debug_toolbar/static/toolbar/toolbar_button.css">

<div id="pDebug">
    <div  id="pDebugToolbarHandle">
        <a title="Show Toolbar" id="pShowToolBarButton"
           href="http://10.10.10.205:6543/_debug_toolbar/34343533343036383136" target="pDebugToolbar">&#171; FIXME: Debug Toolbar</a>
    </div>
</div>
</body>
* Curl_http_done: called premature == 0
* Connection #0 to host 10.10.10.205 left intact

Same with:

alexandra:~ xistence$ curl -vvvvv "http://10.10.10.205:6543/%E6%BC%A2%E5%AD%97"
*   Trying 10.10.10.205...
* TCP_NODELAY set
* Connected to 10.10.10.205 (10.10.10.205) port 6543 (#0)
> GET /%E6%BC%A2%E5%AD%97 HTTP/1.1
> Host: 10.10.10.205:6543
> User-Agent: curl/7.51.0
> Accept: */*
> 
< HTTP/1.1 404 Not Found
< Content-Length: 912
< Content-Type: text/html; charset=UTF-8
< Date: Mon, 06 Feb 2017 01:22:05 GMT
< Server: waitress
< 
<html>
 <head>
  <title>404 Not Found</title>
 </head>
 <body>
  <h1>404 Not Found</h1>
  The resource could not be found.<br/><br/>
debug_notfound of url http://10.10.10.205:6543/%E6%BC%A2%E5%AD%97; path_info: &#x27;/&#28450;&#23383;&#x27;, context: &lt;myapp.traversal.Root object at 0x1096fcb00&gt;, view_name: &#x27;&#28450;&#23383;&#x27;, subpath: (), traversed: (), root: &lt;myapp.traversal.Root object at 0x1096fcb00&gt;, vroot: &lt;myapp.traversal.Root object at 0x1096fcb00&gt;, vroot_path: ()


 <link rel="stylesheet" type="text/css" href="http://10.10.10.205:6543/_debug_toolbar/static/toolbar/toolbar_button.css">

<div id="pDebug">
    <div  id="pDebugToolbarHandle">
        <a title="Show Toolbar" id="pShowToolBarButton"
           href="http://10.10.10.205:6543/_debug_toolbar/34343439333837333532" target="pDebugToolbar">&#171; FIXME: Debug Toolbar</a>
    </div>
</div>
</body>
* Curl_http_done: called premature == 0
* Connection #0 to host 10.10.10.205 left intact

Application output:

2017-02-05 18:19:52,118 DEBUG [myapp:106][waitress] route matched for url http://10.10.10.205:6543/Madoka%E2%99%A5Magika; route_name: 'main', path_info: '/Madoka♥Magika', pattern: '/*traverse', matchdict: {'traverse': ('Madoka♥Magika',)}, predicates: ''
2017-02-05 18:21:50,355 DEBUG [myapp:106][waitress] route matched for url http://10.10.10.205:6543/%E6%BC%A2%E5%AD%97; route_name: 'main', path_info: '/漢字', pattern: '/*traverse', matchdict: {'traverse': ('漢字',)}, predicates: ''

The issue is that cURL by default will NOT send the percent encoded request:

alexandra:~ xistence$ curl -vvv "http://10.10.10.205:6543/Madoka♥Magika"
*   Trying 10.10.10.205...
* TCP_NODELAY set
* Connected to 10.10.10.205 (10.10.10.205) port 6543 (#0)
> GET /Madoka♥Magika HTTP/1.1
> Host: 10.10.10.205:6543
> User-Agent: curl/7.51.0
> Accept: */*
> 
* Curl_http_done: called premature == 0
* Empty reply from server
* Connection #0 to host 10.10.10.205 left intact
curl: (52) Empty reply from server

Which causes waitress to close the connection:

2017-02-05 18:24:44,552 ERROR [waitress:181][MainThread] uncaptured python exception, closing channel <waitress.channel.HTTPChannel connected 10.10.10.205:50329 at 0x109591cf8> (<class 'UnicodeDecodeError'>:'ascii' codec can't decode byte 0xe2 in position 7: ordinal not in range(128) [/Users/xistence/.pyenv/versions/3.5.0/lib/python3.5/asyncore.py|read|83] [/Users/xistence/.pyenv/versions/3.5.0/lib/python3.5/asyncore.py|handle_read_event|423] [/Users/xistence/.ve/myapp/lib/python3.5/site-packages/waitress/channel.py|handle_read|174] [/Users/xistence/.ve/myapp/lib/python3.5/site-packages/waitress/channel.py|received|191] [/Users/xistence/.ve/myapp/lib/python3.5/site-packages/waitress/parser.py|received|102] [/Users/xistence/.ve/myapp/lib/python3.5/site-packages/waitress/parser.py|parse_header|208] [/Users/xistence/.ve/myapp/lib/python3.5/site-packages/waitress/parser.py|split_uri|256] [/Users/xistence/.pyenv/versions/3.5.0/lib/python3.5/urllib/parse.py|urlsplit|327] [/Users/xistence/.pyenv/versions/3.5.0/lib/python3.5/urllib/parse.py|_coerce_args|114] [/Users/xistence/.pyenv/versions/3.5.0/lib/python3.5/urllib/parse.py|_decode_args|98] [/Users/xistence/.pyenv/versions/3.5.0/lib/python3.5/urllib/parse.py|<genexpr>|98])

This behaviour should be improved upon, but is technically contra-spec because the sending entity should have percent encoded the URL before sending it to the server.

rr- · 2017-02-06T05:59:47Z

I'm not sure how you got the above results, but the problematic behavior is demonstrated in existing unit tests:

https://github.com/Pylons/waitress/blob/1bcdeaec9fb60ba41053fcf9253d2a340af95310/waitress/tests/test_compat.py

b'/a%C5%9B'
assert '/aÅ\x9b'

whereas it "should" (should it?) be

b'/a\xc5\x9b'.decode('utf-8')
'/aś'

This weird encoding ends up being stored in env['PATH_INFO']

rr- · 2017-02-06T06:13:38Z

Example:

testapp.py

def application(env, start_response):
    start_response('200', [('content-type', 'text/plain; charset=utf-8')])
    a = env['PATH_INFO']
    b = a.encode('latin-1').decode('utf-8')  # :E
    print(a, b)
    return ('%s %s' % (a, b)).encode('utf-8'),

waitress-serve --port 1234 testapp:application

rr-@tornado:~$ curl 'localhost:1234/%E6%BC%A2%E5%AD%97'
/æ¼¢å% /漢字

The .encode('latin-1').decode('utf-8') gives the expected result but I totally get a "you're doing it wrong" vibe from it.

Edit: looks like pyramid does just that: https://github.com/Pylons/pyramid/blob/4acd85dc98fb2a43eae54d2116cc4bf383157269/pyramid/request.py#L283

In the test I see a reference to PEP 3333 https://www.python.org/dev/peps/pep-3333/#unicode-issues but the reason for latin-1 is bogus at best, even after reading whole "unicode issues" section...

digitalresistor · 2017-02-06T06:49:18Z

Actually Pyramid uses WebOb which does the right thing here: https://github.com/Pylons/webob/blob/master/webob/request.py#L321 and https://github.com/Pylons/webob/blob/master/webob/request.py#L167.

Which is similar to what Werkzeug does: https://github.com/pallets/werkzeug/blob/109dad4ac9e0a1690666b2d4f29d07d98a3701d9/werkzeug/wsgi.py#L233

That being said, the encode/decode spiel is indeed correct.

Based upon the comments in the above bug reports linked by @GrahamDumpleton, it is expected that the PATH_INFO contains the percent decoded URL in latin-1. Changing this would be against the WSGI spec.

The only way that waitress would fix this issue is for it to accept the UTF-8, encode it, and decode it as latin-1 and put it in PATH_INFO, and you would still have to do the dance in your application.

rr- · 2017-02-06T06:56:00Z

Thanks for the confirmation, wish I had known sooner about that encoding gotcha (or at least thought about going to look for it in the WSGI ref.)

Regarding the OP's issue I think curl is at fault for not encoding the URLs like the RFC linked earlier says to, and trying to parse such URLs seems like asking for trouble - for example, what if the user issues curl command in a console with non-unicode locale?

digitalresistor · 2017-02-06T07:00:44Z

I agree with cURL being at fault. Trying UTF-8 and failing back to latin-1 might make sense. The other fix I am thinking about is having it actually return a 400 Bad Request instead of just closing the connection. Slamming the door in someones face is not my idea of a good web citizen.

mmerickel · 2017-03-23T20:34:40Z

This issue is the same as #64.

digitalresistor · 2017-08-16T05:32:39Z

Fixed by #162

rr- mentioned this issue Feb 5, 2017

★ character in tag breaks tag list rr-/szurubooru#121

Closed

digitalresistor closed this as completed Aug 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Waitress errors on curl request with non ASCII in URL. #127

Waitress errors on curl request with non ASCII in URL. #127

GrahamDumpleton commented Apr 21, 2016

tseaver commented Apr 21, 2016

digitalresistor commented Apr 21, 2016

rr- commented Feb 5, 2017 •

edited

digitalresistor commented Feb 6, 2017

rr- commented Feb 6, 2017 •

edited

rr- commented Feb 6, 2017 •

edited

digitalresistor commented Feb 6, 2017

rr- commented Feb 6, 2017

digitalresistor commented Feb 6, 2017

mmerickel commented Mar 23, 2017

digitalresistor commented Aug 16, 2017

Waitress errors on curl request with non ASCII in URL. #127

Waitress errors on curl request with non ASCII in URL. #127

Comments

GrahamDumpleton commented Apr 21, 2016

tseaver commented Apr 21, 2016

digitalresistor commented Apr 21, 2016

rr- commented Feb 5, 2017 • edited

digitalresistor commented Feb 6, 2017

rr- commented Feb 6, 2017 • edited

rr- commented Feb 6, 2017 • edited

digitalresistor commented Feb 6, 2017

rr- commented Feb 6, 2017

digitalresistor commented Feb 6, 2017

mmerickel commented Mar 23, 2017

digitalresistor commented Aug 16, 2017

rr- commented Feb 5, 2017 •

edited

rr- commented Feb 6, 2017 •

edited

rr- commented Feb 6, 2017 •

edited