Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Bad Request when sending utf-8 encoded http path under python3 #1577
This is the sent data:
This is the response
This is because the request line is first decoded as latin1 using the _compat:bytes_to_str this causes the "à" to be returned as "\xc3\xa0", then the request line is split using line.split(None, 2) which will consider the \xa0 (non breaking space) as whitespace and strip it, thus rendering the request line invalid.
A first attempt would be to use line.split(' ', 2) but then the split will no longer eat up all consecutive whitespaces and may introduce other bugs.
I'm not sure what would be the best solution here.
Note that python2 is unaffected because bytes_to_str is a no-op in this case:
mmm shouldn't the start line be encoded in us-ascii though ?
This section describe exactly the problem gunicorn is having in this case:
In any case the parsing of the request line should be done on bytes.
It's debatable if the request line should ascii-only. The spec is a bit unclear. However there are clients in the wild that just send those request lines and I'd rather handle them in the application which can do further processing to handle them properly.