WebTest not handling non-ascii HTTP headers correctly in Python 2.7 #186

karenc · 2017-08-03T15:50:42Z

In one of our tests, we are returning attached; filename=useful-inførmation-5.pdf in the HTTP headers. This is the traceback:

======================================================================              
ERROR: test_exports_wo_version (cnxarchive.tests.test_functional.IdentHashMissingVersionTestCase)
----------------------------------------------------------------------              
Traceback (most recent call last):                                                  
  File "/var/cnx/venvs/publishing/src/cnx-archive/cnxarchive/tests/test_functional.py", line 351, in test_exports_wo_version
    resp = resp.follow()                                                            
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webtest/response.py", line 102, in follow
    return self._follow(**kw)                                                       
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webtest/response.py", line 90, in _follow
    return self.test_app.get(abslocation, **kw)                                     
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webtest/app.py", line 331, in get
    expect_errors=expect_errors)                                                    
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webtest/app.py", line 625, in do_request
    res = req.get_response(app, catch_exc_info=True)                                
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webob/request.py", line 1312, in send
    application, catch_exc_info=True)                                               
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webob/request.py", line 1280, in call_application
    app_iter = application(self.environ, start_response)                            
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webtest/lint.py", line 199, in lint_app
    iterator = application(environ, start_response_wrapper)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/pyramid/router.py", line 271, in __call__
    return response(environ, start_response)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webob/response.py", line 1310, in __call__
    start_response(self.status, headerlist)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webtest/lint.py", line 189, in start_response_wrapper
    check_headers(headers)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webtest/lint.py", line 454, in check_headers
    "%r is not a valid latin1 string" % (value,)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webtest/lint.py", line 420, in _assert_latin1_str
    string.encode('latin1')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf8 in position 29: ordinal not in range(128)

----------------------------------------------------------------------

The text was updated successfully, but these errors were encountered:

digitalresistor · 2017-08-03T16:01:10Z

HTTP headers are ASCII and may not include UTF-8. You'll need to percent encode the file name before sending it out.

The HTTP protocol does not allow for non-ASCII characters:

https://stackoverflow.com/a/4410331/13986

karenc · 2017-08-03T16:07:07Z

Ok, I see. Should WebTest remove the support for latin1 HTTP headers then? The reason I am raising this issue is because the test would probably pass if it was Python 3.

gawel · 2017-08-03T16:23:45Z

From PEP3333:

 WSGI therefore defines two kinds of "string":

    "Native" strings (which are always implemented using the type named str ) that are used for request/response headers and metadata
    "Bytestrings" (which are implemented using the bytes type in Python 3, and str elsewhere), that are used for the bodies of requests and responses (e.g. POST/PUT input data and HTML page outputs).

Do not be confused however: even if Python's str type is actually Unicode "under the hood", the content of native strings must still be translatable to bytes via the Latin-1 encoding! (See the section on Unicode Issues later in this document for more details.)

karenc · 2017-08-03T17:51:32Z

In this commit 7d539b6, in particular this change:

-def _assert_latin1_py3(string, message):
-    if PY3 and type(string) is str:
-        try:
-            string.encode('latin1')
-        except UnicodeEncodeError:
-            raise AssertionError(message)
+def _assert_latin1_str(string, message):
+    assert type(string) is str, message
+    try:
+        string.encode('latin1')
+    except UnicodeEncodeError:
+        raise AssertionError(message)

The above does not work for Python2. The reason is because Python2 str and Python3 str have different meanings. Python2 str is Python3 bytes and Python2 unicode is Python3 str. The bug here is if string is a Python2 str, it is already encoded, you cannot encode it again and that is why there is an error.

To demonstrate the difference:

karen@yachiyo:/tmp/webtest$ python2
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> header = u'attached; filename=useful-inførmation-5.pdf'.encode('latin1')
>>> type(header)
<type 'str'>
>>> header.encode('latin1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf8 in position 29: ordinal not in range(128)
>>>

karen@yachiyo:/tmp/webtest$ python3
Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> header = 'attached; filename=useful-inførmation-5.pdf'
>>> type(header)
<class 'str'>
>>> header.encode('latin1')
b'attached; filename=useful-inf\xf8rmation-5.pdf'
>>>

digitalresistor · 2017-08-03T18:06:17Z

The linter should probably catch that the header contains invalid characters. b'attached; filename=useful-inf\xf8rmation-5.pdf' is not useful to send to a client since they don't know how to interpret the \xf notation.

https://tools.ietf.org/html/rfc6266#section-4.3

For encoding the filename if you are using non ISO-8859-1 characters, you should be using the encoding from here (and provide both filename and filename*, the latter is preferred over the former but the former provides a clean "ASCII" filename:

https://tools.ietf.org/html/rfc5987

I will note that even though the RFC's talk about ISO-8859-1 being supported as a character set for HTTP responses:

https://tools.ietf.org/html/rfc7230#section-3.2.4

says:

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

The webtest linter could probably do a better job at that, and in the future I may add similar checks when setting headers to WebOb so that invalid data is not allowed to be returned to an unsuspecting client.

karenc · 2017-08-03T18:11:30Z

I understand what you're saying there. You might want more discussion about whether the linter should accept non ascii headers in the future. What I am hoping for in this issue is for Python2 and Python3 to give the same result.

karenc mentioned this issue Aug 3, 2017

Pin webtest to 2.0.27 openstax/cnx-archive#510

Merged

icemac mentioned this issue Aug 4, 2017

Allow newer WebTest versions. zopefoundation/zope.testbrowser#32

Merged

gawel closed this as completed in ad31ea8 Oct 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebTest not handling non-ascii HTTP headers correctly in Python 2.7 #186

WebTest not handling non-ascii HTTP headers correctly in Python 2.7 #186

karenc commented Aug 3, 2017

digitalresistor commented Aug 3, 2017

karenc commented Aug 3, 2017

gawel commented Aug 3, 2017

karenc commented Aug 3, 2017

digitalresistor commented Aug 3, 2017

karenc commented Aug 3, 2017

WebTest not handling non-ascii HTTP headers correctly in Python 2.7 #186

WebTest not handling non-ascii HTTP headers correctly in Python 2.7 #186

Comments

karenc commented Aug 3, 2017

digitalresistor commented Aug 3, 2017

karenc commented Aug 3, 2017

gawel commented Aug 3, 2017

karenc commented Aug 3, 2017

digitalresistor commented Aug 3, 2017

karenc commented Aug 3, 2017