Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebTest not handling non-ascii HTTP headers correctly in Python 2.7 #186

Closed
karenc opened this issue Aug 3, 2017 · 6 comments
Closed

WebTest not handling non-ascii HTTP headers correctly in Python 2.7 #186

karenc opened this issue Aug 3, 2017 · 6 comments

Comments

@karenc
Copy link

karenc commented Aug 3, 2017

In one of our tests, we are returning attached; filename=useful-inførmation-5.pdf in the HTTP headers. This is the traceback:

======================================================================              
ERROR: test_exports_wo_version (cnxarchive.tests.test_functional.IdentHashMissingVersionTestCase)
----------------------------------------------------------------------              
Traceback (most recent call last):                                                  
  File "/var/cnx/venvs/publishing/src/cnx-archive/cnxarchive/tests/test_functional.py", line 351, in test_exports_wo_version
    resp = resp.follow()                                                            
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webtest/response.py", line 102, in follow
    return self._follow(**kw)                                                       
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webtest/response.py", line 90, in _follow
    return self.test_app.get(abslocation, **kw)                                     
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webtest/app.py", line 331, in get
    expect_errors=expect_errors)                                                    
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webtest/app.py", line 625, in do_request
    res = req.get_response(app, catch_exc_info=True)                                
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webob/request.py", line 1312, in send
    application, catch_exc_info=True)                                               
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webob/request.py", line 1280, in call_application
    app_iter = application(self.environ, start_response)                            
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webtest/lint.py", line 199, in lint_app
    iterator = application(environ, start_response_wrapper)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/pyramid/router.py", line 271, in __call__
    return response(environ, start_response)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webob/response.py", line 1310, in __call__
    start_response(self.status, headerlist)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webtest/lint.py", line 189, in start_response_wrapper
    check_headers(headers)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webtest/lint.py", line 454, in check_headers
    "%r is not a valid latin1 string" % (value,)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/webtest/lint.py", line 420, in _assert_latin1_str
    string.encode('latin1')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf8 in position 29: ordinal not in range(128)

----------------------------------------------------------------------
@digitalresistor
Copy link
Member

HTTP headers are ASCII and may not include UTF-8. You'll need to percent encode the file name before sending it out.

The HTTP protocol does not allow for non-ASCII characters:

https://stackoverflow.com/a/4410331/13986

@karenc
Copy link
Author

karenc commented Aug 3, 2017

Ok, I see. Should WebTest remove the support for latin1 HTTP headers then? The reason I am raising this issue is because the test would probably pass if it was Python 3.

@gawel
Copy link
Member

gawel commented Aug 3, 2017

From PEP3333:

 WSGI therefore defines two kinds of "string":

    "Native" strings (which are always implemented using the type named str ) that are used for request/response headers and metadata
    "Bytestrings" (which are implemented using the bytes type in Python 3, and str elsewhere), that are used for the bodies of requests and responses (e.g. POST/PUT input data and HTML page outputs).

Do not be confused however: even if Python's str type is actually Unicode "under the hood", the content of native strings must still be translatable to bytes via the Latin-1 encoding! (See the section on Unicode Issues later in this document for more details.) 

@karenc
Copy link
Author

karenc commented Aug 3, 2017

In this commit 7d539b6, in particular this change:

-def _assert_latin1_py3(string, message):
-    if PY3 and type(string) is str:
-        try:
-            string.encode('latin1')
-        except UnicodeEncodeError:
-            raise AssertionError(message)
+def _assert_latin1_str(string, message):
+    assert type(string) is str, message
+    try:
+        string.encode('latin1')
+    except UnicodeEncodeError:
+        raise AssertionError(message)

The above does not work for Python2. The reason is because Python2 str and Python3 str have different meanings. Python2 str is Python3 bytes and Python2 unicode is Python3 str. The bug here is if string is a Python2 str, it is already encoded, you cannot encode it again and that is why there is an error.

To demonstrate the difference:

karen@yachiyo:/tmp/webtest$ python2
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> header = u'attached; filename=useful-inførmation-5.pdf'.encode('latin1')
>>> type(header)
<type 'str'>
>>> header.encode('latin1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf8 in position 29: ordinal not in range(128)
>>> 
karen@yachiyo:/tmp/webtest$ python3
Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> header = 'attached; filename=useful-inførmation-5.pdf'
>>> type(header)
<class 'str'>
>>> header.encode('latin1')
b'attached; filename=useful-inf\xf8rmation-5.pdf'
>>> 

@digitalresistor
Copy link
Member

The linter should probably catch that the header contains invalid characters. b'attached; filename=useful-inf\xf8rmation-5.pdf' is not useful to send to a client since they don't know how to interpret the \xf notation.

https://tools.ietf.org/html/rfc6266#section-4.3

For encoding the filename if you are using non ISO-8859-1 characters, you should be using the encoding from here (and provide both filename and filename*, the latter is preferred over the former but the former provides a clean "ASCII" filename:

https://tools.ietf.org/html/rfc5987

I will note that even though the RFC's talk about ISO-8859-1 being supported as a character set for HTTP responses:

https://tools.ietf.org/html/rfc7230#section-3.2.4

says:

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

The webtest linter could probably do a better job at that, and in the future I may add similar checks when setting headers to WebOb so that invalid data is not allowed to be returned to an unsuspecting client.

@karenc
Copy link
Author

karenc commented Aug 3, 2017

I understand what you're saying there. You might want more discussion about whether the linter should accept non ascii headers in the future. What I am hoping for in this issue is for Python2 and Python3 to give the same result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants