-
Notifications
You must be signed in to change notification settings - Fork 7
Description
To whom it may concern,
If you post a request to w4py3 with a binary content type of application/octet-stream or, say, application/x-7z-compressed, w4py3 will always try to decode the binary payload, usually resulting in a traceback like this:
Traceback (most recent call last):
File "c:\ProgramData\conda\envs\myenv\lib\site-packages\webware\HTTPRequest.py", line 41, in __init__
self._fields = FieldStorage.FieldStorage(
File "c:\ProgramData\conda\envs\myenv\lib\site-packages\webware\WebUtils\FieldStorage.py", line 212, in __init__
self.read_single()
File "c:\ProgramData\conda\envs\myenv\lib\site-packages\webware\WebUtils\FieldStorage.py", line 409, in read_single
self.read_binary()
File "c:\ProgramData\conda\envs\myenv\lib\site-packages\webware\WebUtils\FieldStorage.py", line 434, in read_binary
self.file.write(data.decode())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 2: invalid start byte
This is because the FieldStorage code in w4py3, lifted from the deprecated cgi.py module, does an internal test on "whether or not the storage should be a binary file, viz:
if self._binary_file:
self.file.write(data)
else: # fix for issue 27777
self.file.write(data.decode()) # <----- the problem line
However, unexpectedly, this self._binary_file flag is not set based on the Content type, but rather if and only if there is a Content-disposition header, and then only if a filename attribute in the options dictionary for that header (even though, as far as I can tell, that filename is otherwise ignored, viz:
cdisp, pdict = '', {}
if 'content-disposition' in self.headers:
cdisp, pdict = parse_header(self.headers['content-disposition'])
fslog.write(f'cdisp and pdict found in headers: {cdisp}, {pdict}\n')
self.disposition = cdisp
self.disposition_options = pdict
self.name = None
if 'name' in pdict:
self.name = pdict['name']
self.filename = None
if 'filename' in pdict:
self.filename = pdict['filename']
self._binary_file = self.filename is not None
(I do not understand the reasoning behind this logic, as this header is normally associated with responses, rather than requests. But I digress...)
And, unfortunately, the HTTPRequest class does not pass headers into the FieldStorage constructor and the FieldStorage constructor does not parse this header out of the passed environment to create its own internal headers dictionary, viz:
if headers is None:
headers = {}
if method == 'POST':
headers['content-type'] = 'application/x-www-form-urlencoded'
if 'CONTENT_TYPE' in environ:
headers['content-type'] = environ['CONTENT_TYPE']
if 'QUERY_STRING' in environ:
self.qs_on_post = environ['QUERY_STRING']
if 'CONTENT_LENGTH' in environ:
headers['content-length'] = environ['CONTENT_LENGTH']
else:
if not (isinstance(headers, (Mapping, Message))):
raise TypeError(
'headers must be mapping or an instance of'
' email.message.Message')
self.headers = headers
Because of the above logic, the _binary_file flag is therefore never set to true, and the upshot of this is that you can't post binary data to a w4py3 servlet like you could in w4py2, because read_binary() now always try to run data.decode() on it, which results in the above traceback for non-unicode POST payloads.
I'm not sure what the right fix here is, but my gut tells me that "binary" test should be based on the content-type, rather than the presence of the filename attribute in the content-disposition header; am I missing something?
One other thing: the exception handler for the FieldStorage constructor still assumes that the cgi package is importable:
# Protect the loading of fields with an exception handler,
# because bad headers sometimes can break the field storage
# (see also https://bugs.python.org/issue27777).
try:
self._fields = FieldStorage.FieldStorage(
self._input, environ=self._environ,
keep_blank_values=True, strict_parsing=False)
except Exception:
self._fields = cgi.FieldStorage(keep_blank_values=True)
traceback.print_exc(file=sys.stderr)
But, given that original cgi module is already deprecated and scheduled to go away in python 3.13, is there a plan to replace all of this the FieldStorage logic with the recommended (alas, non-core) "multipart" package?