Description
Long story short
My client needs to send multipart/form-data to an API that expects field names with [] in the name. The server does not accept the submission with default set_content_disposition parameters due to wrong quoting.
Expected behaviour
Content-Disposition: form-data; name="files[]"; filename="filename"
Actual behaviour
Content-Disposition: form-data; name="files%5B%5D"; filename="filename"; filename*=utf-8''filename
Steps to reproduce
Client code is like
with aiohttp.MultipartWriter('form-data') as mpw:
f = mpw.append(file)
f.set_content_disposition("form-data", name="files[]", filename="filename")
res = await self.session.post(url, data=mpw)
Your environment
aiohttp==3.5.4 async client, Ubuntu 18.04, python 3.6.8.
Analysis
Returning Values from Forms: multipart/form-data says
In most multipart types, the MIME header fields in each part are
restricted to US-ASCII; for compatibility with those systems, file
names normally visible to users MAY be encoded using the percent-
encoding method in Section 2, following how a "file:" URI
[URI-SCHEME] might be encoded.NOTE: The encoding method described in [RFC5987], which would add a
"filename*" parameter to the Content-Disposition header field, MUST
NOT be used.
It would seem the current implementation misinterpreted this to mean all field values are to be percent-encoded. But the RFC7578 is clear that the encoding is only to be used on file names. Furthermore, the filename*= form from MIME Parameter Value and Encoded Word Extensions should be used only for the other fields, but as the filename is already via percent-encoding to within US-ASCII, filename*= is not to be used on the filename.
For converting from unicode string to bytes for the percent-encoding, user will need to specify charset in some cases, as in the RFC:
The encoding used for the file names is typically UTF-8, although
HTML forms will use the charset associated with the form.
Thus, in some cases, an additional charset parameter is needed in set_content_disposition. Is it needed in other functions?
The RFCs refer to RFC822 for quoted-string definition, which is currently obsoleted by Internet Message Format RFC5322.
qtext = %d33 / ; Printable US-ASCII
%d35-91 / ; characters not including
%d93-126 / ; "\" or the quote character
obs-qtext
qcontent = qtext / quoted-pair
quoted-string = [CFWS]
DQUOTE *([FWS] qcontent) [FWS] DQUOTE
[CFWS]
quoted-pair = ("\" (VCHAR / WSP)) / obs-qp
And from Augmented BNF for Syntax Specifications: ABNF
VCHAR = %x21-7E
; visible (printing) characters
WSP = SP / HTAB
; white space```
The quoted-pair quoting of quoted-string is missing in the current implementation.
There is also a rather far-fetched case of extremely long values causing the line length limit of 998 characters to be exceeded https://tools.ietf.org/html/rfc5322#section-2.1.1 and requiring using the Folding White Space (FWS).
I can not tell if there would be any compatibility impact of just changing the percent-quoting to the correct quoted-pair quoting. Should the quote_fields parameter concern the percent-encoding of filename or the quoted-pair of all fields?
The current behavior seems to be result of discussion in #916 to fix #903.