Skip to content

MultipartWriter quotes field name wrong #4012

Closed
@kohtala

Description

Long story short

My client needs to send multipart/form-data to an API that expects field names with [] in the name. The server does not accept the submission with default set_content_disposition parameters due to wrong quoting.

Expected behaviour

Content-Disposition: form-data; name="files[]"; filename="filename"

Actual behaviour

Content-Disposition: form-data; name="files%5B%5D"; filename="filename"; filename*=utf-8''filename

Steps to reproduce

Client code is like

        with aiohttp.MultipartWriter('form-data') as mpw:
                f = mpw.append(file)
                f.set_content_disposition("form-data", name="files[]", filename="filename")

        res = await self.session.post(url, data=mpw)

Your environment

aiohttp==3.5.4 async client, Ubuntu 18.04, python 3.6.8.

Analysis

Returning Values from Forms: multipart/form-data says

In most multipart types, the MIME header fields in each part are
restricted to US-ASCII; for compatibility with those systems, file
names normally visible to users MAY be encoded using the percent-
encoding method in Section 2, following how a "file:" URI
[URI-SCHEME] might be encoded.

NOTE: The encoding method described in [RFC5987], which would add a
"filename*" parameter to the Content-Disposition header field, MUST
NOT be used.

It would seem the current implementation misinterpreted this to mean all field values are to be percent-encoded. But the RFC7578 is clear that the encoding is only to be used on file names. Furthermore, the filename*= form from MIME Parameter Value and Encoded Word Extensions should be used only for the other fields, but as the filename is already via percent-encoding to within US-ASCII, filename*= is not to be used on the filename.

For converting from unicode string to bytes for the percent-encoding, user will need to specify charset in some cases, as in the RFC:

The encoding used for the file names is typically UTF-8, although
HTML forms will use the charset associated with the form.

Thus, in some cases, an additional charset parameter is needed in set_content_disposition. Is it needed in other functions?

The RFCs refer to RFC822 for quoted-string definition, which is currently obsoleted by Internet Message Format RFC5322.

   qtext           =   %d33 /             ; Printable US-ASCII
                       %d35-91 /          ;  characters not including
                       %d93-126 /         ;  "\" or the quote character
                       obs-qtext

   qcontent        =   qtext / quoted-pair

   quoted-string   =   [CFWS]
                       DQUOTE *([FWS] qcontent) [FWS] DQUOTE
                       [CFWS]

   quoted-pair     =   ("\" (VCHAR / WSP)) / obs-qp

And from Augmented BNF for Syntax Specifications: ABNF

  VCHAR          =  %x21-7E
                                ; visible (printing) characters

         WSP            =  SP / HTAB
                                ; white space```

The quoted-pair quoting of quoted-string is missing in the current implementation.

There is also a rather far-fetched case of extremely long values causing the line length limit of 998 characters to be exceeded https://tools.ietf.org/html/rfc5322#section-2.1.1 and requiring using the Folding White Space (FWS).

I can not tell if there would be any compatibility impact of just changing the percent-quoting to the correct quoted-pair quoting. Should the quote_fields parameter concern the percent-encoding of filename or the quoted-pair of all fields?

The current behavior seems to be result of discussion in #916 to fix #903.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions