Fix _cpreqbody.Entity to handle Content-Disposition filename* with encoding#1695
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1695 +/- ##
==========================================
- Coverage 80.73% 80.65% -0.09%
==========================================
Files 105 105
Lines 13570 13591 +21
==========================================
+ Hits 10956 10962 +6
- Misses 2614 2629 +15 |
| fnames = ['boop.csv', 'foo, bar.csv', 'bar, xxxx.csv', 'file"name.csv', | ||
| 'file;name.csv', 'file; name.csv'] | ||
| 'file;name.csv', 'file; name.csv', | ||
| ('test_łóąä.txt', 'utf-8') |
There was a problem hiding this comment.
So I wasn't sure but I think it may be better to pass str not unicode object here
and this is why I put "# coding: utf-8" at the top of the file.
webknjaz
left a comment
There was a problem hiding this comment.
Linters must not fail https://travis-ci.org/cherrypy/cherrypy/jobs/349840614#L482
…encoding * Change _cpcompat.unquote_qs to work with unicode on py2 - fix for #1694
|
@webknjaz I've fixed pylint stuff. Could you give it another chance please? |
| self.filename = self.filename[1:-1] | ||
| elif 'filename*' in disp.params: | ||
| # @see https://tools.ietf.org/html/rfc5987 | ||
| fn_encoding, _, filename = disp.params['filename*'].split("'") |
There was a problem hiding this comment.
Could you please keep a language part?
| elif 'filename*' in disp.params: | ||
| # @see https://tools.ietf.org/html/rfc5987 | ||
| fn_encoding, _, filename = disp.params['filename*'].split("'") | ||
| self.filename = unquote_qs(filename, fn_encoding) |
There was a problem hiding this comment.
I'm not sure whether we'd want to have only one filename field. I think it's appropriate to save all of them.
We could have some @property returning just one thing though.
There was a problem hiding this comment.
Hmm. That seems out of scope for this change. This change only seeks to add support for non-ascii filenames. Supporting multiple filenames can be done independently.
There was a problem hiding this comment.
Not if it breaks access to the old field
| return urllib.parse.unquote(atom_spc, encoding=encoding, errors=errors) | ||
| else: | ||
| if isinstance(atom_spc, six.text_type): | ||
| atom_spc = str(atom_spc) |
|
|
||
| filename_encoding = None | ||
| if isinstance(filename, (list, tuple)): | ||
| filename, filename_encoding = filename |
There was a problem hiding this comment.
There's too much logic in here. Tests should be simple and keep testing old cases.
| 'file;name.csv', 'file; name.csv'] | ||
| fnames = [ | ||
| 'boop.csv', 'foo, bar.csv', 'bar, xxxx.csv', 'file"name.csv', | ||
| 'file;name.csv', 'file; name.csv', ('test_łóąä.txt', 'utf-8') |
There was a problem hiding this comment.
this difference in supplied types is redundant.
There was a problem hiding this comment.
I don't understand the comment. The only addition is the one new filename with non-ascii characters. I don't see how that's redundant to all the other names with ascii characters.
There was a problem hiding this comment.
I don't remember why I wrote it back then 🤷♂️
|
@jaraco could you take a look at this, please? |
|
I apologize I haven't had time to review this yet. I still plan to do so. |
| self.filename.endswith('"') | ||
| ): | ||
| self.filename = self.filename[1:-1] | ||
| elif 'filename*' in disp.params: |
There was a problem hiding this comment.
RFC says filename* should be preferred to filename if both are specified, so this should be if and not elif.
| elif 'filename*' in disp.params: | ||
| # @see https://tools.ietf.org/html/rfc5987 | ||
| fn_encoding, _, filename = disp.params['filename*'].split("'") | ||
| self.filename = unquote_qs(filename, fn_encoding) |
There was a problem hiding this comment.
Hmm. That seems out of scope for this change. This change only seeks to add support for non-ascii filenames. Supporting multiple filenames can be done independently.
| elif 'filename*' in disp.params: | ||
| # @see https://tools.ietf.org/html/rfc5987 | ||
| fn_encoding, _, filename = disp.params['filename*'].split("'") | ||
| self.filename = unquote_qs(filename, fn_encoding) |
There was a problem hiding this comment.
Is unquote_qs the right operation here? I'm skeptical because it translates + to , which I don't see in the spec. That's also a bug with the existing _cp_compat; it's doing too much already. Let's fix that first.
There was a problem hiding this comment.
So I can't remember exactly but I think the way I tested it on some older version was I POSTed a file with spaces and some UTF8 chars in it's name using requests and saw that requests is using quote.
There was a problem hiding this comment.
requests is often not a good example of following RFCs, unfortunately.
As for + vs %20 for , I guess it depends on how client decides to encode that. I think I saw something regarding this in query params spec in some rfc.
| if isinstance(filename, (list, tuple)): | ||
| filename, filename_encoding = filename | ||
| if filename_encoding: | ||
| if six.PY3: |
There was a problem hiding this comment.
If at all possible, tests shouldn't switch Python versions. It should be possible to write universal code. Switching on Python versions adds complexity and potential pitfalls.
| https://github.com/cherrypy/cherrypy/issues/1146/ | ||
| https://github.com/cherrypy/cherrypy/issues/1397''' | ||
| https://github.com/cherrypy/cherrypy/issues/1397 | ||
| https://github.com/cherrypy/cherrypy/issues/1694''' |
There was a problem hiding this comment.
It would have been nice if the preceding commit had formatted without the trailing end symbol, but they didn't so now we get a messy diff. Let's not repeat that mistake and instead terminate the multiline string on another line.
| 'file;name.csv', 'file; name.csv'] | ||
| fnames = [ | ||
| 'boop.csv', 'foo, bar.csv', 'bar, xxxx.csv', 'file"name.csv', | ||
| 'file;name.csv', 'file; name.csv', ('test_łóąä.txt', 'utf-8') |
There was a problem hiding this comment.
I don't understand the comment. The only addition is the one new filename with non-ascii characters. I don't see how that's redundant to all the other names with ascii characters.
|
This is a great PR. I have just a few nitpicks... and one thing I wish to correct before this PR is accepted (in unquote_qs). I'll continue to work it from the PR branch. |
| if six.PY3: | ||
| return urllib.parse.unquote(atom_spc, encoding=encoding, errors=errors) | ||
| else: | ||
| if isinstance(atom_spc, six.text_type): |
There was a problem hiding this comment.
I'm hoping this change is unnecessary, because it complicates the behavior.
|
I've updated the _cpcompat.py with the changes I wanted to see (conflicting with this PR); tomorrow, I hope to revisit this and implement the required changes herein to address the conflict and other concerns I raised above. |
| (key, filename)) | ||
|
|
||
| fn_key, encoded = encode_filename(filename) | ||
| tmpl = \ |
There was a problem hiding this comment.
please use braces instead of escaping EOL
Fix _cpreqbody.Entity to handle Content-Disposition filename* with encoding
Change _cpcompat.unquote_qs to work with unicode on py2
What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
fix for #1694
File uploads with
Content-Disposition: form-data; name="myFile"; filename*=utf-8''upload_test_file_%C5%82%C3%B3%C4%85%C3%A4.txtshould work now.