Skip to content

Fix _cpreqbody.Entity to handle Content-Disposition filename* with encoding#1695

Merged
jaraco merged 10 commits into
cherrypy:masterfrom
pawciobiel:1694-fix-for-encoded-filenames
Aug 14, 2018
Merged

Fix _cpreqbody.Entity to handle Content-Disposition filename* with encoding#1695
jaraco merged 10 commits into
cherrypy:masterfrom
pawciobiel:1694-fix-for-encoded-filenames

Conversation

@pawciobiel
Copy link
Copy Markdown
Contributor

  • Fix _cpreqbody.Entity to handle Content-Disposition filename* with encoding

  • Change _cpcompat.unquote_qs to work with unicode on py2

  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

fix for #1694

  • What is the new behavior (if this is a feature change)?

File uploads with
Content-Disposition: form-data; name="myFile"; filename*=utf-8''upload_test_file_%C5%82%C3%B3%C4%85%C3%A4.txt
should work now.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 6, 2018

Codecov Report

Merging #1695 into master will decrease coverage by 0.08%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1695      +/-   ##
==========================================
- Coverage   80.73%   80.65%   -0.09%     
==========================================
  Files         105      105              
  Lines       13570    13591      +21     
==========================================
+ Hits        10956    10962       +6     
- Misses       2614     2629      +15

@cherrypy cherrypy deleted a comment Mar 7, 2018
@cherrypy cherrypy deleted a comment Mar 7, 2018
Comment thread cherrypy/test/test_http.py Outdated
fnames = ['boop.csv', 'foo, bar.csv', 'bar, xxxx.csv', 'file"name.csv',
'file;name.csv', 'file; name.csv']
'file;name.csv', 'file; name.csv',
('test_łóąä.txt', 'utf-8')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m/b u'test_łóąä.txt'?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I wasn't sure but I think it may be better to pass str not unicode object here
and this is why I put "# coding: utf-8" at the top of the file.

@webknjaz webknjaz requested review from a team, jaraco and webknjaz March 12, 2018 22:39
Copy link
Copy Markdown
Member

@webknjaz webknjaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…encoding

* Change _cpcompat.unquote_qs to work with unicode on py2
 - fix for #1694
@cherrypy cherrypy deleted a comment Mar 13, 2018
@cherrypy cherrypy deleted a comment Mar 13, 2018
@pawciobiel
Copy link
Copy Markdown
Contributor Author

@webknjaz I've fixed pylint stuff. Could you give it another chance please?

Comment thread cherrypy/_cpreqbody.py Outdated
self.filename = self.filename[1:-1]
elif 'filename*' in disp.params:
# @see https://tools.ietf.org/html/rfc5987
fn_encoding, _, filename = disp.params['filename*'].split("'")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please keep a language part?

Comment thread cherrypy/_cpreqbody.py Outdated
elif 'filename*' in disp.params:
# @see https://tools.ietf.org/html/rfc5987
fn_encoding, _, filename = disp.params['filename*'].split("'")
self.filename = unquote_qs(filename, fn_encoding)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether we'd want to have only one filename field. I think it's appropriate to save all of them.

We could have some @property returning just one thing though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. That seems out of scope for this change. This change only seeks to add support for non-ascii filenames. Supporting multiple filenames can be done independently.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not if it breaks access to the old field

Comment thread cherrypy/_cpcompat.py Outdated
return urllib.parse.unquote(atom_spc, encoding=encoding, errors=errors)
else:
if isinstance(atom_spc, six.text_type):
atom_spc = str(atom_spc)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this covered by tests?

Comment thread cherrypy/test/test_http.py Outdated

filename_encoding = None
if isinstance(filename, (list, tuple)):
filename, filename_encoding = filename
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's too much logic in here. Tests should be simple and keep testing old cases.

Comment thread cherrypy/test/test_http.py Outdated
'file;name.csv', 'file; name.csv']
fnames = [
'boop.csv', 'foo, bar.csv', 'bar, xxxx.csv', 'file"name.csv',
'file;name.csv', 'file; name.csv', ('test_łóąä.txt', 'utf-8')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this difference in supplied types is redundant.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the comment. The only addition is the one new filename with non-ascii characters. I don't see how that's redundant to all the other names with ascii characters.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember why I wrote it back then 🤷‍♂️

@webknjaz webknjaz requested review from a team, jaraco and webknjaz and removed request for jaraco June 18, 2018 11:28
@webknjaz
Copy link
Copy Markdown
Member

@jaraco could you take a look at this, please?

@jaraco
Copy link
Copy Markdown
Member

jaraco commented Aug 7, 2018

I apologize I haven't had time to review this yet. I still plan to do so.

@jaraco jaraco self-assigned this Aug 13, 2018
Comment thread cherrypy/_cpreqbody.py Outdated
self.filename.endswith('"')
):
self.filename = self.filename[1:-1]
elif 'filename*' in disp.params:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC says filename* should be preferred to filename if both are specified, so this should be if and not elif.

Comment thread cherrypy/_cpreqbody.py Outdated
elif 'filename*' in disp.params:
# @see https://tools.ietf.org/html/rfc5987
fn_encoding, _, filename = disp.params['filename*'].split("'")
self.filename = unquote_qs(filename, fn_encoding)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. That seems out of scope for this change. This change only seeks to add support for non-ascii filenames. Supporting multiple filenames can be done independently.

Comment thread cherrypy/_cpreqbody.py Outdated
elif 'filename*' in disp.params:
# @see https://tools.ietf.org/html/rfc5987
fn_encoding, _, filename = disp.params['filename*'].split("'")
self.filename = unquote_qs(filename, fn_encoding)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is unquote_qs the right operation here? I'm skeptical because it translates + to , which I don't see in the spec. That's also a bug with the existing _cp_compat; it's doing too much already. Let's fix that first.

Copy link
Copy Markdown
Contributor Author

@pawciobiel pawciobiel Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I can't remember exactly but I think the way I tested it on some older version was I POSTed a file with spaces and some UTF8 chars in it's name using requests and saw that requests is using quote.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requests is often not a good example of following RFCs, unfortunately.
As for + vs %20 for , I guess it depends on how client decides to encode that. I think I saw something regarding this in query params spec in some rfc.

Comment thread cherrypy/test/test_http.py Outdated
if isinstance(filename, (list, tuple)):
filename, filename_encoding = filename
if filename_encoding:
if six.PY3:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If at all possible, tests shouldn't switch Python versions. It should be possible to write universal code. Switching on Python versions adds complexity and potential pitfalls.

Comment thread cherrypy/test/test_http.py Outdated
https://github.com/cherrypy/cherrypy/issues/1146/
https://github.com/cherrypy/cherrypy/issues/1397'''
https://github.com/cherrypy/cherrypy/issues/1397
https://github.com/cherrypy/cherrypy/issues/1694'''
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have been nice if the preceding commit had formatted without the trailing end symbol, but they didn't so now we get a messy diff. Let's not repeat that mistake and instead terminate the multiline string on another line.

Comment thread cherrypy/test/test_http.py Outdated
'file;name.csv', 'file; name.csv']
fnames = [
'boop.csv', 'foo, bar.csv', 'bar, xxxx.csv', 'file"name.csv',
'file;name.csv', 'file; name.csv', ('test_łóąä.txt', 'utf-8')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the comment. The only addition is the one new filename with non-ascii characters. I don't see how that's redundant to all the other names with ascii characters.

@jaraco
Copy link
Copy Markdown
Member

jaraco commented Aug 13, 2018

This is a great PR. I have just a few nitpicks... and one thing I wish to correct before this PR is accepted (in unquote_qs). I'll continue to work it from the PR branch.

Comment thread cherrypy/_cpcompat.py Outdated
if six.PY3:
return urllib.parse.unquote(atom_spc, encoding=encoding, errors=errors)
else:
if isinstance(atom_spc, six.text_type):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping this change is unnecessary, because it complicates the behavior.

@jaraco
Copy link
Copy Markdown
Member

jaraco commented Aug 13, 2018

I've updated the _cpcompat.py with the changes I wanted to see (conflicting with this PR); tomorrow, I hope to revisit this and implement the required changes herein to address the conflict and other concerns I raised above.

@cherrypy cherrypy deleted a comment Aug 14, 2018
@cherrypy cherrypy deleted a comment Aug 14, 2018
@cherrypy cherrypy deleted a comment Aug 14, 2018
@cherrypy cherrypy deleted a comment Aug 14, 2018
@cherrypy cherrypy deleted a comment Aug 14, 2018
@cherrypy cherrypy deleted a comment Aug 14, 2018
@cherrypy cherrypy deleted a comment from jaraco Aug 14, 2018
@cherrypy cherrypy deleted a comment from jaraco Aug 14, 2018
@jaraco jaraco dismissed webknjaz’s stale review August 14, 2018 12:02

Issues were addressed.

@jaraco jaraco merged commit f82ca27 into cherrypy:master Aug 14, 2018
(key, filename))

fn_key, encoded = encode_filename(filename)
tmpl = \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use braces instead of escaping EOL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants