Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix _cpreqbody.Entity to handle Content-Disposition filename* with encoding #1695

Merged
merged 10 commits into from Aug 14, 2018

Conversation

@pawciobiel
Copy link
Contributor

@pawciobiel pawciobiel commented Mar 1, 2018

  • Fix _cpreqbody.Entity to handle Content-Disposition filename* with encoding

  • Change _cpcompat.unquote_qs to work with unicode on py2

  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

fix for #1694

  • What is the new behavior (if this is a feature change)?

File uploads with
Content-Disposition: form-data; name="myFile"; filename*=utf-8''upload_test_file_%C5%82%C3%B3%C4%85%C3%A4.txt
should work now.

@pawciobiel pawciobiel force-pushed the pawciobiel:1694-fix-for-encoded-filenames branch from 6e6d510 to 12b9183 Mar 6, 2018
@codecov
Copy link

@codecov codecov bot commented Mar 6, 2018

Codecov Report

Merging #1695 into master will decrease coverage by 0.08%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1695      +/-   ##
==========================================
- Coverage   80.73%   80.65%   -0.09%     
==========================================
  Files         105      105              
  Lines       13570    13591      +21     
==========================================
+ Hits        10956    10962       +6     
- Misses       2614     2629      +15
@cherrypy cherrypy deleted a comment Mar 7, 2018
@cherrypy cherrypy deleted a comment Mar 7, 2018
# We'll upload a bunch of files with differing names.
fnames = ['boop.csv', 'foo, bar.csv', 'bar, xxxx.csv', 'file"name.csv',
'file;name.csv', 'file; name.csv']
'file;name.csv', 'file; name.csv',
('test_łóąä.txt', 'utf-8')

This comment has been minimized.

@webknjaz

webknjaz Mar 12, 2018
Member

m/b u'test_łóąä.txt'?

This comment has been minimized.

@pawciobiel

pawciobiel Mar 13, 2018
Author Contributor

So I wasn't sure but I think it may be better to pass str not unicode object here
and this is why I put "# coding: utf-8" at the top of the file.

@webknjaz webknjaz requested review from jaraco, webknjaz and cherrypy/contributors Mar 12, 2018
Copy link
Member

@webknjaz webknjaz left a comment

…encoding

* Change _cpcompat.unquote_qs to work with unicode on py2
 - fix for #1694
@pawciobiel pawciobiel force-pushed the pawciobiel:1694-fix-for-encoded-filenames branch from 12b9183 to af2ffdf Mar 13, 2018
@cherrypy cherrypy deleted a comment Mar 13, 2018
@cherrypy cherrypy deleted a comment Mar 13, 2018
@pawciobiel
Copy link
Contributor Author

@pawciobiel pawciobiel commented Apr 6, 2018

@webknjaz I've fixed pylint stuff. Could you give it another chance please?

@jaraco jaraco force-pushed the cherrypy:master branch from b7344b6 to 4ce8547 May 28, 2018
@@ -470,6 +470,10 @@ def __init__(self, fp, headers, params=None, parts=None):
self.filename.endswith('"')
):
self.filename = self.filename[1:-1]
elif 'filename*' in disp.params:
# @see https://tools.ietf.org/html/rfc5987
fn_encoding, _, filename = disp.params['filename*'].split("'")

This comment has been minimized.

@webknjaz

webknjaz Jun 18, 2018
Member

Could you please keep a language part?

elif 'filename*' in disp.params:
# @see https://tools.ietf.org/html/rfc5987
fn_encoding, _, filename = disp.params['filename*'].split("'")
self.filename = unquote_qs(filename, fn_encoding)

This comment has been minimized.

@webknjaz

webknjaz Jun 18, 2018
Member

I'm not sure whether we'd want to have only one filename field. I think it's appropriate to save all of them.

We could have some @property returning just one thing though.

This comment has been minimized.

@jaraco

jaraco Aug 13, 2018
Member

Hmm. That seems out of scope for this change. This change only seeks to add support for non-ascii filenames. Supporting multiple filenames can be done independently.

This comment has been minimized.

@webknjaz

webknjaz Aug 16, 2018
Member

Not if it breaks access to the old field

return urllib.parse.unquote(atom_spc, encoding=encoding, errors=errors)
else:
if isinstance(atom_spc, six.text_type):
atom_spc = str(atom_spc)

This comment has been minimized.

@webknjaz

webknjaz Jun 18, 2018
Member

Is this covered by tests?


filename_encoding = None
if isinstance(filename, (list, tuple)):
filename, filename_encoding = filename

This comment has been minimized.

@webknjaz

webknjaz Jun 18, 2018
Member

There's too much logic in here. Tests should be simple and keep testing old cases.

'file;name.csv', 'file; name.csv']
fnames = [
'boop.csv', 'foo, bar.csv', 'bar, xxxx.csv', 'file"name.csv',
'file;name.csv', 'file; name.csv', ('test_łóąä.txt', 'utf-8')

This comment has been minimized.

@webknjaz

webknjaz Jun 18, 2018
Member

this difference in supplied types is redundant.

This comment has been minimized.

@jaraco

jaraco Aug 13, 2018
Member

I don't understand the comment. The only addition is the one new filename with non-ascii characters. I don't see how that's redundant to all the other names with ascii characters.

This comment has been minimized.

@webknjaz

webknjaz Aug 16, 2018
Member

I don't remember why I wrote it back then 🤷‍♂️

@webknjaz webknjaz requested review from jaraco, webknjaz and cherrypy/contributors and removed request for jaraco Jun 18, 2018
@webknjaz
Copy link
Member

@webknjaz webknjaz commented Jun 18, 2018

@jaraco could you take a look at this, please?

@jaraco
Copy link
Member

@jaraco jaraco commented Aug 7, 2018

I apologize I haven't had time to review this yet. I still plan to do so.

@jaraco jaraco self-assigned this Aug 13, 2018
@@ -470,6 +470,10 @@ def __init__(self, fp, headers, params=None, parts=None):
self.filename.endswith('"')
):
self.filename = self.filename[1:-1]
elif 'filename*' in disp.params:

This comment has been minimized.

@jaraco

jaraco Aug 13, 2018
Member

RFC says filename* should be preferred to filename if both are specified, so this should be if and not elif.

elif 'filename*' in disp.params:
# @see https://tools.ietf.org/html/rfc5987
fn_encoding, _, filename = disp.params['filename*'].split("'")
self.filename = unquote_qs(filename, fn_encoding)

This comment has been minimized.

@jaraco

jaraco Aug 13, 2018
Member

Hmm. That seems out of scope for this change. This change only seeks to add support for non-ascii filenames. Supporting multiple filenames can be done independently.

elif 'filename*' in disp.params:
# @see https://tools.ietf.org/html/rfc5987
fn_encoding, _, filename = disp.params['filename*'].split("'")
self.filename = unquote_qs(filename, fn_encoding)

This comment has been minimized.

@jaraco

jaraco Aug 13, 2018
Member

Is unquote_qs the right operation here? I'm skeptical because it translates + to , which I don't see in the spec. That's also a bug with the existing _cp_compat; it's doing too much already. Let's fix that first.

This comment has been minimized.

@pawciobiel

pawciobiel Aug 16, 2018
Author Contributor

So I can't remember exactly but I think the way I tested it on some older version was I POSTed a file with spaces and some UTF8 chars in it's name using requests and saw that requests is using quote.

This comment has been minimized.

@webknjaz

webknjaz Aug 16, 2018
Member

requests is often not a good example of following RFCs, unfortunately.
As for + vs %20 for , I guess it depends on how client decides to encode that. I think I saw something regarding this in query params spec in some rfc.

if isinstance(filename, (list, tuple)):
filename, filename_encoding = filename
if filename_encoding:
if six.PY3:

This comment has been minimized.

@jaraco

jaraco Aug 13, 2018
Member

If at all possible, tests shouldn't switch Python versions. It should be possible to write universal code. Switching on Python versions adds complexity and potential pitfalls.

@@ -160,10 +180,13 @@ def test_post_filename_with_special_characters(self):
'''Testing that we can handle filenames with special characters. This
was reported as a bug in:
https://github.com/cherrypy/cherrypy/issues/1146/
https://github.com/cherrypy/cherrypy/issues/1397'''
https://github.com/cherrypy/cherrypy/issues/1397
https://github.com/cherrypy/cherrypy/issues/1694'''

This comment has been minimized.

@jaraco

jaraco Aug 13, 2018
Member

It would have been nice if the preceding commit had formatted without the trailing end symbol, but they didn't so now we get a messy diff. Let's not repeat that mistake and instead terminate the multiline string on another line.

'file;name.csv', 'file; name.csv']
fnames = [
'boop.csv', 'foo, bar.csv', 'bar, xxxx.csv', 'file"name.csv',
'file;name.csv', 'file; name.csv', ('test_łóąä.txt', 'utf-8')

This comment has been minimized.

@jaraco

jaraco Aug 13, 2018
Member

I don't understand the comment. The only addition is the one new filename with non-ascii characters. I don't see how that's redundant to all the other names with ascii characters.

@jaraco
Copy link
Member

@jaraco jaraco commented Aug 13, 2018

This is a great PR. I have just a few nitpicks... and one thing I wish to correct before this PR is accepted (in unquote_qs). I'll continue to work it from the PR branch.

if six.PY3:
return urllib.parse.unquote(atom_spc, encoding=encoding, errors=errors)
else:
if isinstance(atom_spc, six.text_type):

This comment has been minimized.

@jaraco

jaraco Aug 13, 2018
Member

I'm hoping this change is unnecessary, because it complicates the behavior.

@jaraco
Copy link
Member

@jaraco jaraco commented Aug 13, 2018

I've updated the _cpcompat.py with the changes I wanted to see (conflicting with this PR); tomorrow, I hope to revisit this and implement the required changes herein to address the conflict and other concerns I raised above.

@jaraco
jaraco approved these changes Aug 14, 2018
@jaraco jaraco force-pushed the pawciobiel:1694-fix-for-encoded-filenames branch from 4d93c5f to f034458 Aug 14, 2018
@cherrypy cherrypy deleted a comment Aug 14, 2018
@cherrypy cherrypy deleted a comment Aug 14, 2018
@cherrypy cherrypy deleted a comment Aug 14, 2018
@cherrypy cherrypy deleted a comment Aug 14, 2018
@cherrypy cherrypy deleted a comment Aug 14, 2018
@cherrypy cherrypy deleted a comment Aug 14, 2018
@cherrypy cherrypy deleted a comment from jaraco Aug 14, 2018
@cherrypy cherrypy deleted a comment from jaraco Aug 14, 2018
@jaraco jaraco dismissed webknjaz’s stale review Aug 14, 2018

Issues were addressed.

@jaraco jaraco merged commit f82ca27 into cherrypy:master Aug 14, 2018
1 of 4 checks passed
1 of 4 checks passed
LGTM analysis: Python Analysis Failed (could not build the base commit (a3a3a4d))
Details
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
WIP ready for review
Details
(key, filename))

fn_key, encoded = encode_filename(filename)
tmpl = \

This comment has been minimized.

@webknjaz

webknjaz Aug 16, 2018
Member

please use braces instead of escaping EOL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.