Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text returned as byte object when using 'enctype="multipart/form-data"' in forms #1352

Closed
ghost opened this issue Jan 10, 2015 · 12 comments
Closed

Comments

@ghost
Copy link

@ghost ghost commented Jan 10, 2015

Originally reported by: Juern Brodersen (Bitbucket: juern_uni, GitHub: Unknown)


If you are using 'enctype="multipart/form-data"' in forms and putting more than around 1000 characters in a text field that text field will be returned as a byte object.
Using less characters in the text field will return a string.

#!python

import cherrypy

class StringGenerator(object):
    @cherrypy.expose
    def index(self):
        return """<html>
          <head></head>
          <body>
            <form method="post" action="show" enctype="multipart/form-data">
              <textarea type="text" name="textarea" cols="50" rows="10"></textarea>
              <button type="submit">Give it now!</button>
            </form>
          </body>
        </html>"""

    @cherrypy.expose
    def show(self, **kw):
        a = str(type(kw['textarea']))
        return a[1:-1]

if __name__ == '__main__':
    cherrypy.quickstart(StringGenerator())

Less than 1000 characters will return "class 'str'" more returns "class 'bytes'"

Tested with:

python 3.4

cherrypy 3.6


@ghost
Copy link
Author

@ghost ghost commented Jan 17, 2015

Original comment by Joel Rivera (Bitbucket: cyraxjoe, GitHub: cyraxjoe):


That's the consequence of an optimization leaking the abstraction.

You can fix that with something like this:

#!python

import cherrypy


class StringGenerator(object):
    @cherrypy.expose
    def index(self):
        return """<html>
          <head></head>
          <body>
            <form method="post" action="show" enctype="multipart/form-data">
              <textarea type="text" name="textarea" cols="50" rows="10"></textarea>
              <button type="submit">Give it now!</button>
            </form>
          </body>
        </html>"""

    @cherrypy.expose
    def show(self, **kw):
        a = str(type(kw['textarea']))
        return a[1:-1]

class _BigPart(cherrypy._cpreqbody.Part):
    """
    Custom entity part with a bigger treshould on when to dump
    the content into a file.
    """
    maxrambytes = 2000

if __name__ == '__main__':
    cherrypy.quickstart(StringGenerator(), config={
        '/': {'request.body.part_class': _BigPart}
    })

But I do think that there is a bug on the leaky abstraction and needs to be fixed.

@ghost
Copy link
Author

@ghost ghost commented Jan 17, 2015

Original comment by Juern Brodersen (Bitbucket: juern_uni, GitHub: Unknown):


For now I'm just using:

#!python
for key in kw:
    if isinstance(kw[key], bytes):
        kw[key] = bytes.decode(kw[key])

Would something like that break the optimization?

@ghost
Copy link
Author

@ghost ghost commented Jan 17, 2015

Original comment by Joel Rivera (Bitbucket: cyraxjoe, GitHub: cyraxjoe):


No, that's fine. @juern_uni

@ghost
Copy link
Author

@ghost ghost commented Feb 8, 2015

Original comment by Jason R. Coombs (Bitbucket: jaraco, GitHub: jaraco):


Issue #1357 was marked as a duplicate of this issue.

@ghost
Copy link
Author

@ghost ghost commented Feb 8, 2015

Original comment by Jason R. Coombs (Bitbucket: jaraco, GitHub: jaraco):


Bumping priority as this is a failure of a pretty basic functionality of web servers.

@ghost
Copy link
Author

@ghost ghost commented Nov 14, 2015

Original comment by lakin.wecker (Bitbucket: lakin.wecker, GitHub: Unknown):


Was this on python 3 or 2? I believe I am running into the same issue here on python 2. And I have a pull request that fixes it.

Does this https://gist.github.com/lakinwecker/08b484b3ce12102ac3df reproduce the error for you? and if so, does applying this changeset - https://gist.github.com/lakinwecker/08b484b3ce12102ac3df fix it for you?

@ghost
Copy link
Author

@ghost ghost commented Nov 14, 2015

Original comment by lakin.wecker (Bitbucket: lakin.wecker, GitHub: Unknown):


I can confirm that I am seeing the same issue here in python 3. And that my pull request fixes it.

@ghost
Copy link
Author

@ghost ghost commented Nov 14, 2015

Original comment by Joel Rivera (Bitbucket: cyraxjoe, GitHub: cyraxjoe):


@lakin.wecker you put it twice the same gist on your comment. That was the intention? I guess the changeset link is the wrong one.

@ghost
Copy link
Author

@ghost ghost commented Nov 14, 2015

Original comment by lakin.wecker (Bitbucket: lakin.wecker, GitHub: Unknown):


Whoops. Yeah - I meant to link to this pull request: https://bitbucket.org/cherrypy/cherrypy/pull-requests/112/when-entity-parts-are-too-big-cp-writes/diff

The basic problem is that once you hit 1000 it writes it to a file, and then reads from the file instead of just using memory. This particular optimization is useless, because the "write to the file" then "read from the file" happens within the same request and almost immediately after one another, so the savings you get in terms of unused memory is immediately negated. Also, that code path doesn't properly attempt to decode the incoming data using us-ascii or utf-8 while the path that reads only into memory does do this decoding.

My pull request completely removes the optimization. I am thinking that this is ok, because as far as I can tell the optimization isn't sucessfull anyways.

@ghost
Copy link
Author

@ghost ghost commented Nov 15, 2015

Original comment by Joel Rivera (Bitbucket: cyraxjoe, GitHub: cyraxjoe):


I've been trying to profile the two approaches an so far the current implementation is a lot more efficient on the particular use case on which a input field is dumped into the file.

I'm testing with 10 concurrent requests submitting 5 MB in an input field.

This is the modified memory behavior:

modified_cp.png

This is the original memory behavior:

original_cp.png

After covering that. I'm not implying that is something common to submit inputs of 5MB with 10 at a time without specifying a file name. But I just wanted to see the impact on removing this change. There is also the possibility that we just fix the optimization and make sure that it returns a string instead of bytes when the file was created because of the maxrambytes limit.

@ghost
Copy link
Author

@ghost ghost commented Dec 7, 2015

Original comment by Ben Bass (Bitbucket: codedstructure, GitHub: codedstructure):


I've just encountered this issue in a project which recently moved from Python2 to Python3, exposing some nasty failures as bytes are used in dozens of different contexts which normally expect strings.

I've addressed it in the short term with the following which seems to fix things in my case, but would be nice for this to get cleaned up - it's not at all friendly to Python3 use of CherryPy...

#!python

import cherrypy
cherrypy._cpreqbody.Part.maxrambytes = 20000000
@ghost ghost added critical bug CherryPy code labels Apr 30, 2016
JB26 added a commit to JB26/cherrypy that referenced this issue Jun 3, 2016
JB26 added a commit to JB26/cherrypy that referenced this issue Jun 3, 2016
@JB26 JB26 mentioned this issue Jun 3, 2016
jaraco added a commit that referenced this issue Jun 5, 2016
Patch for #1352
@jaraco jaraco closed this Jun 5, 2016
@jaraco
Copy link
Member

@jaraco jaraco commented Jun 5, 2016

Fix released in 5.5.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.