New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text returned as byte object when using 'enctype="multipart/form-data"' in forms #1352

Closed
bb-migration opened this Issue Jan 10, 2015 · 12 comments

Comments

Projects
None yet
2 participants
@bb-migration

bb-migration commented Jan 10, 2015

Originally reported by: Juern Brodersen (Bitbucket: juern_uni, GitHub: Unknown)


If you are using 'enctype="multipart/form-data"' in forms and putting more than around 1000 characters in a text field that text field will be returned as a byte object.
Using less characters in the text field will return a string.

#!python

import cherrypy

class StringGenerator(object):
    @cherrypy.expose
    def index(self):
        return """<html>
          <head></head>
          <body>
            <form method="post" action="show" enctype="multipart/form-data">
              <textarea type="text" name="textarea" cols="50" rows="10"></textarea>
              <button type="submit">Give it now!</button>
            </form>
          </body>
        </html>"""

    @cherrypy.expose
    def show(self, **kw):
        a = str(type(kw['textarea']))
        return a[1:-1]

if __name__ == '__main__':
    cherrypy.quickstart(StringGenerator())

Less than 1000 characters will return "class 'str'" more returns "class 'bytes'"

Tested with:

python 3.4

cherrypy 3.6


@bb-migration

This comment has been minimized.

Show comment
Hide comment
@bb-migration

bb-migration Jan 17, 2015

Original comment by Joel Rivera (Bitbucket: cyraxjoe, GitHub: cyraxjoe):


That's the consequence of an optimization leaking the abstraction.

You can fix that with something like this:

#!python

import cherrypy


class StringGenerator(object):
    @cherrypy.expose
    def index(self):
        return """<html>
          <head></head>
          <body>
            <form method="post" action="show" enctype="multipart/form-data">
              <textarea type="text" name="textarea" cols="50" rows="10"></textarea>
              <button type="submit">Give it now!</button>
            </form>
          </body>
        </html>"""

    @cherrypy.expose
    def show(self, **kw):
        a = str(type(kw['textarea']))
        return a[1:-1]

class _BigPart(cherrypy._cpreqbody.Part):
    """
    Custom entity part with a bigger treshould on when to dump
    the content into a file.
    """
    maxrambytes = 2000

if __name__ == '__main__':
    cherrypy.quickstart(StringGenerator(), config={
        '/': {'request.body.part_class': _BigPart}
    })

But I do think that there is a bug on the leaky abstraction and needs to be fixed.

bb-migration commented Jan 17, 2015

Original comment by Joel Rivera (Bitbucket: cyraxjoe, GitHub: cyraxjoe):


That's the consequence of an optimization leaking the abstraction.

You can fix that with something like this:

#!python

import cherrypy


class StringGenerator(object):
    @cherrypy.expose
    def index(self):
        return """<html>
          <head></head>
          <body>
            <form method="post" action="show" enctype="multipart/form-data">
              <textarea type="text" name="textarea" cols="50" rows="10"></textarea>
              <button type="submit">Give it now!</button>
            </form>
          </body>
        </html>"""

    @cherrypy.expose
    def show(self, **kw):
        a = str(type(kw['textarea']))
        return a[1:-1]

class _BigPart(cherrypy._cpreqbody.Part):
    """
    Custom entity part with a bigger treshould on when to dump
    the content into a file.
    """
    maxrambytes = 2000

if __name__ == '__main__':
    cherrypy.quickstart(StringGenerator(), config={
        '/': {'request.body.part_class': _BigPart}
    })

But I do think that there is a bug on the leaky abstraction and needs to be fixed.

@bb-migration

This comment has been minimized.

Show comment
Hide comment
@bb-migration

bb-migration Jan 17, 2015

Original comment by Juern Brodersen (Bitbucket: juern_uni, GitHub: Unknown):


For now I'm just using:

#!python
for key in kw:
    if isinstance(kw[key], bytes):
        kw[key] = bytes.decode(kw[key])

Would something like that break the optimization?

bb-migration commented Jan 17, 2015

Original comment by Juern Brodersen (Bitbucket: juern_uni, GitHub: Unknown):


For now I'm just using:

#!python
for key in kw:
    if isinstance(kw[key], bytes):
        kw[key] = bytes.decode(kw[key])

Would something like that break the optimization?

@bb-migration

This comment has been minimized.

Show comment
Hide comment
@bb-migration

bb-migration Jan 17, 2015

Original comment by Joel Rivera (Bitbucket: cyraxjoe, GitHub: cyraxjoe):


No, that's fine. @juern_uni

bb-migration commented Jan 17, 2015

Original comment by Joel Rivera (Bitbucket: cyraxjoe, GitHub: cyraxjoe):


No, that's fine. @juern_uni

@bb-migration

This comment has been minimized.

Show comment
Hide comment
@bb-migration

bb-migration Feb 8, 2015

Original comment by Jason R. Coombs (Bitbucket: jaraco, GitHub: jaraco):


Issue #1357 was marked as a duplicate of this issue.

bb-migration commented Feb 8, 2015

Original comment by Jason R. Coombs (Bitbucket: jaraco, GitHub: jaraco):


Issue #1357 was marked as a duplicate of this issue.

@bb-migration

This comment has been minimized.

Show comment
Hide comment
@bb-migration

bb-migration Feb 8, 2015

Original comment by Jason R. Coombs (Bitbucket: jaraco, GitHub: jaraco):


Bumping priority as this is a failure of a pretty basic functionality of web servers.

bb-migration commented Feb 8, 2015

Original comment by Jason R. Coombs (Bitbucket: jaraco, GitHub: jaraco):


Bumping priority as this is a failure of a pretty basic functionality of web servers.

@bb-migration

This comment has been minimized.

Show comment
Hide comment
@bb-migration

bb-migration Nov 14, 2015

Original comment by lakin.wecker (Bitbucket: lakin.wecker, GitHub: Unknown):


Was this on python 3 or 2? I believe I am running into the same issue here on python 2. And I have a pull request that fixes it.

Does this https://gist.github.com/lakinwecker/08b484b3ce12102ac3df reproduce the error for you? and if so, does applying this changeset - https://gist.github.com/lakinwecker/08b484b3ce12102ac3df fix it for you?

bb-migration commented Nov 14, 2015

Original comment by lakin.wecker (Bitbucket: lakin.wecker, GitHub: Unknown):


Was this on python 3 or 2? I believe I am running into the same issue here on python 2. And I have a pull request that fixes it.

Does this https://gist.github.com/lakinwecker/08b484b3ce12102ac3df reproduce the error for you? and if so, does applying this changeset - https://gist.github.com/lakinwecker/08b484b3ce12102ac3df fix it for you?

@bb-migration

This comment has been minimized.

Show comment
Hide comment
@bb-migration

bb-migration Nov 14, 2015

Original comment by lakin.wecker (Bitbucket: lakin.wecker, GitHub: Unknown):


I can confirm that I am seeing the same issue here in python 3. And that my pull request fixes it.

bb-migration commented Nov 14, 2015

Original comment by lakin.wecker (Bitbucket: lakin.wecker, GitHub: Unknown):


I can confirm that I am seeing the same issue here in python 3. And that my pull request fixes it.

@bb-migration

This comment has been minimized.

Show comment
Hide comment
@bb-migration

bb-migration Nov 14, 2015

Original comment by Joel Rivera (Bitbucket: cyraxjoe, GitHub: cyraxjoe):


@lakin.wecker you put it twice the same gist on your comment. That was the intention? I guess the changeset link is the wrong one.

bb-migration commented Nov 14, 2015

Original comment by Joel Rivera (Bitbucket: cyraxjoe, GitHub: cyraxjoe):


@lakin.wecker you put it twice the same gist on your comment. That was the intention? I guess the changeset link is the wrong one.

@bb-migration

This comment has been minimized.

Show comment
Hide comment
@bb-migration

bb-migration Nov 14, 2015

Original comment by lakin.wecker (Bitbucket: lakin.wecker, GitHub: Unknown):


Whoops. Yeah - I meant to link to this pull request: https://bitbucket.org/cherrypy/cherrypy/pull-requests/112/when-entity-parts-are-too-big-cp-writes/diff

The basic problem is that once you hit 1000 it writes it to a file, and then reads from the file instead of just using memory. This particular optimization is useless, because the "write to the file" then "read from the file" happens within the same request and almost immediately after one another, so the savings you get in terms of unused memory is immediately negated. Also, that code path doesn't properly attempt to decode the incoming data using us-ascii or utf-8 while the path that reads only into memory does do this decoding.

My pull request completely removes the optimization. I am thinking that this is ok, because as far as I can tell the optimization isn't sucessfull anyways.

bb-migration commented Nov 14, 2015

Original comment by lakin.wecker (Bitbucket: lakin.wecker, GitHub: Unknown):


Whoops. Yeah - I meant to link to this pull request: https://bitbucket.org/cherrypy/cherrypy/pull-requests/112/when-entity-parts-are-too-big-cp-writes/diff

The basic problem is that once you hit 1000 it writes it to a file, and then reads from the file instead of just using memory. This particular optimization is useless, because the "write to the file" then "read from the file" happens within the same request and almost immediately after one another, so the savings you get in terms of unused memory is immediately negated. Also, that code path doesn't properly attempt to decode the incoming data using us-ascii or utf-8 while the path that reads only into memory does do this decoding.

My pull request completely removes the optimization. I am thinking that this is ok, because as far as I can tell the optimization isn't sucessfull anyways.

@bb-migration

This comment has been minimized.

Show comment
Hide comment
@bb-migration

bb-migration Nov 15, 2015

Original comment by Joel Rivera (Bitbucket: cyraxjoe, GitHub: cyraxjoe):


I've been trying to profile the two approaches an so far the current implementation is a lot more efficient on the particular use case on which a input field is dumped into the file.

I'm testing with 10 concurrent requests submitting 5 MB in an input field.

This is the modified memory behavior:

modified_cp.png

This is the original memory behavior:

original_cp.png

After covering that. I'm not implying that is something common to submit inputs of 5MB with 10 at a time without specifying a file name. But I just wanted to see the impact on removing this change. There is also the possibility that we just fix the optimization and make sure that it returns a string instead of bytes when the file was created because of the maxrambytes limit.

bb-migration commented Nov 15, 2015

Original comment by Joel Rivera (Bitbucket: cyraxjoe, GitHub: cyraxjoe):


I've been trying to profile the two approaches an so far the current implementation is a lot more efficient on the particular use case on which a input field is dumped into the file.

I'm testing with 10 concurrent requests submitting 5 MB in an input field.

This is the modified memory behavior:

modified_cp.png

This is the original memory behavior:

original_cp.png

After covering that. I'm not implying that is something common to submit inputs of 5MB with 10 at a time without specifying a file name. But I just wanted to see the impact on removing this change. There is also the possibility that we just fix the optimization and make sure that it returns a string instead of bytes when the file was created because of the maxrambytes limit.

@bb-migration

This comment has been minimized.

Show comment
Hide comment
@bb-migration

bb-migration Dec 7, 2015

Original comment by Ben Bass (Bitbucket: codedstructure, GitHub: codedstructure):


I've just encountered this issue in a project which recently moved from Python2 to Python3, exposing some nasty failures as bytes are used in dozens of different contexts which normally expect strings.

I've addressed it in the short term with the following which seems to fix things in my case, but would be nice for this to get cleaned up - it's not at all friendly to Python3 use of CherryPy...

#!python

import cherrypy
cherrypy._cpreqbody.Part.maxrambytes = 20000000

bb-migration commented Dec 7, 2015

Original comment by Ben Bass (Bitbucket: codedstructure, GitHub: codedstructure):


I've just encountered this issue in a project which recently moved from Python2 to Python3, exposing some nasty failures as bytes are used in dozens of different contexts which normally expect strings.

I've addressed it in the short term with the following which seems to fix things in my case, but would be nice for this to get cleaned up - it's not at all friendly to Python3 use of CherryPy...

#!python

import cherrypy
cherrypy._cpreqbody.Part.maxrambytes = 20000000

JB26 added a commit to JB26/cherrypy that referenced this issue Jun 3, 2016

JB26 added a commit to JB26/cherrypy that referenced this issue Jun 3, 2016

@JB26 JB26 referenced this issue Jun 3, 2016

Merged

Patch for #1352 #1438

jaraco added a commit that referenced this issue Jun 5, 2016

@jaraco jaraco closed this Jun 5, 2016

@jaraco

This comment has been minimized.

Show comment
Hide comment
@jaraco

jaraco Jun 5, 2016

Member

Fix released in 5.5.0.

Member

jaraco commented Jun 5, 2016

Fix released in 5.5.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment