Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

gunicorn bad performance with file uploads and webob Request #342

Closed
wants to merge 3 commits into
from

Conversation

Projects
None yet
3 participants
Contributor

djoume commented May 9, 2012

Hi there,

We recently notice a performance issue with gunicorn when dealing with file uploads. The problem can be reproduced with this minimal wsgi app:

wsgi_app.py
8<--------------------------------------->8
from webob import Request

def app(environ, start_response):
if environ['REQUEST_METHOD'] == 'POST':
start_response('200 OK', [('content-type', 'text/html')])
req = Request(environ)
return ["DONE - ", '%s' % ('data' in req.POST)]
else:
start_response('200 OK', [('content-type', 'text/html')])
return ['

Test: <input type="file" '
'name="data">']

def app_factory(global_config, **local_conf):
return app
8<--------------------------------------->8

gunicorn.ini
8<--------------------------------------->8
[server:main]
use = egg:gunicorn#main
workers = 1
host = 0.0.0.0
port = 5000
timeout = 300

[app:main]
paste.app_factory = wsgi_app:app_factory
8<--------------------------------------->8

paste.ini
8<--------------------------------------->8
[server:main]
use = egg:Paste#http
host = 0.0.0.0
port = 5000

[app:main]
paste.app_factory = wsgi_app:app_factory
8<--------------------------------------->8

I have timed uploading a 10MB file using:

time curl -X POST 'http://localhost:5000' -F "data=@/tmp/10MB.data;type=application/octet-stream"

Using paste.ini it takes ~1 seconds, using gunicorn.ini is takes > 60 seconds

Also CPU usage is low when using paste and very high when using gunicorn.

The issue is due to gunicorn.http.body.Body.readline() reading one byte at a time using self.reader.read(1) line 243

I have fixed the issue by using the body buffer to minimize the number of calls to read().

Also the current implementation of readline() could return more than the number of bytes requested when the buffer is loaded (see tests), this is fixed as well.

@taavi taavi and 1 other commented on an outdated diff May 9, 2012

gunicorn/http/body.py
@@ -228,24 +228,19 @@ def readline(self, size=None):
return ""
line = self.buf.getvalue()
+ self.buf.truncate(0)
+ if len(line) < size:
+ line += self.reader.read(size - len(line))
+ buf = line[size:]
@taavi

taavi May 9, 2012

Shouldn't buf always be 0 length, since we asked for size - len(line) bytes? If it can ever return something, maybe a more descriptive name than buf, like extra_data?

And can self.reader.read() ever return fewer bytes than we asked for without hitting the end of the source stream? (e.g. how a socket read will hang if it has no data, but may return less than requested when it has any) That could cause this function to return an "extra" partial line.

Maybe Python protects fully against that…

@djoume

djoume May 9, 2012

Contributor

buf can be non zero if self.buf.getvalue() return more than size bytes. I agree naming can be improved, I will fix.

as for self.reader.read(), it can return fewer bytes than we asked but then we should still return what we have if we don't want to block.

@taavi taavi and 1 other commented on an outdated diff May 9, 2012

gunicorn/http/body.py
idx = line.find("\n")
if idx >= 0:
ret = line[:idx+1]
- self.buf.truncate(0)
- self.buf.write(line[idx+1:])
+ self.buf.write(line[idx+1:] + buf)
@taavi

taavi May 9, 2012

If buf continues to exist (see above comment), would it make more sense to do:

    self.buf.write(line[idx+1:])
    self.buf.write(buf)

instead of doing the string concatenation?

@djoume

djoume May 9, 2012

Contributor

yes, will do thanks

@benoitc benoitc closed this in 8df917d May 10, 2012

@hongqn hongqn referenced this pull request Dec 25, 2012

Closed

optimize readline() #460

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment