Can't serve (very) large files #1736

lamby · 2018-03-31T16:49:34Z

This was originally filed in Debian as: http://bugs.debian.org/894512. Could it be related to #1733?

I'm writing a web application that needs to server fairly large files, in the terabyte range. I am using python3-bottle for my code, and it works just fine. However, when I run my application with gunicorn3 it doesn't work.

I've distilled this into a small test case. The application code, saved as file foo.py:

import bottle

def blob(*args, **kwargs):
    return bottle.static_file('blob', '.')

app = bottle.Bottle()
app.route(path='/blob', callback=blob)

This is the script that starts it, saved as file start.sh:

#!/bin/sh

set -eu

truncate -s 1G blob
gunicorn3 --bind 0.0.0.0:12765 foo:app

To test, run sh +x start.sh, and then from another host run:

curl http://195.201.99.89:12765/blob > blob

This always fails for me: curl complains:

curl: (18) transfer closed with 611469105 bytes remaining to read

It seems to always work over localhost. It always works when the blob is sufficiently small, such as 1024 bytes, even between hosts.

If I don't use gunicorn, and use the Bottle built-in HTTP server, it always works. Like this:

import bottle

def blob(*args, **kwargs):
    return bottle.static_file('blob', '.')

app = bottle.Bottle()
app.route(path='/blob', callback=blob)

 if __name__ == '__main__':
    app.run(host='0.0.0.0', port=12765)

I ran that on one machine, and ran curl on a different host and it worked 10 times in a row.

The text was updated successfully, but these errors were encountered:

benoitc · 2018-04-01T08:10:17Z

what are gunicorn logs ? If it blocks more than 30s when using the synchronous worker it will be killed. To achieve what you want to do you will need to use an async worker for now

ghost · 2018-04-02T13:46:50Z

You're right, the gunicorn log contains

[2018-04-02 15:40:41 +0200] [1565] [CRITICAL] WORKER TIMEOUT (pid:1568)

Adding a sufficiently long timeout works around the problem, but is not actually satisfactory to me. The worker isn't idle - it's transferring data. Given that I need to transfer quite large files, no timeout value it really reasonable. If I choose a timeout suitable for my user to transfer a terabyte, they'll next want to transfer ten terabytes, or a petabyte.

A very long timeout also makes the timeout feature fairly useless. If a worker is stuck and isn't responding, having a timeout that's days or weeks isn't very helpful. It seems to me that a timeout that applies only to an idle connection would be more useful.

As is, gunicorn doesn't seem useful to my use case.

RonRothman · 2018-04-02T15:11:49Z

@larswirzenius Did you see @benoitc's point about using async workers?

As an aside, if you're concerned about transferring petabytes of data, maybe a traditional HTTP server is not the best design for this task. I don't know your context, but have you considered a method that's better suited to moving huge amounts of data quickly (e.g. S3)?

tilgovi · 2018-04-02T17:24:33Z

You can use the threaded worker (-k gthread), even with only a single request thread, should solve your problem. The heartbeat happens on a separate thread.

benoitc · 2018-04-03T08:02:24Z

@larswirzenius either way use a gevent or eventlet worker it will do the trick. Also make sure that your framework allows to use sendfile. Depending on your need you can also bypass the supervision like the websocket example does, it will requires however that you take care of correctly closing the worker at the end.

lamby · 2018-05-31T17:49:53Z

@larswirzenius Did you manage to resolve this? :)

ghost · 2018-05-31T18:24:52Z

On Thu, 2018-05-31 at 10:50 -0700, Chris Lamb wrote: @larswirzenius Did you manage to resolve this? :)

I gave up on gunicorn for that program.

benoitc · 2018-06-01T07:44:43Z

@larswirzenius but did you try the suggestion before "giving up?"

benoitc · 2018-07-06T09:06:06Z

closing issue. sounds like we'll never know if the solutions have been tried or not.

lamby · 2018-07-06T09:09:20Z

:(

benoitc · 2018-07-06T09:17:44Z

@lamby it seems i mixed the answers. reopening it to see if something can be done :)

benoitc · 2019-11-22T20:36:57Z

closing the issue since no activity happened since awhile. feel free to open a new ticket if needed.

johncronan · 2022-04-30T19:21:56Z

I had this same problem. Streaming HTTP response, the gunicorn timeout applied, even though it was transferring data. (With my network transfer throttled on the client side, it would not do it though - interesting.)

You can use the threaded worker (-k gthread), even with only a single request thread, should solve your problem.

It works great! I added worker_class = 'gthread' and threads = 1 to my gunicorn config file, and it's allowing the long downloads now.

benoitc closed this as completed Jul 6, 2018

benoitc reopened this Jul 6, 2018

benoitc closed this as completed Nov 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't serve (very) large files #1736

Can't serve (very) large files #1736

lamby commented Mar 31, 2018

benoitc commented Apr 1, 2018 •

edited by tilgovi

ghost commented Apr 2, 2018

RonRothman commented Apr 2, 2018

tilgovi commented Apr 2, 2018

benoitc commented Apr 3, 2018

lamby commented May 31, 2018

ghost commented May 31, 2018 via email

benoitc commented Jun 1, 2018

benoitc commented Jul 6, 2018 •

edited

lamby commented Jul 6, 2018

benoitc commented Jul 6, 2018

benoitc commented Nov 22, 2019

johncronan commented Apr 30, 2022

Can't serve (very) large files #1736

Can't serve (very) large files #1736

Comments

lamby commented Mar 31, 2018

benoitc commented Apr 1, 2018 • edited by tilgovi

ghost commented Apr 2, 2018

RonRothman commented Apr 2, 2018

tilgovi commented Apr 2, 2018

benoitc commented Apr 3, 2018

lamby commented May 31, 2018

ghost commented May 31, 2018 via email

benoitc commented Jun 1, 2018

benoitc commented Jul 6, 2018 • edited

lamby commented Jul 6, 2018

benoitc commented Jul 6, 2018

benoitc commented Nov 22, 2019

johncronan commented Apr 30, 2022

benoitc commented Apr 1, 2018 •

edited by tilgovi

benoitc commented Jul 6, 2018 •

edited