Silent failing task on large arg size #318

simonz05 opened this Issue Feb 8, 2011 · 21 comments


None yet
4 participants

simonz05 commented Feb 8, 2011

My test-case is a file containing 7000 chars of Lorem Ipsum text. The contents of the file is sent as one argument to an arbitrary task.

$ wc
103 1058 7000

Redis is configured as DB and Broker backend. The case fails on v2.2.0 celery, v1.0.2 kombu. I stepped through the code on the client-side of the program. That is, from apply_async, till the task is published on the 'ae.undeliver' queue. The task is successfully added to the 'ae.undeliver' list, but it does not show up as a unique entry in the backend, as it should.

The same tasks are successfully executed when the argument size is reduced to about 1/3 of the text file. This indicates that there is a hidden limit to argument size, which does not look as if it is intended, since no error is raised.


ask commented Feb 11, 2011

Could you please provide a sample project reproducing this issue?

It would save me a lot of time :)

I created a small project which will reproduce the issue. I've added a readme file which explains the steps to install and reproduce it.


ask commented Mar 27, 2011

I haven't had the time to look at this properly yet, mainly because I don't have another host around.

I will be installing ubuntu on vmware soon to track down the issue.

Thanks for the update.

I could setup a user for you on two hosts running Ubuntu 10.04 if you are interested. Could even setup my test-case again and save you the trouble. Obviously, you might want to be in control of the environment, but it's an open offer so let me know.


ask commented Apr 2, 2011

That would be great, maybe you have Skype? (

simonz05 commented Apr 2, 2011

I'll see if I have some time to setup the hosts tomorrow and contact
you by mail when I have it done. I'm usually on Skype with my regular
email( so we can probably arrange something when it
suits you.

On 3 April 2011 00:12, ask

That would be great, maybe you have Skype?  (

Reply to this email directly or view it on GitHub:

I encountered a very similar if not the same problem today. The problem is in Kombu's pyredis module, specifically setting the Redis socket to nonblocking with sock.setblocking(0).

The socket fails reading the response of BRPOP on line 210 of, raising a ConnectionnError, which ends up raising Empty() on line 212. It might be nice to add a log entry if Empty is raised to give some indication of failure for future debugging.

The error is: "Error while reading from socket: Resource temporarily unavailable". This typically happens when you're trying to read a socket that's concurrently being written to. As simonz05 suggests, it only happens on larger payloads, assumably where the message is broken up into multiple TCP packets. It also explains why it's harder (if not impossible) to repro when Redis is running locally, because the socket transfer doesn't have to travel across the wire.

It looks like you're jumping through some hoops to get your event loop to play well with Redis. I'm open to making redis-py play better with eventio is that'd make this easier, though it will probably take a little time to research what's best, and try to maintain backwards compatibility.

Let's discuss in IRC.


ask commented Apr 6, 2011

@simonz05 and @andymccurdy

Could you try applying this patch to kombu/transport/ ?

The setblocking calls are actually left-overs from a solution that didn't use select/kqueue/epoll.


ask commented Apr 6, 2011

I just managed to reproduce myself by calling this task:

def foo(text):

and calling it with a megabyte of data:

 >>> foo.apply_async(("x" * 1024 * 1024, ))

Removing the setblocking calls solves the issue, and doesn't look to have any
other side effects.


ask commented Apr 6, 2011

While testing I managed to crash Redis though :)

[65478] 06 Apr 13:03:49 - 4 clients connected (0 slaves), 969392 bytes in use
[65478] 06 Apr 13:03:49 - Accepted
[65478] 06 Apr 13:03:49 # === ASSERTION FAILED ===
[65478] 06 Apr 13:03:49 # ==> t_list.c:820 'ln != listFirst(clients)' is not true
[65478] 06 Apr 13:03:49 # (forcing SIGSEGV in order to print the stack trace)
[65478] 06 Apr 13:03:49 # ======= Ooops! Redis 2.2.2 got signal: -11- =======
[65478] 06 Apr 13:03:49 # redis_version:2.2.2
[65478] 06 Apr 13:03:49 # 1 redis-server 0x00000001000294ef 0x0 + 4295136495
[65478] 06 Apr 13:03:49 # 2 ??? 0x00007fff5fbfe420 0x0 + 140734799799328
[65478] 06 Apr 13:03:49 # 3 redis-server 0x0000000100019932 0x0 + 4295072050
[65478] 06 Apr 13:03:49 # 4 redis-server 0x000000010001a0c4 0x0 + 4295073988
[65478] 06 Apr 13:03:49 # 5 redis-server 0x0000000100007921 0x0 + 4294998305
[65478] 06 Apr 13:03:49 # 6 redis-server 0x000000010000f1a8 0x0 + 4295029160
[65478] 06 Apr 13:03:49 # 7 redis-server 0x000000010000f280 0x0 + 4295029376
[65478] 06 Apr 13:03:49 # 8 redis-server 0x0000000100001891 0x0 + 4294973585
[65478] 06 Apr 13:03:49 # 9 redis-server 0x0000000100001b9e 0x0 + 4294974366
[65478] 06 Apr 13:03:49 # 10 redis-server 0x00000001000071b8 0x0 + 4294996408
[65478] 06 Apr 13:03:49 # 11 redis-server 0x00000001000008e4 0x0 + 4294969572
Segmentation fault

simonz05 commented Apr 7, 2011

On 6 April 2011 12:55, ask

@simonz05 and @andymccurdy

Could you try applying this patch to kombu/transport/ ?

The setblocking calls are actually left-overs from a solution that didn't use select/kqueue/epoll.

Hi, I just briefly had some time to test this patch yesterday, and I
didn't see any different behavior, but I assumed I did something
wrong. Furthermore, I'm happy to see you were able to create a fix
for the issue with kombu 348babd commit. So thanks for taking the time
to look at it.



ask commented Apr 16, 2011

@simonz05 confirms on IRC that the issue has not occurred since applying the patch.. Reopen if similar symptoms are experienced.

ask closed this Apr 16, 2011

steeve reopened this Aug 10, 2012


steeve commented Aug 10, 2012

I'm having the same issue with Celery 3.0.5 and AMQP.
Note that this doesn't happen on localhost, but if the broker is on another machine.

I'm running Mountain Lion, and RabbitMQ 2.8.5 runs on Debian.

Code to reproduce:

from celery import Celery

celery = Celery('tasks', broker=BROKER_URL)
celery.conf.CELERYD_POOL = "solo"

def one(data):
    print "Data size is %d" % len(data)
    return None

if __name__ == "__main__":
    one.delay("A" * 1024 * 90)

one never gets called.


steeve commented Aug 10, 2012

After investigation, it seems the socket is now completely out of sync and when AMQP tries to read the header it will read the last message data.

I've also tried to send big amounts of data using Kombu alone, and it works fine. So I believe it's in Celery.


steeve commented Aug 10, 2012

And also having the exact same issues (ConnectionError, Resource not available making Kombu believe that the socket is empty, when it's not.)


steeve commented Aug 20, 2012

Okay this is much worse than I thought, on 3.0.6, if you do:

    for x in xrange(50):
        one.delay("A" * 1024)

It will crash, and in some case triggered this:

[2012-08-20 18:30:46,406: ERROR/MainProcess] Unrecoverable error: AMQPError('Framing Error, received 0x00 while expecting 0xce', None, None, None, '')
Traceback (most recent call last):
  File "/Users/steeve/projects/celery/celery/worker/", line 353, in start
  File "/Users/steeve/projects/celery/celery/worker/", line 369, in start
  File "/Users/steeve/projects/celery/celery/worker/", line 453, in consume_messages
    readers[fileno](fileno, event)
  File "/usr/local/lib/python2.7/site-packages/kombu/", line 195, in drain_nowait
  File "/usr/local/lib/python2.7/site-packages/kombu/", line 191, in drain_events
    return self.transport.drain_events(self.connection, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/", line 84, in drain_events
    return connection.drain_events(**kwargs)
  File "/Users/steeve/projects/instance/lib/python2.7/site-packages/amqp/", line 264, in drain_events
    chanmap, None, timeout=timeout)
  File "/Users/steeve/projects/instance/lib/python2.7/site-packages/amqp/", line 322, in _wait_multiple
    channel, method_sig, args, content = read_timeout(timeout)
  File "/Users/steeve/projects/instance/lib/python2.7/site-packages/amqp/", line 296, in read_timeout
    return self.method_reader.read_method()
  File "/Users/steeve/projects/instance/lib/python2.7/site-packages/amqp/", line 186, in read_method
    raise m
AMQPError: Framing Error, received 0x00 while expecting 0xce

ask commented Aug 20, 2012

@steeve: I start the worker with:

$ celery worker -l info -b amqplib://

and then:

$ celery shell
>>>  _=[accept_anything.delay('A' * 1024) for i in xrange(50)]

and I have no issues,

I can even do:

>>>  _=[accept_anything.delay('A' * 1024) for i in xrange(10000)]

So there must be something in your environment causing this (setting, or something else)


steeve commented Aug 21, 2012

@ask this doesn't happen on localhost, you have to run the broker on another machine for that to happen

steeve referenced this issue Aug 21, 2012


Framing Error #922


steeve commented Aug 24, 2012

Okay I've did some more investigating:

from celery import Celery

celery = Celery()

def big(data):
    return len(data)

In the shell:

$ celery shell -b amqplib://guest:guest@ -A tasks
In [1]: big.delay("A" * 1024 * 1000)
Out[1]: <AsyncResult: 39c5c9a3-d094-4665-a888-c2e906bf8ccb>

This works:

$ celery worker -b amqplib://guest:guest@ -A tasks -l info -P gevent

This doesn't:

$ celery worker -b amqplib://guest:guest@ -A tasks -l info -P solo

It doesn't work with anything but gevent as a pool, or if celery.conf.CELERYD_POOL is set after app creation, even with gevent.


ask commented Aug 29, 2012

@steeve Your amqplib related issue has been fixed in kombu 2.4.4 (or amqp 0.9.3 if pyamqp:// is used)

ask closed this Aug 29, 2012


steeve commented Aug 30, 2012

Yep, I can confirm it's fixed! Great work @ask!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment