Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Crashes with broken pipe #16

Closed
spulec opened this Issue · 16 comments

8 participants

@spulec

This makes our workers crash/hang every hour or so.

Traceback (most recent call last):
File "../lib/python2.7/site-packages/billiard/process.py", line 273, in _bootstrap
self.run()
File "../lib/python2.7/site-packages/billiard/process.py", line 122, in run
self._target(*self._args, **self._kwargs)
File "../lib/python2.7/site-packages/billiard/pool.py", line 302, in worker
put((ACK, (job, i, time.time(), pid)))
File "../lib/python2.7/site-packages/billiard/queues.py", line 377, in put
return send(obj)
IOError: [Errno 32] Broken pipe

celery==3.0.7
django-celery==3.0.4
kombu==2.4.3
billiard==2.7.3.12

We're using the SQS backend.

@mitar

We are observing similar problems on:

software -> celery:3.0.10 (Chiastic Slide) kombu:2.4.7 py:2.7.3
            billiard:2.7.3.15 pymongo:2.3
platform -> system:Linux arch:64bit, ELF imp:CPython
loader   -> djcelery.loaders.DjangoLoader
settings -> transport:mongodb results:mongodb

When running:

python manage.py celery worker --loglevel=info --concurrency=4 --maxtasksperchild=10

But it does not happen when run as:

python manage.py celery worker --loglevel=info --concurrency=1 --maxtasksperchild=10

or:

python manage.py celery worker --loglevel=info --concurrency=4 --maxtasksperchild=1
@mitar

OK, it seems that with --concurrency=4 --maxtasksperchild=10 errors are more common, but they also happen with --concurrency=4 --maxtasksperchild=1.

@9thbit

I am also seeing this error with the latest stable release where celery crashes after a few hours of long running tasks. I'm seeing this happen after an hour long job has timed out.

[2012-10-22 03:22:13,673: WARNING/PoolWorker-1] Process PoolWorker-1:
[2012-10-22 03:22:13,674: WARNING/PoolWorker-1] Traceback (most recent call last):
[2012-10-22 03:22:13,675: WARNING/PoolWorker-1] File "/home/bhurley/.virtualenvs/celery/lib/python2.6/site-packages/billiard-2.7.3.17-py2.6-linux-x86_64.egg/billiard/process.py", line 248, in _bootstrap
[2012-10-22 03:22:13,676: WARNING/PoolWorker-1] self.run()
[2012-10-22 03:22:13,677: WARNING/PoolWorker-1] File "/home/bhurley/.virtualenvs/celery/lib/python2.6/site-packages/billiard-2.7.3.17-py2.6-linux-x86_64.egg/billiard/process.py", line 97, in run
[2012-10-22 03:22:13,678: WARNING/PoolWorker-1] self._target(self._args, *self._kwargs)
[2012-10-22 03:22:13,678: WARNING/PoolWorker-1] File "/home/bhurley/.virtualenvs/celery/lib/python2.6/site-packages/billiard-2.7.3.17-py2.6-linux-x86_64.egg/billiard/pool.py", line 308, in worker
[2012-10-22 03:22:13,679: WARNING/PoolWorker-1] put((READY, (job, i, (False, einfo))))
[2012-10-22 03:22:13,679: WARNING/PoolWorker-1] File "/home/bhurley/.virtualenvs/celery/lib/python2.6/site-packages/billiard-2.7.3.17-py2.6-linux-x86_64.egg/billiard/queues.py", line 352, in put
[2012-10-22 03:22:13,680: WARNING/PoolWorker-1] return send(obj)
[2012-10-22 03:22:13,680: WARNING/PoolWorker-1] IOError: [Errno 32] Broken pipe

software -> celery:3.0.11 (Chiastic Slide) kombu:2.4.7 py:2.6.6
billiard:2.7.3.17 amqplib:N/A
platform -> system:Linux arch:64bit, ELF imp:CPython
loader -> celery.loaders.default.Loader
settings -> transport:amqp results:disabled

BROKER_URL: 'amqp://something/myvhost'
CELERYD_CONCURRENCY: 6
CELERY_IMPORTS:
('celery_tasks',)
CELERYD_PREFETCH_MULTIPLIER: 1

@ask
Owner

@9thbit How did it time out? And do you think it's relevant?

@9thbit

@ask I was simply monitoring the tasks using top and just based on how long the job was running, it seemed to crash after one of the tasks exceeded time_limit.

I have added CELERYD_FORCE_EXECV = True to my config and I'll report back if that resolved the issue. Also, if it is related to using python 2.6 -- which is on the cluster I am using -- I can try 2.7.

@mitar

I am using 2.7 and I am getting this.

@spulec

Python 2.7 here with CELERYD_FORCE_EXECV = True.

@davepeck

Seeing this too with --concurrency=4 and kombu. Ugh.

@noirbizarre

Same issue for us.
It's not stable at all in production!

@sylvinus

Same issue here.

@ask
Owner
ask commented

Could you please include the version your are using?

Also what broker are you using? (kombu is not a transport, it's the messaging framework we use)

The latest celery version disabled force_execv by default, it could be a culprit.

@ask
Owner

If you're using RabbitMQ/redis, could you please try running with CELERY_DISABLE_RATE_LIMITS=True ?

@jdp

I'm are also seeing this crash. I also tried the CELERY_DISABLE_RATE_LIMITS=True suggestion but that didn't fix the error.

Unrecoverable error: IOError(32, 'Broken pipe')

Stacktrace (most recent call last):

  File "celery/worker/__init__.py", line 351, in start
    component.start()
  File "celery/worker/consumer.py", line 393, in start
    self.consume_messages()
  File "celery/worker/consumer.py", line 483, in consume_messages
    handlermap[fileno](fileno, event)
  File "billiard/pool.py", line 1039, in maintain_pool
    self._maintain_pool()
  File "billiard/pool.py", line 1034, in _maintain_pool
    self._repopulate_pool(self._join_exited_workers())
  File "billiard/pool.py", line 1020, in _repopulate_pool
    self._create_worker_process(self._avail_index())
  File "billiard/pool.py", line 904, in _create_worker_process
    w.start()
  File "billiard/process.py", line 120, in start
    self._popen = Popen(self)
  File "billiard/forking.py", line 192, in __init__
    code = process_obj._bootstrap()
  File "billiard/process.py", line 276, in _bootstrap
    sys.stdout.flush()

Here's our sys.argv:

['manage.py', 'celeryd', '--loglevel=INFO', '-c', '4', '--queues=celery', '--maxtasksperchild=100']
@ask
Owner

@jdp what celery/billiard version is this? please include the output of celery report (but make sure to remove sensitive information)

@ask ask closed this
@ask ask reopened this
@ask
Owner

If this happens while flushing stdout, then it may be worth a try to simply ignore this error.
It does not have a stdout at this point anyway.

@ask
Owner

Closing this issue now, please open a new issue if this happen again. Chances are it's not the same issue as the previous one.

@ask ask closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.