dockerized worker with redis gives broken pipe at every startup #5272

WisdomPill · 2019-01-08T07:52:05Z

Healthcheck fails after a few tries about in 5 minutes, and when I ping all the workers using

celery insect ping -A project

I don't find the worker, I find the other ones by the way.
I have three dockerized workers, using Amazon ECS two of which have light work to do and are not dependent on a VPN so I'm using fargate and the one that gives me problems is on EC2 using host networking mode in a server with a VPN. The other two workers don't have the same start up problem.
This worker has a beat that makes the worker do some work to update some sensors statuses every minute. It uses group calls to do parallel work.

REPORT with part of django settings

software -> celery:4.2.1 (windowlicker) kombu:4.2.1 py:3.6.8
            billiard:3.5.0.4 redis:2.10.6
platform -> system:Linux arch:64bit imp:CPython
loader   -> celery.loaders.app.AppLoader
settings -> transport:redis results:disabled

CELERY_ACCEPT_CONTENT: ['application/json']
CELERY_BEAT_SCHEDULE: {
    'curator_indexes_migration': {   'schedule': <crontab: 0 4 * * * (m/h/d/dM/MY)>,
                                     'task': 'sensor.tasks.statuses.curator_indexes_migration'},
    'fetch_all_configurations_backup': {   'schedule': <crontab: 0 9 * * * (m/h/d/dM/MY)>,
                                           'task': 'sensor.tasks.statuses.fetch_all_configurations_backup'},
    'update_images': {   'schedule': <crontab: 0 * * * * (m/h/d/dM/MY)>,
                         'task': 'sensor.tasks.statuses.update_images'},
    'update_statuses': {   'schedule': <crontab: * * * * * (m/h/d/dM/MY)>,
                           'task': 'sensor.tasks.statuses.update_statuses'}}
CELERY_BROKER_URL: 'redis://safety-staging.h1q2ld.0001.euw1.cache.amazonaws.com:6379/1'
CELERY_RESULT_BACKEND: None
CELERY_RESULT_SERIALIZER: 'json'
CELERY_TASK_ROUTES: {
    'notification.tasks.*': {'queue': 'notification'},
    'sensor.tasks.*': {'queue': 'sensor'}}
CELERY_TASK_SERIALIZER: 'json'
CELERY_TIMEZONE: 'Europe/Rome'

part of requirements.txt concerning celery and redis

aioredis==1.2.0
celery==4.2.1
Django==1.11
hiredis==0.2.0
kombu==4.2.1
redis==2.10.6

start script

#!/usr/bin/env bash
CELERY_DOMAIN=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)
echo ${CELERY_DOMAIN} > /home/user/celery_domain.txt
printf "Celery domain is %s\n" "$CELERY_DOMAIN"
celery worker -A centroservizi -l INFO --autoscale=4,2 -Q sensor -n sensor@${CELERY_DOMAIN}

health check script

#!/usr/bin/env bash
CELERY_DOMAIN=$(cat /home/user/celery_domain.txt)
printf "Celery domain is %s\n" "$CELERY_DOMAIN"
celery inspect ping -A centroservizi -d sensor@${CELERY_DOMAIN}

Start log

Celery domain is G9ehGHbxIlsYFPb1JeW6Pdlar0gMIUue
[2019-01-08 07:14:57,184: INFO/MainProcess] Connected to redis://<elastic cache>.0001.euw1.cache.amazonaws.com:6379/1
[2019-01-08 07:14:57,200: INFO/MainProcess] mingle: searching for neighbors
[2019-01-08 07:14:58,306: INFO/MainProcess] mingle: sync with 2 nodes
[2019-01-08 07:14:58,307: INFO/MainProcess] mingle: sync complete
[2019-01-08 07:14:58,332: INFO/MainProcess] sensor@G9ehGHbxIlsYFPb1JeW6Pdlar0gMIUue ready.
[2019-01-08 07:14:59,026: INFO/MainProcess] Received task: sensor.tasks.statuses.update_sensor_status[2fa6e725-0a24-42c9-9171-572d7db3d31e]
[2019-01-08 07:14:59,033: INFO/MainProcess] Received task: sensor.tasks.statuses.update_sensor_status[296aaeb7-926a-4d1b-b62d-cec1783dbc70]
[2019-01-08 07:14:59,036: INFO/MainProcess] Received task: sensor.tasks.statuses.update_sensor_status[43cd23b1-5304-41df-ac82-1168cf53da60]
[2019-01-08 07:14:59,036: INFO/MainProcess] Scaling up 1 processes.
[2019-01-08 07:14:59,204: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 590, in send_packed_command
self._sock.sendall(item)
BrokenPipeError: [Errno 32] Broken pipe
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 317, in start
blueprint.start(self)
File "/usr/local/lib/python3.6/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 593, in start
c.loop(*c.loop_args())
File "/usr/local/lib/python3.6/site-packages/celery/worker/loops.py", line 91, in asynloop
next(loop)
File "/usr/local/lib/python3.6/site-packages/kombu/asynchronous/hub.py", line 276, in create_loop
tick_callback()
File "/usr/local/lib/python3.6/site-packages/kombu/transport/redis.py", line 1033, in on_poll_start
cycle_poll_start()
File "/usr/local/lib/python3.6/site-packages/kombu/transport/redis.py", line 315, in on_poll_start
self._register_BRPOP(channel)
File "/usr/local/lib/python3.6/site-packages/kombu/transport/redis.py", line 301, in _register_BRPOP
channel._brpop_start()
File "/usr/local/lib/python3.6/site-packages/kombu/transport/redis.py", line 707, in _brpop_start
self.client.connection.send_command('BRPOP', *keys)
File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 610, in send_command
self.send_packed_command(self.pack_command(*args))
File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 603, in send_packed_command
(errno, errmsg))
redis.exceptions.ConnectionError: Error 32 while writing to socket. Broken pipe.

If I can give you other info on the issue just ask

The text was updated successfully, but these errors were encountered:

WisdomPill · 2019-01-08T18:03:01Z

I changed the start worker script, I'm not using autoscale anymore, I'm using the concurrency parameter directly.

#!/usr/bin/env bash
CELERY_DOMAIN=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)
echo ${CELERY_DOMAIN} > /home/user/celery_domain.txt
printf "Celery domain is %s\n" "$CELERY_DOMAIN"
celery worker -A centroservizi -l INFO --concurrency=4 -Q sensor -n sensor@${CELERY_DOMAIN}

WisdomPill · 2019-01-08T18:08:26Z

Found an issue https://github.com/celery/kombu/issues/681 in kombu

thedrow · 2019-01-08T18:08:50Z

Could you please attach the Redis log when the connection is severed? It could be that we're sending a large amount of data? See redis/redis-py#986, redis/redis#4888 and https://stackoverflow.com/questions/43204496/broken-pipe-error-redis.
It could be that you are creating a canvas large enough to generate a payload that redis rejects.
Could this be a duplicate of celery/kombu#681?
Have you tried using RabbitMQ to see if this is Redis specific?

thedrow · 2019-01-08T18:10:01Z

How is this related to #4363?

WisdomPill · 2019-01-08T19:15:50Z

Sorry, my bad, I confused the issue subject, I will remove the comment, does this remove also the reference in GitHub?

WisdomPill · 2019-01-08T19:19:40Z

@thedrow no, I haven't tried it yet with rabbit, I use elasticache in AWS so I can't get the redis logs I will try to reproduce it locally, maybe with the same data in order to have the same load

WisdomPill · 2019-01-08T19:23:55Z

I'm quite sure it's not a size issue, I mean, all the parameters that I pass are either, booleans, strings, numbers ir really small dictionaries(one key, sometimes two)

thedrow · 2019-01-08T20:50:28Z

Sorry, my bad, I confused the issue subject, I will remove the comment, does this remove also the reference in GitHub?

Yes.

WisdomPill · 2019-03-07T09:59:15Z

Seems like with the new version of celery 4.3.0rc2, this does not happen anymore.

I will confirm that, in the following days when I will try autoscale in production

WisdomPill · 2019-03-07T17:43:37Z

I confirm that I don't see that error message anymore, you can close it @thedrow thanks for fixing it

thedrow · 2019-03-08T13:57:11Z

Sure, no problem.

thedrow added Component: Redis Broker Category: Deployment Issue Type: Bug Report labels Jan 8, 2019

thedrow closed this as completed Mar 8, 2019

thedrow added this to the v4.3 milestone Mar 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dockerized worker with redis gives broken pipe at every startup #5272

dockerized worker with redis gives broken pipe at every startup #5272

WisdomPill commented Jan 8, 2019 •

edited

WisdomPill commented Jan 8, 2019

WisdomPill commented Jan 8, 2019 •

edited

thedrow commented Jan 8, 2019

thedrow commented Jan 8, 2019

WisdomPill commented Jan 8, 2019

WisdomPill commented Jan 8, 2019

WisdomPill commented Jan 8, 2019

thedrow commented Jan 8, 2019

WisdomPill commented Mar 7, 2019

WisdomPill commented Mar 7, 2019

thedrow commented Mar 8, 2019

dockerized worker with redis gives broken pipe at every startup #5272

dockerized worker with redis gives broken pipe at every startup #5272

Comments

WisdomPill commented Jan 8, 2019 • edited

REPORT with part of django settings

part of requirements.txt concerning celery and redis

start script

health check script

Start log

WisdomPill commented Jan 8, 2019

WisdomPill commented Jan 8, 2019 • edited

thedrow commented Jan 8, 2019

thedrow commented Jan 8, 2019

WisdomPill commented Jan 8, 2019

WisdomPill commented Jan 8, 2019

WisdomPill commented Jan 8, 2019

thedrow commented Jan 8, 2019

WisdomPill commented Mar 7, 2019

WisdomPill commented Mar 7, 2019

thedrow commented Mar 8, 2019

WisdomPill commented Jan 8, 2019 •

edited

WisdomPill commented Jan 8, 2019 •

edited