New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Celery: worker connection timed out #2991
Comments
Are there no remote control related errors above in the log, before this happens? socket.timeout should be ignored in this code, so is this exception different from |
Are you getting this when you run more tasks than there is a concurrency level? |
In my controller rabbitmq's log, if worker got the error, i got a log like: After the log, the worker got "[Errno 110] Connection timed out" error and when worker reconnect to broker, it got timeout and "[Errno 104] Connection reset by peer" error until restart the worker. And sometimes, my rabbitmq's log show: In celery we set the worker has 100 coroutine, and controller will send about 100 tasks, but each task has some subtasks to do my jobs. Another problem is sometimes the worker doesn't get timeout error, but the connection to controller is abnormal. I can't use celery inspect ping to get the worker. The worker can receive the jobs, but it doesn't respond the success message to controller. In controller rabbitmq's log: closing AMQP connection <0.17632.37> (workerip:42754 -> controllerip:5672): |
I was having similar issues using rabbitmq cluster behind an Elastic Load Balancer. What solved the issue for me was setting |
Closing this, as we don't have the resources to complete this task. May be fixed in master, let's see if comes back after 4.0 release. |
I am using Celery version: |
Makes me #wannacry: `=INFO REPORT==== 20-Jul-2017::12:40:55 === =ERROR REPORT==== 20-Jul-2017::12:41:05 === =INFO REPORT==== 20-Jul-2017::12:41:07 === =ERROR REPORT==== 20-Jul-2017::12:41:17 === |
So, i know exactly how to fix this. Removed my ethernet cable and it worked -_- |
@vivekanand1101 You are a life saver. Any ideas why this is happening? I'm on v4.1.0 |
@aayushgoel92 nope :( |
could you check the master branch with latest versions of dependency? |
Fixed on Celery 4.2.0. See: celery/celery#3649 See: celery/celery#2991 See: Polyconseil/aioamqp#96
@alanjds is this still an issue on 4.2? should we close this? |
I do not remember to had saw this before, @auvipy. Is waiting feedback for 8 months. What about pinging @aayushgoel92 and Then close it :/ |
sure! |
Yes this is still an issue, I am using 4.2.0, I have a rabbitmq 3.6 docker image. If you need more information please let me know, which parts of the codebase should be read for fixing this,I'll be glad to help. This issue has popped up recently #4980 . |
@vsag96 could you please install all the package from github master and check? |
@auvipy I have also observed this recently using master with rabbitmq 3.6: I also increased the handshake_timeout to 60s from 10 but that did not help. |
I am also facing this issue with V4.3. Please find trace below. Traceback (most recent call last): During handling of the above exception, another exception occurred: Traceback (most recent call last): |
Setting |
thanks for verifying! |
@auvipy That's a workaround, not a fix. |
This still persists for me. We run on kubernetes, and when this occurs our task revokes stop working, the revokes never reach that worker since I couldn't find the *.pidbox queue for that worker in our rabbitmq instance. |
Could it be that for some reason it's a different exception than celery/celery/worker/pidbox.py Line 119 in 7dfc1fd
|
can you open a PR and investigate? |
I'm using amqp (1.4.6) , celery (3.1.14) and kombu (3.0.22). Recently, I create a worker in the internet connecting with rabbitmq-server(broker). Sometimes, worker get a [errno110] connection timed out error. The following is the worker's log:
After this error message, when worker reconnect to broker, we get another error message:
consumer:
Cannot connect to amqp://guest:**@***:5672//: [Errno 104] Connection reset by peer
.Then after a long time, sometimes the worker can reconnect to broker. But when I assign some jobs to the worker, the worker can't run normally. The worker can get the jobs but doesn't return success to
broker.
There is my celery setting:
Are there any setting that I need to change?
And if the timeout occur, there is a method that we can know the worker is abnormal and restart worker automaticly?
Thanks
The text was updated successfully, but these errors were encountered: