New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Celery Worker crashing after first task with TypeError: 'NoneType' object is not callable #3620
Comments
The error is repeating in the log because the Celery worker daemon is crashing, so systemd restarts it. |
@ask, |
When I change billiard/pool.py#L1483 to |
More verbose output with logging level DEBUG:
|
This was introduced with Celery 4.x because downgrading to |
Doesn't happen to me here on Linux Python 3.4. What arguments to you use to start the worker? |
_quick_put should never be None btw. Does this happen at startup or always after a connection failure? |
I've been trying to reproduce by stopping the broker while executing tasks, and still no luck at reproducing. |
Always at startup. The worker arguments are:
|
👍 for this. Getting the very same issue, even on 4.0.1 |
@ask I get to reproduce it everytime when you have messages on the broker waiting to be processed when the worker comes up. This is often the case when using beat, which is my case. If the beat services comes online before the worker, you won't be able to start the worker due to the issue mentioned above. I'm using python 2.7 for all that matter and am able to reproduce it consistently. This is the same error as the one mentioned on #3539 |
@jmesquita that's consistent with my scenario, since my queues always have pending messages on the broker when starting the workers. |
@alanhamlett I'm trying to get this fixed and reading the code but I'm new to celery so it might take me sometime. What is strange to me is that with so many people using celery and celery messages being queued by default to workers, this has not exploded within the community. Makes me wonder if I'm misusing it somehow. |
I dug into the code a bit, This would explain why the problems does not occur when there are no messages in the queue when the event loop starts up, as then I could fix this by moving
before |
So I worked around my problem by making celery beat messages transient, which is actually my intended behaviour anyway. I'll revisit this as soon as I have a bit more experience with Celery and it's codebase. |
That prevents the error from occurring. Thank you for the suggestion. |
I can only reproduce this bug if I am attempting to consume from multiple queues. If everything is consumed from a single queue then start up works as expected (messages on the queue are properly consumed). @adewes I tested your proposed solution and at least on the surface it seems to solve the problem. |
@adewes Can you issue a pull request so we can discuss your proposed change? |
Are there any updates on this issue? This is causing us major problems now, including in production. I can't even test locally because I get the |
I was not able to resolve it so far, also downgraded to version 3 for now, hope the problem will be fixed soon. @thedrow my "quick fix" did not yield a complete resolution of the problem so I'm not opening a pull request. I'm not fully versed in the dataflow of the components used (there are several libraries in play here), so I'm not able to debug this further right now unfortunately. |
I'm actually not even sure we can downgrade because it's possible we might be relying on the new usage of message headers in the v2 task design. @ask--I'm happy to screenshare or whatever with you so you can see my exact environment to help debug, maybe even try opening up a remote debug if we need to. We're in a bit of a bind because we went all in on Celery 4 and now can't start our workers in production. |
For now, you can install my fork to get things running in your prod environment:
I just opened #3752 to fix this, but need to figure out a good test to cover the bug first. |
Thanks so much. Starting to think I'm going crazy... I tried upgrading to 4.0.2 (currently on 4.0.0) and with that upgrade, all of the sudden |
Specifying queues explicitly from in CLI looks like a walk around (on 4.0.0) |
We do specify the queues explicitly in the CLI; we still had the issue.
…On Fri, Jan 13, 2017 at 5:35 AM, Karol Duleba ***@***.***> wrote:
Specifying queues explicitly from in CLI looks like a walk around (on
4.0.0)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3620 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABRHQSzvyRcQI5qvfufNwPJvXNElcQKPks5rR1NggaJpZM4K7LEz>
.
|
@jdotjdot - I am seeing the same thing. Specifying queues on the command line. |
+1 Having this exact issue. Only solution seems to be reset my broker (which is not good for a queue). |
With the ephemeral nature of workers being so fundamental to Celery, I can’t imagine there are too many other issues that deserve a higher priority than this one. This is the only thing preventing us from going to Prod. Does anybody out there have an estimate on when it will be addressed? |
@johnatron I thought the same thing, but ran into multiple other newly introduced bugs in Prod. Had to downgrade, which is difficult because the messaging spec is not compatible between 3.x and 4.x. Also made me look at alternatives to Celery, like tasktiger. Be careful with Celery 4.x.x in Prod. |
The message spec is cross-compatible with the latest version of 3.x.
I will admit I'm pretty astonished this hasn't been addressed. I'm using
Alan's fork right now in production.
…On Mon, Feb 13, 2017 at 2:05 PM, Alan Hamlett ***@***.***> wrote:
@johnatron <https://github.com/johnatron> I thought the same thing, but
ran into multiple other newly introduced bugs in Prod. Had to downgrade,
which is difficult because the messaging spec is not compatible between 3.x
and 4.x. Also made me look at alternatives to Celery, like tasktiger
<https://github.com/closeio/tasktiger>. Be careful with Celery 4.x.x in
Prod.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3620 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABRHQZkHixjDu37IA7PbAW6iJYcVTGevks5rcKligaJpZM4K7LEz>
.
|
In the light of this unfortunate delay I decided to merge some fixes for known Celery 4 issues together in
Working fine in production so far |
The message spec may be cross compatible, but there are (subtle) incompatible differences between the versions when it comes to Celery Canvas and chaining (a feature which we use a lot). A considerable amount of effort went in to port our app from v3 to v4. Having to go back would be a bad time. |
O frabjous day! Thank you. |
I have this same problem using Python 3.6 and celery 4.0.2. Shut down celery, create a task, start up celery and immediately get the error Thank you for your time, @ask! |
"Microsoft Windows is no longer supported. The test suite is passing, and Celery seems to be working with Windows, but we make no guarantees as we are unable to diagnose issues on this platform. If you are a company requiring support on this platform, please get in touch." |
@tiptoettt this is not an issue specific to Windows. I'm on Mac. Can you please closely look at everyone's comments? Developers will ditch Celery because of this. I've used celery for 5 years and it's a major issue. I am going to use an earlier version 3.1.25 if it doesn't have this issue. |
Also having this issue
This is the line of code that seems to cause the issue:
Removing the second queue causes the issue to go away. |
From what I understand from @ChillarAnand 's post in celery/kombu/issues/675, this issue should be solved by 4.0.3, right? |
I haven't seen this issue since I started building from the master branch.
…On May 8, 2017 5:58 PM, "Victor" ***@***.***> wrote:
From what I understand from @ChillarAnand
<https://github.com/ChillarAnand> 's post in celery/kombu#675
<celery/kombu#675>, this issue should be solved
by 4.0.3, right?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3620 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AC3sA400hzoX7V0GUrUSfYmry9SZ8eIRks5r34_pgaJpZM4K7LEz>
.
|
Thanks, it worked! |
I had the same problem on Celery 3.1.23 (Cipater) using RabbitMQ. After long debugging I finally found this thread and fixed the problem by using 3.1.25 version. But it was banging the head against the wall, really. |
Same issue on |
same issue for me on |
v4.0.2 |
@ba1dr build from master. That fixes it for me. |
@LPiner, thanks, this fixed the issue for me. But this is not a production-ready solution, eh? |
@ba1dr not much of a choice TBH. It's either this or downgrade back to 3.x. |
We're gonna release soon. See #4109 |
Affected here as well. Just hit us. Ready for the release! |
We're having celery/celery#3620
Checklist
celery -A proj report
in the issue.(if you are not able to do this, then at least specify the Celery
version affected).
master
branch of Celery.Yes I've tested and it behaves the same using master.
Steps to reproduce
Not exactly sure, because other machines with the same specs and requirements are working.
Expected behavior
Should consume tasks.
Actual behavior
A task is accepted, then a traceback is logged, then the worker reconnects to the broker for some reason. This repeats forever:
The above lines are keep repeating every few seconds and no tasks are consumed from the queue.
The text was updated successfully, but these errors were encountered: