Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Celery Worker crashing after first task with TypeError: 'NoneType' object is not callable #3620

Closed
2 tasks done
alanhamlett opened this issue Nov 24, 2016 · 54 comments
Closed
2 tasks done

Comments

@alanhamlett
Copy link
Contributor

Checklist

  • I have included the output of celery -A proj report in the issue.
    (if you are not able to do this, then at least specify the Celery
    version affected).
software -> celery:4.0.0 (latentcall) kombu:4.0.0 py:3.4.3
            billiard:3.5.0.2 py-amqp:2.1.1
platform -> system:Linux arch:64bit, ELF imp:CPython
loader   -> celery.loaders.default.Loader
settings -> transport:amqp results:disabled
  • I have verified that the issue exists against the master branch of Celery.
    Yes I've tested and it behaves the same using master.

Steps to reproduce

Not exactly sure, because other machines with the same specs and requirements are working.

Expected behavior

Should consume tasks.

Actual behavior

A task is accepted, then a traceback is logged, then the worker reconnects to the broker for some reason. This repeats forever:

[2016-11-23 23:09:00,468: INFO/MainProcess] Connected to amqp://user:**@10.136.131.6:5672//
[2016-11-23 23:09:00,484: INFO/MainProcess] mingle: searching for neighbors
[2016-11-23 23:09:01,921: INFO/MainProcess] mingle: sync with 1 nodes
[2016-11-23 23:09:01,922: INFO/MainProcess] mingle: sync complete
[2016-11-23 23:09:01,970: INFO/MainProcess] Received task: tasks.calculate_user_running_total[ddd103af-d527-4564-83f8-96b747767a0c]
[2016-11-23 23:09:01,972: CRITICAL/MainProcess] Unrecoverable error: TypeError("'NoneType' object is not callable",)
Traceback (most recent call last):
  File "./venv/lib/python3.4/site-packages/celery/worker/worker.py", line 203, in start
    self.blueprint.start(self)
  File "./venv/lib/python3.4/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "./venv/lib/python3.4/site-packages/celery/bootsteps.py", line 370, in start
    return self.obj.start()
  File "./venv/lib/python3.4/site-packages/celery/worker/consumer/consumer.py", line 318, in start
    blueprint.start(self)
  File "./venv/lib/python3.4/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "./venv/lib/python3.4/site-packages/celery/worker/consumer/consumer.py", line 584, in start
    c.loop(*c.loop_args())
  File "./venv/lib/python3.4/site-packages/celery/worker/loops.py", line 47, in asynloop
    consumer.consume()
  File "./venv/lib/python3.4/site-packages/kombu/messaging.py", line 470, in consume
    self._basic_consume(T, no_ack=no_ack, nowait=False)
  File "./venv/lib/python3.4/site-packages/kombu/messaging.py", line 591, in _basic_consume
    no_ack=no_ack, nowait=nowait)
  File "./venv/lib/python3.4/site-packages/kombu/entity.py", line 737, in consume
    arguments=self.consumer_arguments)
  File "./venv/lib/python3.4/site-packages/amqp/channel.py", line 1578, in basic_consume
    wait=None if nowait else spec.Basic.ConsumeOk,
  File "./venv/lib/python3.4/site-packages/amqp/abstract_channel.py", line 73, in send_method
    return self.wait(wait, returns_tuple=returns_tuple)
  File "./venv/lib/python3.4/site-packages/amqp/abstract_channel.py", line 93, in wait
    self.connection.drain_events(timeout=timeout)
  File "./venv/lib/python3.4/site-packages/amqp/connection.py", line 464, in drain_events
    return self.blocking_read(timeout)
  File "./venv/lib/python3.4/site-packages/amqp/connection.py", line 469, in blocking_read
    return self.on_inbound_frame(frame)
  File "./venv/lib/python3.4/site-packages/amqp/method_framing.py", line 88, in on_frame
    callback(channel, msg.frame_method, msg.frame_args, msg)
  File "./venv/lib/python3.4/site-packages/amqp/connection.py", line 473, in on_inbound_method
    method_sig, payload, content,
  File "./venv/lib/python3.4/site-packages/amqp/abstract_channel.py", line 142, in dispatch_method
    listener(*args)
  File "./venv/lib/python3.4/site-packages/amqp/channel.py", line 1613, in _on_basic_deliver
    fun(msg)
  File "./venv/lib/python3.4/site-packages/kombu/messaging.py", line 617, in _receive_callback
    return on_m(message) if on_m else self.receive(decoded, message)
  File "./venv/lib/python3.4/site-packages/celery/worker/consumer/consumer.py", line 558, in on_task_received
    callbacks,
  File "./venv/lib/python3.4/site-packages/celery/worker/strategy.py", line 145, in task_message_handler
    handle(req)
  File "./venv/lib/python3.4/site-packages/celery/worker/worker.py", line 221, in _process_task_sem
    return self._quick_acquire(self._process_task, req)
  File "./venv/lib/python3.4/site-packages/kombu/async/semaphore.py", line 62, in acquire
    callback(*partial_args, **partial_kwargs)
  File "./venv/lib/python3.4/site-packages/celery/worker/worker.py", line 226, in _process_task
    req.execute_using_pool(self.pool)
  File "./venv/lib/python3.4/site-packages/celery/worker/request.py", line 532, in execute_using_pool
    correlation_id=task_id,
  File "./venv/lib/python3.4/site-packages/celery/concurrency/base.py", line 155, in apply_async
    **options)
  File "./venv/lib/python3.4/site-packages/billiard/pool.py", line 1487, in apply_async
    self._quick_put((TASK, (result._job, None, func, args, kwds)))
TypeError: 'NoneType' object is not callable

The above lines are keep repeating every few seconds and no tasks are consumed from the queue.

@alanhamlett alanhamlett changed the title Repeatedly reconnecting to broker, tasks accepted but never processed and still left in queue Celery Worker crashing after first task with TypeError: 'NoneType' object is not callable Nov 26, 2016
@alanhamlett
Copy link
Contributor Author

The error is repeating in the log because the Celery worker daemon is crashing, so systemd restarts it.

@alanhamlett
Copy link
Contributor Author

alanhamlett commented Nov 26, 2016

@ask, self._quick_put is somehow not defined. Should billiard check for a None value before calling, catch the exception, or should self._quick_put never be None?

@alanhamlett
Copy link
Contributor Author

When I change billiard/pool.py#L1483 to if self.threads or self._quick_put is None: Celery does not crash anymore but for some reason the workers never process any tasks.

@alanhamlett
Copy link
Contributor Author

More verbose output with logging level DEBUG:

[2016-11-27 14:48:09,875: DEBUG/MainProcess] | Worker: Preparing bootsteps.
[2016-11-27 14:48:09,877: DEBUG/MainProcess] | Worker: Building graph...
[2016-11-27 14:48:09,878: DEBUG/MainProcess] | Worker: New boot order: {Timer, Hub, Pool, Autoscaler, StateDB, Beat, Consumer}
[2016-11-27 14:48:09,889: DEBUG/MainProcess] | Consumer: Preparing bootsteps.
[2016-11-27 14:48:09,889: DEBUG/MainProcess] | Consumer: Building graph...
[2016-11-27 14:48:09,898: DEBUG/MainProcess] | Consumer: New boot order: {Connection, Agent, Events, Mingle, Tasks, Control, Gossip, Heart, event loop}
[2016-11-27 14:48:09,908: DEBUG/MainProcess] | Worker: Starting Hub
[2016-11-27 14:48:09,908: DEBUG/MainProcess] ^-- substep ok
[2016-11-27 14:48:09,908: DEBUG/MainProcess] | Worker: Starting Pool
[2016-11-27 14:48:09,998: DEBUG/MainProcess] ^-- substep ok
[2016-11-27 14:48:09,999: DEBUG/MainProcess] | Worker: Starting Consumer
[2016-11-27 14:48:10,000: DEBUG/MainProcess] | Consumer: Starting Connection
[2016-11-27 14:48:10,016: DEBUG/MainProcess] Start from server, version: 0.9, properties: {'cluster_name': 'rabbit@rabbitmq', 'product': 'RabbitMQ', 'version': '3.5.6', 'information': 'Licensed under the MPL.
  See http://www.rabbitmq.com/', 'capabilities': {'authentication_failure_close': True, 'consumer_priorities': True, 'consumer_cancel_notify': True, 'per_consumer_qos': True, 'basic.nack': True, 'publisher_confirms': True, 'connection.blocked': True, 'exchange_exchange_bindings': True}, 'copyright': 'Copyright (C) 2007-2015 Pivotal Software, Inc.', 'platform': 'Erlang/OTP'}, mechanisms: ['AMQPLAIN', 'PLAIN'], locales: ['en_US']
[2016-11-27 14:48:10,018: INFO/MainProcess] Connected to amqp://user:**@10.136.131.6:5672//
[2016-11-27 14:48:10,018: DEBUG/MainProcess] ^-- substep ok
[2016-11-27 14:48:10,019: DEBUG/MainProcess] | Consumer: Starting Events
[2016-11-27 14:48:10,031: DEBUG/MainProcess] Start from server, version: 0.9, properties: {'cluster_name': 'rabbit@rabbitmq', 'product': 'RabbitMQ', 'version': '3.5.6', 'information': 'Licensed under the MPL.  See http://www.rabbitmq.com/', 'capabilities': {'authentication_failure_close': True, 'consumer_priorities': True, 'consumer_cancel_notify': True, 'per_consumer_qos': True, 'basic.nack': True, 'publisher_confirms': True, 'connection.blocked': True, 'exchange_exchange_bindings': True}, 'copyright': 'Copyright (C) 2007-2015 Pivotal Software, Inc.', 'platform': 'Erlang/OTP'}, mechanisms: ['AMQPLAIN', 'PLAIN'], locales: ['en_US']
[2016-11-27 14:48:10,034: DEBUG/MainProcess] ^-- substep ok
[2016-11-27 14:48:10,034: DEBUG/MainProcess] | Consumer: Starting Mingle
[2016-11-27 14:48:10,035: INFO/MainProcess] mingle: searching for neighbors
[2016-11-27 14:48:10,036: DEBUG/MainProcess] using channel_id: 1
[2016-11-27 14:48:10,041: DEBUG/MainProcess] Channel open
[2016-11-27 14:48:10,061: DEBUG/MainProcess] Start from server, version: 0.9, properties: {'cluster_name': 'rabbit@rabbitmq', 'product': 'RabbitMQ', 'version': '3.5.6', 'information': 'Licensed under the MPL.  See http://www.rabbitmq.com/', 'capabilities': {'authentication_failure_close': True, 'consumer_priorities': True, 'consumer_cancel_notify': True, 'per_consumer_qos': True, 'basic.nack': True, 'publisher_confirms': True, 'connection.blocked': True, 'exchange_exchange_bindings': True}, 'copyright': 'Copyright (C) 2007-2015 Pivotal Software, Inc.', 'platform': 'Erlang/OTP'}, mechanisms: ['AMQPLAIN', 'PLAIN'], locales: ['en_US']
[2016-11-27 14:48:10,063: DEBUG/MainProcess] using channel_id: 1
[2016-11-27 14:48:10,064: DEBUG/MainProcess] Channel open
[2016-11-27 14:48:11,189: INFO/MainProcess] mingle: sync with 3 nodes
[2016-11-27 14:48:11,190: DEBUG/MainProcess] mingle: processing reply from celery@worker03
[2016-11-27 14:48:11,190: DEBUG/MainProcess] mingle: processing reply from celery@worker02
[2016-11-27 14:48:11,190: DEBUG/MainProcess] mingle: processing reply from celery@worker01
[2016-11-27 14:48:11,190: INFO/MainProcess] mingle: sync complete
[2016-11-27 14:48:11,191: DEBUG/MainProcess] ^-- substep ok
[2016-11-27 14:48:11,191: DEBUG/MainProcess] | Consumer: Starting Tasks
[2016-11-27 14:48:11,244: DEBUG/MainProcess] ^-- substep ok
[2016-11-27 14:48:11,244: DEBUG/MainProcess] | Consumer: Starting Control
[2016-11-27 14:48:11,244: DEBUG/MainProcess] using channel_id: 2
[2016-11-27 14:48:11,246: DEBUG/MainProcess] Channel open
[2016-11-27 14:48:11,251: DEBUG/MainProcess] ^-- substep ok
[2016-11-27 14:48:11,251: DEBUG/MainProcess] | Consumer: Starting Gossip
[2016-11-27 14:48:11,252: DEBUG/MainProcess] using channel_id: 3
[2016-11-27 14:48:11,253: DEBUG/MainProcess] Channel open
[2016-11-27 14:48:11,257: DEBUG/MainProcess] ^-- substep ok
[2016-11-27 14:48:11,258: DEBUG/MainProcess] | Consumer: Starting Heart
[2016-11-27 14:48:11,259: DEBUG/MainProcess] using channel_id: 1
[2016-11-27 14:48:11,260: DEBUG/MainProcess] Channel open
[2016-11-27 14:48:11,261: DEBUG/MainProcess] ^-- substep ok
[2016-11-27 14:48:11,261: DEBUG/MainProcess] | Consumer: Starting event loop
[2016-11-27 14:48:11,264: INFO/MainProcess] Received task: wakatime.tasks.cache_coding_activity[0eba267c-72e4-40ea-91dd-a1a7ab17c514]
[2016-11-27 14:48:11,265: DEBUG/MainProcess] TaskPool: Apply <function _fast_trace_task at 0x7ff469300950> (args:('wakatime.tasks.cache_coding_activity', '0eba267c-72e4-40ea-91dd-a1a7ab17c514', {'argsrepr': '()', 'task': 'wakatime.tasks.cache_coding_activity', 'lang': 'py', 'parent_id': '81f0c7ce-1396-496f-bf64-ae243736c845', 'timelimit': [None, None], 'root_id': '128647cc-f558-4b7d-bafc-338d186b5cfa', 'reply_to': 'e3c2b067-a058-3aa0-a3a1-384d4b917bbf', 'retries': 0, 'expires': None, 'delivery_info': {'exchange': '', 'priority': None, 'routing_key': 'cache', 'redelivered': True}, 'id': '0eba267c-72e4-40ea-91dd-a1a7ab17c514', 'correlation_id': '0eba267c-72e4-40ea-91dd-a1a7ab17c514', 'group': None, 'eta': None, 'kwargsrepr': "{'cache_projects': True, 'timeout': 15, 'user_id': UUID('d9c69ce0-f194-45a6-83cf-98f931fca8aa'), 'writes_only': False}", 'origin': 'gen3021@worker02'}, '[[], {"cache_projects": true, "timeout": 15, "user_id": "d9c69ce0-f194-45a6-83cf-98f931fca8aa", "writes_only": false}, {"callbacks": null, "chain": null, "chord": null, "errbacks": null}]', 'application/json', 'utf-8') kwargs:{})
[2016-11-27 14:48:11,266: CRITICAL/MainProcess] Unrecoverable error: TypeError("'NoneType' object is not callable",)
Traceback (most recent call last):
  File "./venv/src/celery/celery/worker/worker.py", line 203, in start
    self.blueprint.start(self)
  File "./venv/src/celery/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "./venv/src/celery/celery/bootsteps.py", line 370, in start
    return self.obj.start()
  File "./venv/src/celery/celery/worker/consumer/consumer.py", line 318, in start
    blueprint.start(self)
  File "./venv/src/celery/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "./venv/src/celery/celery/worker/consumer/consumer.py", line 593, in start
    c.loop(*c.loop_args())
  File "./venv/src/celery/celery/worker/loops.py", line 47, in asynloop
    consumer.consume()
  File "./venv/lib/python3.4/site-packages/kombu/messaging.py", line 470, in consume
    self._basic_consume(T, no_ack=no_ack, nowait=False)
  File "./venv/lib/python3.4/site-packages/kombu/messaging.py", line 591, in _basic_consume
    no_ack=no_ack, nowait=nowait)
  File "./venv/lib/python3.4/site-packages/kombu/entity.py", line 737, in consume
    arguments=self.consumer_arguments)
  File "./venv/lib/python3.4/site-packages/amqp/channel.py", line 1578, in basic_consume
    wait=None if nowait else spec.Basic.ConsumeOk,
  File "./venv/lib/python3.4/site-packages/amqp/abstract_channel.py", line 73, in send_method
    return self.wait(wait, returns_tuple=returns_tuple)
  File "./venv/lib/python3.4/site-packages/amqp/abstract_channel.py", line 93, in wait
    self.connection.drain_events(timeout=timeout)
  File "./venv/lib/python3.4/site-packages/amqp/connection.py", line 464, in drain_events
    return self.blocking_read(timeout)
  File "./venv/lib/python3.4/site-packages/amqp/connection.py", line 469, in blocking_read
    return self.on_inbound_frame(frame)
  File "./venv/lib/python3.4/site-packages/amqp/method_framing.py", line 88, in on_frame
    callback(channel, msg.frame_method, msg.frame_args, msg)
  File "./venv/lib/python3.4/site-packages/amqp/connection.py", line 473, in on_inbound_method
    method_sig, payload, content,
  File "./venv/lib/python3.4/site-packages/amqp/abstract_channel.py", line 142, in dispatch_method
    listener(*args)
  File "./venv/lib/python3.4/site-packages/amqp/channel.py", line 1613, in _on_basic_deliver
    fun(msg)
  File "./venv/lib/python3.4/site-packages/kombu/messaging.py", line 617, in _receive_callback
    return on_m(message) if on_m else self.receive(decoded, message)
  File "./venv/src/celery/celery/worker/consumer/consumer.py", line 567, in on_task_received
    callbacks,
  File "./venv/src/celery/celery/worker/strategy.py", line 145, in task_message_handler
    handle(req)
  File "./venv/src/celery/celery/worker/worker.py", line 221, in _process_task_sem
    return self._quick_acquire(self._process_task, req)
  File "./venv/lib/python3.4/site-packages/kombu/async/semaphore.py", line 62, in acquire
    callback(*partial_args, **partial_kwargs)
  File "./venv/src/celery/celery/worker/worker.py", line 226, in _process_task
    req.execute_using_pool(self.pool)
  File "./venv/src/celery/celery/worker/request.py", line 532, in execute_using_pool
    correlation_id=task_id,
  File "./venv/src/celery/celery/concurrency/base.py", line 155, in apply_async
    **options)
  File "./venv/lib/python3.4/site-packages/billiard/pool.py", line 1487, in apply_async
    self._quick_put((TASK, (result._job, None, func, args, kwds)))
TypeError: 'NoneType' object is not callable
[2016-11-27 14:48:11,273: DEBUG/MainProcess] | Worker: Closing Hub...
[2016-11-27 14:48:11,274: DEBUG/MainProcess] | Worker: Closing Pool...
[2016-11-27 14:48:11,274: DEBUG/MainProcess] | Worker: Closing Consumer...
[2016-11-27 14:48:11,274: DEBUG/MainProcess] | Worker: Stopping Consumer...
[2016-11-27 14:48:11,274: DEBUG/MainProcess] | Consumer: Closing Connection...
[2016-11-27 14:48:11,275: DEBUG/MainProcess] | Consumer: Closing Events...
[2016-11-27 14:48:11,275: DEBUG/MainProcess] | Consumer: Closing Mingle...
[2016-11-27 14:48:11,275: DEBUG/MainProcess] | Consumer: Closing Tasks...
[2016-11-27 14:48:11,275: DEBUG/MainProcess] | Consumer: Closing Control...
[2016-11-27 14:48:11,275: DEBUG/MainProcess] | Consumer: Closing Gossip...
[2016-11-27 14:48:11,276: DEBUG/MainProcess] | Consumer: Closing Heart...
[2016-11-27 14:48:11,276: DEBUG/MainProcess] | Consumer: Closing event loop...
[2016-11-27 14:48:11,276: DEBUG/MainProcess] | Consumer: Stopping event loop...
[2016-11-27 14:48:11,276: DEBUG/MainProcess] | Consumer: Stopping Heart...
[2016-11-27 14:48:11,277: DEBUG/MainProcess] | Consumer: Stopping Gossip...
[2016-11-27 14:48:11,278: INFO/MainProcess] Received task: wakatime.tasks.cache_coding_activity[f786fc75-0518-4893-8988-ff7f063edd12]
[2016-11-27 14:48:11,278: DEBUG/MainProcess] TaskPool: Apply <function _fast_trace_task at 0x7ff469300950> (args:('wakatime.tasks.cache_coding_activity', 'f786fc75-0518-4893-8988-ff7f063edd12', {'argsrepr': '()', 'task': 'wakatime.tasks.cache_coding_activity', 'lang': 'py', 'parent_id': '81f0c7ce-1396-496f-bf64-ae243736c845', 'timelimit': [None, None], 'root_id': '128647cc-f558-4b7d-bafc-338d186b5cfa', 'reply_to': 'e3c2b067-a058-3aa0-a3a1-384d4b917bbf', 'retries': 0, 'expires': None, 'delivery_info': {'exchange': '', 'priority': None, 'routing_key': 'cache', 'redelivered': True}, 'id': 'f786fc75-0518-4893-8988-ff7f063edd12', 'correlation_id': 'f786fc75-0518-4893-8988-ff7f063edd12', 'group': None, 'eta': None, 'kwargsrepr': "{'cache_projects': True, 'timeout': 15, 'user_id': UUID('7056644f-2564-4074-b89e-631973879f44'), 'writes_only': False}", 'origin': 'gen3021@worker02'}, '[[], {"cache_projects": true, "timeout": 15, "user_id": "7056644f-2564-4074-b89e-631973879f44", "writes_only": false}, {"callbacks": null, "chain": null, "chord": null, "errbacks": null}]', 'application/json', 'utf-8') kwargs:{})
[2016-11-27 14:48:11,279: INFO/MainProcess] Received task: wakatime.tasks.cache_coding_activity[d5c8dc57-116c-467d-9924-e2999280c2f8]
[2016-11-27 14:48:11,280: INFO/MainProcess] Received task: wakatime.tasks.cache_coding_activity[460ef864-e482-4b0f-8580-d0095750bae6]
[2016-11-27 14:48:11,281: DEBUG/MainProcess] Closed channel #3
[2016-11-27 14:48:11,281: DEBUG/MainProcess] | Consumer: Stopping Control...
[2016-11-27 14:48:11,283: DEBUG/MainProcess] Closed channel #2
[2016-11-27 14:48:11,283: DEBUG/MainProcess] | Consumer: Stopping Tasks...
[2016-11-27 14:48:11,284: DEBUG/MainProcess] Canceling task consumer...
[2016-11-27 14:48:11,286: DEBUG/MainProcess] | Consumer: Stopping Mingle...
[2016-11-27 14:48:11,286: DEBUG/MainProcess] | Consumer: Stopping Events...
[2016-11-27 14:48:11,286: DEBUG/MainProcess] | Consumer: Stopping Connection...
[2016-11-27 14:48:11,286: DEBUG/MainProcess] | Worker: Stopping Pool...
[2016-11-27 14:48:12,800: DEBUG/MainProcess] result handler: all workers terminated
[2016-11-27 14:48:12,801: DEBUG/MainProcess] | Worker: Stopping Hub...
[2016-11-27 14:48:12,801: DEBUG/MainProcess] | Consumer: Shutdown Heart...
[2016-11-27 14:48:12,802: DEBUG/MainProcess] | Consumer: Shutdown Gossip...
[2016-11-27 14:48:12,802: DEBUG/MainProcess] | Consumer: Shutdown Control...
[2016-11-27 14:48:12,802: DEBUG/MainProcess] | Consumer: Shutdown Tasks...
[2016-11-27 14:48:12,803: DEBUG/MainProcess] Canceling task consumer...
[2016-11-27 14:48:12,803: DEBUG/MainProcess] Closing consumer channel...
[2016-11-27 14:48:12,803: DEBUG/MainProcess] | Consumer: Shutdown Events...
[2016-11-27 14:48:12,804: DEBUG/MainProcess] Closed channel #1
[2016-11-27 14:48:12,805: DEBUG/MainProcess] | Consumer: Shutdown Connection...
[2016-11-27 14:48:12,806: DEBUG/MainProcess] Closed channel #1
[2016-11-27 14:48:12,807: DEBUG/MainProcess] removing tasks from inqueue until task handler finished

@alanhamlett
Copy link
Contributor Author

alanhamlett commented Nov 27, 2016

This was introduced with Celery 4.x because downgrading to 3.1.24 prevents the traceback.

@ask
Copy link
Contributor

ask commented Dec 1, 2016

Doesn't happen to me here on Linux Python 3.4. What arguments to you use to start the worker?

@ask
Copy link
Contributor

ask commented Dec 2, 2016

_quick_put should never be None btw. Does this happen at startup or always after a connection failure?

@ask
Copy link
Contributor

ask commented Dec 2, 2016

I've been trying to reproduce by stopping the broker while executing tasks, and still no luck at reproducing.

@alanhamlett
Copy link
Contributor Author

Always at startup. The worker arguments are:

/opt/app/venv/bin/python /opt/app/venv/bin/celery worker --app=wakatime.celery --workdir=/opt/app --logfile=/var/log/celery/worker.log --loglevel=INFO --concurrency=50 --exclude-queues=medium,low,cache

@phretor
Copy link

phretor commented Dec 16, 2016

👍 for this. Getting the very same issue, even on 4.0.1

@jmesquita
Copy link

@ask I get to reproduce it everytime when you have messages on the broker waiting to be processed when the worker comes up. This is often the case when using beat, which is my case. If the beat services comes online before the worker, you won't be able to start the worker due to the issue mentioned above. I'm using python 2.7 for all that matter and am able to reproduce it consistently.

This is the same error as the one mentioned on #3539

@alanhamlett
Copy link
Contributor Author

@jmesquita that's consistent with my scenario, since my queues always have pending messages on the broker when starting the workers.

@jmesquita
Copy link

@alanhamlett I'm trying to get this fixed and reading the code but I'm new to celery so it might take me sometime. What is strange to me is that with so many people using celery and celery messages being queued by default to workers, this has not exploded within the community. Makes me wonder if I'm misusing it somehow.

@adewes
Copy link

adewes commented Dec 18, 2016

I dug into the code a bit, _quick_put gets assigned by AsyncPool._create_write_handlers, which gets called by AsyncPool.register_with_event_loop, which gets called by celery.worker.loops.asynloop. Superficially, the problem seems to be that asynloop first calls consumer.consume() and only then calls obj.register_with_event_loop, which causes _quick_put to be None when it gets called from within consume().

This would explain why the problems does not occur when there are no messages in the queue when the event loop starts up, as then consume() will do nothing and the next time it gets called, register_with_event_loop will have been called already.

I could fix this by moving

obj.controller.register_with_event_loop(hub)
obj.register_with_event_loop(hub)

before consumer.consume(), though this is of course only a very naive (and possibly wrong) fix.

@jmesquita
Copy link

So I worked around my problem by making celery beat messages transient, which is actually my intended behaviour anyway. I'll revisit this as soon as I have a bit more experience with Celery and it's codebase.

@gkoller
Copy link

gkoller commented Dec 21, 2016

@jmesquita:

So I worked around my problem by making celery beat messages transient

That prevents the error from occurring. Thank you for the suggestion.

@sterutkb
Copy link

I can only reproduce this bug if I am attempting to consume from multiple queues. If everything is consumed from a single queue then start up works as expected (messages on the queue are properly consumed).

@adewes I tested your proposed solution and at least on the surface it seems to solve the problem.

@thedrow
Copy link
Member

thedrow commented Jan 2, 2017

@adewes Can you issue a pull request so we can discuss your proposed change?

@jdotjdot
Copy link

jdotjdot commented Jan 9, 2017

Are there any updates on this issue? This is causing us major problems now, including in production. I can't even test locally because I get the TypeError issue. We may have to downgrade back to Celery 3.

@adewes
Copy link

adewes commented Jan 9, 2017

I was not able to resolve it so far, also downgraded to version 3 for now, hope the problem will be fixed soon. @thedrow my "quick fix" did not yield a complete resolution of the problem so I'm not opening a pull request. I'm not fully versed in the dataflow of the components used (there are several libraries in play here), so I'm not able to debug this further right now unfortunately.

@jdotjdot
Copy link

jdotjdot commented Jan 9, 2017

I'm actually not even sure we can downgrade because it's possible we might be relying on the new usage of message headers in the v2 task design.

@ask--I'm happy to screenshare or whatever with you so you can see my exact environment to help debug, maybe even try opening up a remote debug if we need to. We're in a bit of a bind because we went all in on Celery 4 and now can't start our workers in production.

@alanhamlett
Copy link
Contributor Author

alanhamlett commented Jan 9, 2017

For now, you can install my fork to get things running in your prod environment:

pip install -e git://github.com/alanhamlett/celery.git@73147a9da31f2932eb4778e9474fbe72f23d21c2#egg=Celery

I just opened #3752 to fix this, but need to figure out a good test to cover the bug first.

@jdotjdot
Copy link

jdotjdot commented Jan 9, 2017

Thanks so much.

Starting to think I'm going crazy... I tried upgrading to 4.0.2 (currently on 4.0.0) and with that upgrade, all of the sudden self.retry() stopped working as well.

@mrfuxi
Copy link

mrfuxi commented Jan 13, 2017

Specifying queues explicitly from in CLI looks like a walk around (on 4.0.0)

@jdotjdot
Copy link

jdotjdot commented Jan 13, 2017 via email

@sww314
Copy link

sww314 commented Jan 13, 2017

@jdotjdot - I am seeing the same thing. Specifying queues on the command line.
I only see the issue when there is data in the queue. If the queues are empty, I can start workers and they will run.

@ianseyer
Copy link

+1

Having this exact issue. Only solution seems to be reset my broker (which is not good for a queue).

@johnatron
Copy link

With the ephemeral nature of workers being so fundamental to Celery, I can’t imagine there are too many other issues that deserve a higher priority than this one. This is the only thing preventing us from going to Prod. Does anybody out there have an estimate on when it will be addressed?

@alanhamlett
Copy link
Contributor Author

@johnatron I thought the same thing, but ran into multiple other newly introduced bugs in Prod. Had to downgrade, which is difficult because the messaging spec is not compatible between 3.x and 4.x. Also made me look at alternatives to Celery, like tasktiger. Be careful with Celery 4.x.x in Prod.

@jdotjdot
Copy link

jdotjdot commented Feb 14, 2017 via email

@Eugeny
Copy link

Eugeny commented Feb 14, 2017

In the light of this unfortunate delay I decided to merge some fixes for known Celery 4 issues together in git+http://github.com/Eugeny/celery#egg=celery==4.0.2.1
This includes:

  • @alanhamlett 's fix for this issue
  • @HealthTechDevelopers 's fix for django-celery-beat schedule not getting reloaded

Working fine in production so far

@johnatron
Copy link

The message spec may be cross compatible, but there are (subtle) incompatible differences between the versions when it comes to Celery Canvas and chaining (a feature which we use a lot). A considerable amount of effort went in to port our app from v3 to v4. Having to go back would be a bad time.

@johnatron
Copy link

O frabjous day! Thank you.

avantrino added a commit to avantrino/dockerimages that referenced this issue Mar 12, 2017
@iiilx
Copy link

iiilx commented Apr 12, 2017

I have this same problem using Python 3.6 and celery 4.0.2.

Shut down celery, create a task, start up celery and immediately get the error TypeError: 'NoneType' object is not callable. @ask could you please consider the proposed fixes and merging? This is preventing previously happy users of Celery from continuing to use Celery :(

Thank you for your time, @ask!

@tiptoettt
Copy link

tiptoettt commented Apr 12, 2017

"Microsoft Windows is no longer supported.

The test suite is passing, and Celery seems to be working with Windows, but we make no guarantees as we are unable to diagnose issues on this platform. If you are a company requiring support on this platform, please get in touch."
Probably not going to be fixed.

@iiilx
Copy link

iiilx commented Apr 12, 2017

@tiptoettt this is not an issue specific to Windows. I'm on Mac. Can you please closely look at everyone's comments? Developers will ditch Celery because of this. I've used celery for 5 years and it's a major issue. I am going to use an earlier version 3.1.25 if it doesn't have this issue.

@zero-code-hero
Copy link

Also having this issue

lpiner@host:~$ uname -a
Linux host 4.4.0-72-generic #93-Ubuntu SMP Fri Mar 31 14:07:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
lpiner@host:~$ celery --version
4.0.2 (latentcall)

This is the line of code that seems to cause the issue:

task_queues = (
    Queue('queue1_name', Exchange('default'), routing_key='default'),
    Queue('queue2_name', Exchange('default'), routing_key='general.routing_key'),
)

Removing the second queue causes the issue to go away.

@Vslira
Copy link

Vslira commented May 8, 2017

From what I understand from @ChillarAnand 's post in celery/kombu/issues/675, this issue should be solved by 4.0.3, right?

@zero-code-hero
Copy link

zero-code-hero commented May 8, 2017 via email

@Vslira
Copy link

Vslira commented May 8, 2017

Thanks, it worked!

@sireliah
Copy link

I had the same problem on Celery 3.1.23 (Cipater) using RabbitMQ. After long debugging I finally found this thread and fixed the problem by using 3.1.25 version. But it was banging the head against the wall, really.

@xarg
Copy link

xarg commented Jun 1, 2017

Same issue on v4.0.2 (latentcall) with multiple queues + heartbeat.

@MrKiven
Copy link

MrKiven commented Jun 2, 2017

same issue for me on v0.4.0.2 workaround is downgrading to v3.x

@ba1dr
Copy link

ba1dr commented Jun 26, 2017

v4.0.2
For me this issue is a showstopper. I do not consider downgrading to 3.x. I hope the new version will be released soon.

@zero-code-hero
Copy link

@ba1dr build from master. That fixes it for me.

@ba1dr
Copy link

ba1dr commented Jun 26, 2017

@LPiner, thanks, this fixed the issue for me. But this is not a production-ready solution, eh?

@zero-code-hero
Copy link

@ba1dr not much of a choice TBH. It's either this or downgrade back to 3.x.
We use it in production with no issues but our scale is only a few hundred jobs a day, your mileage may vary.

@thedrow
Copy link
Member

thedrow commented Jul 11, 2017

We're gonna release soon. See #4109

@rholloway
Copy link

Affected here as well. Just hit us. Ready for the release!

ketanbhatt pushed a commit to ketanbhatt/celery that referenced this issue Sep 13, 2017
gitmate-test-user pushed a commit to GitMateIO/gitmate-2 that referenced this issue Nov 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests