Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnexpectedFrame exception after "Connection to broker lost" #1779

Closed
nicolasgrasset opened this issue Jan 10, 2014 · 14 comments
Closed

UnexpectedFrame exception after "Connection to broker lost" #1779

nicolasgrasset opened this issue Jan 10, 2014 · 14 comments

Comments

@nicolasgrasset
Copy link

Hey, I am not sure if this is a duplicate or not, but I could not find this exact issue. I realize we are not running 3.1.7 so I will upgrade all machines to check

Running:
amqp==1.3.3
billiard==3.3.0.10
celery==3.1.6
eventlet==0.14.0
kombu==3.0.7

Getting the following issue:

[2014-01-10 12:13:17,420: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
  File "/usr/local/virtualenvs/shapeup/local/lib/python2.7/site-packages/celery/worker/consumer.py", line 270, in start
    blueprint.start(self)
  File "/usr/local/virtualenvs/shapeup/local/lib/python2.7/site-packages/celery/bootsteps.py", line 123, in start
    step.start(parent)
  File "/usr/local/virtualenvs/shapeup/local/lib/python2.7/site-packages/celery/worker/consumer.py", line 790, in start
    c.loop(*c.loop_args())
  File "/usr/local/virtualenvs/shapeup/local/lib/python2.7/site-packages/celery/worker/loops.py", line 99, in synloop
    connection.drain_events(timeout=2.0)
  File "/usr/local/virtualenvs/shapeup/local/lib/python2.7/site-packages/kombu/connection.py", line 279, in drain_events
    return self.transport.drain_events(self.connection, **kwargs)
  File "/usr/local/virtualenvs/shapeup/local/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 90, in drain_events
    return connection.drain_events(**kwargs)
  File "/usr/local/virtualenvs/shapeup/local/lib/python2.7/site-packages/amqp/connection.py", line 282, in drain_events
    chanmap, None, timeout=timeout,
  File "/usr/local/virtualenvs/shapeup/local/lib/python2.7/site-packages/amqp/connection.py", line 345, in _wait_multiple
    channel, method_sig, args, content = read_timeout(timeout)
  File "/usr/local/virtualenvs/shapeup/local/lib/python2.7/site-packages/amqp/connection.py", line 316, in read_timeout
    return self.method_reader.read_method()
  File "/usr/local/virtualenvs/shapeup/local/lib/python2.7/site-packages/amqp/method_framing.py", line 195, in read_method
    raise m
UnexpectedFrame: Received 0x2d while expecting 0xce
@nicolasgrasset
Copy link
Author

Alright, I confirm the workers stop working (or crash) in 3.1.7, but I do not see this specific error in the log, so it might be something else

@ask
Copy link
Contributor

ask commented Jan 13, 2014

Do you know how to reproduce the error?

@nicolasgrasset
Copy link
Author

No. But we have celery deployed on 10-16 instances at the moment and it will happen to 1 or 2 of them every 24h.

I am trying to identify which task might be related. I am not sure how to debug it, but if you have anything, let me know. Logging is on WARNING right now and it is all we see.

@shahafabileah
Copy link

I believe I'm seeing the same issue.

Dummy-1    None WARNING consumer: Connection to broker lost. Trying to re-establish the connection... <celery.worker.consumer,None,MainProcess,None> 
Traceback (most recent call last): 
  File "/app/.heroku/python/lib/python2.7/site-packages/celery/worker/consumer.py", line 270, in start 
    blueprint.start(self) 
  File "/app/.heroku/python/lib/python2.7/site-packages/celery/bootsteps.py", line 123, in start 
    step.start(parent) 
  File "/app/.heroku/python/lib/python2.7/site-packages/celery/worker/consumer.py", line 786, in start 
    c.loop(*c.loop_args()) 
  File "/app/.heroku/python/lib/python2.7/site-packages/celery/worker/loops.py", line 98, in synloop 
    connection.drain_events(timeout=2.0) 
  File "/app/.heroku/python/lib/python2.7/site-packages/kombu/connection.py", line 279, in drain_events 
    return self.transport.drain_events(self.connection, **kwargs) 
  File "/app/.heroku/python/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 90, in drain_events 
    return connection.drain_events(**kwargs) 
  File "/app/.heroku/python/lib/python2.7/site-packages/amqp/connection.py", line 299, in drain_events 
    chanmap, None, timeout=timeout, 
  File "/app/.heroku/python/lib/python2.7/site-packages/amqp/connection.py", line 362, in _wait_multiple 
    channel, method_sig, args, content = read_timeout(timeout) 
  File "/app/.heroku/python/lib/python2.7/site-packages/amqp/connection.py", line 333, in read_timeout 
    return self.method_reader.read_method() 
  File "/app/.heroku/python/lib/python2.7/site-packages/amqp/method_framing.py", line 189, in read_method 
    raise m 
UnexpectedFrame: Received 0x81 while expecting 0xce 

Running:

amqp==1.4.1
billiard==3.3.0.14
celery==3.1.8
eventlet==0.14.0
gevent==1.0
kombu==3.0.10
...

We see this issue sporadically in the production environment (Heroku, using BigWig as the amqp service). I haven't repro'ed the issue when running locally.

I'll report back if I can gather more useful evidence.

@bernpuc
Copy link

bernpuc commented Sep 11, 2014

Is there any solution to this problem? I am seeing this typically, 1 -2 times per day across 6 different servers.

[2014-09-08 10:05:32,491: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.6/celery/worker/consumer.py", line 270, in start
blueprint.start(self)
File "/usr/lib/pymodules/python2.6/celery/bootsteps.py", line 123, in start
step.start(parent)
File "/usr/lib/pymodules/python2.6/celery/worker/consumer.py", line 790, in start
c.loop(_c.loop_args())
File "/usr/lib/pymodules/python2.6/celery/worker/loops.py", line 72, in asynloop
next(loop)
File "/usr/lib/pymodules/python2.6/kombu/async/hub.py", line 320, in create_loop
cb(_cbargs)
File "/usr/lib/pymodules/python2.6/kombu/transport/base.py", line 162, in on_readable
reader(loop)
File "/usr/lib/pymodules/python2.6/kombu/transport/base.py", line 144, in _read
drain_events(timeout=0)
File "/usr/lib/pymodules/python2.6/amqp/connection.py", line 325, in drain_events
return amqp_method(channel, args, content)
File "/usr/lib/pymodules/python2.6/amqp/channel.py", line 1908, in _basic_deliver
fun(msg)
File "/usr/lib/pymodules/python2.6/kombu/messaging.py", line 592, in _receive_callback
return on_m(message) if on_m else self.receive(decoded, message)
File "/usr/lib/pymodules/python2.6/kombu/messaging.py", line 559, in receive
[callback(body, message) for callback in callbacks]
File "/usr/lib/pymodules/python2.6/celery/worker/consumer.py", line 447, in on_task_received
callbacks)
File "/usr/lib/pymodules/python2.6/celery/worker/strategy.py", line 51, in task_message_handler
if req.revoked():
File "/usr/lib/pymodules/python2.6/celery/worker/job.py", line 334, in revoked
'expired' if expired else 'revoked', False, None, expired,
File "/usr/lib/pymodules/python2.6/celery/worker/job.py", line 318, in _announce_revoked
self.task.backend.mark_as_revoked(self.id, reason, request=self)
File "/usr/lib/pymodules/python2.6/celery/backends/base.py", line 124, in mark_as_revoked
request=request)
File "/usr/lib/pymodules/python2.6/celery/backends/amqp.py", line 135, in store_result
delivery_mode=self.delivery_mode,
File "/usr/lib/pymodules/python2.6/kombu/messaging.py", line 168, in publish
routing_key, mandatory, immediate, exchange, declare)
File "/usr/lib/pymodules/python2.6/kombu/connection.py", line 440, in _ensured
return fun(_args, *_kwargs)
File "/usr/lib/pymodules/python2.6/kombu/messaging.py", line 180, in _publish
[maybe_declare(entity) for entity in declare]
File "/usr/lib/pymodules/python2.6/kombu/messaging.py", line 111, in maybe_declare
return maybe_declare(entity, self.channel, retry, **retry_policy)
File "/usr/lib/pymodules/python2.6/kombu/common.py", line 99, in maybe_declare
return _maybe_declare(entity)
File "/usr/lib/pymodules/python2.6/kombu/common.py", line 114, in _maybe_declare
entity.declare()
File "/usr/lib/pymodules/python2.6/kombu/entity.py", line 504, in declare
self.exchange.declare(nowait)
File "/usr/lib/pymodules/python2.6/kombu/entity.py", line 166, in declare
nowait=nowait, passive=passive,
File "/usr/lib/pymodules/python2.6/amqp/channel.py", line 620, in exchange_declare
(40, 11), # Channel.exchange_declare_ok
File "/usr/lib/pymodules/python2.6/amqp/abstract_channel.py", line 67, in wait
self.channel_id, allowed_methods)
File "/usr/lib/pymodules/python2.6/amqp/connection.py", line 240, in _wait_method
self.method_reader.read_method()
File "/usr/lib/pymodules/python2.6/amqp/method_framing.py", line 189, in read_method
raise m
UnexpectedFrame: Received 0x04 while expecting 0xce
[2014-09-08 10:05:35,099: CRITICAL/MainProcess] Couldn't ack 369, reason:RecoverableConnectionError(None, 'connection already closed', None, '')

Traceback (most recent call last):
File "/usr/lib/pymodules/python2.6/kombu/message.py", line 93, in ack_log_error
self.ack()
File "/usr/lib/pymodules/python2.6/kombu/message.py", line 88, in ack
self.channel.basic_ack(self.delivery_tag)
File "/usr/lib/pymodules/python2.6/amqp/channel.py", line 1583, in basic_ack
self._send_method((60, 80), args)
File "/usr/lib/pymodules/python2.6/amqp/abstract_channel.py", line 50, in _send_method
raise RecoverableConnectionError('connection already closed')
RecoverableConnectionError: connection already closed

Running:
librabbitmq1 0.5.1
python-amqp 1.4.6
python-celery 3.1.6
python-django-celery 3.1.9
python-kombu 3.0.7
python-librabbitmq 1.5.2
rabbitmq-server 3.2.4

@thedrow
Copy link
Member

thedrow commented Jun 19, 2015

This could be related to #2066.

@ask
Copy link
Contributor

ask commented Jun 23, 2016

Closing this, as we don't have the resources to complete this task.

@HughWarrington
Copy link

Still an issue in Celery 4.3.0.

@HughWarrington
Copy link

Looking again, we have a very different stack trace. I think this is only related to Kombu, not Celery.

Sorry for the noise.

Here's the stack trace anyway, in case it's of interest:

UnexpectedFrame: Basic.publish: (505) UNEXPECTED_FRAME - expected content body, got non content body frame instead
  File "xxx.py", line 50, in send
    xxx
  File "xxx.py", line 98, in run
    xxx,
  File "xxx.py", line 106, in publish
    xxx,
  File "kombu/messaging.py", line 181, in publish
    exchange_name, declare,
  File "kombu/connection.py", line 510, in _ensured
    return fun(*args, **kwargs)
  File "kombu/messaging.py", line 203, in _publish
    mandatory=mandatory, immediate=immediate,
  File "amqp/channel.py", line 1789, in basic_publish_confirm
    self.wait([spec.Basic.Ack, spec.Basic.Nack], callback=confirm_handler)
  File "amqp/abstract_channel.py", line 80, in wait
    self.connection.drain_events(timeout=timeout)
  File "amqp/connection.py", line 500, in drain_events
    while not self.blocking_read(timeout):
  File "amqp/connection.py", line 506, in blocking_read
    return self.on_inbound_frame(frame)
  File "amqp/method_framing.py", line 55, in on_frame
    callback(channel, method_sig, buf, None)
  File "amqp/connection.py", line 510, in on_inbound_method
    method_sig, payload, content,
  File "amqp/abstract_channel.py", line 126, in dispatch_method
    listener(*args)
  File "amqp/connection.py", line 639, in _on_close
    (class_id, method_id), ConnectionError)

@auvipy
Copy link
Member

auvipy commented Sep 12, 2019

can you also check celery master? with py-amqp master?

@HughWarrington
Copy link

To my knowledge, we've only seen this error once or twice in about 2 million server hours, and I have no reliable repro steps. Also these are production servers so unfortunately I'm unable to test.

@auvipy
Copy link
Member

auvipy commented Sep 12, 2019

OK wait until test it with celery==4.4.0

@auvipy auvipy added this to the 4.4.x milestone Dec 16, 2019
@auvipy
Copy link
Member

auvipy commented Dec 16, 2019

can you test with celery==4.4.0

@auvipy
Copy link
Member

auvipy commented Feb 16, 2021

To my knowledge, we've only seen this error once or twice in about 2 million server hours, and I have no reliable repro steps. Also these are production servers so unfortunately I'm unable to test.

closing for now based on this scenario for now. but we will reopen it if this becomes more frequent. also #6416 might fix this, we will reopen it if it doesn;t.

@auvipy auvipy closed this as completed Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants