if connectionclosed, we die #36

ybrs · 2014-11-30T02:07:29Z

I am opening this pull request to discuss the following case. I'm not sure this is the best approach to recover or it fits with the plans of removing the worker etc. or a better way to do this, so please advice.

if somehow connection is interrupted, workers broken, kaput, here's the traceback

(env)$ STORM_SETTINGS_MODULE="chroma.settings" PYTHONPATH=`pwd` kuyruk -m kuyruk_config worker
I 92720 kuyruk.consumer.next:106 - Waiting for new message...
E 92720 pika.adapters.base_connection._handle_read:347 - Read empty data, calling disconnect
W 92720 pika.adapters.base_connection._check_state_on_disconnect:160 - Socket closed when connection was open
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/Users/aybarsbadur/projects/hipo/chroma/api/env/lib/python2.7/site-packages/kuyruk/consumer.py", line 93, in _process_data_events
    self.queue.channel.connection.process_data_events()
  File "/Users/aybarsbadur/projects/hipo/chroma/api/env/lib/python2.7/site-packages/kuyruk/connection.py", line 32, in process_data_events
    return super(Connection, self).process_data_events()
  File "/Users/aybarsbadur/projects/hipo/chroma/api/env/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 240, in process_data_events
    if self._handle_read():
  File "/Users/aybarsbadur/projects/hipo/chroma/api/env/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 348, in _handle_read
    super(BlockingConnection, self)._handle_read()
  File "/Users/aybarsbadur/projects/hipo/chroma/api/env/lib/python2.7/site-packages/pika/adapters/base_connection.py", line 348, in _handle_read
    return self._handle_disconnect()
  File "/Users/aybarsbadur/projects/hipo/chroma/api/env/lib/python2.7/site-packages/pika/adapters/base_connection.py", line 248, in _handle_disconnect
    self._adapter_disconnect()
  File "/Users/aybarsbadur/projects/hipo/chroma/api/env/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 318, in _adapter_disconnect
    self._check_state_on_disconnect()
  File "/Users/aybarsbadur/projects/hipo/chroma/api/env/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 371, in _check_state_on_disconnect
    raise exceptions.ConnectionClosed()
ConnectionClosed

When connection gets lost etc, it doesn't exit, it doesn't consume, so manual intervention needed, someone needs to restart the workers. Simply restarting rabbitmq causes this, worker stays there frozen.

we run workers in supervisor, so the simplest solution i could find was, shooting the process with sigterm. with autorestart=true in supervisor config, it restarts it a few times and everything gets back to normal.

A better approach would be restarting the thread but I couldn't find a good place to add a Queue or similar message passing to worker, and if the workers going to be replaced/removed this is an easier patch.

what do you guys think ?

coveralls · 2014-11-30T02:09:40Z

Coverage decreased (-0.18%) when pulling 02cf50b on ybrs:master into 18a6646 on cenkalti:master.

cenkalti · 2014-12-01T09:53:51Z

@ybrs This is on purpose. If connection is broken, worker exits. If you run workers under a process manager it should be fine.

By the way, I am working on version 2. I don't want to spend too much effort on this because the new design do not have Consumer class or separate threads.

ybrs · 2014-12-01T11:00:30Z

@cenkalti unfortunately it doesn't behave like that, it doesn't exit from the process, the consumer thread dies, but main thread lives, so process doesn't exit. You can reproduce the issue simply by restarting rabbitmq while worker is running. We are using my fork because of this issue - don't know why but at ec2 we see disconnections very often - so this fixes it.

Do you recommend switching to version 2 ? Any help you need on it ?

cenkalti · 2014-12-01T11:31:48Z

I'm sorry, you are right. Your patch is necessary in this case.

Version 2 is not complete yet. I need a week to finish, maybe two. Thanks for helping.

if connectionclosed, we die

if connectionclosed, we die

02cf50b

cenkalti added a commit that referenced this pull request Dec 1, 2014

Merge pull request #36 from ybrs/master

c8b890a

if connectionclosed, we die

cenkalti merged commit c8b890a into cenkalti:master Dec 1, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

if connectionclosed, we die #36

if connectionclosed, we die #36

ybrs commented Nov 30, 2014

coveralls commented Nov 30, 2014

cenkalti commented Dec 1, 2014

ybrs commented Dec 1, 2014

cenkalti commented Dec 1, 2014

if connectionclosed, we die #36

if connectionclosed, we die #36

Conversation

ybrs commented Nov 30, 2014

coveralls commented Nov 30, 2014

cenkalti commented Dec 1, 2014

ybrs commented Dec 1, 2014

cenkalti commented Dec 1, 2014