Skip to content
This repository has been archived by the owner on Mar 24, 2021. It is now read-only.

when kafka server broken, pykafka will broken but can not catch exception #233

Closed
hallelujah-shih opened this issue Aug 21, 2015 · 11 comments
Labels

Comments

@hallelujah-shih
Copy link

Exception in thread Thread-11:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 813, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 766, in run
self.__target(_self.__args, *_self.__kwargs)
File "/usr/lib/python2.7/site-packages/pykafka/simpleconsumer.py", line 277, in fetcher
self.fetch()
File "/usr/lib/python2.7/site-packages/pykafka/simpleconsumer.py", line 576, in fetch
min_bytes=self._fetch_min_bytes
File "/usr/lib/python2.7/site-packages/pykafka/broker.py", line 221, in fetch_messages
return future.get(FetchResponse)
File "/usr/lib/python2.7/site-packages/pykafka/handlers.py", line 55, in get
raise self.error
SocketDisconnectedError

Exception in thread Thread-12:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 813, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 766, in run
self.__target(_self.__args, *_self.__kwargs)
File "/usr/lib/python2.7/site-packages/pykafka/simpleconsumer.py", line 265, in autocommitter
self._auto_commit()
File "/usr/lib/python2.7/site-packages/pykafka/simpleconsumer.py", line 329, in _auto_commit
self.commit_offsets()
File "/usr/lib/python2.7/site-packages/pykafka/simpleconsumer.py", line 349, in commit_offsets
self._consumer_group, 1, 'pykafka', reqs)
File "/usr/lib/python2.7/site-packages/pykafka/broker.py", line 303, in commit_consumer_group_offsets
self.connect_offsets_channel()
File "/usr/lib/python2.7/site-packages/pykafka/broker.py", line 191, in connect_offsets_channel
self._offsets_channel_connection.connect(self._offsets_channel_socket_timeout_ms)
File "/usr/lib/python2.7/site-packages/pykafka/connection.py", line 66, in connect
timeout=timeout / 1000
File "/usr/lib64/python2.7/socket.py", line 575, in create_connection
raise err
error: [Errno 111] Connection refused

@yungchin
Copy link
Contributor

Thanks for reporting this.

I think there are two issues here (and they interact), and there are some proposed solutions on the issue tracker that touch on both issues to some extent. It's probably high time to address them.

  1. if the worker threads fail, the main thread may never find out. There is a proposed fix for this as part of zookeeper failure #206, specifically commit 2296af6 (and it is somewhat orthogonal to the rest of that pullreq, which we've not merged so far, probably in part because it touches too many things to reason about it easily, but I feel we should split out that solution from that pullreq and apply it, or apply something similar at least).
  2. some of the errors that might occur will benefit from retry logic as proposed in Consumer hanging after a broker gets down #223. I think it would make sense to shield application logic from SocketDisconnectedErrors and the like where possible.

Just to tie these all together: #224 is really the same problem that #223 solves, and #227 is, at least in part, also about this. Even #189 could be the same thing (ie if "it hangs" turns out to mean that the worker threads silently died).

@kbourgoin @emmett9001 what are your thoughts on this? Perhaps it would be better even to copy the exception handling model that is now in #177 for detecting dead worker threads, re-raising the exceptions on the main thread?

@emmettbutler
Copy link
Contributor

Yup @yungchin, I'm in favor of using the same approach from the async producer in the consumers, in which worker exceptions are raised to the main thread when a call is made to the public consumer API. We can even get away without implementing retry logic for disconnected sockets for the time being, since users will at least be able to catch the exception and restart the consumer themselves.

@emmettbutler
Copy link
Contributor

By the way, thanks for putting in the effort to deduplicate so many issues. It's very helpful to see the connections between them.

@pau
Copy link

pau commented Sep 20, 2015

I encountered this issue while trying to port our code from kafka-python to pykafka in order to use the balancedconsumer.

In my test environment I only have one Kafka broker. But if I had more than one Kafka broker, will the consumer be able to successfully switch to other available brokers on SocketDisconnectedError? If yes then it looks like the assumption is that there will always be at least one broker that is available?

@emmettbutler
Copy link
Contributor

The broad issue of the SimpleConsumer not being resilient to broker failure is touched on by several tickets, and is the next bigger-scale enhancement I'll be focusing on. Though it's very experimental at the moment and more than a bit out of date, this pull request is the current most promising work on this issue I'm aware of. In the next few weeks, I'll be putting together a cleaner pull request that addresses the issue of broker failure in the consumer. #233, #223, #224 and #227 are all related in some way to this more general issue.

To directly answer your question @pau: in general pykafka does assume that there's at least one broker available - the current Producer in master illustrates this. If you'd like to provide any input on the direction these upcoming fixes should take, please do let us know.

@pau
Copy link

pau commented Sep 20, 2015

Thanks for the response @emmett9001. Looking forward to the new PR 👍

For my input, I like the ideas from your previous comments:

... worker exceptions are raised to the main thread ...

... since users will at least be able to catch the exception and restart the consumer themselves ...

Our current code the uses kafka-python can catch kafka-python exceptions and handle it gracefully. So it is probably best to just offload that decision to the user of the library rather than building complex retry logic on your end.

@emmettbutler
Copy link
Contributor

In the producer, we retry a few times before erroring. I imagine we'll do
something similar in the consumer.

On Sunday, September 20, 2015, Paulo Tioseco notifications@github.com
wrote:

Thanks for the response @emmett9001 https://github.com/emmett9001.
Looking forward to the new PR [image: 👍]

For my input, I like the ideas from your previous comments:

... worker exceptions are raised to the main thread ...

... since users will at least be able to catch the exception and restart
the consumer themselves ...

Our current code the uses kafka-python can catch kafka-python exceptions
and handle it gracefully. So it is probably best to just offload that
decision to the user of the library rather than building complex retry
logic on your end.


Reply to this email directly or view it on GitHub
#233 (comment).

Emmett Butler | Software Engineer
http://www.parsely.com/?utm_source=Signature&utm_medium=emmett&utm_campaign=Signature

@pau
Copy link

pau commented Sep 21, 2015

Thanks. Is there any guidance on how to handle retries until this issue is fixed?

@emmettbutler
Copy link
Contributor

I think you can get pretty far catching SocketDisconnectError and re-instantiating your consumer. At Parsely, we allow the process to crash and be automatically restarted by supervisord, though that's of course not ideal either.

@pau
Copy link

pau commented Sep 21, 2015

The issue is that the main thread can't catch SocketDisconnectError.

Here's some sample code:

import logging

from pykafka import KafkaClient
from pykafka.exceptions import SocketDisconnectedError


logging.basicConfig(level='DEBUG')


client = KafkaClient(hosts='kafka:9092')
topic = client.topics['test-topic']


consumer = None
try:
    consumer =  topic.get_balanced_consumer(consumer_group='test-group',
                                            zookeeper_connect='zookeeper:2181',
                                            auto_commit_enable=True,
                                            auto_commit_interval_ms=1000)
    while True:
        message = consumer.consume()
        print message.__dict__
except SocketDisconnectedError:
    print 'handle error here'
finally:
    if consumer:
        consumer.stop()

@emmettbutler
Copy link
Contributor

Closing this issue in favor of #223, where work on smart retry logic in the consumer will be tracked

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants