Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with consumer reconnection after socket failure? #843

Closed
dnwe opened this issue Oct 7, 2016 · 9 comments
Closed

problem with consumer reconnection after socket failure? #843

dnwe opened this issue Oct 7, 2016 · 9 comments

Comments

@dnwe
Copy link
Contributor

dnwe commented Oct 7, 2016

Whilst I was testing how the kafka consumer + producer cope with network timeouts / dropping, I noticed that on-reconnection of the network the client seemingly floods a large number of new connections, which in our particular environment triggers some iptables rules which are designed to prevent connection floods. Looking at the logging it appeared to be the check_version code which connects to the brokers to guess at the right api_version to use.

Explicitly setting the api_version in the consumer config (to disable the api version probing) successfully prevents this from occurring.

@dpkp
Copy link
Owner

dpkp commented Oct 7, 2016

check_version is only called once per consumer / producer, and the result is cached. Are you creating new consumers / producers?

@dnwe
Copy link
Contributor Author

dnwe commented Oct 7, 2016

Hmmm that's odd.

No, I wasn't recreating them. It was the simplest send and recv example possible, with SASL_SSL. I'm not sure why the manual API version would have fixed it then 🤔

@dpkp
Copy link
Owner

dpkp commented Oct 7, 2016

It's possible that there's some code path that I haven't noticed that causes check_version to be called again, but I haven't been able to spot it.

setting api_version will always disable check_version, and I'd generally recommend configuring explicitly once you are in production.

@dnwe
Copy link
Contributor Author

dnwe commented Oct 10, 2016

Dug in to the log output for this again this morning to try and compare the different between api_version being set vs not. You are definitely correct and check_version is only ever called once.

The only obvious difference I could see in the flow of logic was that in client_async.py we call the bootstrap before check_version and hence the type of MetadataRequest used in the bootstrap differs

        if self.config['api_version'] is None or self.config['api_version'] < (0, 10):
            metadata_request = MetadataRequest[0]([])
        else:
            metadata_request = MetadataRequest[1](None)

and because check_version establish a connection to a node, we have one of those sooner than in the explicit api_version case.

@dpkp
Copy link
Owner

dpkp commented Oct 12, 2016

What is the problem you are seeing exactly? If your client is partitioned from all brokers and the partition is suddenly removed, I would definitely expect the client to immediately reconnect to all brokers and continue producing or consuming.

@dnwe
Copy link
Contributor Author

dnwe commented Oct 12, 2016

@dpkp this is the sort of behaviour that I see: recv.log.zip - network was unplugged around [20:41:26] and immediately re-connected.

@dnwe dnwe changed the title client_async.py #check_version() needs backoff / rate limiting? problem with consumer reconnection after socket failure? Oct 12, 2016
@jberstler
Copy link

jberstler commented Nov 17, 2016

@dpkp Is there any update on this? I have seen this numerous times and typically killing/restarting the consumer resolves the issue - that is, the Kafka instance really is running, but the consumer gets into a state where it does not consume new messages.

@modeyang
Copy link

+1 @bjustin-ibm @dpkp more than , I found kafka connections were CLOSE_WAIT when I use lsof -p <pid>, work around : kafka broker version 0.9, kafka-python 1.3.0.

@dpkp
Copy link
Owner

dpkp commented Mar 9, 2017

I believe the issue here was related to SASL and is fixed in master / #1003

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants