-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Producer cannot recover from broker failure #267
Comments
@tvoinarovskyi is this a known issue? Am I doing something wrong/is there a planned fix? |
Hey there. Thanks for the bug report. I am a bit stuck with a full refactor of subscription state and a little time on my hands, thus the late response. |
I'm reproducing with Kafka 0.11.0.1, aiokafka 0.3.1 Once stopping Kafka, it logs this very fast, like multiple times per second which seems like
Then I start Kafka backup, and aiokafka is stuck just logging this every 0.1 seconds:
|
I am on kafka 0.10.2.1 and aiokafka 0.3.1 |
@ask The issue is about Producer, but your log shows Consumer errors. Does the consumer fail in your case? How many brokers and topics do you have? |
Any updates on this @tvoinarovskyi ? |
@vineet-rh Am finishing the #286, no progress on this thus far. I do think it's not complicated, just needs to be tracked down. Sorry. |
@ask About your error, it seems like a separate error from this one. I will file up a new issue for this. As far as I can tell you created a topic with a replication factor of more nodes than those available. Frankly, I think Java's client does not handle this case either, no handlers for INVALID_REPLICATION_FACTOR in code... Will need to try to reproduce the case later. |
Hmm. Thanks for investigating! Since it worked at startup and it created the topics successfully then, it only broke after I restarted the broker. I guess it must have changed replication factor after restarting (or the number of brokers changed, but this was local with a single broker)? Apropos, is there an easy way to add a callback when the connection to the broker is closed? I have a topic creation cache that I probably have to reset when disconnected. |
@ask Hmm, there is certainly no API to understand when a connection is closed. The connections by design are meant to be transparent to the user, there are also cases when 2 sockets can be opened to a single node (coordination socket is separate). Anyway, if you don't find a way to do that, please open an issue with a use case, will add something ASAP. |
Hmm, now I can't reproduce this in master... @vineet-rh can you confirm this too? |
Ok, so actually it's a strange broker behaviour on startup. There is a case, where broker can return 0 nodes in metadata response. Now aiokafka believes it truly and erases all nodes from its cache =) |
Should be fixed by #297 |
It looks like the AIOKafkaProducer is unable to recover from the broker going down.
I tried with the following code:
Looks like if I take down my locally running broker and bring it back up, the program just goes into a loop with the following log message:
It looks like the producer is unable to get the cluster broker ids after a broker failure.
The text was updated successfully, but these errors were encountered: