RD_KAFKA_RESP_ERR_NOT_ENOUGH_REPLICAS should be retriable according to protocol #1421

charkost · 2017-09-15T14:17:53Z

Description

Hello,

The RD_KAFKA_RESP_ERR_NOT_ENOUGH_REPLICAS does not seem to be a retriable error in librdkafka in contrast with what the official protocol states about this error: http://kafka.apache.org/protocol.html#protocol_error_codes

No RETRY attempts appeared in the log using debug=all:

%7|1505481017.739|SEND|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: Sent ProduceRequest (v3, 273 bytes @ 0, CorrId 5)
%7|1505481017.739|TOPPAR|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: test-scratchdd14 [0] 0+0 msgs
%7|1505481017.739|TOPPAR|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: test-scratchdd14 [2] 0+0 msgs
%7|1505481017.739|TOPPAR|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: test-scratchdd14 [1] 0+0 msgs
%7|1505481017.741|RECV|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: Received ProduceResponse (v3, 52 bytes, CorrId 5, rtt 2.24ms)
%7|1505481017.741|REQERR|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: ProduceRequest failed: Broker: Not enough in-sync replicas: explicit actions 0x0 
%7|1505481017.741|MSGSET|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: test-scratchdd14 [0]: MessageSet with 1 message(s) encountered error: Broker: Not enough in-sync replicas (actions 0x1)%7|1505481017.741|TOPPAR|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: test-scratchdd14 [0] 0+0 msgs
%7|1505481017.741|TOPPAR|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: test-scratchdd14 [2] 0+0 msgs
%7|1505481017.741|TOPPAR|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: test-scratchdd14 [1] 0+0 msgs
[kafka] 2017/09/15 16:10:17 Delivery failed: `{"cuuid":"ODQzODU4ODE4NTVhMzIx","foo":"bar","ip":"127.0.0.1","r":"321321321","sig":"bad","ts":"1505481016.738828487","ua":"ApacheBench/2.3"}` to test-scratchdd14[0]@end: Broker: Not enough in-sync replicas
%7|1505481018.374|CONNECT|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: kafka-b.vm.skroutz.gr:9092/2: broker in state DOWN connecting
%7|1505481018.374|CONNECT|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: kafka-b.vm.skroutz.gr:9092/2: Connecting to ipv4#10.42.9.54:9092 (plaintext) with socket 12
%7|1505481018.374|STATE|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: kafka-b.vm.skroutz.gr:9092/2: Broker changed state DOWN -> CONNECT
%7|1505481018.374|BROADCAST|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: Broadcasting state change
%7|1505481018.374|BROKERFAIL|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: kafka-b.vm.skroutz.gr:9092/2: failed: err: Local: Broker transport failure: (errno: Connection refused)
%7|1505481018.374|FAIL|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: kafka-b.vm.skroutz.gr:9092/2: Connect to ipv4#10.42.9.54:9092 failed: Connection refused
%7|1505481018.374|STATE|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: kafka-b.vm.skroutz.gr:9092/2: Broker changed state CONNECT -> DOWN
%7|1505481018.374|BROADCAST|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: Broadcasting state change
%7|1505481018.374|BUFQ|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: kafka-b.vm.skroutz.gr:9092/2: Purging bufq with 0 buffers
%7|1505481018.374|BUFQ|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: kafka-b.vm.skroutz.gr:9092/2: Updating 0 buffers on connection reset
%7|1505481018.570|CONNECT|rdkafka#producer-1| [thrd:kafka-c.vm.skroutz.gr:9092/bootstrap]: kafka-c.vm.skroutz.gr:9092/3: broker in state DOWN connecting
%7|1505481018.570|CONNECT|rdkafka#producer-1| [thrd:kafka-c.vm.skroutz.gr:9092/bootstrap]: kafka-c.vm.skroutz.gr:9092/3: Connecting to ipv4#10.42.9.16:9092 (plaintext) with socket 12
%7|1505481018.571|STATE|rdkafka#producer-1| [thrd:kafka-c.vm.skroutz.gr:9092/bootstrap]: kafka-c.vm.skroutz.gr:9092/3: Broker changed state DOWN -> CONNECT
%7|1505481018.571|BROADCAST|rdkafka#producer-1| [thrd:kafka-c.vm.skroutz.gr:9092/bootstrap]: Broadcasting state change

How to reproduce

Checklist

Please provide the following information:

librdkafka version (release number or git tag): 61d786b
Apache Kafka version: 0.11.0
librdkafka client configuration:
Operating system: debian 9
Using the legacy Consumer
Using the high-level KafkaConsumer
Provide logs (with debug=.. as necessary) from librdkafka
Provide broker log excerpts
Critical issue

The text was updated successfully, but these errors were encountered:

edenhill · 2017-09-25T09:06:00Z

Thanks, will look into it.

…1432, #1476, #1421) ProduceRequest retries are reworked to not retry the request itself, but put the messages back on the partition queue (while maintaining input order) and then have an upcoming ProduceRequest include the messages again. Retries are now calculated per message rather than ProduceRequest and the retry backoff is also enforced on a per-message basis. The input order of messages is retained during this whole process, which should guarantee ordered delivery if max.in.flight=1 but with retries > 0. The new behaviour is formalised through documentation (INTRODUCTION.md)

…, confluentinc#1092, confluentinc#1432, confluentinc#1476, confluentinc#1421) ProduceRequest retries are reworked to not retry the request itself, but put the messages back on the partition queue (while maintaining input order) and then have an upcoming ProduceRequest include the messages again. Retries are now calculated per message rather than ProduceRequest and the retry backoff is also enforced on a per-message basis. The input order of messages is retained during this whole process, which should guarantee ordered delivery if max.in.flight=1 but with retries > 0. The new behaviour is formalised through documentation (INTRODUCTION.md)

…1432, #1476, #1421) ProduceRequest retries are reworked to not retry the request itself, but put the messages back on the partition queue (while maintaining input order) and then have an upcoming ProduceRequest include the messages again. Retries are now calculated per message rather than ProduceRequest and the retry backoff is also enforced on a per-message basis. The input order of messages is retained during this whole process, which should guarantee ordered delivery if max.in.flight=1 but with retries > 0. The new behaviour is formalised through documentation (INTRODUCTION.md)

edenhill · 2018-01-15T10:25:31Z

Fixed on master

edenhill added bug producer labels Sep 25, 2017

edenhill closed this as completed Jan 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RD_KAFKA_RESP_ERR_NOT_ENOUGH_REPLICAS should be retriable according to protocol #1421

RD_KAFKA_RESP_ERR_NOT_ENOUGH_REPLICAS should be retriable according to protocol #1421

charkost commented Sep 15, 2017 •

edited

Loading

edenhill commented Sep 25, 2017

edenhill commented Jan 15, 2018

RD_KAFKA_RESP_ERR_NOT_ENOUGH_REPLICAS should be retriable according to protocol #1421

RD_KAFKA_RESP_ERR_NOT_ENOUGH_REPLICAS should be retriable according to protocol #1421

Comments

charkost commented Sep 15, 2017 • edited Loading

Description

How to reproduce

Checklist

edenhill commented Sep 25, 2017

edenhill commented Jan 15, 2018

charkost commented Sep 15, 2017 •

edited

Loading