Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RD_KAFKA_RESP_ERR_NOT_ENOUGH_REPLICAS should be retriable according to protocol #1421

Closed
4 of 9 tasks
charkost opened this issue Sep 15, 2017 · 2 comments
Closed
4 of 9 tasks

Comments

@charkost
Copy link

charkost commented Sep 15, 2017

Description

Hello,

The RD_KAFKA_RESP_ERR_NOT_ENOUGH_REPLICAS does not seem to be a retriable error in librdkafka in contrast with what the official protocol states about this error: http://kafka.apache.org/protocol.html#protocol_error_codes

No RETRY attempts appeared in the log using debug=all:

%7|1505481017.739|SEND|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: Sent ProduceRequest (v3, 273 bytes @ 0, CorrId 5)
%7|1505481017.739|TOPPAR|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: test-scratchdd14 [0] 0+0 msgs
%7|1505481017.739|TOPPAR|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: test-scratchdd14 [2] 0+0 msgs
%7|1505481017.739|TOPPAR|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: test-scratchdd14 [1] 0+0 msgs
%7|1505481017.741|RECV|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: Received ProduceResponse (v3, 52 bytes, CorrId 5, rtt 2.24ms)
%7|1505481017.741|REQERR|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: ProduceRequest failed: Broker: Not enough in-sync replicas: explicit actions 0x0 
%7|1505481017.741|MSGSET|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: test-scratchdd14 [0]: MessageSet with 1 message(s) encountered error: Broker: Not enough in-sync replicas (actions 0x1)%7|1505481017.741|TOPPAR|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: test-scratchdd14 [0] 0+0 msgs
%7|1505481017.741|TOPPAR|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: test-scratchdd14 [2] 0+0 msgs
%7|1505481017.741|TOPPAR|rdkafka#producer-1| [thrd:kafka-a.vm.skroutz.gr:9092/bootstrap]: kafka-a.vm.skroutz.gr:9092/1: test-scratchdd14 [1] 0+0 msgs
[kafka] 2017/09/15 16:10:17 Delivery failed: `{"cuuid":"ODQzODU4ODE4NTVhMzIx","foo":"bar","ip":"127.0.0.1","r":"321321321","sig":"bad","ts":"1505481016.738828487","ua":"ApacheBench/2.3"}` to test-scratchdd14[0]@end: Broker: Not enough in-sync replicas
%7|1505481018.374|CONNECT|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: kafka-b.vm.skroutz.gr:9092/2: broker in state DOWN connecting
%7|1505481018.374|CONNECT|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: kafka-b.vm.skroutz.gr:9092/2: Connecting to ipv4#10.42.9.54:9092 (plaintext) with socket 12
%7|1505481018.374|STATE|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: kafka-b.vm.skroutz.gr:9092/2: Broker changed state DOWN -> CONNECT
%7|1505481018.374|BROADCAST|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: Broadcasting state change
%7|1505481018.374|BROKERFAIL|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: kafka-b.vm.skroutz.gr:9092/2: failed: err: Local: Broker transport failure: (errno: Connection refused)
%7|1505481018.374|FAIL|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: kafka-b.vm.skroutz.gr:9092/2: Connect to ipv4#10.42.9.54:9092 failed: Connection refused
%7|1505481018.374|STATE|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: kafka-b.vm.skroutz.gr:9092/2: Broker changed state CONNECT -> DOWN
%7|1505481018.374|BROADCAST|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: Broadcasting state change
%7|1505481018.374|BUFQ|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: kafka-b.vm.skroutz.gr:9092/2: Purging bufq with 0 buffers
%7|1505481018.374|BUFQ|rdkafka#producer-1| [thrd:kafka-b.vm.skroutz.gr:9092/bootstrap]: kafka-b.vm.skroutz.gr:9092/2: Updating 0 buffers on connection reset
%7|1505481018.570|CONNECT|rdkafka#producer-1| [thrd:kafka-c.vm.skroutz.gr:9092/bootstrap]: kafka-c.vm.skroutz.gr:9092/3: broker in state DOWN connecting
%7|1505481018.570|CONNECT|rdkafka#producer-1| [thrd:kafka-c.vm.skroutz.gr:9092/bootstrap]: kafka-c.vm.skroutz.gr:9092/3: Connecting to ipv4#10.42.9.16:9092 (plaintext) with socket 12
%7|1505481018.571|STATE|rdkafka#producer-1| [thrd:kafka-c.vm.skroutz.gr:9092/bootstrap]: kafka-c.vm.skroutz.gr:9092/3: Broker changed state DOWN -> CONNECT
%7|1505481018.571|BROADCAST|rdkafka#producer-1| [thrd:kafka-c.vm.skroutz.gr:9092/bootstrap]: Broadcasting state change

How to reproduce

Checklist

Please provide the following information:

  • librdkafka version (release number or git tag): 61d786b
  • Apache Kafka version: 0.11.0
  • librdkafka client configuration:
  • Operating system: debian 9
  • Using the legacy Consumer
  • Using the high-level KafkaConsumer
  • Provide logs (with debug=.. as necessary) from librdkafka
  • Provide broker log excerpts
  • Critical issue
@edenhill
Copy link
Contributor

Thanks, will look into it.

edenhill added a commit that referenced this issue Dec 12, 2017
…1432, #1476, #1421)

ProduceRequest retries are reworked to not retry the request itself,
but put the messages back on the partition queue (while maintaining
input order) and then have an upcoming ProduceRequest include the messages again.

Retries are now calculated per message rather than ProduceRequest
and the retry backoff is also enforced on a per-message basis.

The input order of messages is retained during this whole process,
which should guarantee ordered delivery if max.in.flight=1 but with retries > 0.

The new behaviour is formalised through documentation (INTRODUCTION.md)
edenhill added a commit that referenced this issue Dec 20, 2017
…1432, #1476, #1421)

ProduceRequest retries are reworked to not retry the request itself,
but put the messages back on the partition queue (while maintaining
input order) and then have an upcoming ProduceRequest include the messages again.

Retries are now calculated per message rather than ProduceRequest
and the retry backoff is also enforced on a per-message basis.

The input order of messages is retained during this whole process,
which should guarantee ordered delivery if max.in.flight=1 but with retries > 0.

The new behaviour is formalised through documentation (INTRODUCTION.md)
barrotsteindev pushed a commit to barrotsteindev/librdkafka that referenced this issue Jan 2, 2018
…, confluentinc#1092, confluentinc#1432, confluentinc#1476, confluentinc#1421)

ProduceRequest retries are reworked to not retry the request itself,
but put the messages back on the partition queue (while maintaining
input order) and then have an upcoming ProduceRequest include the messages again.

Retries are now calculated per message rather than ProduceRequest
and the retry backoff is also enforced on a per-message basis.

The input order of messages is retained during this whole process,
which should guarantee ordered delivery if max.in.flight=1 but with retries > 0.

The new behaviour is formalised through documentation (INTRODUCTION.md)
edenhill added a commit that referenced this issue Jan 2, 2018
…1432, #1476, #1421)

ProduceRequest retries are reworked to not retry the request itself,
but put the messages back on the partition queue (while maintaining
input order) and then have an upcoming ProduceRequest include the messages again.

Retries are now calculated per message rather than ProduceRequest
and the retry backoff is also enforced on a per-message basis.

The input order of messages is retained during this whole process,
which should guarantee ordered delivery if max.in.flight=1 but with retries > 0.

The new behaviour is formalised through documentation (INTRODUCTION.md)
edenhill added a commit that referenced this issue Jan 3, 2018
…1432, #1476, #1421)

ProduceRequest retries are reworked to not retry the request itself,
but put the messages back on the partition queue (while maintaining
input order) and then have an upcoming ProduceRequest include the messages again.

Retries are now calculated per message rather than ProduceRequest
and the retry backoff is also enforced on a per-message basis.

The input order of messages is retained during this whole process,
which should guarantee ordered delivery if max.in.flight=1 but with retries > 0.

The new behaviour is formalised through documentation (INTRODUCTION.md)
edenhill added a commit that referenced this issue Jan 10, 2018
…1432, #1476, #1421)

ProduceRequest retries are reworked to not retry the request itself,
but put the messages back on the partition queue (while maintaining
input order) and then have an upcoming ProduceRequest include the messages again.

Retries are now calculated per message rather than ProduceRequest
and the retry backoff is also enforced on a per-message basis.

The input order of messages is retained during this whole process,
which should guarantee ordered delivery if max.in.flight=1 but with retries > 0.

The new behaviour is formalised through documentation (INTRODUCTION.md)
@edenhill
Copy link
Contributor

Fixed on master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants