Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Producers using librdkafka log errors: Receive failed: Disconnected #132

Closed
solsson opened this issue Jan 22, 2018 · 3 comments
Closed

Producers using librdkafka log errors: Receive failed: Disconnected #132

solsson opened this issue Jan 22, 2018 · 3 comments

Comments

@solsson
Copy link
Contributor

solsson commented Jan 22, 2018

This is part of the investigation for #116. I start with kafkacat because I like the debug logging.

It's something we see in for example fluent-bit log aggregation (#131):

[2018/01/22 08:37:43] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-0.broker.kafka.svc.cluster.local:9092/0]: kafka-0.broker.kafka.svc.cluster.local:9092/0: Receive failed: Disconnected
[2018/01/22 08:37:43] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-0.broker.kafka.svc.cluster.local:9092/0]: kafka-0.broker.kafka.svc.cluster.local:9092/0: Receive failed: Disconnected
[2018/01/22 08:37:43] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
[2018/01/22 08:37:43] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
[2018/01/22 08:47:43] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
[2018/01/22 08:47:43] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
[2018/01/22 08:57:43] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected
[2018/01/22 08:57:43] [error] [out_kafka] fluent-bit#producer-1: [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected
[2018/01/22 08:57:44] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
[2018/01/22 08:57:44] [error] [out_kafka] fluent-bit#producer-1: [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected

I don't think messages get lost.

The test ./kafka/test/kafkacat.yml is a good way to correlate logs from producer and consumer:

Testcase output:

test-kafkacat [0] offset 113772
test-kafkacat [0] offset 113773
Last message is 10.213 old:
Test kafkacat-9552m@2018-01-22T08:45:19,332988261+00:00
Last message is 10.213 old:
Test kafkacat-9552m@2018-01-22T08:45:29,333140279+00:00
test-kafkacat [0] offset 113776
test-kafkacat [0] offset 113777

Producer:

9552m:Test kafkacat-9552m@2018-01-22T08:43:19,346975867+00:00
9552m:Test kafkacat-9552m@2018-01-22T08:43:29,332832192+00:00
%7|1516610729.335|REQERR|rdkafka#producer-1| [thrd:kafka-0.broker.kafka.svc.cluster.local:9092/0]: kafka-0.broker.kafka.svc.cluster.local:9092/0: ProduceRequest failed: Broker: Not leader for partition: explicit actions 0x0
%7|1516610729.336|REQERR|rdkafka#producer-1| [thrd:kafka-0.broker.kafka.svc.cluster.local:9092/0]: kafka-0.broker.kafka.svc.cluster.local:9092/0: ProduceRequest failed: Broker: Not leader for partition: explicit actions 0x0
... lots of the same ...
%7|1516610729.616|REQERR|rdkafka#producer-1| [thrd:kafka-0.broker.kafka.svc.cluster.local:9092/0]: kafka-0.broker.kafka.svc.cluster.local:9092/0: ProduceRequest failed: Broker: Not leader for partition: explicit actions 0x0
%7|1516610729.616|REQERR|rdkafka#producer-1| [thrd:kafka-0.broker.kafka.svc.cluster.local:9092/0]: kafka-0.broker.kafka.svc.cluster.local:9092/0: ProduceRequest failed: Broker: Not leader for partition: explicit actions 0x0
%7|1516610729.626|TOPBRK|rdkafka#producer-1| [thrd:kafka-0.broker.kafka.svc.cluster.local:9092/0]: kafka-0.broker.kafka.svc.cluster.local:9092/0: Topic test-kafkacat [0]: leaving broker (0 messages in xmitq, next leader kafka-1.broker.kafka.svc.cluster.local:9092/1, rktp 0x7f9808002bf0)
%7|1516610729.626|TOPBRK|rdkafka#producer-1| [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Topic test-kafkacat [0]: joining broker (rktp 0x7f9808002bf0)
%7|1516610729.671|REQERR|rdkafka#producer-1| [thrd:kafka-0.broker.kafka.svc.cluster.local:9092/0]: kafka-0.broker.kafka.svc.cluster.local:9092/0: ProduceRequest failed: Broker: Not leader for partition: explicit actions 0x0
%7|1516610739.337|REQERR|rdkafka#producer-1| [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: ProduceRequest failed: Broker: Not leader for partition: explicit actions 0x0
... lots of the same ...
%7|1516610739.586|REQERR|rdkafka#producer-1| [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: ProduceRequest failed: Broker: Not leader for partition: explicit actions 0x0
%7|1516610739.587|REQERR|rdkafka#producer-1| [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: ProduceRequest failed: Broker: Not leader for partition: explicit actions 0x0
%7|1516610739.588|TOPBRK|rdkafka#producer-1| [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Topic test-kafkacat [0]: leaving broker (0 messages in xmitq, next leader kafka-0.broker.kafka.svc.cluster.local:9092/0, rktp 0x7f9808002bf0)
%7|1516610739.588|TOPBRK|rdkafka#producer-1| [thrd:kafka-0.broker.kafka.svc.cluster.local:9092/0]: kafka-0.broker.kafka.svc.cluster.local:9092/0: Topic test-kafkacat [0]: joining broker (rktp 0x7f9808002bf0)
%7|1516610739.588|REQERR|rdkafka#producer-1| [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: ProduceRequest failed: Broker: Not leader for partition: explicit actions 0x0
9552m:Test kafkacat-9552m@2018-01-22T08:43:39,331839701+00:00
9552m:Test kafkacat-9552m@2018-01-22T08:43:49,332344669+00:00

Consumer:

1516610569334;9552m:0;113758;Test kafkacat-9552m@2018-01-22T08:42:49,334087744+00:00
% Reached end of topic test-kafkacat [0] at offset 113759
1516610579336;9552m:0;113759;Test kafkacat-9552m@2018-01-22T08:42:59,335617617+00:00
% Reached end of topic test-kafkacat [0] at offset 113760
%7|1516610588.428|BROKERFAIL|rdkafka#consumer-1| [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: failed: err: Local: Broker transport failure: (errno: Resource temporarily unavailable)
%7|1516610588.428|STATE|rdkafka#consumer-1| [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Broker changed state UP -> DOWN
% ERROR: Local: Broker transport failure: bootstrap.kafka:9092/bootstrap: Receive failed: Disconnected
%7|1516610588.513|BROKERFAIL|rdkafka#consumer-1| [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: failed: err: Local: Broker transport failure: (errno: Resource temporarily unavailable)
%7|1516610588.513|STATE|rdkafka#consumer-1| [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Broker changed state UP -> DOWN
%7|1516610588.530|CONNECT|rdkafka#consumer-1| [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: broker in state DOWN connecting
%7|1516610588.531|CONNECT|rdkafka#consumer-1| [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Connecting to ipv4#10.3.252.81:9092 (plaintext) with socket 15
%7|1516610588.532|STATE|rdkafka#consumer-1| [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Broker changed state DOWN -> CONNECT
%7|1516610588.532|CONNECT|rdkafka#consumer-1| [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Connected to ipv4#10.3.252.81:9092
%7|1516610588.532|CONNECTED|rdkafka#consumer-1| [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Connected (#1021)
%7|1516610588.532|STATE|rdkafka#consumer-1| [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Broker changed state CONNECT -> APIVERSION_QUERY
%7|1516610588.533|STATE|rdkafka#consumer-1| [thrd:bootstrap.kafka:9092/bootstrap]: bootstrap.kafka:9092/bootstrap: Broker changed state APIVERSION_QUERY -> UP
% ERROR: Local: Broker transport failure: kafka-1.broker.kafka.svc.cluster.local:9092/1: Receive failed: Disconnected
%7|1516610588.613|CONNECT|rdkafka#consumer-1| [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: broker in state DOWN connecting
%7|1516610588.614|CONNECT|rdkafka#consumer-1| [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Connecting to ipv4#10.0.6.58:9092 (plaintext) with socket 16
%7|1516610588.614|STATE|rdkafka#consumer-1| [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Broker changed state DOWN -> CONNECT
%7|1516610588.614|CONNECT|rdkafka#consumer-1| [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Connected to ipv4#10.0.6.58:9092
%7|1516610588.614|CONNECTED|rdkafka#consumer-1| [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Connected (#993)
%7|1516610588.614|STATE|rdkafka#consumer-1| [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Broker changed state CONNECT -> APIVERSION_QUERY
%7|1516610588.639|STATE|rdkafka#consumer-1| [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Broker changed state APIVERSION_QUERY -> UP
1516610589334;9552m:0;113760;Test kafkacat-9552m@2018-01-22T08:43:09,334195814+00:00
% Reached end of topic test-kafkacat [0] at offset 113761
1516610599348;9552m:0;113761;Test kafkacat-9552m@2018-01-22T08:43:19,346975867+00:00
% Reached end of topic test-kafkacat [0] at offset 113762
... 10 similar omitted ...
1516610709336;9552m:0;113772;Test kafkacat-9552m@2018-01-22T08:45:09,335680228+00:00
% Reached end of topic test-kafkacat [0] at offset 113773
1516610719333;9552m:0;113773;Test kafkacat-9552m@2018-01-22T08:45:19,332988261+00:00
% Reached end of topic test-kafkacat [0] at offset 113774
%7|1516610729.000|TOPBRK|rdkafka#consumer-1| [thrd:kafka-0.broker.kafka.svc.cluster.local:9092/0]: kafka-0.broker.kafka.svc.cluster.local:9092/0: Topic test-kafkacat [0]: leaving broker (0 messages in xmitq, next leader kafka-1.broker.kafka.svc.cluster.local:9092/1, rktp 0x7f85ac002bf0)
%7|1516610729.000|TOPBRK|rdkafka#consumer-1| [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Topic test-kafkacat [0]: joining broker (rktp 0x7f85ac002bf0)
1516610729333;9552m:0;113774;Test kafkacat-9552m@2018-01-22T08:45:29,333140279+00:00
% Reached end of topic test-kafkacat [0] at offset 113775
%7|1516610734.002|TOPBRK|rdkafka#consumer-1| [thrd:kafka-1.broker.kafka.svc.cluster.local:9092/1]: kafka-1.broker.kafka.svc.cluster.local:9092/1: Topic test-kafkacat [0]: leaving broker (0 messages in xmitq, next leader kafka-0.broker.kafka.svc.cluster.local:9092/0, rktp 0x7f85ac002bf0)
%7|1516610734.003|TOPBRK|rdkafka#consumer-1| [thrd:kafka-0.broker.kafka.svc.cluster.local:9092/0]: kafka-0.broker.kafka.svc.cluster.local:9092/0: Topic test-kafkacat [0]: joining broker (rktp 0x7f85ac002bf0)
1516610739336;9552m:0;113775;Test kafkacat-9552m@2018-01-22T08:45:39,334674612+00:00
% Reached end of topic test-kafkacat [0] at offset 113776
1516610749336;9552m:0;113776;Test kafkacat-9552m@2018-01-22T08:45:49,336160245+00:00
% Reached end of topic test-kafkacat [0] at offset 113777
@solsson
Copy link
Contributor Author

solsson commented Jan 22, 2018

There are interesting transient errors with the java based console producer/consumer too, in a seemingly healthy cluster:

Testcase:

OK (Test produce-consume-bootstrapbrokers-wcjwn@2018-01-22T10:11:27,470735872+00:00 at 2018-01-22T10:11:29)
OK (Test produce-consume-bootstrapbrokers-wcjwn@2018-01-22T10:11:37,326414153+00:00 at 2018-01-22T10:11:39)
Last message (at 2018-01-22T10:11:49) isn't from this test run (produce-consume-bootstrapbrokers-wcjwn@2018-01-22T10:11:47,323276991+00:00):
Test produce-consume-bootstrapbrokers-wcjwn@2018-01-22T10:11:37,326414153+00:00
OK (Test produce-consume-bootstrapbrokers-wcjwn@2018-01-22T10:11:57,477701157+00:00 at 2018-01-22T10:11:59)
OK (Test produce-consume-bootstrapbrokers-wcjwn@2018-01-22T10:12:07,324936198+00:00 at 2018-01-22T10:12:09)
+ tail -f /shared/produce.tmp
[2018-01-22 10:06:55,001] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 2 : {test-produce-consume-bootstrapbrokers=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2018-01-22 10:11:49,829] WARN [Producer clientId=console-producer] Got error produce response with correlation id 37 on topic-partition test-produce-consume-bootstrapbrokers-0, retrying (2 attempts left). Error: NETWORK_EXCEPTION (org.apache.kafka.clients.producer.internals.Sender)
Test produce-consume-bootstrapbrokers-wcjwn@2018-01-22T10:08:57,438508785+00:00
Test produce-consume-bootstrapbrokers-wcjwn@2018-01-22T10:09:07,332220250+00:00
[2018-01-22 10:09:10,896] ERROR [Consumer clientId=consumer-1, groupId=console-consumer-55623] Offset commit failed on partition test-produce-consume-bootstrapbrokers-0 at offset 14: This is not the correct coordinator. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2018-01-22 10:09:10,901] WARN [Consumer clientId=consumer-1, groupId=console-consumer-55623] Asynchronous auto-commit of offsets {test-produce-consume-bootstrapbrokers-0=OffsetAndMetadata{offset=14, metadata=''}} failed: Offset commit failed with a retriable exception. You should retry committing offsets. The underlying error was: This is not the correct coordinator. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
Test produce-consume-bootstrapbrokers-wcjwn@2018-01-22T10:09:17,380825151+00:00
Test produce-consume-bootstrapbrokers-wcjwn@2018-01-22T10:09:27,461868390+00:00
Test produce-consume-bootstrapbrokers-wcjwn@2018-01-22T10:09:37,326653465+00:00
Test produce-consume-bootstrapbrokers-wcjwn@2018-01-22T10:09:47,351274002+00:00
Test produce-consume-bootstrapbrokers-wcjwn@2018-01-22T10:09:57,450343508+00:00
Test produce-consume-bootstrapbrokers-wcjwn@2018-01-22T10:10:07,383976858+00:00
Test produce-consume-bootstrapbrokers-wcjwn@2018-01-22T10:10:17,321663927+00:00
Test produce-consume-bootstrapbrokers-wcjwn@2018-01-22T10:10:27,468669200+00:00
[2018-01-22 10:10:36,017] ERROR [Consumer clientId=consumer-1, groupId=console-consumer-55623] Offset commit failed on partition test-produce-consume-bootstrapbrokers-0 at offset 22: This is not the correct coordinator. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2018-01-22 10:10:36,018] WARN [Consumer clientId=consumer-1, groupId=console-consumer-55623] Asynchronous auto-commit of offsets {test-produce-consume-bootstrapbrokers-0=OffsetAndMetadata{offset=22, metadata=''}} failed: Offset commit failed with a retriable exception. You should retry committing offsets. The underlying error was: This is not the correct coordinator. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
Test produce-consume-bootstrapbrokers-wcjwn@2018-01-22T10:10:37,327567654+00:00
Test produce-consume-bootstrapbrokers-wcjwn@2018-01-22T10:10:47,337703014+00:00

I fail to find concurrent errors in producer, consumer and testcase pods.

@solsson
Copy link
Contributor Author

solsson commented Jan 24, 2018

Found a similar issue, and someone noticed that this happens about every 10 minutes, which is true for our fluent-bit pods as well. Blizzard/node-rdkafka#310 (comment)

@solsson
Copy link
Contributor Author

solsson commented Jan 2, 2019

@solsson solsson closed this as completed Jan 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant