rdkafka_broker.c:2755:rd_kafka_fetch_reply_handle: assert failed #1948

Kshitij29 · 2018-08-14T08:51:51Z

Description

I get a lot of partition count change messages as follows:

`
%4|1534226113.072|LEADER|rdkafka#consumer-9| [thrd:main]: abc.com [14] is unknown (partition_cnt 6)

%4|1534226113.072|LEADER|rdkafka#consumer-9| [thrd:main]: abc.com [13] is unknown (partition_cnt 6)

%4|1534226113.072|LEADER|rdkafka#consumer-9| [thrd:main]: abc.com [12] is unknown (partition_cnt 6)

%5|1534226114.589|PARTCNT|rdkafka#consumer-9| [thrd:main]: Topic abc.com partition count changed from 6 to 15

%5|1534226121.090|PARTCNT|rdkafka#consumer-9| [thrd:main]: Topic abc.com partition count changed from 15 to 6

%4|1534226121.090|LEADER|rdkafka#consumer-9| [thrd:main]: abc.com [14] is unknown (partition_cnt 6)

%4|1534226121.090|LEADER|rdkafka#consumer-9| [thrd:main]: abc.com [13] is unknown (partition_cnt 6)

%4|1534226121.090|LEADER|rdkafka#consumer-9| [thrd:main]: abc.com [12] is unknown (partition_cnt 6)

%5|1534226121.177|PARTCNT|rdkafka#consumer-9| [thrd:main]: Topic abc.com partition count changed from 6 to 15
`
However using kafka-topics.sh, I get that partition count is constant at 15.

After these partition count change messages, my daemon crashes with the following error:
*** rdkafka_broker.c:2755:rd_kafka_fetch_reply_handle: assert: tver && rd_kafka_toppar_s2i(tver->s_rktp) == rktp *** Abort trap: 6

How to reproduce

Created 10 consumers in 10 separate threads using github.com/confluentinc/confluent-kafka-go .
All consumer threads are created by a main thread. All subscribe to same topic (here "abc.com"). Seeing the issue as soon as the main thread starts.

Checklist

IMPORTANT: We will close issues where the checklist has not been completed.

Please provide the following information:

librdkafka version (release number or git tag): v0.11.4
Apache Kafka version: v0.10.1.0
librdkafka client configuration: "group.id": abc.com, "auto.offset.reset": "earliest", "auto.commit.interval.ms": 5000, "api.version.request": "false", "broker.version.fallback": 0.10.1.0, "broker.address.family": "v4", "queued.max.messages.kbytes": 100,
Operating system: macOS Sierra 10.12.6
Provide logs (with debug=.. as necessary) from librdkafka: In Description
Provide broker log excerpts: NA
Critical issue: yes

The text was updated successfully, but these errors were encountered:

edenhill · 2018-08-14T08:55:05Z

It seems like your cluster is desynchronized, different brokers returning different partition counts for a topic.

librdkafka queries the brokers for topic metadata directly, while kafka-topics.sh queries zookeeper. A client should only query the brokers and not zookeeper.

As for the crash; there are some known problems when the partition count decreases (which should never happen in a healthy cluster), and this crash is most likely related to that. We'll look into fixing the crash but please do try to fix your cluster.

Kshitij29 · 2018-08-14T15:51:21Z

Thanks for the prompt reply @edenhill . I'll check if the cluster is properly synchronised. However, if this desynchronisation is persistent in our cluster due to some reason, I'll wait for your fix.

…1948) This may happen when the cluster is desynchronized and different brokers report different partition counts for a topic, resulting in the at-request-time rktp being removed and a new rktp created before the fetch response is returned.

edenhill · 2018-08-15T18:07:48Z

Fixed on master

Kshitij29 · 2018-08-17T06:20:45Z

Thanks for the solution @edenhill . The application does not crash now even with out-of-sync brokers.

edenhill · 2018-08-17T07:46:07Z

Thank you!

edenhill added the bug label Aug 14, 2018

edenhill closed this as completed Aug 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rdkafka_broker.c:2755:rd_kafka_fetch_reply_handle: assert failed #1948

rdkafka_broker.c:2755:rd_kafka_fetch_reply_handle: assert failed #1948

Kshitij29 commented Aug 14, 2018 •

edited

Loading

edenhill commented Aug 14, 2018

Kshitij29 commented Aug 14, 2018

edenhill commented Aug 15, 2018

Kshitij29 commented Aug 17, 2018

edenhill commented Aug 17, 2018

rdkafka_broker.c:2755:rd_kafka_fetch_reply_handle: assert failed #1948

rdkafka_broker.c:2755:rd_kafka_fetch_reply_handle: assert failed #1948

Comments

Kshitij29 commented Aug 14, 2018 • edited Loading

Description

How to reproduce

Checklist

edenhill commented Aug 14, 2018

Kshitij29 commented Aug 14, 2018

edenhill commented Aug 15, 2018

Kshitij29 commented Aug 17, 2018

edenhill commented Aug 17, 2018

Kshitij29 commented Aug 14, 2018 •

edited

Loading