Consumer stop consuming after Broker transport failure #548

Giska · 2018-12-21T09:47:29Z

Hi,

We encounter a problem with consumers that stop providing new messages to the 'data' listener.
This seemingly happens after a broker becomes temporarily unavailable (broker transport failure), but only rarely. We observed this on several different consumers on different topics with similar configurations, seemingly randomly (most of the times the consumers resume operations after a broken broker connection).

The consumer is still synchronized with its consumer group (which consists of a single consumer for one topic of 5 partitions), the high offsets increase as new message arrive on the partitions, but the consumer lag keeps increasing and messages are seemingly never properly consumed by the consumer.

We observed this sequence of events, where all partitions of a topic stopped consuming:

This 'event.error' seems to indicate the beginning of the problem: Error: broker transport failure
After this, no stats are logged again, although they were being logged every second before that.
10 seconds after the error, the consumer stops fetching every partition of the topic, with these two event logs happening for each partition:

{ severity: 7, fac: 'FETCH' } [thrd:BROKER_IP:9092/0]: BROKER_IP:9092/0: Topic TOPIC_NAME [3] in state active at offset 39611 (10/10 msgs, 0/40960 kb queued, opv 6) is not fetchable: queued.min.messages exceeded

{ severity: 7, fac: 'FETCHADD' } [thrd:BROKER_IP:9092/0]: BROKER_IP:9092/0: Removed TOPIC_NAME [3] from fetch list (0 entries, opv 6)

This happens at a time when no new messages are available (partitions with infrequent messages that appear at set times in this test environment), and the 'data' listener function does not receive any message, so it is not clear to us why the queue would be full.

Probably linked to #182.

Environment Information

OS: Debian Stretch
Node Version: 8.11.0
node-rdkafka version: 2.4.2

Consumer configuration

'api.version.request': true,
 'message.max.bytes': 150 * 1024 * 1024, // 150 MB
 'receive.message.max.bytes': messageMaxBytes * 1.3,
 // Logging
 'log.connection.close': true,
 'statistics.interval.ms': 1000,
 // Consumer-specific rdkafka settings
 'group.id': group_id,
 'auto.commit.interval.ms': 2000,
 'enable.auto.commit': true,
 'enable.auto.offset.store': true,
 'enable.partition.eof': false,
 'fetch.wait.max.ms': 100,
 'fetch.min.bytes': 1,
 'fetch.message.max.bytes': 20 * 1024 * 1024, // 20 MB
 'fetch.error.backoff.ms': 0,
 'heartbeat.interval.ms': 1000,
 'queued.min.messages': 10,
 'queued.max.messages.kbytes': Math.floor(40 * 1024), // 40 MB
 'session.timeout.ms': 7000,

The text was updated successfully, but these errors were encountered:

bobzsj87 · 2019-03-20T20:55:28Z

Same behaviour and same error of "broker transport failure". Consumer stops and we can see the lag of a topic caused by that. We have to restart the whole thing

carlessistare · 2019-04-11T18:10:23Z

@webmakersteve just pinging here too, since this issue is tracked in multiple issues, and on my opinion it's pretty critical, since the recovery for this problem, in prod environments is not easy.

ivan83 · 2019-04-12T07:23:51Z

@webmakersteve +1
This issue has been popping up on our prod environment since we started using this connector.
Most of the time connector recovers, but every once in a while it becomes unresponsive.
So each day, we have at least one consumer stopping at random time of the day.

mvtm-dn · 2019-04-15T12:47:06Z

@carlessistare IMHO there is a bug in librdkafka. My observations told me that the thread stops consume inside the library. Indirect sign of this is a "solving" issue #222

smaheshw · 2019-04-24T12:57:35Z

Same issue at our side. Has anybody got a working solution for this? This is extremely critical now for our project.

aakashkharche04 · 2019-05-09T12:38:20Z

We are also facing the same issue, Is there any fix for it?

RaajBadra · 2019-05-11T17:41:40Z

I'm also facing the same issue. This is a critical issue which has to be fixed. Is there a working solution?

danielAnguloG · 2019-07-16T15:48:41Z

Hello, Is there any update about this issue? or a possible workaround?

cravi24 · 2019-08-05T04:49:50Z

We are also facing the same issue. Should we go for non-flow mode for the time being till the fix is available

NeoyeElf · 2019-08-22T11:15:33Z

Is there any progress for it?

edenhill · 2019-08-22T11:42:01Z

Check the librdkafka release notes, might be time to upgrade the librdkafka provided by node-rdkafka.
https://github.com/edenhill/librdkafka/releases

funduck · 2019-08-28T13:44:51Z

Had same issue, first added every N minutes restart to my app, then switched to other lib, which is quite good for consuming messages, for producing is slow. Here I compared them

stale · 2019-11-26T14:25:13Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

l8on · 2020-08-06T01:50:43Z

We are noticing a similar issue. It seems like an update to the version of librdkafka that is used by this module might be worth a try. Is there anything the community can do to help move that along?

stale bot added the stale Stale issues label Nov 26, 2019

stale bot closed this as completed Dec 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consumer stop consuming after Broker transport failure #548

Consumer stop consuming after Broker transport failure #548

Giska commented Dec 21, 2018

bobzsj87 commented Mar 20, 2019

carlessistare commented Apr 11, 2019

ivan83 commented Apr 12, 2019

mvtm-dn commented Apr 15, 2019

smaheshw commented Apr 24, 2019

aakashkharche04 commented May 9, 2019

RaajBadra commented May 11, 2019

danielAnguloG commented Jul 16, 2019

cravi24 commented Aug 5, 2019

NeoyeElf commented Aug 22, 2019

edenhill commented Aug 22, 2019

funduck commented Aug 28, 2019

stale bot commented Nov 26, 2019

l8on commented Aug 6, 2020

Consumer stop consuming after Broker transport failure #548

Consumer stop consuming after Broker transport failure #548

Comments

Giska commented Dec 21, 2018

bobzsj87 commented Mar 20, 2019

carlessistare commented Apr 11, 2019

ivan83 commented Apr 12, 2019

mvtm-dn commented Apr 15, 2019

smaheshw commented Apr 24, 2019

aakashkharche04 commented May 9, 2019

RaajBadra commented May 11, 2019

danielAnguloG commented Jul 16, 2019

cravi24 commented Aug 5, 2019

NeoyeElf commented Aug 22, 2019

edenhill commented Aug 22, 2019

funduck commented Aug 28, 2019

stale bot commented Nov 26, 2019

l8on commented Aug 6, 2020