KafkaConsumer and current positions #118

AlexeyRaga · 2017-02-02T05:30:34Z

When the consumer starts and there are no new messages for the group it is impossible to know current positions and to commit current offsets.

Committing offsets even if there are no new messages can be important because offsets in Kafka can expire. To prevent offsets from expiring we need to commit them periodically, even is there are no changes.

Normal commit() doesn't help because it won't do anything and would return KafkaError.NO_OFFSETS.

It is possible to use commit(offsets) and give it specific offsets, but how to get them?

Currently even if the consumer gets partitions assigned, consumer.assignment() returns an empty list when there are no new messages. I strongly suspect that even if I receive messages it will only give me partitions for the received messages, not all of them.

OK, I can get a list of assigned partitions from on_assign callback. But then when I call consumer.position(_assigned) it won't give me current offsets for the group, only for the partitions I received messages from.

Adding the ability to query currently assigned offsets would be very helpful.
Making commit() committing offsets even if nothing changed would be even more helpful.

The text was updated successfully, but these errors were encountered:

edenhill · 2017-02-02T09:12:08Z

There is a librdkafka issue for this:
confluentinc/librdkafka#584

While it makes sense to commit when reaching EOF even if there were no messages, it is effectively the same as setting auto.offset.reset=earliest.

@ewencp Thoughts?

As you say position() will return the application's current offsets, but if there are no messages on a partition there is no offset to return, do you think it should return the current partition high-watermark offset?

ewencp · 2017-02-03T00:01:46Z

While it makes sense to commit when reaching EOF even if there were no messages, it is effectively the same as setting auto.offset.reset=earliest.

Why is that effectively the same? Having auto.offset.reset=earliest and your offset commit expires, it seems like you'd go back and reprocess data?

Actually the behavior described for consumer.assignment and consumer.position sound confusing and surprising. Why won't consumer.position return offsets? It must know where it is currently trying to fetch data from.

newhoggy · 2017-02-03T02:01:44Z

And if you have auto.offset.reset=latest, it could mean messages are skipped when jobs are restarted or rebalanced. It's not just unintuitive, it's dangerous.

edenhill · 2017-03-02T21:45:52Z

There is a difference between where it tries to fetch (pre fetch position) and the application's actual position.
If we reach the partition EOF without seeing any messages (e.g., auto.offset.reset=largest), does that mean the consumer position is at EOF? Should that offset be committed even if the application has not seen any messages?

AlexeyRaga · 2017-03-02T23:07:02Z

There are edge cases though.

Here is an (edge-case) example of how messages can be lost if a job has an unfortunate downtime (however short):
If you are not committing offsets then what may happen is:

You reach EOF
Your job sits there waiting for more messages for a long time
Offsets expire
Job is still running fine without being able to commit offsets
Job crashes, got shut down for re-deployment, etc.
New messages are pushed to the topic
Job wakes up.

Because offsets were expired despite the job was running and attempting to re-commit its offsets previously, this job will lose messages. The job may go down because of re-deploying for a bug fix, etc, you just need to be lucky enough to listen to a not busy topic and then have a downtime when offsets are expired.

Typically in prod-ready environment when all the data matters you wouldn't use latest, you would use earliest though. This is because often you want to handle all the data you have. If it happens to be that the consumer group is deployed 1st time (or changed) you (re)-process all the data.
What happens then, if the offsets are lost/expired and the job is restarted, it would trigger full re-processing again, which is probably undesirable.

This is why things like samza is re-committing offsets every minute or so. And kafka-streams (and I believe any Java consumer) do the same.

I think that allowing jobs to re-assure their offsets would be logical in many reasons.

edenhill · 2017-03-08T11:43:21Z

Thanks for the thorough explanation, Alexey.

This boils down to two things:

commit offsets on EOF
commit offsets at regular intervals (auto.commit.interval.ms) even if the locally stored offset didnt change. You are right that the Java client does this but speaking the developers they actually think this should be fixed.

I would be happy to add a config property for the EOF one (as not to change the default behaviour), but I'm a little bit hesitant on the latter.

AlexeyRaga · 2017-03-09T06:02:15Z

I was not talking about auto commits (since we don't use them) but about explicit commit calls: when I call commit explicitly I would expect my offsets to be persisted (even if they are not changed).

I am surprised that this behaviour is considered to be a bug in kafka clients... Could you please explain to me why this behaviour is undesirable?

edenhill · 2017-04-10T14:35:17Z

I am surprised that this behaviour is considered to be a bug in kafka clients... Could you please explain to me why this behaviour is undesirable?

This is one of those corner-cases where you'll just need to make a design decision, both options have pros and cons.

If you have a use-case for re-committing offsets I'd be happy to add that as an option to librdkafka (can't change the default though).

AlexeyRaga · 2017-04-11T03:28:00Z

My point here is simple: messages frequency should not impact reasonability.
When I have a topic whose messages are infrequent, I still want to be able to reason about the behaviour of my consumers.
While my consumer is still alive and is still committing offsets, however stale, that should be the offsets for that consumer.
Otherwise, it is a bit weird: if I have a topic and there are no messages in it for a long while, then even if my consumer is up and running the offsets would get expired. And when it happens, and my consumer restarts (whatever causes it), the whole situation becomes unreasonable: the consumer will start from the beginning of a topic if it was configured with earliest, or it can potentially lose messages if it was configured with latest.

I could compensate with much longer offsets expiration settings, but then it impacts the broker performance AFAIK.

edenhill · 2017-08-09T13:00:30Z

This needs to be fixed in librdkafka, created upstream issue here: confluentinc/librdkafka#1372

edenhill · 2017-09-01T06:50:14Z

Closing this issue in favour of the librdkafka one. The librdkafka fix will not require changes to the Python code.

edenhill added enhancement wait info labels Apr 10, 2017

edenhill removed the wait info label Jun 20, 2017

edenhill mentioned this issue Aug 9, 2017

Allow re-Committing offsets confluentinc/librdkafka#1372

Open

edenhill added the librdkafka label Aug 9, 2017

edenhill closed this as completed Sep 1, 2017

johnistan mentioned this issue Jun 20, 2018

Consumer unable to read current offset position until one message is consumed for each partition assigned #406

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KafkaConsumer and current positions #118

KafkaConsumer and current positions #118

AlexeyRaga commented Feb 2, 2017 •

edited

edenhill commented Feb 2, 2017

ewencp commented Feb 3, 2017

newhoggy commented Feb 3, 2017

edenhill commented Mar 2, 2017

AlexeyRaga commented Mar 2, 2017

edenhill commented Mar 8, 2017

AlexeyRaga commented Mar 9, 2017

edenhill commented Apr 10, 2017

AlexeyRaga commented Apr 11, 2017

edenhill commented Aug 9, 2017

edenhill commented Sep 1, 2017

KafkaConsumer and current positions #118

KafkaConsumer and current positions #118

Comments

AlexeyRaga commented Feb 2, 2017 • edited

edenhill commented Feb 2, 2017

ewencp commented Feb 3, 2017

newhoggy commented Feb 3, 2017

edenhill commented Mar 2, 2017

AlexeyRaga commented Mar 2, 2017

edenhill commented Mar 8, 2017

AlexeyRaga commented Mar 9, 2017

edenhill commented Apr 10, 2017

AlexeyRaga commented Apr 11, 2017

edenhill commented Aug 9, 2017

edenhill commented Sep 1, 2017

AlexeyRaga commented Feb 2, 2017 •

edited