Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message count estimation on compacted topics #83

Open
weeco opened this issue Jul 27, 2020 · 3 comments
Open

Message count estimation on compacted topics #83

weeco opened this issue Jul 27, 2020 · 3 comments
Labels
backend feature New feature or request help wanted Good for newcomers

Comments

@weeco
Copy link
Contributor

weeco commented Jul 27, 2020

We estimate the number of messages by calculating highWatermark - lowWatermark. This returns the correct number of messages for topics whose cleanup.policy is set to delete but it's not reliable and possibly very wrong for compacted topics.

The only way to get the exact number of messages per topic and per partition is to actually consume all these messages. Since this may take too long we elaborate other options:

Idea 1: Estimating the number of messages by using the partition size

We know each partitions log dir size as this can be queried via the Kafka API. The idea is to consume some messages in order to calculate a representative average message size and use that along with the partition size to calculate the number of messages in the compacted topic.

@weeco weeco added feature New feature or request help wanted backend labels Jul 27, 2020
@weeco weeco added this to the 1.3 release milestone Aug 3, 2020
@weeco weeco removed this from the 1.3 release milestone Feb 24, 2021
@twmb twmb added help wanted Good for newcomers and removed help wanted labels Oct 19, 2023
@JakeSummers
Copy link

In the short-term, it would be a significant improvement if we could NOT provide count estimation for compacted tables. I just spent more than a day debugging my Kafka consumer trying to figure out why it was missing messages.

The estimate in the topic was ~20 million but my consumer was only pulling ~2 million. This was a real head scratcher.

@weeco
Copy link
Contributor Author

weeco commented Feb 3, 2024

@JakeSummers We have a relative prominent hint that this is just an estimate in the configurations tab for any topic with compact or compact,delete cleanup policy. We'll additionally add a tooltip in the statistics bar to make this clearer (see: #1064).

We had many users that wished to have this feature, even for compacted topics as it would help them to have a rough idea how many messages could be in that topic. It's always hard to balance requests that may help some users, but confuse or disturb other users. I hope adding the hint in the statistics bar right next to the message count estimate makes it less worse. There's no way to figure out the exact count of messages without actually consuming all messages. And even while consuming all messages the count may already change (new compactions, deletions, etc).

@JakeSummers
Copy link

JakeSummers commented Feb 7, 2024

@weeco - In my case, the estimate of total messages was off by a factor of 10. I would argue that an estimate that is so wrong, does more harm than good.

image

And even while consuming all messages the count may already change (new compactions, deletions, etc).

With this implementation, you could provide a timestamp of when the message count was from.

Aka:

Messages
2,103,103
<i>at Feb 6, 2023, 12:01 UTC</i>

This makes it clear that the data was an estimate.

We'll additionally add a tooltip in the statistics bar to make this clearer

I don't think a tooltip is a good UI feature to use here. I think a better strategy would be to rename Messages to Estimated Messages. Using Messages conveys that this is a factual piece of information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend feature New feature or request help wanted Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants