-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discarding expired messages causes periodic latency spikes #11591
Comments
I have changed this to a feature instead of a bug. Message expiry works as expected. We consider improving the performance of it to be a feature. As it's a behavioural change we wouldn't want to make this part of a patch release. @romansmirnov you mentioned in the support case you'd like to see this as a patch release. Is this blocking the customer? @felix-mueller Could you say something about the priority of this issue? |
From my discussion with @romansmirnov I can share that this is a blocking issue for the customer as they have a large amount of messages, which have to be processed without impacting the performance to reach their targets. Therefore the request for a patch release. @felix-mueller should decide in the priority in the planning meeting, but based on my conversation with Roman this should have a high priority to resolve open issues with the customers production use case. |
@menski @remcowesterhoud this has very high priority and ideally should be done as soon as we have time. I would also vote for a patch release. |
Let's discuss it in the planning Friday 👍 |
We discussed in our planning meeting 2023-02-17 that we will aim to work on this issue in the next iteration. I will inform support about this. |
@remcowesterhoud, as mentioned by @felix-mueller, the issue is critical for the customer. They need an 8.1 patch with the upcoming March release to hold their timelines. @korthout, please feel free to approach me, if you need any further context. Also, we can discuss together the breakdown to decide on what we need to deliver at its core so that the customer can move on. |
Discussed about this issue with @romansmirnov (and others). We will aim to resolve the following tasks (in order of priority):
This issue also discussed the following, but we will not aim to achieve this in this iteration: |
Closing this issue as the main parts are implemented in the patch release and forwarded to |
13400: feat: expire messages in TTL checker in batches r=korthout a=abbasadel ## Description <!-- Please explain the changes you made here. --> To further improve the performance of message expiration, the Message TTL Checker writes a single `MessageBatch:EXPIRE` command (instead of individual `Message:EXPIRE` commands) to expire a batch of messages simultaneously. ## Related issues <!-- Which issues are closed by this PR or are related --> related: #11591 closes: #11953 Co-authored-by: Abbas Ibrahim <abbas.adel.ibrahim@gmail.com>
Describe the bug
Every 60s, the
MessageTimeToLiveChecker
runs to collect expired (published) messages to eventually discard them from the RocksDB state. While running, it iterates thedeadline
column family as long as there are expired messages, and for each expired message it appends an expire command to a batch. Once all expired messages are collected, it eventually will submit this batch of expire commands to the log stream.Now, assuming that 1500 messages per second with a TTL > 0 are published, and Zeebe is configured with 3 partitions, that means
This approach may cause latency spikes in two ways:
MessageTimeToLiveChecker
shares the same actor with the Stream Processor. Basically, while the checker runs the Stream Processor does not do any processing in the meantime. Meaning, if collecting the 30k expired messages by the checker takes e.g. ~100ms, the Stream Processor does not process anything during that time. Due to it, the Stream Processor's backlog increases, and everything in the backlog gets delayed by at least 100ms.Expected behavior
MessageTimeToLiveChecker
does not share an actor so that the Stream Processor continues processing while the checker collects expired messages.Hints
DbMessageState
or theTransactionContext
, etc.Environment:
related to SUPPORT-15892
The text was updated successfully, but these errors were encountered: