Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable running Message TTL Checker async to stream processor by default #12307

Closed
korthout opened this issue Apr 6, 2023 · 4 comments
Closed
Labels
area/performance Marks an issue as performance related component/engine kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc.

Comments

@korthout
Copy link
Member

korthout commented Apr 6, 2023

Description

We fixed

by introducing an experimental feature flag zeebe.broker.experimental.features.enableMessageTtlCheckerAsync which allows running the Message TTL Checker asynchronous to the stream processor when enabled.

By default, this feature is disabled. As 8.2 is now released, we should enable the feature by default so we can build up confidence in it and eventually remove the feature flag altogether.

Left over from

@korthout korthout added kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. area/performance Marks an issue as performance related component/engine labels Apr 6, 2023
@korthout
Copy link
Member Author

korthout commented Apr 12, 2023

ZPA Triage:

  • trial clusters will run on alpha as well, perhaps we should investigate to run this first on benchmarks
  • that's only useful if the benchmarks actually use a non-zero TTL for messages
  • otherwise, let's just flip the switch on the next alpha
  • alternatively, we could attempt to roll it out to small groups by changing the setting for a small group of new trial clusters
  • we should at least have a way to verify that this is not leading to issues once rolled out

@Zelldon Are there any chaos experiments running on testbench that use a non-zero message TTL?

EDIT: I've handed Ole the responsibility for confidence in this feature, so I no longer need to know about any chaos experiments that cover this

@korthout
Copy link
Member Author

korthout commented Apr 13, 2023

Today, I talked with @oleschoenburg about this topic:

Now:

  • There is a need to have this same feature flag available for timers Triggering due timer events causes periodic latency spikes #11594
  • Ole will work on this, and I will review it
  • we have already verified the following:
    • that due date timer checker is threadsafe
    • that there is a yielding behavior that should result in similar non-flooding of the log, like message ttl checker's batch limit

Further iterations:

  • Ole wants to introduce a new feature flag that allows running all scheduled tasks async to the stream processor and remove the message ttl specific feature flag Scheduled tasks should not block processing #12302
  • Ole would then be responsible for switching the default setting of this feature flag and testing it
  • ZPA will remove any left-over state mutations from the scheduled tasks
  • Nico will raise the priority of these changes at the ZPA planning session

This means that we can lower the priority of this issue to later, as it will likely be removed anyways as part of

@megglos
Copy link
Contributor

megglos commented Aug 17, 2023

@korthout this can be closed now, given #12302 is done, right?

@korthout
Copy link
Member Author

@megglos Yes we close this 👍

✅ All scheduled tasks are run async by default:

Thanks again @oleschoenburg 🙇

@korthout korthout closed this as not planned Won't fix, can't repro, duplicate, stale Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Marks an issue as performance related component/engine kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc.
Projects
None yet
Development

No branches or pull requests

2 participants