You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Firstly, it's so great that this project is available for public use, yay!
I noticed a bit of a quirk with the way events notifications retries work...
If a target for events notifications is down, then they accumulate to be retried later
If the target is down for a prolonged period of time, then the accumulation of failed notifications can consume increasing amounts of memory, potentially all available memory
There are probably multiple ways of solving this, but one such solution may be to provide a "max retries" option so that failed notifications do not accumulate indefinitely
The text was updated successfully, but these errors were encountered:
In production, I'd recommend monitoring the queue sizes via expvar. We are also at looking getting more prometheus support (#2466) and will add these metrics. In addition, I'd recommend that you monitor the endpoints for failure. Failures are reported both via expvar and the log output.
Firstly, it's so great that this project is available for public use, yay!
I noticed a bit of a quirk with the way events notifications retries work...
If a target for events notifications is down, then they accumulate to be retried later
If the target is down for a prolonged period of time, then the accumulation of failed notifications can consume increasing amounts of memory, potentially all available memory
There are probably multiple ways of solving this, but one such solution may be to provide a "max retries" option so that failed notifications do not accumulate indefinitely
The text was updated successfully, but these errors were encountered: