notifications retry infinitely, can lead to memory leak, need "max retries" #2422

jokeyrhyme · 2017-10-09T05:41:25Z

Firstly, it's so great that this project is available for public use, yay!

I noticed a bit of a quirk with the way events notifications retries work...

If a target for events notifications is down, then they accumulate to be retried later
If the target is down for a prolonged period of time, then the accumulation of failed notifications can consume increasing amounts of memory, potentially all available memory

There are probably multiple ways of solving this, but one such solution may be to provide a "max retries" option so that failed notifications do not accumulate indefinitely

stevvooe · 2018-02-09T18:50:13Z

Please see https://github.com/docker/docker.github.io/blob/master/registry/notifications.md#considerations. There is a general requirement that the notification endpoint is somewhat reliable. In practice, usually critical registry workflows rely on receiving notifications, so we err on retrying them until the registry falls over or the endpoint recovers.

In production, I'd recommend monitoring the queue sizes via expvar. We are also at looking getting more prometheus support (#2466) and will add these metrics. In addition, I'd recommend that you monitor the endpoints for failure. Failures are reported both via expvar and the log output.

dmp42 added the question label May 26, 2018

dmp42 closed this as completed May 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notifications retry infinitely, can lead to memory leak, need "max retries" #2422

notifications retry infinitely, can lead to memory leak, need "max retries" #2422

jokeyrhyme commented Oct 9, 2017

stevvooe commented Feb 9, 2018

notifications retry infinitely, can lead to memory leak, need "max retries" #2422

notifications retry infinitely, can lead to memory leak, need "max retries" #2422

Comments

jokeyrhyme commented Oct 9, 2017

stevvooe commented Feb 9, 2018