Minimum save interval when saving fails #849

Closed
p opened this Issue Dec 26, 2012 · 4 comments

Projects

None yet

2 participants

@p
p commented Dec 26, 2012

Suppose the rdb file is not writable for whatever reason.

Redis goes into a spin trying to save its data.

Here is a log excerpt for a single second of wallclock time: http://paste.kde.org/631532/

In this one second, redis attempted the write 63 times.

It also produced 18 kb of log entries which, unlike the data, were written to disk.

There is absolutely no reason to try to save any more frequently than once per second. In the configuration file it does not appear to be possible to specify a shorter save interval, so you can assume that users are ok with at least that much data loss.

In practice, save intervals can be way more than one second. This is what I have in my file which came as a default:

save 900 1
save 300 10
save 60 10000

You can probably retry saves after errors at the shortest interval, in this case once a minute. This feels weird coming from a world that cares about one's data, but the configuration above says that up to the last minute of data can be discarded at any time. Why are write errors more significant than other issues?

If you for whatever reason want to have a shorter retry interval on errors, 5x as frequent as the shortest configured interval, but no more frequent than once per second, ought to be a reasonable choice.

redis-2.6.7

@antirez
Owner
antirez commented Jan 21, 2013

Hello @p, I think you are very right and very wrong at the same time :-)

You are absolutely right, there is no excuse to fill the logs like that trying to save al every serverCron() call. it's simply stupid and should be avoided at all. It can create huge problems, load, fill the disk. BAD, I'll fix it as soon as I end writing this reply to you.

However you are wrong IMHO with the idea that if you can persist one time ever minute, then, "Hey it's not serious persistence! I can persiste every two minutes and it's the same".

If you guaranteed the user you are going to persist every 60 seconds, then better you try to make the delay as short as possible if there are problems that are possibly are transient. Once the issue is solved, we want to persist ASAP to honor the guarantee.

IMHO a reasonable behavior is the following:

  • If last save failed, wait at least a small number of seconds, like 5, in order to try again.
  • If the incident was logged, wait at least another minute before logging the error again, in the meantime increment a counter so you can log "Save failed 10 times in the latest minute" when finally you can log the issue.

I'll try to turn this into some reasonable code ASAP. Thank you. Keeping the issue open until I commit the code.

@p
p commented Jan 21, 2013

Sounds like a reasonable solution.

@antirez antirez added a commit that referenced this issue Apr 2, 2013
@antirez Throttle BGSAVE attempt on saving error.
When a BGSAVE fails, Redis used to flood itself trying to BGSAVE at
every next cron call, that is either 10 or 100 times per second
depending on configuration and server version.

This commit does not allow a new automatic BGSAVE attempt to be
performed before a few seconds delay (currently 5).

This avoids both the auto-flood problem and filling the disk with
logs at a serious rate.

The five seconds limit, considering a log entry of 200 bytes, will use
less than 4 MB of disk space per day that is reasonable, the sysadmin
should notice before of catastrofic events especially since by default
Redis will stop serving write queries after the first failed BGSAVE.

This fixes issue #849
ed2d988
@antirez antirez added a commit that referenced this issue Apr 2, 2013
@antirez Throttle BGSAVE attempt on saving error.
When a BGSAVE fails, Redis used to flood itself trying to BGSAVE at
every next cron call, that is either 10 or 100 times per second
depending on configuration and server version.

This commit does not allow a new automatic BGSAVE attempt to be
performed before a few seconds delay (currently 5).

This avoids both the auto-flood problem and filling the disk with
logs at a serious rate.

The five seconds limit, considering a log entry of 200 bytes, will use
less than 4 MB of disk space per day that is reasonable, the sysadmin
should notice before of catastrofic events especially since by default
Redis will stop serving write queries after the first failed BGSAVE.

This fixes issue #849
d6b0c18
@antirez antirez added a commit that referenced this issue Apr 2, 2013
@antirez Throttle BGSAVE attempt on saving error.
When a BGSAVE fails, Redis used to flood itself trying to BGSAVE at
every next cron call, that is either 10 or 100 times per second
depending on configuration and server version.

This commit does not allow a new automatic BGSAVE attempt to be
performed before a few seconds delay (currently 5).

This avoids both the auto-flood problem and filling the disk with
logs at a serious rate.

The five seconds limit, considering a log entry of 200 bytes, will use
less than 4 MB of disk space per day that is reasonable, the sysadmin
should notice before of catastrofic events especially since by default
Redis will stop serving write queries after the first failed BGSAVE.

This fixes issue #849
b237de3
@antirez
Owner
antirez commented Apr 2, 2013

Fixed, sorry for the delay.

@antirez antirez closed this Apr 2, 2013
@p
p commented Apr 2, 2013

Thanks!

@JackieXie168 JackieXie168 pushed a commit to JackieXie168/redis that referenced this issue Aug 29, 2016
@antirez Throttle BGSAVE attempt on saving error.
When a BGSAVE fails, Redis used to flood itself trying to BGSAVE at
every next cron call, that is either 10 or 100 times per second
depending on configuration and server version.

This commit does not allow a new automatic BGSAVE attempt to be
performed before a few seconds delay (currently 5).

This avoids both the auto-flood problem and filling the disk with
logs at a serious rate.

The five seconds limit, considering a log entry of 200 bytes, will use
less than 4 MB of disk space per day that is reasonable, the sysadmin
should notice before of catastrofic events especially since by default
Redis will stop serving write queries after the first failed BGSAVE.

This fixes issue #849
85e3fb8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment