Big, busy Redis servers configured with AOF persistence can chew through
a lot of disk storage. It's usually not a big deal if capacity
estimates come in a little under true: operators can adapt to steady
growth by using filesystems that can be grown while online.
What we can't (instantaneously) adapt to, however, are the big,
unpredictable surges in demand that come with automatic AOF rewrites.
If a BGREWRITEAOF happens to completely fill its filesystem (with the
temporary AOF), there is a very real chance that Redis will die because
it has no space left to append to the real AOF.
You then find yourself in a somewhat ironic situation: the AOF rewrite
process designed to prevent Redis from filling its disk just killed
Redis... because it filled the disk. :)
With the existing implementation, Redis will terminate immediately if a
write to its AOF has failed.
This commit modifies AOF write behaviour when the fsync policy is set to
one of the less durable modes ('everysec' or 'no'). With this commit,
Redis will assume that if an AOF write fails because of ENOSPC and we
have a BGREWRITEAOF child running, the write might succeed if we try it
again after the child dies. This keeps Redis up and servicing requests
even if an automatic BGREWRITEAOF momentarily fills the disk.