We were bitten by a nasty bug due to redis timing out on our client.
We ultimately found out that the culprit was ntpdate moving system time forward of some days, so that all timeouts started elapsing much earlier than they were needed. I know we should have restarted all sensitive services on such a time jump, but luckily there are new system calls which try to avoid this kind of problems.
Could we use e.g. clock_gettime() instead of time() to keep a monotonic server time to be used in all elapsed time calculations?
I'd be willing to propose a patch if you think it would be interesting.
I'm not sure I can do this, because this system call is not generally available AFAIK.
But what I can easily do is, at least, to make Redis aware that a strange time shift happened and log the event.
Btw there are many reasons why I can't do the monotonically clock stuff. For instance semantically in Redis an EXPIRE set the time at which a key will expire, because if you save the DB, it will still expire at that date.
Is not really "expire after 10 seconds" even if this is what actually happens. It's instead:
And indeed, both RDB, AOF and Replication are handled in this way, converting expires into EXPIREAT commands.
I just ran into a similar problem with snapshotting: if the system time goes backwards, snapshotting is still relative to the system time when redis started (or more likely the last snapshot).
I admit this is a pretty fringe case, but I've been bitten by this on an embedded device that can lose power or have the system time change at any time.