Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

New commands: PIPESAVE and DUMPSAVE #1185

Open
wants to merge 3 commits into
from

Conversation

Projects
None yet
5 participants

mpalmer commented Jul 7, 2013

I can't believe I haven't submitted this before...

This patch implements a pair of commands to make RDB backups easier, without chewing up IOPS on the system unnecessarily. Both commands emit an RDB file over a socket, rather than writing them to disk. The differences are:

  • PIPESAVE invokes a system command (as specified by the new config variable pipesavecommand) and writes the RDB to that command's stdin. Why would you want this? Well, you can implement a pipeline to compress, encrypt, and send-to-S3 an RDB file, all without it touching disk. This can, for example, reduce the time to do a backup from 3 hours to 15 minutes.
  • DUMPSAVE writes the RDB file straight back at the client who called the command. The client should be grabbing that data and doing something with it (writing it to disk would be a good recommendation). The purpose of this command is to allow a backup server to request an RDB file, without needing to either (a) become a slave (with the associated memory usage), or (b) Have the RDB written to disk locally.

mpalmer and others added some commits Aug 23, 2012

New command: pipesave
Allows you to specify a command to run (via the config file *only*, to
prevent people from running arbitrary code on your machine via redis), which
will be invoked and fed the contents of an RDB dump file.

Useful for taking offsite backups (say, to S3 or a centralised backup server)
or a periodic replication mechanism when you don't want (or don't have the
available memory) to run twice as many redis instances everywhere.

Redis treats these children as it would treat a child forked to BGSAVE.
This simplifies the code (rdb_child_pid is already checked where it matters),
but means you can't run a BGSAVE and PIPESAVE concurrently.
New command: dumpsave
Allows a client to connect and request an in-band RDB dump down the
connection.

This is ideal for automated backup systems, as it lets the backup
server "pull" a consistent copy of the Redis data instead of
hoping that a suitable RDB file will sitting on disk, ready for
pickup.

Note that this behaviour does not conform to the Redis protocol
specification in any way.
Provide a sample client for making use of the DUMPSAVE feature
The DUMPSAVE command's response does not follow the Redis protocol
specification. Instead of a standard Redis bulk reply, Redis will
send raw RDB data down the line without warning.

Never use a regular Redis client library to submit a DUMPSAVE. That
said, the DUMPSAVE exchange is so simple that it requires almost
no effort at all to implement a working client yourself.

mezzatto commented Jul 8, 2013

PIPESAVE with the S3 backup use case seems pretty usefull to me.

charl commented Jan 2, 2014

Would these commands still have the RDB requirement where when the dump is kicked off it may use up 2-3 time as much RAM as the current redis db is using?

I think PIPESAVE can be implemented with named pipes (probably), NFS mount or SSHFS. DUMPSAVE would introduce issues such as network stability since you are receiving data in a compressed, binary format and I am guessing it would be very hard to verify the integrity of the received data. The client already receives data in AOF format so that would be a more sane alternative although somewhat slower and resulting in larger output (but again you do not need an extra command for this).

Maybe what you need is simply I/O throttling in the RDB dump process, an extra parameter limiting the IOPS during SAVE/BGSAVE could be added? That could make sense for every use case, not only remote backups.

mpalmer commented Jan 16, 2014

@charl: RDB doesn't have a requirement to use three times the RAM of the master redis process, so no, these commands don't either. They do, however, fork a child, so the same CoW issues will exist. However, since the dump will probably stream faster over the network than to disk, the maximum overhead should be less.

@georgepsarakis: You could potentially use a named pipe, but it'd be messy to implement, and would still require a change to Redis to allow it to dump an RDB to a file other than dbfilename (because if we overrode dbfilename to be the named pipe, every time you shut down Redis all your data would go down the pipe and there'd be nothing to read on startup). I have no idea how you'd use an NFS mount or SSHFS to encrypt an RDB and send it to Amazon S3 without touching local disk. I also can't comprehend how dumping an RDB over a network introduces any more network instability than copying it over that same network after writing it to local disk. I'm not sure how to even address your suggestion of AOF as an alternative. It isn't. As for I/O throtting... I want my backups to complete faster, not even slower.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment