Skip to content


Subversion checkout URL

You can clone with
Download ZIP


Backup data pipelining #241

phene opened this Issue · 6 comments

2 participants


I've come to realize how disconnected each stage of the backup currently is, and I wanted to open up a dialog on refactoring this gem so that we can stream data through each stage rather than perform each read/write separately to the disk.

Take this configuration as an example:, 'full database and asset backup') do

  database MySQL do |database|               = DB_CONFIG['host']               = DB_CONFIG['database']
    database.username           = DB_CONFIG['username']
    database.password           = DB_CONFIG['password']
    database.additional_options = ['--single-transaction']

  archive :assets do |archive|
    archive.add "/path/to/asset/data"

  compress_with Gzip do |compression|    = true    = false

  encrypt_with OpenSSL do |encryption|
    encryption.password = "my_password"
    encryption.salt     = true
    encryption.base64   = false

  store_with Local do |local|
    local.path          = "/some/mount/of/remote/device"
    local.keep          = 4

When I trigger the backup, the following occurs:

  1. Mysql dump to .sql file
  2. Tar asset data
  3. Tar results of 1 and 2 together
  4. Gzip tar file from 3
  5. Encrypt tar.gz from 4
  6. Copy tar.gz.enc to destination path

It seems to me that we could easily combine steps 3, 4, 5, and 6 together, so that instead of performing a full read and write of the data 4 times, we only do it once. This could be accomplished by:

  1. Enabling the compress_with Gzip statement to inform the packager that the -z option should be used
  2. Enabling the encrypt_with OpenSSL statement to inform the packager that it should be piped into the openssl command with the correct options
  3. Enabling the store_with Local statement to inform the packager where it should be saved to instead of writing to a tmp file and copying.

These enhancements may become critical as backups get larger in size and disk/network I/O time can not be fully controlled.

I don't have any specific code change proposals yet, but I wanted to get a the rest of the communities thoughts on this problem.


Yup. This is what I've been wanting to do for a while now but wasn't sure what a good approach would be to take. Thanks for bringing it up. This is definitely something I want to have incorporated in to Backup, because as you said, the large the Backups the heavier I/O / CPU and Disk Usage will be.

If we could stream/convert File A -> File B without leaving a copy of File A behind that would be great. (And File B -> C -> D to stream through all the stages to get to the final result).

I am certainly willing to get something like that going, it has been bugging me for quite a while. Though it won't make it in to 3.0.20 (next release) if possible I'd like to implement it as soon as possible. It should be a seamless upgrade seeing as it doesn't change DSL or the end-result for the user so it can be incorporated at any time.

I'm currently waiting with 3.0.20 because I want to get the last few tickets resolved and bugs fixed so we have a clean base to work off of. I hope to get 3.0.20 out in the next few days, maybe this weekend if all goes well!

Cheers and thanks for the suggestion!



This would have to work with every compressor and encryptor, not just openssl and gzip, but that shouldn't be a problem I think?


As long as the tool supports reading and writing via stdin/stdout, then it should be easy to build a command pipeline with any of the tools involved.

tar cf - /path/to/my/files | gzip/bzip | openssl/gpg > destination.tar.gz/bz.enc


Right, we'll have to see if bzip2 pbzip2 lzma (compressors) and gpg (encryptor) support it, if so then we should be able to get it going. Also there is another process you missed because it isn't in the latest gem yet (but in HEAD@develop) which is the Splitter, and the Splitter basically uses the split utility to split the archives in to multiple chunks, but I'm pretty sure that supports reading/writing from stdin/stdout.

This will be added to Backup in 3.0.20 which is in a few days hopefully.


This is currently being worked on. This will not be in this next release, but will be in the following release - hopefully not too far off :)

As it stands, the new process will be:

  • Each archive configured will pipe it's tar output through the configured Compressor.
  • Each MySQL and PostgreSQL database configured will pipe it's dump output through the configure Compressor. (I'm still looking at the other databases...)

So, from your example, you would end up with:

The final packaging tar command would be piped through the Encryptor, giving you:
Or, if the new Splitter is also used, the tar output will be piped through the Encyptor and split, resulting in:


Pipeline changes have been merged to the develop branch. 9be16f7

@ghost ghost closed this
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.