I've come to realize how disconnected each stage of the backup currently is, and I wanted to open up a dialog on refactoring this gem so that we can stream data through each stage rather than perform each read/write separately to the disk.
Take this configuration as an example:
Backup::Model.new(:database_and_assets, 'full database and asset backup') do
database MySQL do |database|
database.host = DB_CONFIG['host']
database.name = DB_CONFIG['database']
database.username = DB_CONFIG['username']
database.password = DB_CONFIG['password']
database.additional_options = ['--single-transaction']
archive :assets do |archive|
compress_with Gzip do |compression|
compression.best = true
compression.fast = false
encrypt_with OpenSSL do |encryption|
encryption.password = "my_password"
encryption.salt = true
encryption.base64 = false
store_with Local do |local|
local.path = "/some/mount/of/remote/device"
local.keep = 4
When I trigger the backup, the following occurs:
It seems to me that we could easily combine steps 3, 4, 5, and 6 together, so that instead of performing a full read and write of the data 4 times, we only do it once. This could be accomplished by:
These enhancements may become critical as backups get larger in size and disk/network I/O time can not be fully controlled.
I don't have any specific code change proposals yet, but I wanted to get a the rest of the communities thoughts on this problem.
Yup. This is what I've been wanting to do for a while now but wasn't sure what a good approach would be to take. Thanks for bringing it up. This is definitely something I want to have incorporated in to Backup, because as you said, the large the Backups the heavier I/O / CPU and Disk Usage will be.
If we could stream/convert File A -> File B without leaving a copy of File A behind that would be great. (And File B -> C -> D to stream through all the stages to get to the final result).
I am certainly willing to get something like that going, it has been bugging me for quite a while. Though it won't make it in to 3.0.20 (next release) if possible I'd like to implement it as soon as possible. It should be a seamless upgrade seeing as it doesn't change DSL or the end-result for the user so it can be incorporated at any time.
I'm currently waiting with 3.0.20 because I want to get the last few tickets resolved and bugs fixed so we have a clean base to work off of. I hope to get 3.0.20 out in the next few days, maybe this weekend if all goes well!
Cheers and thanks for the suggestion!
This would have to work with every compressor and encryptor, not just openssl and gzip, but that shouldn't be a problem I think?
As long as the tool supports reading and writing via stdin/stdout, then it should be easy to build a command pipeline with any of the tools involved.
tar cf - /path/to/my/files | gzip/bzip | openssl/gpg > destination.tar.gz/bz.enc
Right, we'll have to see if bzip2 pbzip2 lzma (compressors) and gpg (encryptor) support it, if so then we should be able to get it going. Also there is another process you missed because it isn't in the latest gem yet (but in HEAD@develop) which is the Splitter, and the Splitter basically uses the split utility to split the archives in to multiple chunks, but I'm pretty sure that supports reading/writing from stdin/stdout.
This will be added to Backup in 3.0.20 which is in a few days hopefully.
This is currently being worked on. This will not be in this next release, but will be in the following release - hopefully not too far off :)
As it stands, the new process will be:
So, from your example, you would end up with:
The final packaging tar command would be piped through the Encryptor, giving you:
Or, if the new Splitter is also used, the tar output will be piped through the Encyptor and split, resulting in:
Pipeline changes have been merged to the develop branch. 9be16f7