New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backups over multiple volumes #4896
Comments
Are you aware that even if you just backed up a new file in 2019, the chunks for restoring it may be saved in the volume from 2011, because of deduplication? So restoring just a single file, regardless of it's creating date, could require access to all your volumes, which would be very impractical from my point of view. |
If you had many backup volumes, you could get the "windows 3.0" install feeling. :-) (the stuff was on many floppy disks back then and it requested random disks while installing, repeatedly) |
@fantasya-pbem : Sure, I am aware that all volumes are needed. The volumes are only needed during a restore, so it is not really unpractical. As main use case, I see "Neverending backups": You simply never delete old backups and start a new volume when a disk is full. That would be practical for almost everyone. If you re-organize your photo files, deduplication kicks in. Of course, you can manually split your backups into something like "all photos" and "all videos", but then you need to do 2 backups with 2 meda every time you do a backup. Let's not discuss that it would be unpractical because of xyz. I would propose that we keep the thread open in case someone needs the feature or wants to implement it. If there was an FAQ entry "are multi-volume backups possible?", I would not have started this thread. |
I agree with the OP (@hypnotoad ). I too have encountered such scenarios and agree it would be a nice to have feature. That being said, relying on multiple volumes isn't as safe on relying on just 1. Assuming a volume is just a HDD, you increase the chances of your backup being irrecoverable by adding more HDDs as you increase the chance of a failure in the backup array. I use borg within NAS however, so this risk is mitigated by my NAS's RAID array. |
Right now, borg just can't do this kind of job. When restoring, if the segments are incomplete, it will pause and prompt to ask for the missing but needed segment, which is fine. When during backup, if the dest repo lack some of the segments, the create action will just fail (however, sometimes the borg cache could keep the creation going even the real segment file is actually gone, but we just can't count on it). Anyway, multiple volume is not in borg's feature list, for now. For guys who need to do incremental backups even the old archives are offline/inaccessible, there is a qualified tool called "DAR backup". It's a file-level backup tool instead of block-level. When DAR create archives, it generate a "catalogue" which records all the files and their size, date, CRC so this catalogue can be a reference when creating new archive. The catalogue can be isolated into a standalone file, very small, so easy to be kept in local while the real archive data can be stored anywhere - no need for the whole old archive, just this catalogue file as a reference archive is enough to do a incremntal backup. Though DAR can not do block-level-dedup, but most of the time file-level-dedup is farily enough. Even borg can't dedup all kinds of files. I used to use borg to bakcup a virtual machine folder, it had vbox's vdi file inside, which is a sparse file, as a virtual disk image, constantly modified. Borg definitely tried to dedup the data in this vdi file, but after several archives had been created, the resulting repo size was still growing linearly, and the status report said there was little data to dedup. You can say that scroll-window-buffer-block-level-dedup is not almighty. Borg is not suitable for poor/non-random-access storage medium/device. Borg assumes that the disk is big and fast enough. You can say it's born for hard disks while agaist tapes or optical discs. However, when we do backups, I mean real backups, not some temporary or whim bakcups, then we store our precious archives in a long-live medium, like a magnetic tape or an optical disc. In the past, a well-made HDD can live for decades, but that was the good old days. Nowadays, all consumer grade disks only have a 3-5 years warranty, most of them just die not long after that, some of them even die before the day. HDD is already pretty long-term for preservation, compared to Flash chips. But as a complex electromechanical device, if any part inside of it failed, the data may be gone. And to restore data from a broken HDD is rather difficult, or expensive, or just impossible, depending on the situations. So boys, do not ask borg to do something that it do no want to do. |
I think this would be very useful. Especially if you could set up on how many redundant copies in a set of volumes you want to have of your files. This way you could have a set of hdds in a remote location and them sometime take the oldest one home and add new files. It would also be good with a a way to store files where the number of redundant copies is already high enough, just before you are going to move a hdd to the remote location. |
Hi Linux is perfectly capable of having more than one USB HDD plugged in - all that happens is that you mount them in two mount points - in which case, there should be no reason why borg cannot extend the backup across both drives - so backup chunks that do not fit in disk 1 are simply written to disk 2. I've got a 2Tb backup disk, and a number of 500Gb disks - I'd really not want to have to go out and buy a 3 or 4 Tb disk when my 2Tb fills up - its costly and a little frustrating if I've got perfectly good disks sitting by the side of me... Yes, its perfectly possible to have one repository on each disk, and run two backups one to each disk - and splitting the source information across the two backups - however, that means re-working the backups and losing the historical information, as opposed to a graceful expansion... Anyway - just my 10c worth - please do not think I'm being critical here - I believe borg is one of the best backup mechanisms out there - its just that I think the ability to extend repos across more than one disk would be a very worthwhile addition.. All the best |
@carlbeech you know, you can create RAID over multiple USB drives. Therefore consequences of disk failure are almost the same -- for LVM or borg-custom multidisk solution. On general topic: of course, if implemented correctly, redundant copies spread over multiple media are great to have. |
As of today, spreading a backup over several disks is still not possible? |
Borg does not handle disks. It just writes a "repository", which is a bunch of some directories and files, to a folder on a file system. This file system has to deliver enough free space. This file system has to be either mounted locally on the system where borg is executed (the client) or is accessed via SSH on a storage server that has installed a Borg binary, too. |
@sat-hub Hello fellow grumpy cat! 🤣 |
I use borg backup for several years now and recommend it to everyone. There is just one thing missing and I am wondering if there is a plan of the borg developer team.
A fundamental limit of borg is that it is located to a single position in your file system and to a single backup medium. So when the backup medium is full, it has to be replaced by a bigger one. Instead, I think it should be possible to just have another volume. E.g., "volume 1" has all the files backuped in 2011-2018 and "volume 2" has 2019-2022 (but none of volume 1).
As far as I see it (reading the documentation several times), it is not possible to achieve this right now. I would assume it should be possible in the following way: When initializing a new volume, the database is copied from the previous volume with additional info that the data is on another volume. The data iteself is not copied.
Are there any plans like that by to borg team?
The text was updated successfully, but these errors were encountered: