Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backups over multiple volumes #4896

Open
hypnotoad opened this issue Dec 23, 2019 · 11 comments
Open

Backups over multiple volumes #4896

hypnotoad opened this issue Dec 23, 2019 · 11 comments

Comments

@hypnotoad
Copy link

I use borg backup for several years now and recommend it to everyone. There is just one thing missing and I am wondering if there is a plan of the borg developer team.

A fundamental limit of borg is that it is located to a single position in your file system and to a single backup medium. So when the backup medium is full, it has to be replaced by a bigger one. Instead, I think it should be possible to just have another volume. E.g., "volume 1" has all the files backuped in 2011-2018 and "volume 2" has 2019-2022 (but none of volume 1).

As far as I see it (reading the documentation several times), it is not possible to achieve this right now. I would assume it should be possible in the following way: When initializing a new volume, the database is copied from the previous volume with additional info that the data is on another volume. The data iteself is not copied.

Are there any plans like that by to borg team?

@fantasya-pbem
Copy link
Contributor

fantasya-pbem commented Dec 23, 2019

Are you aware that even if you just backed up a new file in 2019, the chunks for restoring it may be saved in the volume from 2011, because of deduplication? So restoring just a single file, regardless of it's creating date, could require access to all your volumes, which would be very impractical from my point of view.

@ThomasWaldmann
Copy link
Member

If you had many backup volumes, you could get the "windows 3.0" install feeling. :-)

(the stuff was on many floppy disks back then and it requested random disks while installing, repeatedly)

@hypnotoad
Copy link
Author

@fantasya-pbem : Sure, I am aware that all volumes are needed. The volumes are only needed during a restore, so it is not really unpractical.

As main use case, I see "Neverending backups": You simply never delete old backups and start a new volume when a disk is full. That would be practical for almost everyone. If you re-organize your photo files, deduplication kicks in. Of course, you can manually split your backups into something like "all photos" and "all videos", but then you need to do 2 backups with 2 meda every time you do a backup.

Let's not discuss that it would be unpractical because of xyz. I would propose that we keep the thread open in case someone needs the feature or wants to implement it. If there was an FAQ entry "are multi-volume backups possible?", I would not have started this thread.

@Stunner
Copy link

Stunner commented Dec 30, 2019

I agree with the OP (@hypnotoad ). I too have encountered such scenarios and agree it would be a nice to have feature. That being said, relying on multiple volumes isn't as safe on relying on just 1. Assuming a volume is just a HDD, you increase the chances of your backup being irrecoverable by adding more HDDs as you increase the chance of a failure in the backup array. I use borg within NAS however, so this risk is mitigated by my NAS's RAID array.

@mo-han
Copy link

mo-han commented Jan 12, 2020

Right now, borg just can't do this kind of job. When restoring, if the segments are incomplete, it will pause and prompt to ask for the missing but needed segment, which is fine. When during backup, if the dest repo lack some of the segments, the create action will just fail (however, sometimes the borg cache could keep the creation going even the real segment file is actually gone, but we just can't count on it). Anyway, multiple volume is not in borg's feature list, for now.

For guys who need to do incremental backups even the old archives are offline/inaccessible, there is a qualified tool called "DAR backup". It's a file-level backup tool instead of block-level. When DAR create archives, it generate a "catalogue" which records all the files and their size, date, CRC so this catalogue can be a reference when creating new archive. The catalogue can be isolated into a standalone file, very small, so easy to be kept in local while the real archive data can be stored anywhere - no need for the whole old archive, just this catalogue file as a reference archive is enough to do a incremntal backup.

Though DAR can not do block-level-dedup, but most of the time file-level-dedup is farily enough. Even borg can't dedup all kinds of files. I used to use borg to bakcup a virtual machine folder, it had vbox's vdi file inside, which is a sparse file, as a virtual disk image, constantly modified. Borg definitely tried to dedup the data in this vdi file, but after several archives had been created, the resulting repo size was still growing linearly, and the status report said there was little data to dedup. You can say that scroll-window-buffer-block-level-dedup is not almighty.

Borg is not suitable for poor/non-random-access storage medium/device. Borg assumes that the disk is big and fast enough. You can say it's born for hard disks while agaist tapes or optical discs. However, when we do backups, I mean real backups, not some temporary or whim bakcups, then we store our precious archives in a long-live medium, like a magnetic tape or an optical disc. In the past, a well-made HDD can live for decades, but that was the good old days. Nowadays, all consumer grade disks only have a 3-5 years warranty, most of them just die not long after that, some of them even die before the day. HDD is already pretty long-term for preservation, compared to Flash chips. But as a complex electromechanical device, if any part inside of it failed, the data may be gone. And to restore data from a broken HDD is rather difficult, or expensive, or just impossible, depending on the situations.

So boys, do not ask borg to do something that it do no want to do.

@fuzzdk
Copy link

fuzzdk commented Feb 11, 2020

I think this would be very useful. Especially if you could set up on how many redundant copies in a set of volumes you want to have of your files. This way you could have a set of hdds in a remote location and them sometime take the oldest one home and add new files. It would also be good with a a way to store files where the number of redundant copies is already high enough, just before you are going to move a hdd to the remote location.

@carlbeech
Copy link

Hi
I wouldn't think that having borg backup over more than one location would be a major issue... however, I was thinking that instead of swapping between backup media, you'd simply just plug in more than one backup USB hard drive...

Linux is perfectly capable of having more than one USB HDD plugged in - all that happens is that you mount them in two mount points - in which case, there should be no reason why borg cannot extend the backup across both drives - so backup chunks that do not fit in disk 1 are simply written to disk 2.

I've got a 2Tb backup disk, and a number of 500Gb disks - I'd really not want to have to go out and buy a 3 or 4 Tb disk when my 2Tb fills up - its costly and a little frustrating if I've got perfectly good disks sitting by the side of me...

Yes, its perfectly possible to have one repository on each disk, and run two backups one to each disk - and splitting the source information across the two backups - however, that means re-working the backups and losing the historical information, as opposed to a graceful expansion...

Anyway - just my 10c worth - please do not think I'm being critical here - I believe borg is one of the best backup mechanisms out there - its just that I think the ability to extend repos across more than one disk would be a very worthwhile addition..

All the best
Carl

@amerlyq
Copy link

amerlyq commented Jun 27, 2020

@carlbeech you know, you can create RAID over multiple USB drives.
Or, more flexibly, add all your USB drives to LVM group and get "one big virtual storage" -- and access it as such after inserting all disks.
Or course, if ANY disk fails -- in most cases all your backups are gone :)
But take into account how borg deduplication will spread your data chunks all over multiple disks anyway -- so if archiving/fragmenting is implemented poorly (without redundancy) you will lose/corrupt most of your backups even using Borg only.

Therefore consequences of disk failure are almost the same -- for LVM or borg-custom multidisk solution.
You may try LVM if increased risk is worth it.

On general topic: of course, if implemented correctly, redundant copies spread over multiple media are great to have.
But then Borg will become not only backup solution, but also archiving-library-management solution.
You may look how complex it may become by trying to setup and fully understand git-annex(1).
It's literally configuration complexity hell, which maybe fits into the brain of only the most prudent and disciplined data-hoarders.

@Massimo-B
Copy link

As of today, spreading a backup over several disks is still not possible?

@sat-hub
Copy link

sat-hub commented Aug 11, 2023

Borg does not handle disks. It just writes a "repository", which is a bunch of some directories and files, to a folder on a file system. This file system has to deliver enough free space. This file system has to be either mounted locally on the system where borg is executed (the client) or is accessed via SSH on a storage server that has installed a Borg binary, too.

@ThomasWaldmann
Copy link
Member

@sat-hub Hello fellow grumpy cat! 🤣

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants