Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: safe copy of repo? #5284

Open
Grunthos opened this issue Aug 5, 2020 · 18 comments
Open

Question: safe copy of repo? #5284

Grunthos opened this issue Aug 5, 2020 · 18 comments
Labels

Comments

@Grunthos
Copy link

Grunthos commented Aug 5, 2020

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

QUESTION

The FAQ clearly highlights the dangers of copying a repository, and notes the risk of corrupt caches when accessing a copy of a repository. My proposed 'secure' backup strategy is:

**Host A**
borg-create a local backup in an existing local repo

**Host B**
rsync (pull) the repo (using with-lock) [ only needs limited read access to Host A ]
mount the latest backup
borg-backup from the rsync'd 'staging' repo to local 'master' repo from the mounted backup
    (at least until we have a 'merge' function)

My problem is that while this seems to avoid the cache issues, accessing the repo on Host B (from Host B) will presumably create a cache, then the next time rsync is run, the repo may be radically changed, thus invalidating the cache on Host B.

Since it is a local->local copy, what is the best way of avoiding this potential problem? Just delete .cache/borg on Host B after doing the rsync?

@ThomasWaldmann
Copy link
Member

When using the FUSE fs (borg mount), you might lose some metadata (e.g. ACLs and "bsdflags" / fs flags).

Also, backing up such a mount might be rather slow when not being careful (guess the inode numbers would not be stable, so one needs to ignore them). For big archives, resource usage (esp. RAM) might be also quite high.

Considering that, would you still like to do that?

It looks a bit like you're looking for pull mode backups. We have a documented way to do them since recently, did you try that?

@Grunthos
Copy link
Author

Grunthos commented Aug 5, 2020

Thanks for the quick reply. I am not too sure if I should worry about "bsdflags"; ACLs don't worry me -- I assume the usual srwxgrwxrwx are preserved? It's a linux box.

The backup from the mount point takes a little over an hour; the data size is 100GB or so. I have not looked into RAM usage.

Pull backups are indeed what I am trying to achieve, but with the "puller" having minimal access (which this provides, since the backup is done locally on Host A). The use of SSHFS requires root access to the whole FS and was very slow in the one test I did, so it's not ideal for me.

I guess my revised concerns are: (a) is deleting the local cache on Host B sufficient to protect from corruption, or is there a better method, and (b) what are the exact implications of the loss of bsdflags (it's linux, so I assume I'm ok)?

Edit: as an aside, I am hoping the "merge" feature, if it appears, will remove the need for the mount/backup step on Host B.

@ThomasWaldmann
Copy link
Member

The general term is (filesystem) flags, but that is hard to search for and it used to be called bsdflags in borg, because the stuff on linux was mapped onto the bsd flags. It's stuff like the immutable flag, for example.

The normal mode is reflected by fuse, that should be easy to see.

I currently don't see a cache (corruption) issue with what you posted.

@Grunthos
Copy link
Author

My owrry with cache corruption is:

  • On Host B (with the copied version) I do something that uses the cache (note that I never plan to update the copy except via rsync)
  • Later, I use rsync to update the repo; this possibly changes or deletes blocks (not sure what happens under the hood, but imagine that the source has, for example, been pruned). This will erase/update any changes that may have occurred in prior 'list' or 'mount' or 'info' operations (note that these seem to all have an effect on the repo)
  • Once again, I do something that uses the cache on Host B...could the cache now be invalid?

@ThomasWaldmann
Copy link
Member

If you only produce (e.g. by copying, syncing) states of a repo that would occur "naturally" also (e.g. by using/updating it by multiple borg clients), there should never be any cache issue because borg checks whether its cache corresponds to the repo.

@Grunthos
Copy link
Author

Grunthos commented Oct 31, 2020

An update: this seems to work, except I think there IS an issue with the cache. If the source repo has a partial data file (<512MB) and that is copied to the second machine then cached some time later, it sometime fails to recognize when the original file is later updated (and rsynced). I suspect there might be a race condition on the file date....or something that results in the cached copy being updated. Not entirely sure what is happening, but it produces corrup block warnings, and deleting and rebuilding the cache fixes it (this is on 1.1.9)

@ThomasWaldmann
Copy link
Member

With "natural states" I did only mean the non-locked states. You newer should copy a locked repo that might have partial files.

@Grunthos
Copy link
Author

Grunthos commented Oct 31, 2020

Misunderstanding: the repo is never locked or in use when it is rsynced. Does that mean the partial file is not normal? It's possible the local backup died and left it in a mess.

edit; also, have now updated to 1.1.14

@ThomasWaldmann
Copy link
Member

You should use borg with-lock to start the copy process. The repo will be locked for the copy process then.

Even when borg is crashing / the connection breaks down, the repo should not be in a problematic state.

@Grunthos
Copy link
Author

Grunthos commented Oct 31, 2020

That's what it does (borg-with-lock). Pretty confident the repo was unused. I do think there is a race condition with the remote cache...or something...next time it happens (1 in 100 roughly so far), I'll take more details. Erasing the cache fixed it, which makes me think it's a cache problem.

Does the cache check both file size and date when using cached values?

@ThomasWaldmann
Copy link
Member

There are multiple caches / indexes, see the code for details.

But, as a general comment: if you need to think about that, something is already wrong.

In your case, it is that you have 2 copies of the same repo (with the same repo id) and you work with both. As long as they are in same state and you only do read accesses, it will work. If not, it won't / it might cause issues.

@Grunthos
Copy link
Author

Grunthos commented Oct 31, 2020

Don't forget the two identical repos are on different machines. The only contact between those machines is rsync, which is run while the source repo is locked. And the copy is only ever read.

And yes, it looks like something is wrong, but that was on 1.1.9....so maybe not in 1.1.14.

@Grunthos
Copy link
Author

Grunthos commented Nov 10, 2020

OK...a lot more testing and something is very fishy.

  1. I rsync'ed a repo to another machine (let's call the copy S).
  2. I stopped all other access to that repo, and stopped rsync
  3. I enabled a background task to:
  • scan another repo (M) for archive names
  • scan repo S for archive names
  • pick a name in S but not in M (sorted by date)
  • '''borg mount``` the S archive
  • borg backup from the S mount point to M
  • rinse and repeat

After about 50 such tasks, borg mount failed with a bad chunk checksum.

I ran a borg check, and it reported the bad chunk.

I copied (via cp -a) the repository to aid in debugging. Did nothing else with it.

I re-ran borg check on S. It found no errors.

Now I am confused.

Edit: if it helps, I have the borg mount output...

@ThomasWaldmann
Copy link
Member

Could be random corruption happening when accessing it at the original location, but not at the copied location.

Also, you are working with borg with 2 repo copies that have the same repo id and thus use the same clientside cache...

@Grunthos
Copy link
Author

Grunthos commented Nov 11, 2020

Sorry, you missed the point: the copy was not touched. I re-checked the ORIGINAL. Did not borg commands on the copy whatsoever...hence why I wrote "I copied (via cp -a) the repository to aid in debugging. Did nothing else with it.".

ie. as far as borg is converned I have only one copy with one id.

To try to be abundantly clear: the original location now verifies, and borg knows nothing of the copy...it was made in the expectation I would need to investigate the corruption further, but it went away...

@ThomasWaldmann
Copy link
Member

OK, sounds like random corruption somewhere.

@Grunthos
Copy link
Author

Indeed, and since in the first instance deleting the cache fixed it, and in this most recent instance, it "just fixed itself" one in left with the strong suspicion that the cache code may be buggy (ie. the cache is the source of the corruption), since the underlying data verifies.

@ThomasWaldmann
Copy link
Member

see updated #5830.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants