New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hardlinked repositories: a bad idea? #4272

Closed
anarcat opened this Issue Jan 28, 2019 · 20 comments

Comments

Projects
None yet
2 participants
@anarcat
Copy link
Contributor

anarcat commented Jan 28, 2019

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

Bug

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

1.1.8

Operating system (distribution) and version.

Debian buster 10

Hardware / network configuration, and filesystems used.

ext4, external USB 1.5TB HDD

How much data is handled by borg?

Around two 80GB original size sets, 200GB deduplicated, after a dozen archives.

Full borg commandline that lead to the problem (leave away excludes and passwords)

See below.

Describe the problem you're observing.

I am seeing the same error as #3737 after doing some crazy hacks on a repository, but it's probably not due to a memory error and more a user error. Since others might have that trouble (and it might be something we want to avoid in the future), I figured I would report it as a bug, but maybe it could be resolved with a "hell no don't do that" FAQ... (for example here). Here's the story:

I had a repository (/media/calyx/borg, 1.5TB HDD external USB) where I would backup my workstation (curie) and laptop (angela). It was a nice experiment showing that while you can backup multiple machines to the same repository, and it does save space in the repository, but it implies resyncing caches and files so it felt a bit slow.

So I figured I would try to split the repository in two. The plan was simple: copy the repository to borg-curie (and rename the original one to borg-angela) and remove the "foreign" archives from the other. So I started:

cp -al borg borg-curie
mv borg borg-angela
borg delete borg-angela::curie-2017-09-18

That archive was the oldest curie backup, if my memory is exact. Also note that I used hardlinks to avoid using too much space, a technique I took from the originally successful approach taken by borg upgrade -i. This probably is the cause of all my problems here.

I lost the backtrace here, but once the borg delete completed on borg-angela, I couldn't create an archive in borg-curie. It failed with an error similar to #3737 which I can reproduce with check, below.

Also note that I tried to create a new archive while the delete was happening, but it led to a locking issue (which is fine). I also don't have the backtrace on that, unfortunately.

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Simple reproducer:

anarcat@curie:test(master)$ borg init -e none borg
anarcat@curie:test(master)$ borg create borg::curie-0 /etc/motd
anarcat@curie:test(master)$ borg create borg::angela-0 /etc/motd
anarcat@curie:test(master)$ cp -al borg borg-curie
anarcat@curie:test(master)$ mv borg borg-angela
anarcat@curie:test(master)$ borg list borg-angela
Warning: The repository at location /home/anarcat/test/borg-angela was previously located at /home/anarcat/test/borg
Do you want to continue? [yN] y
curie-0                              Mon, 2019-01-28 09:15:39 [2977f5f2cbe86cc04177038ff95775c0522a0e383c88bf94cd3c93be1a574fa6]
angela-0                             Mon, 2019-01-28 09:15:55 [2e236af1c92ca1a9a371d21df7afcbb9bc16a3181297e7cf643170e6501e55a5]
anarcat@curie:test(master)$ borg delete borg-angela::curie-0
anarcat@curie:test(master)$ borg list borg-angela
angela-0                             Mon, 2019-01-28 09:15:55 [2e236af1c92ca1a9a371d21df7afcbb9bc16a3181297e7cf643170e6501e55a5]
anarcat@curie:test(master)$ borg list borg-curie
Data integrity error: Invalid segment entry header [segment 8, offset 2161]: unpack requires a buffer of 41 bytes
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4436, in main
    exit_code = archiver.run(args)
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4368, in run
    return set_ec(func(args))
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 139, in wrapper
    kwargs['manifest'], kwargs['key'] = Manifest.load(repository, compatibility)
  File "/usr/lib/python3/dist-packages/borg/helpers.py", line 330, in load
    cdata = repository.get(cls.MANIFEST_ID)
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 1061, in get
    return self.io.read(segment, offset, id)
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 1392, in read
    size, tag, key, data = self._read(fd, self.put_header_fmt, header, segment, offset, (TAG_PUT, ), read_data)
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 1404, in _read
    segment, offset, err)) from None
borg.helpers.IntegrityError: Data integrity error: Invalid segment entry header [segment 8, offset 2161]: unpack requires a buffer of 41 bytes

Platform: Linux curie 4.19.0-1-amd64 #1 SMP Debian 4.19.12-1 (2018-12-22) x86_64
Linux: debian buster/sid 
Borg: 1.1.8  Python: CPython 3.7.2
PID: 341  CWD: /home/anarcat/test
sys.argv: ['/usr/bin/borg', 'list', 'borg-curie']
SSH_ORIGINAL_COMMAND: None

That's pretty close to what happened, except that in the above the borg-angela repository is still intact. In my situation, it's also corrupt.

Include any warning/errors/backtraces from the system logs

After the crash, I have moved the repositories around and tried to fix my mistake by copying the repository in multiple copies I could work on:

cp -a borg-curie borg-curie2 && mv borg-curie borg-curie-hardlinks && mv borg-curie2 borg-curie-corrupt && cp -a borg-curie-corrupt borg-curie-corrupt-work

Now I have a copy of a copy of the repository in borg-curie-corrupt-work which I'll try to restore. It fails:

root@curie:/media/calyx# borg check --repair --progress --verbose borg-curie-corrupt-work
'check --repair' is an experimental feature that might result in data loss.
Type 'YES' if you understand this and want to continue: YES
Starting repository check
Data integrity error: Invalid segment magic [segment 13193, offset 0]                                                                                                  
attempting to recover /media/calyx/borg-curie-corrupt-work/data/1/13193
Local Exception
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 911, in check
    objects = list(self.io.iter_objects(segment))
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 1335, in iter_objects
    raise IntegrityError('Invalid segment magic [segment {}, offset {}]'.format(segment, 0))
borg.helpers.IntegrityError: Data integrity error: Invalid segment magic [segment 13193, offset 0]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4436, in main
    exit_code = archiver.run(args)
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4368, in run
    return set_ec(func(args))
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 152, in wrapper
    return method(self, args, repository=repository, **kwargs)
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 313, in do_check
    if not repository.check(repair=args.repair, save_space=args.save_space):
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 916, in check
    self.io.recover_segment(segment, filename)
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 1364, in recover_segment
    with mmap.mmap(backup_fd.fileno(), 0, access=mmap.ACCESS_READ) as mm:
ValueError: cannot mmap an empty file

Platform: Linux curie 4.19.0-1-amd64 #1 SMP Debian 4.19.12-1 (2018-12-22) x86_64
Linux: debian buster/sid 
Borg: 1.1.8  Python: CPython 3.7.2
PID: 16572  CWD: /media/calyx
sys.argv: ['/usr/bin/borg', 'check', '--repair', '--progress', '--verbose', 'borg-curie-corrupt-work']
SSH_ORIGINAL_COMMAND: None

What's "funny" is that even the borg-angela repository seems corrupt. I attached the output of borg check on the angela repository because it's fairly large.

angela-check.txt

I'm currently running a --repair on that one as well, but makes me wonder if those repositories were healthy in the first place. But I was able to make a backup about a month ago without problems, here's the list of archives in borg-angela before the fateful move deleting that archive:

root@curie:/media/calyx# borg list borg-angela
Warning: The repository at location /media/calyx/borg-angela was previously located at /media/calyx/borg
Do you want to continue? [yN] y
angela-2015-10-15                    Thu, 2015-10-15 20:18:07 [3273fdc18f8a6a0622d52fea5fa7088f7d1c040e787a20a15182ef743a53884a]
angela-2016-11-22                    Tue, 2016-11-22 18:36:58 [fbac7796fa580890d972da0b5dbcceb792744b1e6c5bafc159aaa63ed5ce41b6]
curie-2017-09-18                     Mon, 2017-09-18 18:49:05 [a8e51d8a2fe3d42bc0f7cafc576dbfa55ee362bc2e38256dd5afff4ef4bf924d]
curie-2018-08-22                     Wed, 2018-08-22 12:02:15 [4b5e7f0558f74a37b8cbb0036350120266fa8c2c023f92c33caf5e338a976dc8]
angela-2018-10-23                    Tue, 2018-10-23 21:01:35 [8f958138dcb4079ab6dc317eb32235dc95abef9ab83107a6d17286587a2b8dbc]
curie-2018-11-14                     Wed, 2018-11-14 18:00:45 [3439b8d4dd2b6c82849e33d09b11c27bd7d1d78ea40d8e480b6eccb6f5ce75b6]
curie-2018-12-31                     Mon, 2018-12-31 15:11:25 [495fbcdaa375675651caac869ef1bdbb5e8747cfa17e7ab157e611dd31124f5b]

@anarcat anarcat changed the title working on a hardlinked repository is a bad idea hardlinked repositories: a bad idea? Jan 28, 2019

@anarcat

This comment has been minimized.

Copy link
Contributor Author

anarcat commented Jan 28, 2019

For what it's worth, here's the result of the repair on the other repository:
angela-repair.txt

... looks like the repair worked fine, which is interesting. i'm running a backup now to see if things work correctly there as well. presumably those Replacing with all-zero chunk. chunks will be fixed if the original files still exist? (and they do)

@ThomasWaldmann

This comment has been minimized.

Copy link
Member

ThomasWaldmann commented Jan 29, 2019

borg-angela: looks like you have 2 zero-length segment files and lost the data that should be in there.

i can not explain why these are zero-length, but as you are not sure that the repo was ok before you did your misc. operations on it, I'ld suggest you try to reproduce the problem starting from a known-valid repo. If you can reproduce, open a new ticket.

@ThomasWaldmann

This comment has been minimized.

Copy link
Member

ThomasWaldmann commented Jan 29, 2019

The recover_segment code obviously needs to catch the zero-length case though, this is a bug. It does not even need to try to recover if there is nothing left there.

@ThomasWaldmann ThomasWaldmann added the bug label Jan 29, 2019

@ThomasWaldmann ThomasWaldmann added this to the 1.1.9 milestone Jan 29, 2019

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Jan 29, 2019

recover_segment: handle 0-byte segment file correctly, see borgbackup…
…#4272

nothing left to recover there, but at least we must not crash in mmap().

@ThomasWaldmann ThomasWaldmann self-assigned this Jan 29, 2019

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Jan 29, 2019

recover_segment: handle too small segment files correctly, see borgba…
…ckup#4272

nothing left to recover there, but at least we must not crash in mmap().

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Jan 29, 2019

recover_segment: handle too small segment files correctly, see borgba…
…ckup#4272

nothing left to recover there, but at least we must not crash in mmap().
@ThomasWaldmann

This comment has been minimized.

Copy link
Member

ThomasWaldmann commented Jan 29, 2019

borg.helpers.IntegrityError: Data integrity error: Invalid segment entry
   header [segment 8, offset 2161]: unpack requires a buffer of 41 bytes

This is a "short read". borg tried to read 41 bytes from that offset (length of a PUT header), but obviously it got less than that.

@ThomasWaldmann

This comment has been minimized.

Copy link
Member

ThomasWaldmann commented Jan 29, 2019

Rather fundamental problems one runs into when copying repos and working on multiple of the copies independently:

  • same repo id, client uses the same cache storage for 2 repos

Even if one fixes above by tweaking the repoid, one still has:

  • crypto issues, counter reuse in AES-CTR mode
@ThomasWaldmann

This comment has been minimized.

Copy link
Member

ThomasWaldmann commented Jan 29, 2019

I guess the unexpected crash when reading the manifest for the repo for borg list comes exactly from that: it used the cached chunks index of the other repo and tried to read data from a location that wasn't valid for this repo.

So the root cause here was not changing the repo id to something unique.

But even when doing that, there is still the counter reuse security issue.

So maybe one just should not do that.

@anarcat

This comment has been minimized.

Copy link
Contributor Author

anarcat commented Jan 29, 2019

I see. That makes sense! Those repositories are in cleartext, so counter reuse is not a problem. If I understand this right, the repo id is generated with something like:

from binascii import hexlify
import os
hexlify(os.urandom(32)).decode('ascii')

... is that right?

I have just rerun borg check on the borg-curie-corrupt-work repository and it found errors again, although it did not crash, presumably because the borg-angela one wiped out the brokenness...

I'll try to rename the repository and run a check again on both sides.

@ThomasWaldmann

This comment has been minimized.

Copy link
Member

ThomasWaldmann commented Jan 29, 2019

Could be. I'ld just change 1 random digit, that's good enough for 1 copy.

@anarcat

This comment has been minimized.

Copy link
Contributor Author

anarcat commented Jan 29, 2019

i tried renaming the borg-curie-corrupt-work after changing the ID and it still can't recover:

root@curie:/media/calyx# borg check --progress --verbose --repair borg-curie-corrupt-work
'check --repair' is an experimental feature that might result in data loss.
Type 'YES' if you understand this and want to continue: YES
Starting repository check
Data integrity error: Invalid segment magic [segment 15102, offset 0]                                                                                                  
attempting to recover /media/calyx/borg-curie-corrupt-work/data/1/15102
Local Exception
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 911, in check
    objects = list(self.io.iter_objects(segment))
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 1335, in iter_objects
    raise IntegrityError('Invalid segment magic [segment {}, offset {}]'.format(segment, 0))
borg.helpers.IntegrityError: Data integrity error: Invalid segment magic [segment 15102, offset 0]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4436, in main
    exit_code = archiver.run(args)
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4368, in run
    return set_ec(func(args))
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 152, in wrapper
    return method(self, args, repository=repository, **kwargs)
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 313, in do_check
    if not repository.check(repair=args.repair, save_space=args.save_space):
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 916, in check
    self.io.recover_segment(segment, filename)
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 1364, in recover_segment
    with mmap.mmap(backup_fd.fileno(), 0, access=mmap.ACCESS_READ) as mm:
ValueError: cannot mmap an empty file

Platform: Linux curie 4.19.0-1-amd64 #1 SMP Debian 4.19.12-1 (2018-12-22) x86_64
Linux: debian buster/sid 
Borg: 1.1.8  Python: CPython 3.7.2
PID: 21970  CWD: /media/calyx
sys.argv: ['/usr/bin/borg', 'check', '--progress', '--verbose', '--repair', 'borg-curie-corrupt-work']
SSH_ORIGINAL_COMMAND: None

notice the failed mmap on an empty file...

the check on angela is worrisome as well:

root@curie:/media/calyx# borg check --progress --verbose borg-angela
Starting repository check
Starting repository index check                                                                                                                                        
Completed repository check, no problems found.
Starting archive consistency check...
Analyzing archive angela-2015-10-15 (1/7)
home/anarcat/Maildir/.Archives.2012/1325800475.19901_0.marcos:2,S: Previously missing file chunk is still missing (Byte 5630228-5800691). It has a all-zero replacement chunk already.
[... repeated ...]
Analyzing archive angela-2016-11-22 (2/7)
Analyzing archive curie-2018-08-22 (3/7)
Analyzing archive angela-2018-10-23 (4/7)
Analyzing archive curie-2018-11-14 (5/7)
Analyzing archive curie-2018-12-31 (6/7)
Analyzing archive angela-2019-01-28 (7/7)
Archive consistency check complete, no problems found.

does that mean the files are still missing from the archive? and why does it say "no problems found" even if there are zero'd out files?

sorry to be throwing all of that at you like that, I hope that's useful!

@ThomasWaldmann

This comment has been minimized.

Copy link
Member

ThomasWaldmann commented Jan 29, 2019

mmap: already fixed, see PR.
if mmap crashes, you likely have 0 byte segment files.
borg does not create 0 byte files, the minimum is 17b.

the no problems found refers to new problems (in this check run).

@anarcat

This comment has been minimized.

Copy link
Contributor Author

anarcat commented Jan 29, 2019

you mean PR #4273?

couldn't those empty files be related with with the zero'd out files cleaned up earlier?

@ThomasWaldmann

This comment has been minimized.

Copy link
Member

ThomasWaldmann commented Jan 29, 2019

yes.

all-zero replacement chunks do not have zero size, but \x00 bytes as content.
length is same as the lost chunk they replace (thus >0).

@anarcat

This comment has been minimized.

Copy link
Contributor Author

anarcat commented Jan 29, 2019

alright, then i need to test the patch to see if it can repair the repository... it might take a while, my borg-fu is rusty. ;)

ThomasWaldmann added a commit that referenced this issue Jan 29, 2019

Merge pull request #4273 from ThomasWaldmann/fix-empty-segment-crash
recover_segment: handle too small segment files correctly, see #4272

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Jan 29, 2019

recover_segment: handle too small segment files correctly, see borgba…
…ckup#4272

nothing left to recover there, but at least we must not crash in mmap().
@ThomasWaldmann

This comment has been minimized.

Copy link
Member

ThomasWaldmann commented Jan 29, 2019

i've merged it now. use 1.1-maint branch, not master.

@ThomasWaldmann

This comment has been minimized.

Copy link
Member

ThomasWaldmann commented Jan 29, 2019

fixed by #4273, #4275. does not apply to 1.0-maint.

@anarcat

This comment has been minimized.

Copy link
Contributor Author

anarcat commented Jan 30, 2019

thinking back on this issue, shouldn't we add a warning about hardlinked/copied repositories in the FAQ? i'd be happy to provide a PR for that if you wish.

@ThomasWaldmann

This comment has been minimized.

Copy link
Member

ThomasWaldmann commented Jan 30, 2019

like "don't do that, use borg init"?

yes. maybe check if there is something already (like mentioning rsync).

@anarcat

This comment has been minimized.

Copy link
Contributor Author

anarcat commented Jan 30, 2019

well this FAQ explicitely says:

There is no special borg command to do the copying, just use cp or rsync if you want to do that.

so there's definitely something more special than just cp, from what i understand. :)

@ThomasWaldmann

This comment has been minimized.

Copy link
Member

ThomasWaldmann commented Jan 30, 2019

That FAQ section must mention that the only legitimate purpose of such a copy is to have a copy, but that one must never use borg to access multiple copies.

@anarcat

This comment has been minimized.

Copy link
Contributor Author

anarcat commented Jan 30, 2019

PR done, in #4285. I thought of detailing the config id hack, but figured you might not like that and stuck with your last statement instead.

ThomasWaldmann added a commit that referenced this issue Feb 1, 2019

Merge pull request #4275 from ThomasWaldmann/fix-empty-segment-crash-…
…master

recover_segment: handle too small segment files correctly, see #4272

ThomasWaldmann added a commit that referenced this issue Feb 10, 2019

add warnings on repository copies to avoid future problem like #4272 (#…
…4285)

docs: add warnings about repo copies to avoid problems like #4272

ThomasWaldmann added a commit that referenced this issue Feb 10, 2019

Merge pull request #4339 from ThomasWaldmann/anarcat-faq-1.1
add warnings on repository copies to avoid future problem, fixes #4272
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment