Getting rid of internally soft-linked hard-links #855

enkore · 2016-04-07T15:52:23Z

Today in the Borg "Getting rid of…" show: soft-linked hard-links.

This distinction between "regular files" and "regular files with nlink>1" has been a bit of a troublemaker in various places, because it makes it hard to work on subsets of all items. The original solution with the 'source' attribute is nice, because it avoids storing the chunk id list twice, and because it makes it straightforward to link all links together when extracting (the full archive, not a subset).

When working with subsets this solution fails and we kludged stuff together to make it work, but it ain't nice.

Ideas

Let it be
Just put the "chunks" in every file, ignore 'source' except when extracting (to link 'em together)
- the chunk id list will probably need more space, but for really large files the deduplication of the item metadata should kick in nicely.
- 1.0 still does the right thing, but if we drop the compat code we have now the troubles for old archives are still there. We could shift blame to "recreate" and reduce the compat code to that single occurrence.
Could drop 'source' entirely (=> 1.0 would extract each link independently), index of hard links outside of the 'items' stream
?

ThomasWaldmann · 2016-04-07T16:00:25Z

Guess we need to keep the compat code until we do another major release that requires running an upgrade procedure anyway. So, tag this "2.0"?

enkore · 2016-04-17T18:33:51Z

Ape had the good idea of doing this via recreate. I.e. removing all the hardlink_master cruft, implementing a clean solution, and only keeping it in recreate (where it's one of the simpler variants, especially compared with diff!). I'd say this would become a feasible option for 1.2 or 1.3 if recreate has proven reliable in 1.1.

ThomasWaldmann · 2016-08-20T18:23:28Z

See also #1473 - one reason why the problem there occurs relatively early is because the chunk list of a file is contained in the ITEM, making the item big (the other reason is having extremely many items).

If we would move the chunklist into an INODE (and reference the inode objects from the item), #1473 would be very much relaxed as the item metadata stream would shrink a lot.

Also, for this ticket here, we could reference same INODE objects from multiple ITEMs to model hardlinks in a natural way.

Note that INODE can not just be 1 storage object (MAX_OBJECT_SIZE = 20MiB) as that only stores ~500.000 object references, with ~2MiB per file content chunk this would mean a file size limit of ~1TB, which is too low.

So, we could have a primary (small) list of objects IDs in the ITEM and each of these object contains a secondary list of references to content objects, so we get n * 1TB.

An optimization could be done to avoid the indirection for small files: just have the primary list directly point to content objects (as it is now) - this could also be the "compatibility mode".

Note: I talked about INODE above. In UNIX filesystems usually also the metadata of the file (except the name) is stored in the INODE. We could discuss doing that or we could just implement the block list part of an INODE.

ThomasWaldmann · 2022-05-03T21:51:36Z

Closing in favour of #2325.

enkore added later breaking labels Apr 7, 2016

ThomasWaldmann mentioned this issue Mar 22, 2017

hardlink slave items, add chunk list? #2325

Closed

ThomasWaldmann added this to the 2.0 - future goals milestone Mar 29, 2017

enkore added this to Doing things differently in breaking Jul 20, 2017

ThomasWaldmann mentioned this issue Apr 15, 2022

borg2: it's coming! #6602

Open

ThomasWaldmann moved this from misc to archive / item in breaking Apr 16, 2022

ThomasWaldmann closed this as completed May 3, 2022

ThomasWaldmann removed this from archive / item in breaking May 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting rid of internally soft-linked hard-links #855

Getting rid of internally soft-linked hard-links #855

enkore commented Apr 7, 2016

ThomasWaldmann commented Apr 7, 2016

enkore commented Apr 17, 2016

ThomasWaldmann commented Aug 20, 2016 •

edited

ThomasWaldmann commented May 3, 2022

Getting rid of internally soft-linked hard-links #855

Getting rid of internally soft-linked hard-links #855

Comments

enkore commented Apr 7, 2016

ThomasWaldmann commented Apr 7, 2016

enkore commented Apr 17, 2016

ThomasWaldmann commented Aug 20, 2016 • edited

ThomasWaldmann commented May 3, 2022

ThomasWaldmann commented Aug 20, 2016 •

edited