New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

per archive total item count limit #1452

Closed
ThomasWaldmann opened this Issue Aug 8, 2016 · 13 comments

Comments

Projects
None yet
4 participants
@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Aug 8, 2016

the archive item (which has a list of "item metadata stream" chunks) is currently stored as a single object.

we limit size of repo objects to MAX_OBJ_SIZE (20MB).
note: we did only enforce the limit on get() - not on put().

each list entry takes ~40 bytes (32 sha256 id + 2 * 4 for the size / csize), msgpacked a bit less maybe.

so, that means we can reference ~500.000 item metadata stream chunks per archive.

each item metadata stream chunk is ~16kiB (due to special, more fine grained and hardcoded item metadata chunker settings).

so that means the whole item metadata stream is limited to ~8GB.

the item entry size is determined by:

  • file size (number of chunks * 40B)
  • path lengths
  • amount of ACLs, xattrs
  • a constant small amount of file metadata (timestamps, uid/gid, mode, ...)

if the medium size of an item entry is 100B (small size file, no ACLs/xattrs), that means we have a limit of 80M files/directories per archive. if the medium size of an item entry is 2kB (~100MB size files), that means we have a limit of 4M files/directories per archive.

we need a short term change to relax this limit and also a long term change to resolve it completely, this issue is about the short term change.

also, this limit should be documented.

@ThomasWaldmann ThomasWaldmann changed the title archive item size limit imposed limit on total item count in archive per archive total item count limit Aug 8, 2016

@ThomasWaldmann

This comment has been minimized.

Copy link
Member

ThomasWaldmann commented Aug 8, 2016

Note: attic used same 64KB chunk size for data and metadata. at some time, borg made the data chunker params user configurable and raised the default data chunk size to 2MB. at some time, the item metadata chunker got separate (hardcoded) chunker params and 16KB chunk size for better / more fine grained metadata dedup.

So, we could raise the chunk size for the metadata stream, e.g. to 128KB - increases limit by 8x and is easy to do.

@ThomasWaldmann

This comment has been minimized.

Copy link
Member

ThomasWaldmann commented Aug 8, 2016

related: #1453

@enkore

This comment has been minimized.

Copy link
Contributor

enkore commented Aug 9, 2016

https://mail.python.org/pipermail/borgbackup/2016q3/000357.html

Bump max_object_size and such? Or change the metadata chunker params. The latter is more compatible.

In this case we have only 4.6 million files.

@ThomasWaldmann

This comment has been minimized.

Copy link
Member

ThomasWaldmann commented Aug 9, 2016

Hmm, 8GB max item metadata stream size for 4.6M files means almost 2kB per msgpacked item.

Why are items that big in this case?

@ThomasWaldmann

This comment has been minimized.

Copy link
Member

ThomasWaldmann commented Aug 9, 2016

Oh, guess we forgot there is of course also a item chunks list in the item metadata, with also ~40B per chunk. So if we have ~50 chunks per item, that is 2kB. 50 chunks with default chunker settings are ~100MB file size.

@enkore

This comment has been minimized.

Copy link
Contributor

enkore commented Aug 9, 2016

Since lots of these should be hard links they should have less chunks. 320 GB / 4.6 million is ~70 kB, and the 4.6 million even includes hard links. Metadata isn't shared among hard links, maybe ACLs or xattrs?

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Aug 9, 2016

larger item metadata stream chunks, fixes borgbackup#1452
increasing the mask (target chunk size) from 14 (16kiB) to 17 (128kiB).
this should reduce the amount of item metadata chunks an archive has to reference to 1/8.
this does not completely fix borgbackup#1452, but at least enables a 8x larger item metadata stream.

@ThomasWaldmann ThomasWaldmann self-assigned this Aug 9, 2016

enkore added a commit that referenced this issue Aug 9, 2016

Merge pull request #1459 from ThomasWaldmann/larger-items-stream-chunks
larger item metadata stream chunks, fixes #1452
@ThomasWaldmann

This comment has been minimized.

Copy link
Member

ThomasWaldmann commented Aug 9, 2016

Fixed (raised limit) by #1459.

With that, the per-archive items metadata stream should now be able to grow up to ~64GB size.

Documented by #1470..

@enkore

This comment has been minimized.

Copy link
Contributor

enkore commented Aug 10, 2016

ML: Total of 100.5 million files. Let's see whether the new limit suffices for that :D

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Aug 12, 2016

ThomasWaldmann added a commit that referenced this issue Aug 12, 2016

@ThomasWaldmann

This comment has been minimized.

Copy link
Member

ThomasWaldmann commented Aug 12, 2016

short time fix done, long term fix todo, see #1473.

@gima

This comment has been minimized.

Copy link

gima commented Feb 18, 2017

I understood archive maximum size to be 64GiB (as per "docs -> internals -> archive limitations").

Yet archives much bigger were seen to be created in issue "#216 (backup a big amount of data)".

So which is it? Could this be clarified into the documentation as well:
What is the maximum archive / repository size?

@enkore

This comment has been minimized.

Copy link
Contributor

enkore commented Feb 18, 2017

The 64 GiB limit refers to the size of the packed + compressed metadata, not file contents.

What is the maximum archive / repository size?

There is no single answer. There are worst-cases, like many small files -- imagine a bunch of files that just contain an increasing 4 byte counter, every file are different 4 bytes -- with lots of metadata (xattrs, ACLs, long paths -- paths are practically unlimited in size), and there are best-cases like lots of duplicated contents.

enkore added a commit to enkore/borg that referenced this issue Feb 22, 2017

info: show utilization of maximum archive size
See borgbackup#1452

This is 100 % accurate.

Also increases maximum data size by ~41 bytes. Not 100 % side-effect free;
if you manage to exactly land in that area then older Borg would not read
it. OTOH it gives us a nice round number there.

enkore added a commit to enkore/borg that referenced this issue Feb 22, 2017

info: show utilization of maximum archive size
See borgbackup#1452

This is 100 % accurate.

Also increases maximum data size by ~41 bytes. Not 100 % side-effect free;
if you manage to exactly land in that area then older Borg would not read
it. OTOH it gives us a nice round number there.

enkore added a commit to enkore/borg that referenced this issue Feb 22, 2017

info: show utilization of maximum archive size
See borgbackup#1452

This is 100 % accurate.

Also increases maximum data size by ~41 bytes. Not 100 % side-effect free;
if you manage to exactly land in that area then older Borg would not read
it. OTOH it gives us a nice round number there.
@anarcat

This comment has been minimized.

Copy link
Contributor

anarcat commented Sep 16, 2017

i wonder if the FAQ needs to be updated to reflect the changes here: http://borgbackup.readthedocs.io/en/stable/faq.html#are-there-other-known-limitations

in particular, is it worth mentioning chunker params may affect the number of files that can be backed up?

@ThomasWaldmann

This comment has been minimized.

Copy link
Member

ThomasWaldmann commented Sep 16, 2017

Yes, using a smaller target chunk size leads to more chunk references per file and makes the metadata entries bigger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment