New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
per archive total item count limit #1452
Comments
Note: attic used same 64KB chunk size for data and metadata. at some time, borg made the data chunker params user configurable and raised the default data chunk size to 2MB. at some time, the item metadata chunker got separate (hardcoded) chunker params and 16KB chunk size for better / more fine grained metadata dedup. So, we could raise the chunk size for the metadata stream, e.g. to 128KB - increases limit by 8x and is easy to do. |
related: #1453 |
Bump max_object_size and such? Or change the metadata chunker params. The latter is more compatible. In this case we have only 4.6 million files. |
Hmm, 8GB max item metadata stream size for 4.6M files means almost 2kB per msgpacked item. Why are items that big in this case? |
Oh, guess we forgot there is of course also a item chunks list in the item metadata, with also ~40B per chunk. So if we have ~50 chunks per item, that is 2kB. 50 chunks with default chunker settings are ~100MB file size. |
Since lots of these should be hard links they should have less chunks. 320 GB / 4.6 million is ~70 kB, and the 4.6 million even includes hard links. Metadata isn't shared among hard links, maybe ACLs or xattrs? |
increasing the mask (target chunk size) from 14 (16kiB) to 17 (128kiB). this should reduce the amount of item metadata chunks an archive has to reference to 1/8. this does not completely fix borgbackup#1452, but at least enables a 8x larger item metadata stream.
larger item metadata stream chunks, fixes #1452
ML: Total of 100.5 million files. Let's see whether the new limit suffices for that :D |
document archive limitation, #1452
short time fix done, long term fix todo, see #1473. |
I understood archive maximum size to be 64GiB (as per "docs -> internals -> archive limitations"). Yet archives much bigger were seen to be created in issue "#216 (backup a big amount of data)". So which is it? Could this be clarified into the documentation as well: |
The 64 GiB limit refers to the size of the packed + compressed metadata, not file contents.
There is no single answer. There are worst-cases, like many small files -- imagine a bunch of files that just contain an increasing 4 byte counter, every file are different 4 bytes -- with lots of metadata (xattrs, ACLs, long paths -- paths are practically unlimited in size), and there are best-cases like lots of duplicated contents. |
See borgbackup#1452 This is 100 % accurate. Also increases maximum data size by ~41 bytes. Not 100 % side-effect free; if you manage to exactly land in that area then older Borg would not read it. OTOH it gives us a nice round number there.
See borgbackup#1452 This is 100 % accurate. Also increases maximum data size by ~41 bytes. Not 100 % side-effect free; if you manage to exactly land in that area then older Borg would not read it. OTOH it gives us a nice round number there.
See borgbackup#1452 This is 100 % accurate. Also increases maximum data size by ~41 bytes. Not 100 % side-effect free; if you manage to exactly land in that area then older Borg would not read it. OTOH it gives us a nice round number there.
i wonder if the FAQ needs to be updated to reflect the changes here: http://borgbackup.readthedocs.io/en/stable/faq.html#are-there-other-known-limitations in particular, is it worth mentioning chunker params may affect the number of files that can be backed up? |
Yes, using a smaller target chunk size leads to more chunk references per file and makes the metadata entries bigger. |
the archive item (which has a list of "item metadata stream" chunks) is currently stored as a single object.
we limit size of repo objects to MAX_OBJ_SIZE (20MB).
note: we did only enforce the limit on get() - not on put().
each list entry takes ~40 bytes (32 sha256 id + 2 * 4 for the size / csize), msgpacked a bit less maybe.
so, that means we can reference ~500.000 item metadata stream chunks per archive.
each item metadata stream chunk is ~16kiB (due to special, more fine grained and hardcoded item metadata chunker settings).
so that means the whole item metadata stream is limited to ~8GB.
the item entry size is determined by:
if the medium size of an item entry is 100B (small size file, no ACLs/xattrs), that means we have a limit of 80M files/directories per archive. if the medium size of an item entry is 2kB (~100MB size files), that means we have a limit of 4M files/directories per archive.
we need a short term change to relax this limit and also a long term change to resolve it completely, this issue is about the short term change.
also, this limit should be documented.
The text was updated successfully, but these errors were encountered: