Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicated size of a given set of archives? #5741

Open
rom1v opened this issue Mar 20, 2021 · 1 comment
Open

Deduplicated size of a given set of archives? #5741

rom1v opened this issue Mar 20, 2021 · 1 comment

Comments

@rom1v
Copy link
Contributor

rom1v commented Mar 20, 2021

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes, in praticular https://borgbackup.readthedocs.io/en/stable/usage/info.html

Is this a BUG / ISSUE report or a QUESTION?

A question (or feature request).

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

borg 1.1.15

Operating system (distribution) and version.

Linux (Debian sid).

Describe the problem you're observing.

Let's create two directories to backup, each containing a duplicated file (respectively 10M and 100M):

$ mkdir userA userB
$ dd if=/dev/urandom of=userA/fileA bs=1M count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.257317 s, 40.8 MB/s
$ dd if=/dev/urandom of=userB/fileB bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 2.54954 s, 41.1 MB/s
$ cp userA/fileA userA/fileA2
$ cp userB/fileB userB/fileB2
$ tree userA userB
userA
├── fileA
└── fileA2
userB
├── fileB
└── fileB2

0 directories, 4 files

Then, create an archive for userA and another for userB:

$ export BORG_REPO=borg
$ borg init -e none
$ borg create -Cnone ::userA-monday userA
$ borg create -Cnone ::userB-monday userB

The deduplicated size is (as expected) half the original size, for each user:

$ borg info ::userA-monday | grep -A2 Dedup
                       Original size      Compressed size    Deduplicated size
This archive:               20.97 MB             20.97 MB             10.49 MB
All archives:              230.69 MB            230.69 MB            115.35 MB
$ borg info ::userB-monday | grep -A2 Dedup
                       Original size      Compressed size    Deduplicated size
This archive:              209.72 MB            209.72 MB            104.86 MB
All archives:              230.69 MB            230.69 MB            115.35 MB

Now, let's create a new archive for each user (without any changes):

$ borg create -Cnone ::userA-tuesday userA
$ borg create -Cnone ::userB-tuesday userB

Now, the deduplicated size is basically 0 for all archives (if we remove exactly one archive, it won't save any space):

$ borg info ::userA-monday | grep -A2 Dedup
                       Original size      Compressed size    Deduplicated size
This archive:               20.97 MB             20.97 MB                433 B
All archives:              461.39 MB            461.39 MB            115.35 MB
$ borg info ::userA-tuesday | grep -A2 Dedup
                       Original size      Compressed size    Deduplicated size
This archive:               20.97 MB             20.97 MB                435 B
All archives:              461.39 MB            461.39 MB            115.35 MB
$ borg info ::userB-monday | grep -A2 Dedup
                       Original size      Compressed size    Deduplicated size
This archive:              209.72 MB            209.72 MB                433 B
All archives:              461.39 MB            461.39 MB            115.35 MB
$ borg info ::userB-tuesday | grep -A2 Dedup
                       Original size      Compressed size    Deduplicated size
This archive:              209.72 MB            209.72 MB                435 B
All archives:              461.39 MB            461.39 MB            115.35 MB

But in practice, we may still be interested in knowning the deduplicated size of all archives of userA. Is it possible to get this information easily?

Currently, it seems we can retrieve the global deduplicated size ("All archives"), or the deduplicated size of a single archive, but not that of a set of archives (typically for a given prefix). As a consequence, as soon as a new archive is created with few changes, the deduplicated size is meaningless in practice.

The filtering options of borg info just list individual archives matching the filter, but not the deduplicated size of the set of resulting archives:

$ borg info --prefix userA- | grep -A2 Dedup
                       Original size      Compressed size    Deduplicated size
This archive:               20.97 MB             20.97 MB                433 B
All archives:              461.39 MB            461.39 MB            115.35 MB
--
                       Original size      Compressed size    Deduplicated size
This archive:               20.97 MB             20.97 MB                435 B
All archives:              461.39 MB            461.39 MB            115.35 MB
@ThomasWaldmann
Copy link
Member

Yeah, for the archive's deduped size, borg always computes what this archive adds (size of all unique chunks [chunks ONLY used by this archive]) compared to the rest of the repo.

@ThomasWaldmann ThomasWaldmann changed the title Deduplicated size of a given prefix? Deduplicated size of a given set of archives? Mar 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants