Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer size stats #280

Open
mzealey opened this issue May 3, 2019 · 4 comments
Open

Transfer size stats #280

mzealey opened this issue May 3, 2019 · 4 comments

Comments

@mzealey
Copy link

mzealey commented May 3, 2019

It would be nice (perhaps when running -v run) to include details about how many bytes were sent which is presumably roughly equivalent of how much space the snapshot will take up based on the previous one? Perhaps this could also be saved somewhere and output in the stats so you can see roughly what the deltas are between snaps?

@digint
Copy link
Owner

digint commented May 16, 2019

I see two ways of implementing this:

  1. Add a command to the pipe (between btrfs send and btrfs receive), measuring the "transferred size of btrfs-send". This might not really reflect the size used on the target, but at least gives some magnitude.
    In order not to add too much to the pipe, I tried using mbuffer -v 2 (already in the pipe when using stream_buffer) which prints a summary. Sadly this does not work as mbuffer prints the status to the controlling terminal instead of file descriptor 2, making it impossible to catch from btrbk.
    Another approach would be to add dd (or any other command capable of printing a summary) to the pipe: this would introduce some more context switches and slow down things, but should work.

  2. A better approach would be to directly scan the target "received" subvolume. I've come with a little script for this:

received-length.sh:

SUBVOL=/path/to/subvolume
CGEN=$(btrfs subvolume show "$SUBVOL" | sed -n 's/\s*Gen at creation:\s*//p')
btrfs subvolume find-new "$SUBVOL" $((CGEN+1)) \
  | cut -d' ' -f7 \
  | tr '\n' '+' \
  | sed 's/\+\+$/\n/' \
  | bc

This simply sums up the "len" field from all modified files since the creation of the subvolume. Works fine, as btrfs receive first makes a snapshot of the parent subvolume, then adds the files according to the send-stream.

Issues:

  • does not honor compressed size (?)
  • slow, needs to parse huge text (every file change gets listed)

I'm planning to implement this either with a new btrbk command, something like btrbk list backup-size.

This needs some more investigation, maybe there's a nicer way to get the "real size used on disk".

@mzealey
Copy link
Author

mzealey commented May 16, 2019 via email

@digint
Copy link
Owner

digint commented May 24, 2019

I would think option 1 would be a reasonable estimate and not need much in the way of overhead.

Yes, this is also valuable information. Especially when you want to also have an estimate of the ssh traffic generated by btrbk.

Having a command for listing (option 2) has the advantage that it is reproducible, and also works for manually generated backups.

btrfs subvolume find-new above is not very accurate, and gives only a rough estimate of what is really added on disk (it ignores deleted files, shared extents e.g. by clone sources), etc.

For more accurate results, we need to do more extensive analysis on the block level, which unfortunately is very time consuming. I did some promising tests with extents-list, and implemented a very experimental btrbk extents-diff command on the extents-diff branch for testing.

@yarikoptic
Copy link
Contributor

I was about to file a new issue begging for a new diffstat or diff -stat but it sounds that the desire is similar to the one discussed here - to see the summary of differences (not only the total size of new/modified files as diff reports I guess) between two snapshots. Even if reported sizes (deleted, added or modified) do not account for possible operations on CoW'ed files -- that already would be useful information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants