Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dry Run Statistics, please implement or document #3298

Closed
Code7R opened this issue Nov 7, 2017 · 9 comments
Closed

Dry Run Statistics, please implement or document #3298

Code7R opened this issue Nov 7, 2017 · 9 comments
Milestone

Comments

@Code7R
Copy link

Code7R commented Nov 7, 2017

Hi,

this is a follow-up to #265.

I don't agree with the with the way it was closed. And I was tricked by the same assumption as shown there, and there is STILL no documentation nor implementation of this functionality.

Seriously, I do expect that "--stats -C none ..." would give me some useful information. That is what I get from rsync when used for similar purpose ("rsync -a --stats --checksum ...") and a such limitation in borg seems to be weird, frankly speaking.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Nov 8, 2017

The major use case for borg create --dry-run was to optimize the exclude options.

Thus, it doesn't do much except recursing over the the input directories, doing pattern matching on the file names. It does that rather fast though, by getting out there quickly before the real processing of a file begins (reading, chunking, hashing, updating the repo). Adding --stats thus gives no useful information.

Adding a useful --dry-run --stats functionality is not easily possible, I think:

  • dry-run requires no changes to the repo
  • the deduplication processing (and stats) only works if the repo, chunks index, etc. is updated
  • compression processing (and stats) only works if stuff is actually compressed (taking quite some time)

So, there is only one, slightly "dirty" option coming to mind:
It could do a normal borg create run and just not commit the result.
That would give correct stats, but also potentially need a lot of resources (time, space, ...), so not sure we want that. If one wants that, ony could also run a normal borg create and delete the created archive again, which would be a lot cleaner (and is already working with current code).

So, we could just add code that outputs that --stats is not available together with --dry-run.

@ThomasWaldmann ThomasWaldmann added this to the 1.1.3 milestone Nov 8, 2017
@Code7R
Copy link
Author

Code7R commented Nov 8, 2017

The major use case for borg create --dry-run was to optimize the exclude options.

Well, this is exactly what I had in mind. Make a dry-run, estimate whether the data would fit on the target filesystem (even in worse case without compression).

Actually, I was starting with an existing borg archive (which contained an old/similar version of the same source data), I would also like to know whether it's worth to keep the old repo or restart from scratch.

If you say that making a "test commit" and dropping it later is totally harmless then I agree with your conclusion. But please document it somehow in a more appropriate way, ideally in the manpage and also in the program output.

PS: and I still don't understand what you tried to say with "The major use case was ... to optimize the exclude options". Was this a use case for regular users or for developers? From user's point of view, this feature basically does NOT work. There is no way for me to make a QUICK check to see whether my exclude file worked or not. I.e. even if I do RTFM and run something like borg create --stats --exclude-from exclude.stuff.txt -verbose $PWD::$(date +%F) /src and then it runs for hours (storage medium is slow) and only then I can see whether the pattern in exclude.stuff.txt was correct or not. It's just PITA.

@ThomasWaldmann
Copy link
Member

If you run borg create --dry-run --list --exclude ... you will get a list of all files it would back up.

By just looking through that list, you can usually:

  • spot some files you don't want to backup: downloads, ISOs here and there, unimportant VM images, cache directories, ...
  • check whether any include/exclude patterns you have used do really work (it's easy to have typos or syntax issues there)

That's what it was made for. Not for estimating dedup or compression efficiency.

@milkey-mouse
Copy link
Contributor

milkey-mouse commented Nov 10, 2017

So has this issue been reduced to:

  • telling argparse that --dry-run and --stats are mutually exclusive, and
  • documenting (in the FAQ or borg create --help) that one should use a temp archive to check dedup?

I do see the use case of running compression/dedup and throwing away the result as we go, though. As documented (and in my personal experience), borg does not handle running out of space well (if we do add a --dry-run & --stats mode, it should probably be mentioned there). Being able to make an estimate of the amount of space an archive would take (or documenting some workaround for doing such a test) would minimize the space problem.

@ThomasWaldmann
Copy link
Member

@milkey-mouse yes, that's the 2 TODOs here.

trying to cope with too little repo space just by tuning dedup or compression is a somehow futile attempt anyway. one might be able to fit it in there for the initial backup, but usually one wants to use a repo over a longer time and if it is that tight, it will run full rather soon.

borg 1.1 handles (near and really) out-of-space situations better than 1.0.

@milkey-mouse
Copy link
Contributor

trying to cope with too little repo space just by tuning dedup or compression is a somehow futile attempt anyway

So are you saying resistance is futile? ;)

@ThomasWaldmann
Copy link
Member

Yes. And we need more space. :D

@Code7R
Copy link
Author

Code7R commented Nov 10, 2017

2 todos... almost there.

I added #3306 too. Main part of the use case for this whole issue was to test exclude patterns. And do that quickly. And there is apparently no way to do that. Just imagine having a huge directory and a slow backup disks. Now it takes you a couple of hours a then you see: oh sh*t, the exclude patterns didn't work and it started adding the useless binary folders which you wanted to exclude... and what now? Let it run for another five hours? Anyhow, two hours of time and energy wasted. :-(

@ThomasWaldmann
Copy link
Member

@Code7R to see whether exclude patterns work, you can proceed as I have already told there: #3298 (comment) - and this is very quick (as far as borg is concerned as it does not process file contents). So I suggest you try that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants