Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--resume-extract #327

Merged
merged 4 commits into from May 29, 2019
Merged

--resume-extract #327

merged 4 commits into from May 29, 2019

Conversation

gperciva
Copy link
Member

No description provided.

@gperciva
Copy link
Member Author

Example, with a weird example of an archive containing two copies of the same file:

$ ./tarsnap -tf random-100k-random-100k
random-100k
random-100k
$ rm random-100k
$ ./tarsnap --debug-network-stats -xf random-100k-random-100k
Connection ended with in / out:	204029	1185
$ rm random-100k
$ ./tarsnap --debug-network-stats -xf random-100k-random-100k --resume-extract
Connection ended with in / out:	103500	1039

@cperciva
Copy link
Member

Do we want to support resume-extract as a configuration file setting? It seems like it could be dangerous as a "set and forget" option -- unless there's a compelling reason it seems like it might be better to leave it as command-line-only until someone comes forward saying that they need to put it into a configuration file.

And if we're not supporting it as a configuration file option, we don't need --no-resume-extract.

@gperciva
Copy link
Member Author

I wondered about that, but then I asked myself the flipped version: if I think it's a risky thing to have as a "set and forget" option, then clearly I don't have confidence in it doing the right thing.

I'm happy to have it as command-line-only if you prefer that, though.

@cperciva
Copy link
Member

While I agree that --resume-extract should normally be safe, I can imagine situations where it could cause problems; for example, if someone archives (using @- functionality) synthetic tarballs created by tools (e.g., database dumps) which don't fill in timestamps. If someone deliberately uses --resume-extract when extracting such broken archives they deserve what they get; but I could imagine someone setting that in a configuration file and forgetting about it.

So for the sake of minimizing user error I think let's keep it as command-line-only... at least until someone steps forward and says "I really wish I could put this into my configuration file because ".

@gperciva
Copy link
Member Author

Ok, removed resume-extract from a config file option. Rebased history.

tar/bsdtar.c Outdated
@@ -1779,6 +1782,11 @@ dooption(struct bsdtar *bsdtar, const char * conf_opt,

bsdtar->option_quiet = 1;
bsdtar->option_quiet_set = 1;
} else if (strcmp(conf_opt, "resume-extract") == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code here is for processing configuration file options. Why do we need it?

@@ -624,6 +624,11 @@ Ignore any
\fBquiet\fP
option specified in a configuration file.
.TP
\fB\--no-resume-extract\fP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed any more?

@@ -561,6 +561,10 @@ option specified in a configuration file.
Ignore any
.Cm quiet
option specified in a configuration file.
.It Fl -no-resume-extract
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed any more?

@cperciva
Copy link
Member

Looks like there's still some --no-resume-extract scattered around.

@gperciva
Copy link
Member Author

Sorry, was distracted by a viola gig.

  • git grep no-resume now returns nothing
  • I added (x mode only) to the man-page
  • fixed a minor typo in NEWS ("an --dump-config" -> "a --dump-config")

tar/read.c Outdated
/* Don't extract if the file already matches it. */
if (bsdtar->option_resume_extract) {
if ((r = check_skip_file(
archive_entry_pathname(entry), st)) == -1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How carefully did you check that archive_entry_pathname(entry) is the correct path to the file we would create? I think this is correct but with options like -C and -s and the handling of absolute paths and paths containing .. I want to be sure that we have this right and aren't accidentally comparing metadata against the wrong place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woah, -s looks wacky! Didn't occur to me to check that one, I'll do it soon.

@gperciva
Copy link
Member Author

gperciva commented Nov 13, 2018

I think this is working. I made sure that check_skip_file() came after edit_pathname(). See below for the tests; I think I covered all the weird cases.

Basic usage:

$ ./tarsnap -tf random-100k-random-100k
random-100k
random-100k
$ rm -f random-100k
$ ./tarsnap --debug-network-stats -xf random-100k-random-100k
Connection ended with in / out:	204029	1185
$ rm -f random-100k
$ ./tarsnap --debug-network-stats -xf random-100k-random-100k --resume-extract
Connection ended with in / out:	103500	1039
$ ./tarsnap --debug-network-stats -xf random-100k-random-100k --resume-extract
Connection ended with in / out:	2483	747
$

-C out/

$ mkdir out/
$ ./tarsnap --debug-network-stats -xf random-100k-random-100k --resume-extract -C out/
Connection ended with in / out:	103500	1039
$ 

-s

$ rm -f foo-100k
$ ./tarsnap --debug-network-stats -xf random-100k-random-100k --resume-extract -s "/random/foo/g"
Connection ended with in / out:	103500	1039
$ ./tarsnap --debug-network-stats -xf random-100k-random-100k --resume-extract -s "/random/foo/g"
Connection ended with in / out:	2483	747
$ 

-C out/ -s

$ mkdir out/
$ ./tarsnap --debug-network-stats -xf random-100k-random-100k --resume-extract -s "/random/foo/g" -C out
Connection ended with in / out:	103500	1039
$ ls out/
foo-100k
$

absolute paths combined with -C

$ ./tarsnap -tf random-100k-random-100k-absolute
/home/td/src/tarsnap/build/random-100k
/home/td/src/tarsnap/build/random-100k
$ rm -f random-100k
$ ./tarsnap --debug-network-stats -xf random-100k-random-100k-absolute --resume-extract -P -C out
Connection ended with in / out:	103598	1039
$ ./tarsnap --debug-network-stats -xf random-100k-random-100k-absolute --resume-extract -P -C out
Connection ended with in / out:	2581	747
td@mac: ~/src/tarsnap/build (resume-extract)
$ ls out/
home
td@mac: ~/src/tarsnap/build (resume-extract)
$ 

relative dirs

$ cd out/
$ ls
$ ../tarsnap --debug-network-stats -x -f random-100k-random-100k-updir --resume-extract -P
Connection ended with in / out:	103524	1039
$ ../tarsnap --debug-network-stats -x -f random-100k-random-100k-updir --resume-extract -P
Connection ended with in / out:	2507	747
$ ls
$ 

(x mode only)
Don't extract files whose filesize and mtime matches existing files on the
disk. Primarily used to resume an archive extraction which was interrupted.
.TP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth adding something along the lines of "This differs from -k in that --resume-extract will overwrite a file if the size or modification time do not match, as can happen if tarsnap is killed partway through extracting a file."

@@ -7,6 +7,8 @@
non-blank lines read from config files.
- tarsnap now gives an error if there are unused command-line arguments.
(i.e. "tarsnap -d -f a1 a2", where "a2" is unused.)
- tarsnap now accepts a --resume-extract option to skip extracting files whose
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to have a merge conflict fixed.

tar/read.c Outdated
goto err0;
}

/* Compare file size and mtime. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here should point out that we're comparing the seconds of the mtime, not the complete timespec; some filesystems don't have sub-second timestamp precision, so comparing timespecs would produce a lot of false negatives.

@cperciva
Copy link
Member

Ok, looks good. I've confirmed by inspection of the libarchive source code that archive_entry_pathname is definitely the path to where we're going to write on disk.

@gperciva
Copy link
Member Author

gperciva commented Jan 1, 2019

Rebased to fix merge conflicts, but I left the two recent fixes to your comments are separate commits for easy examination.

Fun fact: I misread the POSIX and linux man-pages about stat(2). My brain just couldn't process that it was actually st_mtim instead of st_mtime. The latter is #defined as st_mtim.tv_sec for backwards compatibility on my desktop, so I didn't realize the discrepancy.

@gperciva
Copy link
Member Author

Updated to current master. I checked that all the tests in #327 (comment) work.

@cperciva cperciva merged commit 8e8e444 into master May 29, 2019
@gperciva gperciva deleted the resume-extract branch May 29, 2019 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants