Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add options to extract only some files/components/... #12

Open
dscharrer opened this issue Dec 10, 2012 · 15 comments
Open

Add options to extract only some files/components/... #12

dscharrer opened this issue Dec 10, 2012 · 15 comments

Comments

@dscharrer
Copy link
Owner

Currently we only have --language (which should be extended to accept a list!).

Users should be able to limit th files to be extracted should by

  1. Filenames(+path) (--include/--exclude chains like with e.g. rsync)
  2. Raw, unprocessed filenames as they appear in innoextract --list --dump output
  3. Components defined in the setup file
  4. Languages defined in the setup file
  5. Tasks specified in the setup file
  6. Files can have "checks" - scripts that determine if the file gets extracted. For now we completely ignore them.

Additionally:

  • There should be some form of pattern matching for all of this (simple glob expressions might be enough)
  • All of this should also be possible to do with an include/exclude file
  • The order of the include statements will have to be important - boost::program_options doesn't allow this, but I want to replace that anyway
  • How should users specify to extract only a single component/file/language no matter what others there are --exclude * --include a (or in rsync order: --include a --exclude *) seems a bit akward.
  • How should includes/excludes for e.g. files interact with e.g. components?
    If component c has file f, should --exclude-component c --include f extract the file? Should the order matter?
  • Most files won't specify any language/component/task -- should these be excluded with --exclude-component *? Currently --language doesn't affect them.

This is all probably over-engineering the problem but I don't want whatever is implemented now to at least to allow adding all these points in the future without breaking the command-line interface.

@ssokolow
Copy link

At the very least, it'd be nice to be able to do --include app/* for excluding what's obviously not needed when using a GOG installer to feed an engine re-implementation like GemRB.

@dscharrer
Copy link
Owner Author

Depends on issue #25 for proper --include/--exclude support (keeping order between different arguments without hacks).

@a-detiste
Copy link
Contributor

Instead of silencing all warnings in game-data-packager when one file of the same name exists in several directories with a slew of distinctive_name: false; I'll have a look on how to fix that here.

This would also give a ~5x speedup for bundles like Space Quest 1,2,3 that got now extracted 3 times.

http://anonscm.debian.org/cgit/pkg-games/game-data-packager.git/tree/data/spacequest1.yaml

@smcv

a-detiste added a commit to a-detiste/innoextract that referenced this issue May 8, 2015
@a-detiste
Copy link
Contributor

I have now my own ultra-fast innoextract :-)

a-detiste@8cd04fb

Would you like a P.R. that only adds the bare minimum I need, as long as it
"allow adding all these points in the future without breaking the command-line interface." .

That means accepting only one --include argument for now, no exclude argument, no globbing,
but one can run innoextract several times if needed.

pi@raspberrypi ~/innoextract/build $ time ./innoextract -s -d /tmp/tmp/ /home/pi/setup_zork_anthology.exe

real    0m3.751s
user    0m3.570s
sys     0m0.170s
pi@raspberrypi ~/innoextract/build $ time ./innoextract --include Zork/DATA -d /tmp/tmp/ /home/pi/setup_zork_anthology.exe                                                                                                                            
Extracting "Zork Anthology" - setup data version 5.2.3
 - "app/Zork/DATA/ZORK1.DAT" (90 KiB)
Done.

real    0m0.206s
user    0m0.190s
sys     0m0.010s

@dscharrer
Copy link
Owner Author

Sure, a patch for just --include would be appreciated. However, here are a few minor nitpicks about your implementation:

  • You match the "pattern" against any path substring. I would prefer to only match full path components (or a series of path components) and then later add support for explicit wildcards. Basically "a/b" should match "a/b", "a/b/…", "…/a/b" and "…/a/b/…" but not "a/bc" or "ca/b".
  • I'd also like patterns starting with a slash to (eventually) only match at the beginning of the path (excluding the slash of course since the internal paths. You can send a patch without that, but I want to get that implemented (or disallow patterns starting with a slash) before making a release to prevent working scripts from unexpectedly failing in a future update.
  • Simply skipping the files means the progress bar total still includes them. Only a minor problem, can be fixed later.
  • Chunks are still opened, even if no file in them is read. This is slightly wasteful, but of course less so then extracting everything.

a-detiste added a commit to a-detiste/innoextract that referenced this issue May 9, 2015
@a-detiste
Copy link
Contributor

Ok, here's a much better patch.

a-detiste@136ee52?diff=unified

@a-detiste
Copy link
Contributor

This fix "match full path components" behaviour for "patterns starting with a slash".

work with WIN32 too, move one time checks out of loop

a-detiste@5ae4521
inno1

@dscharrer
Copy link
Owner Author

A coupe of points:

  1. The second commit breaks inc_root: You compare against inc_string with offset 1, but then use the whole inc_string.size()
  2. A pattern with a trailing slash matches paths that don't end in a trailing slash. I'm not sure if that is desirable or not.
  3. Can you change the if/else style to match the rest of the code?
    • Both preceding and following braces should always be on the same line as else
    • Use braces even for single-line ifs and elses

Sorry for not getting back to you sooner.

@a-detiste
Copy link
Contributor

  1. ok , thanks for reviewing this !
  2. I changed it so that supplied path ending with a slash never match anything,
    it may feel less user-friendly, but it's more predictable. (and "users" are mostly scripts anyway)
  3. I can smash the commit together and create a P.R. if you want.
    I'd rather not keep forever my messy history in the main three for no purpose.

From this experience, implenting support for several include & exclude doesn't seem so hard:

The include & exclude lists would be vectors of "typedef std::pair<bool, std::string> filter;"
(bool = inc_root , string = inc_string)
that get populated in main.cpp.

Then extract.cpp read those in two BOOST_FOREACH()
that sets a filtered = true flag when there is a match. Then there is a if (filtered) continue;

They wouldn't work the rsync way like stated up:

  • include = filter, only keep files that match one of these strings
  • exclude = remove remaining files matching any of the strings

For example:

  • -- include "app/manual.pdf"
  • --include "app/Space Quest 6"
  • --exclude "app/Space Quest 6/DRIVERS"
  • --exclude "app/Space Quest 6/VFW"
    would narrow down the extraction for this game

@dscharrer
Copy link
Owner Author

I can smash the commit together and create a P.R. if you want.

Yes, that would be great.

From this experience, implenting support for several include & exclude doesn't seem so hard:

The hard part would be that boost::program_options makes it hard to keep the relative order between two different option types. Multiple --includes should be easy though. Adding the two options while ignoring their relative order would mean that respecting the order in the future could break scripts. And I do want them to respect the order, especially once there are wildcards.

@ssokolow
Copy link

Have you considered using "most specific match wins" semantics like in CSS to control the interaction between include and exclude rules?

(eg. if you include "/", exclude "/app", and include "/app/manual.pdf", then the most intuitive interpretation is unambiguous regardless of order.)

a-detiste added a commit to a-detiste/innoextract that referenced this issue May 13, 2015
@dscharrer
Copy link
Owner Author

That requires a notion of "more specific" though. At lest with wildcards that won't be so intuitive anymore. And even CSS uses the order of definition when the specificity is the same.

More importantly though, "most specific match wins" can be easily implemented on top of "first match wins" by ordering the arguments by how specific they are. The other way around is not so easy, if at all possible.

@ssokolow
Copy link

Good point. I wrote that at the end of a jet-lagged day and I didn't think about what would happen when "most path components, then most characters" specificity calculation runs into two rules with the same number of path components and same length.

(eg. Specifying the same path both as an include and an exclude may be silly, but it shouldn't result in undefined behaviour)

dscharrer added a commit that referenced this issue Sep 22, 2015
dscharrer added a commit that referenced this issue Sep 22, 2015
@dscharrer dscharrer modified the milestones: 1.6, 1.5 Sep 24, 2015
@dscharrer dscharrer modified the milestones: 1.7, 1.6 Feb 23, 2016
@dscharrer dscharrer modified the milestones: 1.7, 1.8 Jun 9, 2018
@dscharrer dscharrer modified the milestones: 1.8, 1.9 Sep 9, 2018
@bam80
Copy link

bam80 commented Apr 28, 2019

This is highly welcomed feature:
https://framagit.org/vv221/play.it/issues/139#note_404696

@vv221
Copy link

vv221 commented Apr 28, 2019

@bam80
There is already --include (-I) available in current stable version of innoextract, but I think it can only be called once.
Being able to specify multiple paths and support for globbing would indeed be very nice improvements to have.

@dscharrer dscharrer removed this from the 1.9 milestone Aug 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants