New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
7z-based (optional) replacement for patool #4041
Conversation
Grrrr, this was supposed to be in draft mode. I see no way to achieve that post-factum. |
cd028b7
to
d5c1939
Compare
Automagically fallback to patool if 7z
command is not available. I would have even may be wrapped *comparess_file
so if exception would happen, announce in the log (before reraising) to try an alternative method by changing that config setting. Otherwise it is burried and nobody would discover that there is a config setting they could tune (until they ask)
would also be nice to
|
Holy cow, this is a rabbit hole... I am so deep in |
Codecov Report
@@ Coverage Diff @@
## master #4041 +/- ##
=========================================
- Coverage 89.79% 89.7% -0.09%
=========================================
Files 273 274 +1
Lines 36577 36620 +43
=========================================
+ Hits 32843 32851 +8
- Misses 3734 3769 +35
Continue to review full report at Codecov.
|
I am reluctant to do this. We are pretty much in the same situation with patool. We can pull patool in via dependency, but we cannot pull in its dependencies (tar, zip, ...). Moreover, the reason why I am doing this to begin with, is that this code is broken, and it should be removed, or fixed. Hence, I went for a dedicated switch. |
patool has fallbacks from 7z to e.g. zip, jar, etc, so it could work on systems without 7z. You might fix some bugs (on windows) but would introduce datalad crashes on others (without 7z, which is generally not installed by default, whenever zip etc are more known and more likely to be installed). so I would strongly appreciate (could buy a bottle) if we could avoid such crashes with a cost of a single |
I am planning to merge this relatively quickly: On Monday, or whenever the test pass (whichever happens last). It is the foundation for more work on windows, which will go into different directions, hence should not be added here.
I would leave this out here, for now, and as a topic for further debate. I think we should not test this code for every PR. Maybe a cron job would suffice. |
…taladgh-4019) Keep the old implementation around (by request of @yarikoptic), but require a switch (datalad.runtime.use-patool) to turn it on.
Also enable all tests on Windows.
The reason is the same as what made us use MD5E instead MD5 as the default backend. If we purposefully remove an extension, we have mimetype inspection left is the only option to decided on some further processing (e.g. archive type). Possible, but needlessly complicated. An alternative fix would have been to not purposefully use an extension-less archive name in the associated tests.
Via @yarikoptic: > would remain a preferred way to simply reraise exception, since otherwise it would use this line as the source of exception while also mentioning original one (in python3 only)
Thanks @yarikoptic for pointing this out.
Now required on import.
Should be turned into a common helper dataladgh-4048
Required for switching between 7z and patool. Also make it reported by `wtf()`
…ves too XZ is now being tested, and 7z has built-in support.
All of the checks of the test generator fail on a system without p7zip. Before dataladgh-4041 (in particular, before beb5d04), the subset of checks that existed at that time was skipped with (see dataladgh-3176): SKIP: cmd:7z is missing. (Not) Funny enough but ATM we need p7zip installation to handle .gz files extraction 'correctly' However, with the change to the archive file name in 05e9353 (TST: Expand test coverage to additional relevant compression formats, 2020-01-17), the MissingExternalDependency exception that led to the skip for gzip formats is no longer triggered. And even if it were, the other file formats tested would fail. Mark these tests as known failures when p7zip isn't available.
Sitting on top of #4039
This is a complete draft implementation of a 7z-based file compression/decompression that should be as capable as the patool-based one.
TODO
xz-utils
on travistest_archives.py
on all platformsdatalad.runtime.use-patool
config switch7z
is not found/supported (as requested by @yarikoptic) although now we need close inspection of what was actually executed whenever something archive-related fails.wtf
reports7z
versionFurther related TODOs (for subsequent PRs) surfaced by this PR:
p7zip
as dependency/recommends to Debian package