Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New formats support #84

Open
7 of 17 tasks
aonez opened this issue Oct 17, 2017 · 28 comments
Open
7 of 17 tasks

New formats support #84

aonez opened this issue Oct 17, 2017 · 28 comments
Assignees
Milestone

Comments

@aonez
Copy link
Owner

aonez commented Oct 17, 2017

Some compression formats that could be added. Being in the list does not mean they will be added, just taked into account:

Just extraction:

@aonez aonez added this to the Future milestone Oct 17, 2017
@aonez aonez self-assigned this Oct 17, 2017
@dezzeus
Copy link

dezzeus commented Mar 12, 2018

It would be nice to also have Zstandard.

@MaxPower85
Copy link

MaxPower85 commented Mar 23, 2018

Since you have lrzip on the list, add zpaq too since lrzip can optionally use zpaq for it's 2nd stage... but you can use zpaq independently too.

You can also add rzip... lrzip is similar, but it's not the same format.

Maybe add Apple's lzfse too... but I'm not sure did Apple mean it to be used on it's own as a format for archives (or did they mean it to just be used within some other formats), since I can't find info about what kind of extension could be used for archives compressed with lzfse... although you can compress some file or a tar archive with lzfse and it seems pretty good for a format that doesn't use multithreading... and people are saying that lzfse is supposed to be energy efficient... but even the file command on Sierra doesn't seem to recognize what type of archive is that if you compress files with lzfse...

https://github.com/lzfse/lzfse

If you look at a file compressed with lzfse in some HexEditor, it says "bvx2" at the begining... and here's a clue about what that means: https://github.com/lzfse/lzfse/blob/497c5c176732769abf36ccc71a31c06bad93a84d/src/lzfse_internal.h#L276-L281

So it doesn't seem that it would be difficult to recognize lzfse compressed archives... but the question is did Apple intend for it to be used just on it's own like bzip2 or gzip.

It can also be used for compressed .dmg images when you create a compressed .dmg with hdiutil and you use -format ULFO like hdiutil create -volname vol_name -srcfolder source_folder -ov -format ULFO new_dmg_image.dmg

I'm reading that 7z beta for Windows has added support for .dmg images that use lzfse compression... but 7z for macOS or Linux doesn't seem to recognize them yet.

@yetisyny
Copy link

yetisyny commented Mar 29, 2018

The .WIM format (Windows Imaging Format) has been supported for both compression and decompression by 7-Zip for Windows for several years. Since it is part of the relatively short list of filetypes 7-Zip for Windows supports not only reading but also writing to, even in the GUI, it ought to be included in Keka for feature parity with the Windows version of 7-Zip. There is also already a library and utility for the .WIM format that is cross-platform, at https://wimlib.net/, although this library is under GNU GPL version 3 so you cannot use it legally unless you start using that license too which I doubt you would want to do.

So using the 7-Zip implementation would probably work better license-wise. And actually the 7-Zip implementation for the .WIM format is already included in the p7zip ports to UNIX-based operating systems (including macOS, Linux, etc.). So directly using p7zip is probably the easiest way to do this, in fact you already use p7zip for other things. And as far as the virtues of the .WIM format or why anyone would want to use it, it is a file-based imaging format that can archive advanced filesystem features and can be used with several different compression algorithms, and is in widespread use, especially by Microsoft which uses it for everything. Plus it is the ONLY compression format supported by the GUI and command-line versions 7-Zip for Windows which Keka does not also support, so adding it would bring Keka to feature parity with 7-Zip regarding supported formats to compress to, and of course it is also there in p7zip too. The other formats 7-Zip advertises on its website as being able to compress besides .WIM are 7Z, XZ, BZIP2, GZIP, TAR, ZIP, and you already support all of those! (I think 7-Zip also supports maybe a few more such as ISO but those are not mentioned there, anyway you already support ISO too.)

@aonez
Copy link
Owner Author

aonez commented Mar 29, 2018

this library is under GNU GPL version 3 so you cannot use it legally

If that is true, then lbzip2 also can't be bundled within Keka.

@d235j
Copy link

d235j commented May 3, 2018

Regarding bundling, please see https://www.gnu.org/licenses/gpl-faq.en.html#MereAggregation. If the proprietary components are not linking to the GPL components, then you should be OK; however, you need to provide source code to the GPL components.

@aonez
Copy link
Owner Author

aonez commented May 4, 2018

Thanks @d235j, you're right. Already started pushing the GPL code here 😊

@magitk
Copy link

magitk commented Jun 18, 2018

+1 for zpaq

@dh1337
Copy link

dh1337 commented Jan 14, 2019

any news on brotli?

@gingerbeardman
Copy link
Contributor

gingerbeardman commented Jan 14, 2019

@denishamann1337 out of interest what is your use case for Brotli? What would it achieve that other existing formats or schemes can not?

@p2k
Copy link

p2k commented Jan 14, 2019

+1 for zpaq

It is the best pack format I know combining deduplication and a strong compression that outperforms every competitor. It actually allows multiple versions of the same file(s) so it is suitable for incremental backups. Needless to say it offers industrial standard encryption.

Having a GUI for zpaq would be a bliss, but is considerably harder to do than for all the other formats since it has some unique features (like the aforementioned multi-version capability).

More information on zpaq: http://mattmahoney.net/dc/zpaq.html

@dh1337
Copy link

dh1337 commented Jan 14, 2019

@denishamann1337 out of interest what is your use case for Brotli? What would it achieve that other existing formats or schemes can not?

I read some interesting benchmarks lately (e.g.: http://www.instantshift.com/2018/03/02/gzip-vs-brotli-compression/)
On the same brotli is supported by the 7z extension (see: https://github.com/mcmilk/7-Zip-zstd) and I would love to have the same "compatbility" in Keka compared to 7z on windows.
I feel like having better performance for some usecases and being supported next to gzip in all major browsers makes it a defacto standard (see: https://caniuse.com/#search=brotli).

@gingerbeardman
Copy link
Contributor

gingerbeardman commented Jan 14, 2019

@p2k is this not a problem?

zpaq is for user-level backups. Do not use it to back up the operating system or any software that requires a password to install. zpaq saves regular files and directories, last-modified dates (to the nearest second), and (optionally) Windows attributes or Linux permissions. It does not follow or save symbolic links or junctions. It unknowingly follows hard links. It does not save owner or group IDs, ACLs, extended attributes, the registry, or special file types like devices, sockets, or named pipes.

@p2k
Copy link

p2k commented Jan 14, 2019

@gingerbeardman not for me. I don't use it to backup an operating system or things like an .app bundle on macOS (which often contain symlinks). But if I wanted to, I could always resort to piping a tar archive to zpaq.

It might be an idea to do a pre-check when archiving stuff with zpaq, though, and warn the user. That's a good point.

@aonez
Copy link
Owner Author

aonez commented Jan 14, 2019

@denishamann1337 I checked again and still Brotli does not even have a magic number. So it is still focused in data stream over the network, for use in browsers. That is why it is compared with gzip, also used in browsers.

That said, as it is fairly easy to add support for Brotli, here a test build:
https://github.com/aonez/Keka/releases/tag/dev-test-builds

@dh1337
Copy link

dh1337 commented Jan 14, 2019

@aonez I see, I assumed the magic number was existant by now. Thank for the effort for checking :)

@jamie-arcc
Copy link

jamie-arcc commented Apr 1, 2019

+1 for Zstd and zpaq!

@systemcrash
Copy link

systemcrash commented May 21, 2019

+1 for Zstd / Zstandard

dual BSD and GPLv2 licensed C library

@aonez
Copy link
Owner Author

aonez commented Jun 25, 2019

@jamie-arcc and @systemcrash check out the latest v1.2.0-dev.3494 test build, it has Zstandard support 😊

@systemcrash
Copy link

systemcrash commented Jun 25, 2019

First thoughts on
https://github.com/aonez/Keka/releases/tag/v1.2.0-dev.3417

What Zstd compression numbers correspond to the slider? (Store, Fastest, Fast...) - could this info be hinted in the GUI?

-# : # compression level (1-19, default: 3)
Store = 1
Fastest = 4
Fast = 7
Normal = 10
Slow = 14
Slowest = 19
?

@aonez
Copy link
Owner Author

aonez commented Jun 26, 2019

@systemcrash it goes 1, 2, 3, 6, 8 and 9. The method (level) slider should be enhanced to adapt #112. Most cases use 0-9, this case and also RAR (0-5) are different. Also a dynamic slider is much needed for a finer selection.

@systemcrash
Copy link

systemcrash commented Jun 26, 2019

Forget everything above 15 - tradeoffs are rarely worth it for the diminishing gains above level 15.

why make things static? Look at the library range, then draw the slider based on this. Now 6 stops on the slider,
(int)floor(15/6 * 1)
(int)floor(15/6 * 2)
(int)floor(15/6 * 3)
(int)floor(15/6 * 4)
(int)floor(15/6 * 5)
(int)floor(15/6 * 6)

You closed the source because of all the copy-cats in the App Store, yah?

@aonez
Copy link
Owner Author

aonez commented Jun 27, 2019

Made a quick test and 15-19 resulted in 13% more savings. So If the next dev build does not have the dynamic slider yet, it will use 1, 2, 3, 4, 15 and 19. So far I'm impressed with Zstd, although 7z still has better speed/ratio.

Screen Shot 2019-06-27 at 11 24 37

You closed the source because of all the copy-cats in the App Store, yah?

It was the trigger, yep.

@systemcrash
Copy link

systemcrash commented Jun 27, 2019

7z is a format - not an algorithm. Which algo was used LZMA?

@gingerbeardman
Copy link
Contributor

gingerbeardman commented Jun 28, 2019

How is the support for Zstd across platforms?

@akrabu
Copy link

akrabu commented Sep 16, 2019

Made a quick test and 15-19 resulted in 13% more savings. So If the next dev build does not have the dynamic slider yet, it will use 1, 2, 3, 4, 15 and 19. So far I'm impressed with Zstd, although 7z still has better speed/ratio.

Screen Shot 2019-06-27 at 11 24 37

For what it's worth, I ran the latest build (1.2.0.3542) at the highest compression level for Zstd on an old Outlook PST file I was intending to archive, and achieved the following:

Original: 7.34GB
Brotli: 5.84GB (Keka, slowest method)
Zstd: 5.11GB (Keka, slowest method)
7z: 4.86GB (Keka, slowest method)
ZPAQ: 4.84GB (zpaq a mailbox.pst.zpaq mailbox.pst -m5)
XZ: 4.53GB (xz -e --lzma2=preset=9,dict=1610612736,nice=273 --memory=90% mailbox.pst)
Zstd: 4.46GB (zstd -22 --ultra --long=31 --single-thread mailbox.pst)
Lrzip (LZMA): 4.44GB (lrzip --lzma -L 9 -U mailbox.pst)
Lrzip (ZPAQ): 4.34GB (lrzip -z -L 9 -U mailbox.pst)

The "long range mode" in Zstd is rather impressive. The only thing that seems to beat it is Lrzip (aka Long Range ZIP, not LZIP), which takes significantly longer (and the ZPAQ method takes the same amount of time to DEcompress as well - in this case, 10 hours).

With that in mind, could we...

  • Leave the current Zstd compression slider as-is.
  • Create a slider for the window size
  • Create a checkmark for "--ultra" that would go straight to level 22 and grey out the compression level slider (but not the window slider)

Apologies if I'm over-complicating the UI, but I thought I'd throw it out there. I just really love using Zstd's long range option for very large files with redundant data (archiving mailboxes, for instance). It works WAY faster than Lrzip, which tries to do something somewhat similar. Zstd appears to use a window of 2147483648 bytes (~2GB) to look for patterns, at least on this specific test file, which isn't quite as effective as Lrzip's "sliding window" but it sure performs faster.

Note: Zstd will throw an error during testing or extraction if you don't use a large enough window for an archive that was compressed with a larger than normal window. Example:

akrabu-macbook-air:~ akrabu$ zstd --test mailbox.pst.zst
mailbox.pst.zst : Decoding error (36) : Frame requires too much memory for decoding
mailbox.pst.zst : Window size larger than maximum : 2147483648 > 134217728
mailbox.pst.zst : Use --long=31 or --memory=2048MB

This also means that, presently, Keka will fail to extract files made with large windows:

Screen Shot 2019-09-16 at 2 34 52 PM

Ps. I also tried Brotli's --large-window option, but it was unremarkable in this case, and resulted in a size comparable to what Keka's max accomplished already.

@MaxPower85
Copy link

MaxPower85 commented Feb 21, 2021

  • zpaq (LRZIP as suggested by @MaxPower85) -> 1.2.0r 3806+ LRZIP in slow method

This needs a correction...

LRZIP can use various compression formats on parts of the archive, but it's a separate format... it can use ZPAQ, but ZPAQ is its own archiving format which can be quite useful to have on its own too, especially if files that share a lot of the same data are added to an existing archive later, since it does not compress them again and just reuses the data that was the same... the archive can also be "rolled back" to retrieve an earlier version of some file.

@gingerbeardman
Copy link
Contributor

gingerbeardman commented Aug 30, 2021

DAR (Disk ARchive)
https://dar.sourceforge.io

@akrabu
Copy link

akrabu commented Aug 30, 2021

Oh I'd love to have DAR support. It can do SO much. I just thought it might be too much to support in such a little Keka window, you know? It's SO configurable, though I guess basic support would be fine.

I use it to make 50GB archives with par2 files and burn them all to Blu-rays to back up my pictures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests