Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lrzip-bz2 performs worse than just bz2 for large json file #51

Closed
phiresky opened this issue May 14, 2016 · 3 comments
Closed

lrzip-bz2 performs worse than just bz2 for large json file #51

phiresky opened this issue May 14, 2016 · 3 comments

Comments

@phiresky
Copy link

See here:

 880M 14. Mai 21:25 files.tar
  11M 14. Mai 21:25 files.tar.bz2
  21M 14. Mai 21:31 files.tar.lrz-bz2
  22M 14. Mai 21:33 files.tar.lrz-lzma
  51M 14. Mai 21:29 files.tar.lrz-nocompress
  21M 14. Mai 21:29 files.tar.lrz-nocompress.bz2
  19M 14. Mai 21:35 files.tar.lrz-zpaq

Using just bzip2, the 880MB file compresses to 11MB. Using lrzip -b, it is 21MB.
How can this happen / should this happen?

Here is the file (.zip so github accepts the upload):
files.tar.zip

@pete4abw
Copy link
Contributor

BZIP operates much better on small files. Lrzip on large ones. There are
1000 or so files and all are small. The long range compression routine
does not work as well on smaller files. Even though you tar the files
together, the bzip compression window is much smaller and operates more
efficiently with this type.

Interestingly, if you tar up the Kernel source tree, you get much better
results.

On 05/14/2016 02:41 PM, phiresky wrote:

See here:

|880M 14. Mai 21:25 files.tar 11M 14. Mai 21:25 files.tar.bz2 21M 14.
Mai 21:31 files.tar.lrz-bz2 22M 14. Mai 21:33 files.tar.lrz-lzma 51M
14. Mai 21:29 files.tar.lrz-nocompress 21M 14. Mai 21:29
files.tar.lrz-nocompress.bz2 19M 14. Mai 21:35 files.tar.lrz-zpaq |

Using just bzip2, the 880MB file compresses to 11MB. Using |lrzip -b|,
it is 21MB.
How can this happen / should this happen?

Here is the file (.zip so github accepts the upload):
files.tar.zip https://github.com/ckolivas/lrzip/files/264669/files.zip


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#51

Peter Hyman
http://www.linkedin.com/in/peterh
C: +1(609)598-0262
C: +1(612)440-7383 (Pete)

@phiresky
Copy link
Author

The json files are very similar, they can basically be seen as a single file (same file structure, same words, just different order). The result stays the same when instead of tarring them up using cat *.json.

@ckolivas
Copy link
Owner

"Should this happen?" - on rare occasions the underlying compression does a better job on the file due to the tightly packed redundancy, and separating out the dictionary from the matches is counterproductive. The only thing lrzip offers for this particular archive is much faster compression and decompression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants