Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually set "incompressible data" threshold #207

Closed
lr4d opened this issue Aug 17, 2021 · 3 comments
Closed

Manually set "incompressible data" threshold #207

lr4d opened this issue Aug 17, 2021 · 3 comments

Comments

@lr4d
Copy link

lr4d commented Aug 17, 2021

When working with tarballs of media files and PDF's, lrzip sometimes gives me a fast 15% compression by just compressing 1/9 blocks e.g. truncated output of lrzip -i -vv ...:

Block   Comp    Percent Size
1   none    100.0%  10485760 / 10485760 Offset: 4589643704  Head: 10485799
2   none    100.0%  10485760 / 10485760 Offset: 4600129477  Head: 20971572
3   lzma    77.7%   8152171 / 10485760  Offset: 4610615250  Head: 29123756
4   none    100.0%  10485760 / 10485760 Offset: 4618767434  Head: 39609529
5   none    100.0%  10485760 / 10485760 Offset: 4629253207  Head: 50095302
6   none    100.0%  10485760 / 10485760 Offset: 4639738980  Head: 60581075
7   none    100.0%  10485760 / 10485760 Offset: 4650224753  Head: 71066848
8   none    100.0%  10485760 / 10485760 Offset: 4660710526  Head: 81567392
9   none    100.0%  4860321 / 4860321   Offset: 4671211070  Head: 0

Other times, it takes a lot longer and struggles a lot more to compress:

Block   Comp    Percent Size
1   none    100.0%  49603243 / 49603243 Offset: 5173302056  Head: 49603282
2   lzma    99.2%   49182901 / 49603243 Offset: 5222905312  Head: 98786196
3   lzma    99.3%   49236783 / 49603243 Offset: 5272088226  Head: 148022992
4   lzma    99.5%   49339373 / 49603243 Offset: 5321325022  Head: 197362378
5   lzma    99.0%   49131243 / 49603243 Offset: 5370664408  Head: 246493634
6   lzma    99.3%   49263853 / 49603243 Offset: 5419795664  Head: 295757500
7   lzma    98.5%   48863058 / 49603243 Offset: 5469059530  Head: 344620571
8   lzma    98.7%   48981972 / 49603243 Offset: 5517922601  Head: 393602556
9   lzma    99.4%   49286839 / 49603243 Offset: 5566904586  Head: 442889408
10  lzma    99.4%   49310169 / 49603243 Offset: 5616191438  Head: 492199590
11  lzma    99.4%   49295202 / 49603243 Offset: 5665501620  Head: 541494805
12  lzma    99.2%   49216341 / 49603243 Offset: 5714796835  Head: 590711159
13  lzma    99.4%   49310508 / 49603243 Offset: 5764013189  Head: 640021680
14  lzma    99.4%   49310012 / 49603243 Offset: 5813323710  Head: 689331705
15  lzma    99.3%   49260783 / 49603243 Offset: 5862633735  Head: 738592501
16  lzma    99.4%   49304015 / 49603243 Offset: 5911894531  Head: 787896529
17  lzma    99.3%   49272993 / 49603243 Offset: 5961198559  Head: 837169535
18  lzma    99.4%   49321970 / 49603243 Offset: 6010471565  Head: 886491518
19  lzma    99.4%   49315822 / 49603243 Offset: 6059793548  Head: 935807353
20  lzma    99.4%   49288914 / 49603243 Offset: 6109109383  Head: 985273666
21  lzma    90.2%   18692941 / 20716811 Offset: 6158575696  Head: 0

In the latter case, I'd much rather prefer lrzip to not compress any blocks if the expected compression ratio for lz4 is <= 95 %, so as to get faster speed.

Using lrzip --level=1 doesn't seem to make any difference in this regard.

Is it feasible to make the threshold which lrzip uses for determining when the data is incompressible a cli parameter so I can set it manually?

@pete4abw
Copy link
Contributor

The -T | --threshold option was designed to take an optional argument to limit Threshold testing to N%. Somehow that feature did not make it to lrzip. -T alone (an argument is not tested for) will disable threshold testing totally. Not limit it. This feature is implemented in lrzip-next. -T95 for example, would test the lz4 compression against 95% and if above, would not compress that block. Good practice to use it in general. The time saved is a better value than the compression benefit. See This wiki article on it

@ckolivas
Copy link
Owner

I removed the optional percentage a while ago. You're the first person to request it be implemented. The way it works now however, it aborts way too early to have any idea what the percentage will be by the end of the block; its point is to avoid compressing incompressible blocks entirely and your request is a pretty unique use case. It could be extended to do what you ask but I'm not currently implementing new features.

@ckolivas
Copy link
Owner

I've decided this isn't worth implementing, apologies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants