Reducing break even point may lower compression ratio #32

unixdj · 2015-06-08T00:41:48Z

Reducing the break even point, as done in f533992, should theoretically improve compression ratio. However, in some cases emitting a literal or two followed by a backref would make the encoding overall shorter than two shorter backrefs that the (naturally greedy) encoder finds. E.g., when compressing kennedy.xls from the Canterbury Corpus with -w 8 -l 7 with different break even points, one can observe many such occurences (- is break even point 3, + is 2):

 ## step_search, scan @ +15 (271/512), input size 256
 ss Match not found
 ## step_search, scan @ +16 (272/512), input size 256
-ss Match not found
-## step_search, scan @ +17 (273/512), input size 256
-ss Found match of 6 bytes at 8
+ss Found match of 2 bytes at 1
+## step_search, scan @ +18 (274/512), input size 256
+ss Found match of 5 bytes at 8
 ## step_search, scan @ +23 (279/512), input size 256
 ss Match not found
 ## step_search, scan @ +24 (280/512), input size 256

It would be nice to try to find a way to optimize this within the contraints of an embedded environment. If possible, without backtracking.

The text was updated successfully, but these errors were encountered:

unixdj · 2015-06-08T22:10:10Z

A quick bechmark shows that the following moification of step_search() seems to improve compression ratio:

match = find_match(here);
if (match.found && find_match(here + 1).len <= match.len) {
    emit_backref();
else
    emit_literal();

(here + 1 seems to work better than here + break_even_point / 9 for some reason.)

Time to code it without calling find_match() twice as much. It's also completely meaningless when backref encoding is not longer than literal encoding.

unixdj · 2015-06-24T22:53:09Z

I haven't had much time to play with it lately, but here's an implementation of the above:
https://github.com/atomicobject/heatshrink/compare/master...unixdj:look-forward?expand=1
It's still really slow. Really really slow. But compression ratio is higher.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing break even point may lower compression ratio #32

Reducing break even point may lower compression ratio #32

unixdj commented Jun 8, 2015

unixdj commented Jun 8, 2015

unixdj commented Jun 24, 2015

Reducing break even point may lower compression ratio #32

Reducing break even point may lower compression ratio #32

Comments

unixdj commented Jun 8, 2015

unixdj commented Jun 8, 2015

unixdj commented Jun 24, 2015