You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reducing the break even point, as done in f533992, should theoretically improve compression ratio. However, in some cases emitting a literal or two followed by a backref would make the encoding overall shorter than two shorter backrefs that the (naturally greedy) encoder finds. E.g., when compressing kennedy.xls from the Canterbury Corpus with -w 8 -l 7 with different break even points, one can observe many such occurences (- is break even point 3, + is 2):
## step_search, scan @ +15 (271/512), input size 256
ss Match not found
## step_search, scan @ +16 (272/512), input size 256
-ss Match not found
-## step_search, scan @ +17 (273/512), input size 256
-ss Found match of 6 bytes at 8
+ss Found match of 2 bytes at 1
+## step_search, scan @ +18 (274/512), input size 256
+ss Found match of 5 bytes at 8
## step_search, scan @ +23 (279/512), input size 256
ss Match not found
## step_search, scan @ +24 (280/512), input size 256
It would be nice to try to find a way to optimize this within the contraints of an embedded environment. If possible, without backtracking.
The text was updated successfully, but these errors were encountered:
A quick bechmark shows that the following moification of step_search() seems to improve compression ratio:
match = find_match(here);
if (match.found && find_match(here + 1).len <= match.len) {
emit_backref();
else
emit_literal();
(here + 1 seems to work better than here + break_even_point / 9 for some reason.)
Time to code it without calling find_match() twice as much. It's also completely meaningless when backref encoding is not longer than literal encoding.
Reducing the break even point, as done in f533992, should theoretically improve compression ratio. However, in some cases emitting a literal or two followed by a backref would make the encoding overall shorter than two shorter backrefs that the (naturally greedy) encoder finds. E.g., when compressing
kennedy.xls
from the Canterbury Corpus with -w 8 -l 7 with different break even points, one can observe many such occurences (-
is break even point 3,+
is 2):It would be nice to try to find a way to optimize this within the contraints of an embedded environment. If possible, without backtracking.
The text was updated successfully, but these errors were encountered: