New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In content streams, see what happens if we reserve 0 for "next in prelude". #298
Comments
Note that I expect this to improve compression of the literal string content stream and decrease the compression of other streams, so this wouldn't be a one size fits all improvement. |
I have an early prototype that applies this to all content streams without discrimination. Let's see how that goes. Protocol: $ cargo run --release --bin binjs_generate_prediction_tables -- --in tests/data/facebook/single/*.js --out /tmp/binjs/facebook/dict/
$ cargo run --bin binjs_encode -- --in tests/data/facebook/single --out /tmp/binjs/facebook=0/ advanced entropy --dictionary /tmp/binjs/facebook/dict/dict.entropy --split-streams
$ cargo run --example investigate_streams -- /tmp/binjs/facebook=0/ CSV:
Impact
So it's a small but zero-cost improvement! Verdict Proceed |
As it turns out, the most common value we fetch from a content stream is the "next in prelude". Experiments (see issue binast#298) indicate that reserving 0 for "next in prelude" improves compression of most streams. This is what this patch does.
As it turns out, the most common value we fetch from a content stream is the "next in prelude". Experiments (see issue binast#298) indicate that reserving 0 for "next in prelude" improves compression of most streams. This is what this patch does.
Consider the following sequence:
In the implementation at bf3d4e3, we fetch |
As it turns out, the most common value we fetch from a content stream is the "next in prelude". Experiments (see issue binast#298) indicate that reserving 0 for "next in prelude" improves compression of most streams. This is what this patch does.
As it turns out, the most common value we fetch from a content stream is the "next in prelude". Experiments (see issue binast#298) indicate that reserving 0 for "next in prelude" improves compression of most streams. This is what this patch does.
As it turns out, the most common value we fetch from a content stream is the "next in prelude". Experiments (see issue binast#298) indicate that reserving 0 for "next in prelude" improves compression of most streams. This is what this patch does.
As it turns out, the most common value we fetch from a content stream is the "next in prelude". Experiments (see issue binast#298) indicate that reserving 0 for "next in prelude" improves compression of most streams. This is what this patch does.
As it turns out, the most common value we fetch from a content stream is the "next in prelude". Experiments (see issue binast#298) indicate that reserving 0 for "next in prelude" improves compression of most streams. This is what this patch does.
Resolve #298 - In content streams, reserve 0 for "next in prelude"
Consider the string content stream. Assuming that most strings are in the prelude and are not repeated, the most common pattern will be "next string in prelude". If we always encode this as
0
, this might make the stream easier to compress.The text was updated successfully, but these errors were encountered: