Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In content streams, see what happens if we reserve 0 for "next in prelude". #298

Closed
Yoric opened this issue Feb 12, 2019 · 3 comments
Closed

Comments

@Yoric
Copy link
Collaborator

Yoric commented Feb 12, 2019

Consider the string content stream. Assuming that most strings are in the prelude and are not repeated, the most common pattern will be "next string in prelude". If we always encode this as 0, this might make the stream easier to compress.

@Yoric
Copy link
Collaborator Author

Yoric commented Feb 13, 2019

Note that I expect this to improve compression of the literal string content stream and decrease the compression of other streams, so this wouldn't be a one size fits all improvement.

@Yoric Yoric self-assigned this Feb 13, 2019
@Yoric
Copy link
Collaborator Author

Yoric commented Feb 13, 2019

I have an early prototype that applies this to all content streams without discrimination.

Let's see how that goes.

Protocol:

$ cargo run --release --bin binjs_generate_prediction_tables -- --in tests/data/facebook/single/*.js --out /tmp/binjs/facebook/dict/
$ cargo run --bin binjs_encode -- --in tests/data/facebook/single --out /tmp/binjs/facebook=0/ advanced entropy --dictionary  /tmp/binjs/facebook/dict/dict.entropy --split-streams
$ cargo run --example investigate_streams -- /tmp/binjs/facebook=0/

CSV:

File,                                     raw (b),  brotli (b),
js,                                      43134534,     8016723,
binjs,                                    7283698,     7242796,
floats.content,                            432445,      148570,
identifier_names.content,                 4907931,     1210930,
list_lengths.content,                     1985489,      543775,
property_keys.content,                    2716332,     1324653,
string_literals.content,                  2976279,     1425681,
unsigned_longs.content,                    448663,       90770,
main.entropy,                             1610749,     1600642,
floats.prelude,                             13817,       10849,
identifier_names.prelude,                    2907,        1878,
identifier_names_len.prelude,                1012,         251,
list_lengths.prelude,                         222,         498,
property_keys.prelude,                     930522,      255228,
property_keys_len.prelude,                  47918,       33494,
string_literals.prelude,                  1354161,      456487,
string_literals_len.prelude,                73044,       55172,
unsigned_longs.prelude,                         6,          14,

Impact

Stream after/before (compressed) wins (%) wins (brotli-compressed b)
floats.content 0.960722692118673 3.93% 6074
identifier_names.content 0.992098799417323 0.79% 9644
list_lengths.content 1.00587682529347 -0.59% -3177
property_keys.content 0.98236398666898 1.76% 23781
string_literals.content 0.954174644999066 4.58% 68470
unsigned_longs.content 0.997713732990393 0.23% 208
       
total 0.995568680503761 0.44% 105000

So it's a small but zero-cost improvement!

Verdict Proceed

@Yoric Yoric moved this from To do to In progress in Achieve size parity with Brotli + minify Feb 13, 2019
Yoric added a commit to Yoric/binjs-ref that referenced this issue Feb 13, 2019
As it turns out, the most common value we fetch from a content stream is the "next in prelude". Experiments (see issue binast#298) indicate that reserving 0 for "next in prelude" improves compression of most streams. This is what this
patch does.
Yoric added a commit to Yoric/binjs-ref that referenced this issue Feb 13, 2019
As it turns out, the most common value we fetch from a content stream is the "next in prelude". Experiments (see issue binast#298) indicate that reserving 0 for "next in prelude" improves compression of most streams. This is what this
patch does.
@Yoric
Copy link
Collaborator Author

Yoric commented Feb 13, 2019

Consider the following sequence:

  • we are at position I in prelude;
  • we then fetch position J < I in prelude;
  • we now fetch "next in prelude"

In the implementation at bf3d4e3, we fetch J+1. A quick test with the alternate strategy (fetching I+1) indicates that this alternate strategy does not improve compression.

Yoric added a commit to Yoric/binjs-ref that referenced this issue Feb 13, 2019
As it turns out, the most common value we fetch from a content stream is the "next in prelude". Experiments (see issue binast#298) indicate that reserving 0 for "next in prelude" improves compression of most streams. This is what this
patch does.
Yoric added a commit to Yoric/binjs-ref that referenced this issue Feb 15, 2019
As it turns out, the most common value we fetch from a content stream is the "next in prelude". Experiments (see issue binast#298) indicate that reserving 0 for "next in prelude" improves compression of most streams. This is what this
patch does.
Yoric added a commit to Yoric/binjs-ref that referenced this issue Feb 15, 2019
As it turns out, the most common value we fetch from a content stream is the "next in prelude". Experiments (see issue binast#298) indicate that reserving 0 for "next in prelude" improves compression of most streams. This is what this
patch does.
Yoric added a commit to Yoric/binjs-ref that referenced this issue Feb 17, 2019
As it turns out, the most common value we fetch from a content stream is the "next in prelude". Experiments (see issue binast#298) indicate that reserving 0 for "next in prelude" improves compression of most streams. This is what this
patch does.
Yoric added a commit to Yoric/binjs-ref that referenced this issue Feb 18, 2019
As it turns out, the most common value we fetch from a content stream is the "next in prelude". Experiments (see issue binast#298) indicate that reserving 0 for "next in prelude" improves compression of most streams. This is what this
patch does.
@Yoric Yoric closed this as completed in ad448a5 Feb 18, 2019
Yoric added a commit that referenced this issue Feb 18, 2019
Resolve #298 - In content streams, reserve 0 for "next in prelude"
Achieve size parity with Brotli + minify automation moved this from In progress to Done Feb 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

1 participant