Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP content-encoding: zstd #4065

Open
Malvoz opened this issue Feb 6, 2018 · 23 comments
Open

HTTP content-encoding: zstd #4065

Malvoz opened this issue Feb 6, 2018 · 23 comments

Comments

@Malvoz
Copy link
Contributor

@Malvoz Malvoz commented Feb 6, 2018

Stumbled upon a relatively new content-encoding called zstd (Zstandard).

Defined in:
https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-03.html
https://tools.ietf.org/html/rfc8478

Repo:
https://github.com/facebook/zstd

Blogpost:
https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/

@Malvoz Malvoz changed the title content-encoding: zstd HTTP content-encoding: zstd Feb 6, 2018
@felixhandte
Copy link

@felixhandte felixhandte commented Dec 7, 2018

To clarify, it's a body content encoding. The content-coding identifier is zstd. It's been standardized in RFC 8478. It's supported by some non-browser clients and servers (which I guess is outside your purview), and we're working towards browser support.

@andrew-aladev
Copy link

@andrew-aladev andrew-aladev commented Oct 12, 2019

It's supported by some non-browser clients and servers (which I guess is outside your purview), and we're working towards browser support.

I saw that implementations, saw zstd_dict_file option and tried to search around all internet about an example and failed. There are no public dictionaries. I think Facebook don't want to give money on that research. You should make a research yourself and create dictionary for your private application (based on production database).

So zstd is the option for private client + server applications like... facebook! So why people are creating tasks about integrating zstd into common purpose applications like browsers?

I can't understand a reason for integrating zstd even in caniuse today.

@felixhandte
Copy link

@felixhandte felixhandte commented Oct 14, 2019

Hi @andrew-aladev,

In general, using a dictionary with Zstd is entirely optional, you don't need one. And in fact, the HTTP extension specified in the above-linked RFC, the zstd content-coding, does not use a dictionary. The dictionary support in the nginx extension is non-standard (see tokers/zstd-nginx-module#2 for context). So none of that is really relevant to this topic.

I'll comment on your other concerns in a reply on facebook/zstd#1669.

@andrew-aladev
Copy link

@andrew-aladev andrew-aladev commented Oct 14, 2019

using a dictionary with Zstd is entirely optional

You can use zstd without dictionary or with your own dictionary.

Caniuse is related to zstd support in common purpose applications like web browsers. I think that nobody will start to integrate zstd into web browsers without clear dictionary registry. Zstd RFC doesn't declear a way of synchronizing and protecting dictionaries. It will take many years to do it.

@tuxayo
Copy link

@tuxayo tuxayo commented Dec 12, 2019

@felixhandte

It's supported by some non-browser clients and servers (which I guess is outside your purview), and we're working towards browser support.

This confirms that it's too early for inclusion in Can I use, right?

@xorgy
Copy link

@xorgy xorgy commented May 19, 2020

So zstd is the option for private client + server applications like... facebook! So why people are creating tasks about integrating zstd into common purpose applications like browsers?

This is an extremely narrow way to read the situation. I use zstd at work, it has excellent technical properties that make sense for basically any user of general-purpose lossless compression. I've seen it used in the wild with HTTP APIs; yes it's internal for now, but it not being included in browsers is a matter of browsers picking up support for it... just like everything on caniuse.

@andrew-aladev
Copy link

@andrew-aladev andrew-aladev commented Jun 27, 2020

This is an extremely narrow way to read the situation. I use zstd at work, it has excellent technical properties that make sense for basically any user of general-purpose lossless compression. I've seen it used in the wild with HTTP APIs; yes it's internal for now, but it not being included in browsers is a matter of browsers picking up support for it... just like everything on caniuse.

I am using it too, it is very effective in private applications where you can train dictionary based on real world data. General-purpose lossless compression using zstd without dictionary is almost meaningless.

Caniuse is related to web browsers. zstd without dictionary is not able to compete with brotli that uses single static dictionary optimized for web data. This question can be resolved by registration of zstd ditionaries optimized for web data in iana registries. I am waiting when facebook will make that step.

@felixhandte
Copy link

@felixhandte felixhandte commented Jun 29, 2020

@andrew-aladev, I would hesitate to generalize a specific experience you've had into a global statement about one compressor being universally better than another. Compression is notoriously tricky that way--the specifics of the use case matter. "using zstd without dictionary is almost meaningless" is an almost meaningless statement. I can offer a counter-example: we chose zstd over brotli for HTTP traffic at Facebook.

Even without a dictionary, we found that zstd achieved equivalent compression, compared to brotli, at our target compression speed and performed much better on the client side (as a result of its much higher decompression speed). On mobile chipsets, decompression efficiency is actually very important, and that produced meaningful improvements in end-to-end / top-level metrics.

@andrew-aladev
Copy link

@andrew-aladev andrew-aladev commented Jun 29, 2020

I would hesitate to generalize a specific experience you've had into a global statement about one compressor being universally better than another. Compression is notoriously tricky that way--the specifics of the use case matter. "using zstd without dictionary is almost meaningless" is an almost meaningless statement. I can offer a counter-example: we chose zstd over brotli for HTTP traffic at Facebook.

I was sure that this question will appear: I am making that research, will provide results later.

Please provide your research results based on HTTP traffic on official website. Existing results has no sense: you have re-compressed books, x-ray pictures, etc. Please provide clear results that web developer wants (we are talking about caniuse): graphs for xhtml, graphs for css, graphs for js, graphs for fonts, etc.

Even without a dictionary, we found that zstd achieved equivalent compression, compared to brotli, at our target compression speed and performed much better on the client side (as a result of its much higher decompression speed). On mobile chipsets, decompression efficiency is actually very important, and that produced meaningful improvements in end-to-end / top-level metrics.

I've already reproduced similar results (using small amounts of web data): zstd compression ratio was significantly lower than brotli, but compression and decompression speed was faster. Now just to be clear I want to provide reproducable results.

@xorgy
Copy link

@xorgy xorgy commented Jun 30, 2020

I'll second @felixhandte 's point here, and go a little further. Decompression efficiency is extremely good with zstd.

Zstd can often produce better ratios and better decompression efficiency than lz4 at the same time, even in lz4's sweetspot. In my use case (being careful not to prematurely end frames) it is a massive difference. Far as I can tell, zstd decompresses more efficiently than any common compressor at equivalent ratios, for both my plain text and binary use cases. In my experience, zstd decompresses plain text an integer multiple of times faster than lz4 and gzip do, with dramatically better ratios; and for similar ratios between Brotli and zstd, (zstd can often do better), Brotli takes several times longer to compress. There are a lot of use cases where on-the-fly Brotli simply doesn't make sense because it will increase transfer times at any ratio that makes it worthwhile; for zstd, that list of use cases is several times smaller.

The data's not mine to give, but my two cents are.

In any event, it seems to me like the thing that should decide whether Caniuse shows something is whether any of the tracked web browsers have a release with support. Debating the merits of zstd without dictionaries is neither here nor there, once it's something available in browsers.

@andrew-aladev
Copy link

@andrew-aladev andrew-aladev commented Jun 30, 2020

@xorgy, I am almost agree with you, but there is one hidden thing: absolute values.

Brotli takes several times longer to compress and decompress

We are talking about 0.002 vs 0.001 seconds for 200 KB file. This may not be reasonable for users in many cases. I am going to provide those absolute values and comparison will become more clear for everyone.

@andrew-aladev
Copy link

@andrew-aladev andrew-aladev commented Jul 23, 2020

I've received first results.

Google fonts ratio

Google fonts ratio

Google fonts compress performance

Google fonts compress performance

Google fonts decompress performance

Google fonts decompress performance

Google fonts ratio all in one file

Google fonts ratio all in one file

Google fonts compress performance all in one file

Google fonts compress performance all in one file

Google fonts decompress performance all in one file

Google fonts decompress performance all in one file

  • use brotli for web browser, it provides best ratio for single font.
  • use zstd for mobile devices, it provides best performance with competitive ratio for single font.
  • use tar.br or tar.zst for fonts collection, it provides almost the same ratio.

For more information see brotli-vs-zstd.

@charmander
Copy link

@charmander charmander commented Jul 23, 2020

Recommendations on serving compressed TTFs…? Isn’t any browser supporting zstd also going to support WOFF 2.0?

@andrew-aladev
Copy link

@andrew-aladev andrew-aladev commented Jul 23, 2020

@charmander ttf and otf are just popular uncompressed web data. I am going to provide another stats for data like css, js, html etc. It will be ready on the next week.

@andrew-aladev
Copy link

@andrew-aladev andrew-aladev commented Jul 23, 2020

cdnjs data differs from another collection like google fonts: it includes all file versions (instead of last one). So results are more interesting.

Cdnjs svg ratio

Cdnjs svg ratio

Cdnjs svg compress performance

Cdnjs svg compress performance

Cdnjs svg decompress performance

Cdnjs svg decompress performance

Cdnjs svg ratio all in one file

Cdnjs svg ratio all in one file

Cdnjs svg compress performance all in one file

Cdnjs svg compress performance all in one file

Cdnjs svg decompress performance all in one file

Cdnjs svg decompress performance all in one file

  • use svg + brotli for web browser.
  • use svg + zstd for mobile device.
  • use tar.zst for svg collection, it provides significantly better ratio than tar.br.

So we can see that zstd is much (40-60%) better than brotli for fonts or svg collections in one archive. Brotli is only competitive for collections when it consist from completely unique files.

@andrew-aladev
Copy link

@andrew-aladev andrew-aladev commented Jul 25, 2020

Cdnjs css ratio

Cdnjs css ratio

Cdnjs css compress performance

Cdnjs css compress performance

Cdnjs css decompress performance

Cdnjs css decompress performance

Cdnjs css ratio all in one file

Cdnjs css ratio all in one file

Cdnjs css compress performance all in one file

Cdnjs css compress performance all in one file

Cdnjs css decompress performance all in one file

Cdnjs css decompress performance all in one file

  • use css + brotli for web browser.
  • use css + zstd for mobile device.
  • use tar.zst for css collection, it provides huge 100-130% ratio gain over tar.br.

Now you can see visually what I was talking about: zstd is significantly better than brotli for large web data collection: 1 GB +. But brotli is better for single small file. It means that zstd suffers from lack of dictionary.

Results for js files will be ready on the next week.

@xorgy
Copy link

@xorgy xorgy commented Jul 26, 2020

I appreciate the data. :+ )
I just have one thing to add.

  • use svg + brotli for web browser.
  • use svg + zstd for mobile device.

Some people use web browsers on mobile devices, lately.

@andrew-aladev
Copy link

@andrew-aladev andrew-aladev commented Jul 31, 2020

Sure, today we have powerful smartphones with almost desktop browser. But I think many developers still optimizes applications for old phones so zstd will be usefull.

I've underestimated the power of cdnjs collection: it has about 220 GB of js files. I've received only charts for minified js files. Charts for non-minified js and for all js files will be ready on the next week.

@WilsonHorse1
Copy link

@WilsonHorse1 WilsonHorse1 commented Aug 1, 2020

@andrew-aladev
Copy link

@andrew-aladev andrew-aladev commented Aug 8, 2020

Unfortunately today we have received ac power loss and my ups with 100ah battery was not able to handle 5h offline. I am going to provide results later, sorry.

@xorgy
Copy link

@xorgy xorgy commented Aug 12, 2020

Unfortunately today we have received ac power loss and my ups with 100ah battery was not able to handle 5h offline. I am going to provide results later, sorry.

If you'd like, I can lend you a shell on a Ryzen 3950X with 64GiB of DDR4 in New York which has a stable connection and lives in a rack with backup power, then you don't need to run those from home.

@andrew-aladev
Copy link

@andrew-aladev andrew-aladev commented Aug 16, 2020

Thank you, everything will be all right.

Cdnjs js ratio

Cdnjs js ratio

Cdnjs js compress performance

Cdnjs js compress performance

Cdnjs js decompress performance

Cdnjs js decompress performance

Cdnjs js ratio all in one file

Cdnjs js ratio all in one file

Cdnjs js compress performance all in one file

Cdnjs js compress performance all in one file

Cdnjs js decompress performance all in one file

Cdnjs js decompress performance all in one file

Wikipedia html ratio

Wikipedia html ratio

Wikipedia html compress performance

Wikipedia html compress performance

Wikipedia html decompress performance

Wikipedia html decompress performance

Wikipedia html ratio all in one file

Wikipedia html ratio all in one file

Wikipedia html compress performance all in one file

Wikipedia html compress performance all in one file

Wikipedia html decompress performance all in one file

Wikipedia html decompress performance all in one file

We can see absolutely the same result for js and html files.

@felixhandte We can see that zstd provides better ratio than brotli only when source is large enough. brotli is better than zstd when source is small (regular web browser usage). It proves that zstd suffers from lack of dictionary.

Please mark public dictionary optimized for web content as priority milestone.

@tuxayo
Copy link

@tuxayo tuxayo commented May 12, 2021

Please mark public dictionary optimized for web content as priority milestone.

Is there a ticket for that? I can't find one but I'm not sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants