Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding zstd decoding support #5453

Closed
wants to merge 1 commit into from
Closed

Adding zstd decoding support #5453

wants to merge 1 commit into from

Conversation

@gvollant
Copy link
Contributor

gvollant commented May 25, 2020

Zstd is a very fast compressor, and give better speed vs compression ratio than gzip/deflate or brotli.

This pull request add zstd support on curl and libcurl

to test
curl --compressed https://de-de.facebook.com/
curl --compressed https://de-de.facebook.com/ --verbose

Somes links:
https://www.iana.org/assignments/http-parameters/http-parameters.xhtml

https://datatracker.ietf.org/doc/draft-kucherawy-rfc8478bis/

https://tools.ietf.org/html/rfc8478

https://facebook.github.io/zstd/

@gvollant gvollant force-pushed the gvollant:zstd branch from c803be0 to 77f8f73 May 25, 2020
@gvollant
Copy link
Contributor Author

gvollant commented May 25, 2020

Under Ubuntu, we can install zstd library with apt install libzstd-dev

Can anyone help me integrate a zstd test in the curl test system ?

@gvollant
Copy link
Contributor Author

gvollant commented May 25, 2020

To install a server with zstd support
https://github.com/tokers/zstd-nginx-module

@bagder
Copy link
Member

bagder commented May 25, 2020

Can anyone help me integrate a zstd test in the curl test system ?

See test 1170 for inspiration. You basically only need a binary dump of what the server should respond, then put that base64-encoded in the <data> section.

@gvollant
Copy link
Contributor Author

gvollant commented May 25, 2020

I can, by example, translare the three brotli test (314/315/316), but add a build configuration with zstd (probably an Ubuntu with libzstd-dev package) and, configure the test only for zstd build is not easy for me

@bagder
Copy link
Member

bagder commented May 25, 2020

I'll write up a configure check for you...

@bagder
Copy link
Member

bagder commented May 25, 2020

I put it in a separate branch for you. Try it out and see if it works out.

docs/libcurl/curl_version_info.3 Show resolved Hide resolved
#ifdef HAVE_ZSTD
/* Writer parameters. */
struct zstd_params {
ZSTD_DStream *zds; /* State structure for zstd. */

This comment has been minimized.

Copy link
@bagder

bagder May 25, 2020

Member

not curl indent-level! (and more of that later but I won't repeat myself)

This comment has been minimized.

Copy link
@gvollant

gvollant May 25, 2020

Author Contributor

yes, I'll have to find a solution with visual studio 2019 and curl rules

lib/content_encoding.c Outdated Show resolved Hide resolved
ZSTD_outBuffer out;
size_t errorCode;

if(zp->decomp == NULL)

This comment has been minimized.

Copy link
@bagder

bagder May 25, 2020

Member

we typically use if(!zp->decomp) instead of checking for NULL

lib/content_encoding.c Show resolved Hide resolved
lib/content_encoding.c Outdated Show resolved Hide resolved
docs/libcurl/symbols-in-versions Outdated Show resolved Hide resolved
@gvollant gvollant force-pushed the gvollant:zstd branch 5 times, most recently from ce5f499 to c57930c May 25, 2020
@gvollant
Copy link
Contributor Author

gvollant commented May 26, 2020

I put it in a separate branch for you. Try it out and see if it works out.

I rebased my work on your branch. Do you prefer I just copy your file and rebase on master?

@bagder
Copy link
Member

bagder commented May 26, 2020

It is probably easiest if you just copy (cherry-pick) my commit into your branch

@gvanem
Copy link
Contributor

gvanem commented May 26, 2020

I tried your patch on Windows/MSVC-2019 and it works fine.
I've not much to compare with, but the above facebook link transferred fast!

@gvollant
Copy link
Contributor Author

gvollant commented May 26, 2020

I added libzstd-dev package on an ubuntu focal build on .travis.yml but I'm unable to find it on all the continuous integration build.

@felixhandte
Copy link

felixhandte commented May 26, 2020

Hey! I'm from the zstd team (and I implemented the support for the zstd content-coding for facebook.com).

I'm really excited to see zstd make its way into curl! Let me know if you have any questions.

@bagder
Copy link
Member

bagder commented May 26, 2020

I added libzstd-dev package on an ubuntu focal build on .travis.yml but I'm unable to find it on all the continuous integration build.

Do you have travis enabled for your repo? Looking at your repo it doesn't look like so...

Ah strike that, it should build here...

@bagder
Copy link
Member

bagder commented May 26, 2020

You can see configure detect and enabled it here - so that much seems to work. But note that you added that to the build that explicitly does not run the test suite.

@gvollant gvollant force-pushed the gvollant:zstd branch 6 times, most recently from 776840d to 51b05d8 May 26, 2020
@vszakats
Copy link
Member

vszakats commented May 26, 2020

I'm attaching a patch to add zstd support to the MinGW aka Makefile.m32 build method. It's an untested, but fairly simple modification, and should work together with a pending patch that adds zstd builds to curl-for-win. The latter has been tested successfully earlier today on a local machine:

zstd-m32.patch.txt

patch

diff --git a/docs/examples/Makefile.m32 b/docs/examples/Makefile.m32
index d0694cfa3..e8c0d376f 100644
--- a/docs/examples/Makefile.m32
+++ b/docs/examples/Makefile.m32
@@ -24,7 +24,7 @@
 #
 ## Makefile for building curl examples with MingW (GCC-3.2 or later)
 ## and optionally OpenSSL (1.0.2a), libssh2 (1.5), zlib (1.2.8), librtmp (2.4),
-## brotli (1.0.1)
+## brotli (1.0.1), zstd (1.4.5)
 ##
 ## Usage:   mingw32-make -f Makefile.m32 CFG=-feature1[-feature2][-feature3][...]
 ## Example: mingw32-make -f Makefile.m32 CFG=-zlib-ssl-sspi-winidn
@@ -39,6 +39,10 @@
 ifndef ZLIB_PATH
 ZLIB_PATH = ../../../zlib-1.2.8
 endif
+# Edit the path below to point to the base of your Zstandard sources.
+ifndef ZSTD_PATH
+ZSTD_PATH = ../../../zstd-1.4.5
+endif
 # Edit the path below to point to the base of your Brotli sources.
 ifndef BROTLI_PATH
 BROTLI_PATH = ../../../brotli-1.0.1
@@ -180,6 +184,9 @@ endif
 ifeq ($(findstring -zlib,$(CFG)),-zlib)
 ZLIB = 1
 endif
+ifeq ($(findstring -zstd,$(CFG)),-zstd)
+ZSTD = 1
+endif
 ifeq ($(findstring -brotli,$(CFG)),-brotli)
 BROTLI = 1
 endif
@@ -284,6 +291,11 @@ ifdef ZLIB
   CFLAGS += -DHAVE_LIBZ -DHAVE_ZLIB_H
   curl_LDADD += -L"$(ZLIB_PATH)" -lz
 endif
+ifdef ZSTD
+  INCLUDES += -I"$(ZSTD_PATH)/include"
+  CFLAGS += -DHAVE_ZSTD
+  curl_LDADD += -L"$(ZSTD_PATH)/lib" -lzstd
+endif
 ifdef BROTLI
   INCLUDES += -I"$(BROTLI_PATH)/include"
   CFLAGS += -DHAVE_BROTLI
diff --git a/lib/Makefile.m32 b/lib/Makefile.m32
index fe8701bdb..02b31106c 100644
--- a/lib/Makefile.m32
+++ b/lib/Makefile.m32
@@ -24,7 +24,7 @@
 #
 ## Makefile for building libcurl.a with MingW (GCC-3.2 or later or LLVM/Clang)
 ## and optionally OpenSSL (1.0.2a), libssh2 (1.5), zlib (1.2.8), librtmp (2.4),
-## brotli (1.0.1)
+## brotli (1.0.1), zstd (1.4.5)
 ##
 ## Usage:   mingw32-make -f Makefile.m32 CFG=-feature1[-feature2][-feature3][...]
 ## Example: mingw32-make -f Makefile.m32 CFG=-zlib-ssl-sspi-winidn
@@ -39,6 +39,10 @@
 ifndef ZLIB_PATH
 ZLIB_PATH = ../../zlib-1.2.8
 endif
+# Edit the path below to point to the base of your Zstandard sources.
+ifndef ZSTD_PATH
+ZSTD_PATH = ../../zstd-1.4.5
+endif
 # Edit the path below to point to the base of your Brotli sources.
 ifndef BROTLI_PATH
 BROTLI_PATH = ../../brotli-1.0.1
@@ -180,6 +184,9 @@ endif
 ifeq ($(findstring -zlib,$(CFG)),-zlib)
 ZLIB = 1
 endif
+ifeq ($(findstring -zstd,$(CFG)),-zstd)
+ZSTD = 1
+endif
 ifeq ($(findstring -brotli,$(CFG)),-brotli)
 BROTLI = 1
 endif
@@ -288,6 +295,11 @@ ifdef ZLIB
   CFLAGS += -DHAVE_LIBZ -DHAVE_ZLIB_H
   DLL_LIBS += -L"$(ZLIB_PATH)" -lz
 endif
+ifdef ZSTD
+  INCLUDES += -I"$(ZSTD_PATH)/include"
+  CFLAGS += -DHAVE_ZSTD
+  DLL_LIBS += -L"$(ZSTD_PATH)/lib" -lzstd
+endif
 ifdef BROTLI
   INCLUDES += -I"$(BROTLI_PATH)/include"
   CFLAGS += -DHAVE_BROTLI
diff --git a/src/Makefile.m32 b/src/Makefile.m32
index 802a1da0f..afb4fd547 100644
--- a/src/Makefile.m32
+++ b/src/Makefile.m32
@@ -24,7 +24,7 @@
 #
 ## Makefile for building curl.exe with MingW (GCC-3.2 or later or LLVM/Clang)
 ## and optionally OpenSSL (1.0.2a), libssh2 (1.5), zlib (1.2.8), librtmp (2.4),
-## brotli (1.0.1)
+## brotli (1.0.1), zstd (1.4.5)
 ##
 ## Usage:   mingw32-make -f Makefile.m32 CFG=-feature1[-feature2][-feature3][...]
 ## Example: mingw32-make -f Makefile.m32 CFG=-zlib-ssl-sspi-winidn
@@ -39,6 +39,10 @@
 ifndef ZLIB_PATH
 ZLIB_PATH = ../../zlib-1.2.8
 endif
+# Edit the path below to point to the base of your Zstandard sources.
+ifndef ZSTD_PATH
+ZSTD_PATH = ../../zstd-1.4.5
+endif
 # Edit the path below to point to the base of your Brotli sources.
 ifndef BROTLI_PATH
 BROTLI_PATH = ../../brotli-1.0.1
@@ -189,6 +193,9 @@ endif
 ifeq ($(findstring -zlib,$(CFG)),-zlib)
 ZLIB = 1
 endif
+ifeq ($(findstring -zstd,$(CFG)),-zstd)
+ZSTD = 1
+endif
 ifeq ($(findstring -brotli,$(CFG)),-brotli)
 BROTLI = 1
 endif
@@ -302,6 +309,11 @@ ifdef ZLIB
   CFLAGS += -DHAVE_LIBZ -DHAVE_ZLIB_H
   curl_LDADD += -L"$(ZLIB_PATH)" -lz
 endif
+ifdef ZSTD
+  INCLUDES += -I"$(ZSTD_PATH)/include"
+  CFLAGS += -DHAVE_ZSTD
+  curl_LDADD += -L"$(ZSTD_PATH)/lib" -lzstd
+endif
 ifdef BROTLI
   INCLUDES += -I"$(BROTLI_PATH)/include"
   CFLAGS += -DHAVE_BROTLI

@gvollant gvollant force-pushed the gvollant:zstd branch 2 times, most recently from 52fa79b to cf2ab8f May 27, 2020
@gvollant
Copy link
Contributor Author

gvollant commented May 27, 2020

zstd tests 396 and 397 have success
https://github.com/curl/curl/pull/5453/checks?check_run_id=711600257

https://dev.azure.com/daniel0244/curl/_build/results?buildId=2039&view=logs&jobId=8a21621c-000d-5b56-b920-3535c5db3dfc&j=8a21621c-000d-5b56-b920-3535c5db3dfc&t=7573036f-f328-5f4c-a599-58f1a97b0344

2020-05-27T02:48:29.8568226Z --p----e--- OK (357 out of 1342, remaining: 04:37, took 0.075s, duration: 01:40)
2020-05-27T02:48:29.8569333Z test 0396...[HTTP GET zstd compressed content]
2020-05-27T02:48:29.9184975Z --pd---e--- OK (358 out of 1342, remaining: 04:36, took 0.061s, duration: 01:40)
2020-05-27T02:48:29.9186066Z test 0397...[HTTP GET zstd compressed content of size more than CURL_MAX_WRITE_SIZE]
2020-05-27T02:48:29.9611994Z --pd---e--- OK (359 out of 1342, remaining: 04:35, took 0.035s, duration: 01:40)

@bagder
Copy link
Member

bagder commented May 27, 2020

@Lekensteyn you up for adding a cmake fix for zstd to add to this?

@gvollant
Copy link
Contributor Author

gvollant commented May 27, 2020

@bagder Do you think last check fails are related to my modification? It's not easy to understand them...

regards!

@bagder
Copy link
Member

bagder commented May 27, 2020

No I don't think you can be blamed for them, they look like the regular set failures due to flaky CI.

@Lekensteyn
Copy link
Member

Lekensteyn commented May 27, 2020

Added CMake support, tested on Arch Linux with zstd 1.4.4-1.

I observed that the support for ZSTD_createDStream was added for the first time in zstd v0.8.1 (facebook/zstd@5a0c8e2), so the symbol check is indeed necessary. Otherwise curl could potentially fail to build on Ubuntu 16.04.

Copy link
Member

Lekensteyn left a comment

A couple of nits, looks good otherwise.

tests/runtests.pl Outdated Show resolved Hide resolved
.travis.yml Outdated Show resolved Hide resolved
.travis.yml Outdated Show resolved Hide resolved
env:
AZURE_ACCESS_TOKEN: "$(System.AccessToken)"
TFLAGS: ""

This comment has been minimized.

Copy link
@Lekensteyn

Lekensteyn May 27, 2020

Member

Any reason for adding a new job as opposed to combining it with another test?

This comment has been minimized.

Copy link
@gvollant

gvollant May 27, 2020

Author Contributor

Just I was afraid broken anything...

This comment has been minimized.

Copy link
@Lekensteyn

Lekensteyn May 27, 2020

Member

I'd probably reuse an existing job to reduce the carbon footprint, but it's up to @bagder :)

This comment has been minimized.

Copy link
@bagder

bagder May 27, 2020

Member

I agree with @Lekensteyn, we should rather combine this into an existing job. Not only the carbon footprint wins, we also get the complete results done faster.

This comment has been minimized.

Copy link
@gvollant

gvollant May 27, 2020

Author Contributor

I agree with you, but I prefer you choose the existing job to add zstd.

I must admit my focus was on zstd code, makefile and test and not the better CI solution... :-)

size_t errorCode;

if(!zp->decomp) {
zp->decomp = malloc(DSIZ);

This comment has been minimized.

Copy link
@bagder

bagder May 28, 2020

Member

I'm curious on this malloc, and maybe @felixhandte can help us out here (I tried to find docs that would explain this but I failed). How is this particular memory block size chosen? Is there a benefit to use any particular size and if so, what's the trade-offs? DSIZ is CURL_MAX_WRITE_SIZE, 16K by default.

This comment has been minimized.

Copy link
@gvollant

gvollant May 28, 2020

Author Contributor

this is size used by other decompressor (both brotli and zlib). I believed this was optimal size for Curl_unencode_write.
Zstd is happy with this size or other size.

This comment has been minimized.

Copy link
@felixhandte

felixhandte May 28, 2020

16KB should be fine here.

Optimizing output buffer sizes for Zstd decompression is mostly a best effort sort of thing, and generally not performance critical. If Zstd gets a frame that it knows will be smaller than the provided output buffer, it can decompress straight into that buffer. But if you receive a streamed response (i.e., one that doesn't announce its total size in the Frame_Content_Size field in the frame header), Zstd has to internally buffer the output anyways to maintain its history window. So the output buffer size doesn't really matter, we're just memcpy-ing into it from that internal buffer that we are actually doing the decompression into.

Which, by the way, raises the topic of window sizes. libzstd has a default limit of 128 MB, which means that the ZSTD_DCtx may internally allocate up to ~128 MB, and anything that requires more will be rejected. This limit can be configured: I'm not sure whether curl wants to expose this to users, or whether curl wants to select a different limit by default. The RFC recommends for interoperability that, in the absence of negotiation, encoders not use a window larger than 8 MB:

For improved interoperability, it's recommended for decoders to
support values of Window_Size up to 8 MB and for encoders not to
generate frames requiring a Window_Size larger than 8 MB. It's
merely a recommendation though, and decoders are free to support
larger or lower limits, depending on local limitations.

This comment has been minimized.

Copy link
@gvollant

gvollant May 31, 2020

Author Contributor

I made minor decompression code to follow @Cyan4973 suggestion about exit loop condition

@gvollant gvollant force-pushed the gvollant:zstd branch 2 times, most recently from 5bda517 to 17a62b4 May 29, 2020
@gvollant gvollant force-pushed the gvollant:zstd branch from 17a62b4 to 221b38e Jun 7, 2020
@gvollant gvollant force-pushed the gvollant:zstd branch 3 times, most recently from 2ac3e54 to c7a258b Jun 20, 2020
include zstd curl patch for Makefile.m32 from vszakats
and include Add CMake support for zstd from Peter Wu
@gvollant gvollant force-pushed the gvollant:zstd branch from c7a258b to a721de9 Jul 1, 2020
@gvollant
Copy link
Contributor Author

gvollant commented Jul 12, 2020

@bagder do you think we need more work/test on this feature?

@bagder
bagder approved these changes Jul 12, 2020
@bagder bagder closed this in e13357b Jul 12, 2020
@bagder
Copy link
Member

bagder commented Jul 12, 2020

Thanks!

@gvollant gvollant deleted the gvollant:zstd branch Jul 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants
You can’t perform that action at this time.