Added Ctx compress/decompress #83

merlimat · 2020-06-11T23:49:25Z

Motivation

According to https://facebook.github.io/zstd/zstd_manual.html#Chapter4 when doing repeated compression/decompression operations, it's recommended to use a context object for internal state to be preserved.

  When compressing many times,
  it is recommended to allocate a context just once,
  and re-use it for each successive compression operation.
  This will make workload friendlier for system's memory.
  Note : re-using context is just a speed / resource optimization.
         It doesn't change the compression ratio, which remains identical.
  Note 2 : In multi-threaded environments,
         use one different context per thread for parallel execution.

While this could be done similarly through the stream API, it doesn't come natural when compressing a []byte and it doesn't provide a way to provide a dst buffer for the result.

Modifications

In order to expose the ZSTD_compressCCtx() and ZSTD_decompressDCtx(), adding a Ctx interface:

type Ctx interface {
	// Compress src into dst.  If you have a buffer to use, you can pass it to
	// prevent allocation.  If it is too small, or if nil is passed, a new buffer
	// will be allocated and returned.
	Compress(dst, src []byte) ([]byte, error)

	// CompressLevel is the same as Compress but you can pass a compression level
	CompressLevel(dst, src []byte, level int) ([]byte, error)

	// Decompress src into dst.  If you have a buffer to use, you can pass it to
	// prevent allocation.  If it is too small, or if nil is passed, a new buffer
	// will be allocated and returned.
	Decompress(dst, src []byte) ([]byte, error)

	io.Closer
}

Example:

ctx := zstd.NewCtx()

out1, err := ctx.Compress(nil, input1)
out2, err := ctx.Compress(nil, input2)
// ...

ctx.Close()

Microbenchmark

BenchmarkCtxCompression
BenchmarkCtxCompression-16         	     207	   5189899 ns/op	 345.87 MB/s
BenchmarkCtxDecompression
    BenchmarkCtxDecompression: zstd_ctx_test.go:166: Reduced from 1795030 to 119090
    BenchmarkCtxDecompression: zstd_ctx_test.go:166: Reduced from 1795030 to 119090
    BenchmarkCtxDecompression: zstd_ctx_test.go:166: Reduced from 1795030 to 119090
BenchmarkCtxDecompression-16       	    1548	    679185 ns/op	2642.92 MB/s
BenchmarkStreamCompression
BenchmarkStreamCompression-16      	    5766	    230298 ns/op	7794.38 MB/s
BenchmarkStreamDecompression
BenchmarkStreamDecompression-16    	    1048	   1578035 ns/op	1137.51 MB/s
BenchmarkCompression
BenchmarkCompression-16            	     150	   7384646 ns/op	 243.08 MB/s
BenchmarkDecompression
    BenchmarkDecompression: zstd_test.go:189: Reduced from 1795030 to 119090
    BenchmarkDecompression: zstd_test.go:189: Reduced from 1795030 to 119090
    BenchmarkDecompression: zstd_test.go:189: Reduced from 1795030 to 119090
BenchmarkDecompression-16          	    1357	    783583 ns/op	2290.80 MB/s

The benchmark shows that reusing the context improves the overall performance:

Compression : 243 MB/s --> 345 MB/s
Decompression: 2290 MB/s --> 2642 MB/s

Viq111

Thanks for your contribution @merlimat!
This looks good, I just added some comments around SetFinalizer vs a Close pattern.

Thanks as well for adding the tests and benchmarks, really highlight the usefulness of this addition.

zstd_ctx.go

Viq111 · 2020-06-16T17:13:35Z

zstd_ctx.go

+}
+
+func (c *ctx) Close() error {
+	if err := getError(int(C.ZSTD_freeCCtx(c.cctx))); err != nil {


I think we should probably try to free both and return the first error if any:

err1 := getError(int(C.ZSTD_freeCCtx(c.cctx))) err2 := getError(int(C.ZSTD_freeDCtx(c.dctx))) if err1 != nil { return err1 } return err2

That way you don't prevent the second context to be freed if the first fails.

This should also probably be gated by a boolean to make sure we don't call freeCCtx twice on the same pointer.

As pointed earlier though, I think in that particular case, a finalizer might be a better fit since all we do is freeing memory, it would be more user-friendly, not end of the world if the finalizer is not called and you are sure it's only called once

The only thing with finalize function, I don't know what to do with the error codes since we're not bubbling it up. Is there any convention on how to log errors?

Yes that's a good point.
Usually what I've seen in the go ecosystem is libraries use the standard log package (examples: zookeeper, sarama)
But most of the time, main program use another logging system (logrus, zap) so it's hard to provide standard logs.

For the current case, the error is not very actionable (if the free fails) so I think we are ok not logging it.
Need to revisit this though if we ever need to add additional logging

Co-authored-by: Vianney Tran <vianney.tran@datadoghq.com>

merlimat · 2020-06-16T20:04:23Z

@Viq111 Thanks for the feedback, I've changed to use finalizer.

Viq111

Thanks for your contribution, this looks great!

Viq111 · 2020-06-16T21:44:17Z

I'll merge it tomorrow morning EST (since it's nearing end of business day now)

The benchmark on CircleCI gives the following benchmark:
https://app.circleci.com/pipelines/github/DataDog/zstd/38/workflows/16e654d5-bced-4e74-af20-a20fd83e05c9/jobs/122/steps

BenchmarkCtxCompression-36         	      10	 147802605 ns/op	  67.46 MB/s
BenchmarkCtxDecompression-36       	     100	  15460314 ns/op	 644.91 MB/s
--- BENCH: BenchmarkCtxDecompression-36
    zstd_ctx_test.go:158: Reduced from 9970564 to 3402985
    zstd_ctx_test.go:158: Reduced from 9970564 to 3402985
BenchmarkStreamCompression-36      	      10	 154451392 ns/op	  64.55 MB/s
BenchmarkStreamDecompression-36    	     100	  19827954 ns/op	 502.85 MB/s
BenchmarkCompression-36            	      10	 143375341 ns/op	  69.54 MB/s
BenchmarkDecompression-36          	     100	  16590654 ns/op	 600.97 MB/s
--- BENCH: BenchmarkDecompression-36
    zstd_test.go:189: Reduced from 9970564 to 3402985
    zstd_test.go:189: Reduced from 9970564 to 3402985
PASS

So still see the improvement but not as much since mr is a big payload.

For a small payload (I took zstd.go) on my machine:

ᐅ go test -bench . -run None
goos: darwin
goarch: amd64
pkg: github.com/DataDog/zstd
BenchmarkCtxCompression-16         	   30886	     38165 ns/op	 110.84 MB/s
BenchmarkCtxDecompression-16       	  158637	      7432 ns/op	 569.15 MB/s
BenchmarkCompression-16            	   31852	     40295 ns/op	 104.98 MB/s
BenchmarkDecompression-16          	  160329	      7761 ns/op	 545.05 MB/s

Added Ctx compress/decompress

c45dcb3

Viq111 reviewed Jun 16, 2020

View reviewed changes

merlimat and others added 2 commits June 16, 2020 12:55

Update zstd_ctx.go

9963239

Co-authored-by: Vianney Tran <vianney.tran@datadoghq.com>

Using finalizer instead of close

fda5922

Viq111 approved these changes Jun 16, 2020

View reviewed changes

Viq111 merged commit 89f69fb into DataDog:1.x Jun 17, 2020

merlimat deleted the ctx-compress branch June 17, 2020 15:05

merlimat mentioned this pull request Jun 18, 2020

Switched to DataDog zstd wrapper, reusing the compression ctx apache/pulsar-client-go#287

Merged

Viq111 mentioned this pull request Apr 4, 2022

Implements Bulk processing dictionary API #114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Ctx compress/decompress #83

Added Ctx compress/decompress #83

merlimat commented Jun 11, 2020

Viq111 left a comment

Viq111 Jun 16, 2020

merlimat Jun 16, 2020

Viq111 Jun 16, 2020

merlimat commented Jun 16, 2020

Viq111 left a comment

Viq111 commented Jun 16, 2020

Added Ctx compress/decompress #83

Added Ctx compress/decompress #83

Conversation

merlimat commented Jun 11, 2020

Motivation

Modifications

Microbenchmark

Viq111 left a comment

Choose a reason for hiding this comment

Viq111 Jun 16, 2020

Choose a reason for hiding this comment

merlimat Jun 16, 2020

Choose a reason for hiding this comment

Viq111 Jun 16, 2020

Choose a reason for hiding this comment

merlimat commented Jun 16, 2020

Viq111 left a comment

Choose a reason for hiding this comment

Viq111 commented Jun 16, 2020