Enable compression in netty #8486

deepthidevaki · 2021-12-28T09:01:22Z

Is your feature request related to a problem? Please describe.

More and more usecases are coming up where Zeebe is deployed across multiple data centers. That means, the communication latency between two zeebe brokers can be high.

When there is high latency between nodes, we have previously observed

Frequent leader changes
And/OR lower throughput

While we expect that higher latency will impact the commit latency and this overall process execution time and throughput, frequent leader change is not acceptable. One way to prevent leader change was to increase the election timeout. This means that failover time is also higher, as it take longer to start an election when the current leader dies.

One of the causes for the frequent leader change is that the bandwidth limitation of TCP when the RTT is higher.
https://accedian.com/blog/measuring-network-performance-latency-throughput-packet-loss/

Since we are sending a lot of data over network, the latency has a big impact on replication throughput. This cause a lot of requests to timeout and this causes a leader election.

Describe the solution you'd like

Compress messages that are sent over network. It is easy to plugin a compression algorithm to netty.
We can enable compression by adding the following lines when creating a channel (BasicServer/ClientChannelInitializer#initChannel

        channel.pipeline().addLast(ZlibCodecFactory.newZlibEncoder(ZlibWrapper.GZIP));
        channel.pipeline().addLast(ZlibCodecFactory.newZlibDecoder(ZlibWrapper.GZIP));

There are also other compression algorithms available.

Describe alternatives you've considered

To prevent freq leader change, we can increase the election timeout. But this would mean failover time is high as it takes longer to detect a leader failure.
Compress records before writing to disk. Raft should replicate already compressed records. This may have an additional benefit that the disk i/o is reduced.

Additional context
See comment for the benchmark result from a prototype.

The text was updated successfully, but these errors were encountered:

deepthidevaki · 2021-12-28T09:08:21Z

Here are the results from a benchmark from a prototype when we introduced high latency between brokers with the default benchmark setup. We introduced latency by running the following command on all brokers:

delay=30ms
kubectl --namespace $namespace exec $namespace-zeebe-0 -- tc qdisc replace dev eth0 root netem delay $delay

This introduces an RTT of 60ms.

First we run with a base image with high latency . No compression enabled.

Although this configuration did not introduce a leader change, we observed heartbeat misses:
Second we run with compression enabled in netty and the same RTT of 60ms.

With compression - it is almost 3 times the throughput.

Third we run with compression, with the default benchmark setup and no additional latency.

Compare this with the latest weekly benchmark:

There is no big impact on the performance.

Also see the effect of compression:

npepinpe · 2021-12-28T10:41:36Z

Go for it, it seems like an easy, low hanging fruit. We would need to pick the right compression algo. Common wisdom is gzip, as it has a very low memory impact, but it doesn't have the best compression/decompression times and/or ratio. Snappy is the one I usually see mentioned for streaming use cases, since it has low memory and cpu impact (at the cost of worse compression). But I guess GZip is usually pretty balanced.

I'm not sure how these compare to Zlib, Zstd, or Lzx tbh.

I would propose GZip if only because it's balanced and AFAIK the most used for TCP/HTTP compression, but I'm happy to allocate time for benchmarking it compared to other algos which have different concerns (e.g. lz4 or snappy which are optimized for fast compression/decompression at the loss of ratio)

deepthidevaki · 2021-12-28T12:33:27Z

We can also make it configurable and make more than one algo available. I think, for us optimizing for time/cpu/memory is more important than the compression ratio.

npepinpe · 2021-12-28T12:48:55Z

Then I would propose offering GZip and Snappy. Both don't require much configuration (as opposed to, say, Brotli), one is focused on compression ratio and the other low resource usage. Of course we can discuss other alternatives (e..g I'm not familiar with the difference between lz4 and Snappy, both have similar focuses).

lenaschoenburg · 2022-01-03T09:03:17Z

I think zstd would be a strong contender as well, maybe even instead of gzip. It should be faster than gzip with the same compression ratio.

deepthidevaki · 2022-01-03T12:15:38Z

I have added Gzip and Snappy in #8502 . We can add zstd as a follow up. Right now, I don't see any performance impact when using Gzip. So I don't know if we will see any difference when using zstd. Besides I had already started testing with Gzip before your comment @oleschoenburg

falko · 2022-02-12T13:53:50Z

Did you measure how the CPU usage changes with and without compression? Are we penalizing users with low network latency, i.e. running all brokers in the same data center, if we enable this by default?

falko · 2022-02-12T13:56:57Z

Okay, I saw that it's optional. Do you have any feeling out of your benchmarks at what network latency it's worth the CPU investment?

npepinpe · 2022-02-12T16:05:30Z

IIRC, Deepthi/Ole didn't notice any performance impact when it comes to either gzip or snappy. That said, Snappy is specifically available for the use case where you want a bit of compression for very little overhead, as that's what it was designed for. So users worried about it but who may face a bit of latency could use Snappy. I would recommend to start with GZIP in general, and if you see no different stick to it, as it gives the best compression. If there is some impact, go to Snappy, and if you're sure latency is great then just keep it disabled.

deepthidevaki added the kind/feature Categorizes an issue or PR as a feature, i.e. new behavior label Dec 28, 2021

deepthidevaki added the scope/broker Marks an issue or PR to appear in the broker section of the changelog label Dec 28, 2021

npepinpe added this to Ready in Zeebe Dec 28, 2021

deepthidevaki self-assigned this Jan 3, 2022

deepthidevaki mentioned this issue Jan 3, 2022

Enable compression in netty #8502

Merged

9 tasks

ghost closed this as completed in 2eb67be Jan 4, 2022

Zeebe automation moved this from Ready to Done Jan 4, 2022

deepthidevaki mentioned this issue Jan 5, 2022

Add zstd to list of available options to compress network messages #8523

Closed

remcowesterhoud added the Release: 1.4.0-alpha1 label Feb 1, 2022

KerstinHebel removed this from Done in Zeebe Mar 23, 2022

npepinpe added the Release: 8.0.0-rc1 label Mar 28, 2022

pihme added the Release: 8.0.0 label Apr 5, 2022

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable compression in netty #8486

Enable compression in netty #8486

deepthidevaki commented Dec 28, 2021

deepthidevaki commented Dec 28, 2021

npepinpe commented Dec 28, 2021 •

edited

Loading

deepthidevaki commented Dec 28, 2021

npepinpe commented Dec 28, 2021

lenaschoenburg commented Jan 3, 2022

deepthidevaki commented Jan 3, 2022

falko commented Feb 12, 2022

falko commented Feb 12, 2022

npepinpe commented Feb 12, 2022

Enable compression in netty #8486

Enable compression in netty #8486

Comments

deepthidevaki commented Dec 28, 2021

deepthidevaki commented Dec 28, 2021

npepinpe commented Dec 28, 2021 • edited Loading

deepthidevaki commented Dec 28, 2021

npepinpe commented Dec 28, 2021

lenaschoenburg commented Jan 3, 2022

deepthidevaki commented Jan 3, 2022

falko commented Feb 12, 2022

falko commented Feb 12, 2022

npepinpe commented Feb 12, 2022

npepinpe commented Dec 28, 2021 •

edited

Loading