Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zstd compression #1278

Merged
merged 64 commits into from
Apr 5, 2023
Merged

Add zstd compression #1278

merged 64 commits into from
Apr 5, 2023

Conversation

danlaine
Copy link
Collaborator

Still needs resolution to the following:

  • Do we want to keep compression as a binary disabled/enabled? Or allow user to specify compression type?
  • For existing UT, should we use zstd, or duplicate tests for both gzip and zstd?

Note this changes metric names from [message type]_compress_time / [message type]_decompress_time to zstd_[message type]_compress_time and gzip_[message type]_compress_time. Grafana dashboards will need to be updated accordingly.

Note this deprecates the --network-compression-enabled flag in favor of new --network-compression-type.

Why this should be merged

zstd compression appears significantly faster than gzip, and marginally better at compressing messages.

How this works

Adds new zstd Compressor. Default behavior is still gzip. zstd is forbidden (via config) until v1.10.0. We need to update it so that it's forbidden until network upgrade time passes, but there's no var in the code for that yet.

How this was tested

Existing/new UT

message/messages.go Outdated Show resolved Hide resolved
message/messages.go Outdated Show resolved Hide resolved
message/messages.go Outdated Show resolved Hide resolved
utils/compression/zstd_compressor.go Outdated Show resolved Hide resolved
message/messages.go Outdated Show resolved Hide resolved
utils/compression/zstd_compressor.go Show resolved Hide resolved
utils/compression/zstd_compressor.go Outdated Show resolved Hide resolved
message/messages.go Outdated Show resolved Hide resolved
Copy link
Contributor

@StephenButtolph StephenButtolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - up to you if you want to add the log or not.

@StephenButtolph StephenButtolph added this to the v1.10.0 (Cortina) milestone Apr 4, 2023
Comment on lines +54 to +55
if int64(len(decompressed)) > z.maxSize {
return nil, fmt.Errorf("%w: (%d) > (%d)", ErrDecompressedMsgTooLarge, len(decompressed), z.maxSize)
Copy link
Contributor

@joshua-kim joshua-kim Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we return an error here? At this point we've already decompressed the payload so it seems like a waste to drop this message - i think it makes sense to limit the size of the thing we're compressing or the size of the thing we're decompressing but it feels strange to limit the size of the result of the decompression

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't do this error handling then i think we can also not do the weird extra byte allocation in the earlier line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did not decompress the message in this case. We stopped decompressing the message because it may have been a zip bomb.

@StephenButtolph StephenButtolph merged commit 529d7be into dev Apr 5, 2023
14 checks passed
@StephenButtolph StephenButtolph deleted the add-zstd-compression branch April 5, 2023 23:47
@g1mv
Copy link

g1mv commented Apr 6, 2023

During hypersdk load testing, I found that avalanchego spends the majority of time performing compression-related tasks (zstd should help a ton here):

Showing nodes accounting for 16600ms, 71.55% of 23200ms total
Dropped 486 nodes (cum <= 116ms)
Showing top 10 nodes out of 233
      flat  flat%   sum%        cum   cum%
    3040ms 13.10% 13.10%     7990ms 34.44%  compress/flate.(*compressor).deflate
    2500ms 10.78% 23.88%     3170ms 13.66%  compress/flate.(*decompressor).huffSym
    2160ms  9.31% 33.19%     2160ms  9.31%  runtime.memmove
    1820ms  7.84% 41.03%     1820ms  7.84%  crypto/sha256.block
    1730ms  7.46% 48.49%     1730ms  7.46%  runtime/internal/syscall.Syscall6
    1550ms  6.68% 55.17%     1670ms  7.20%  github.com/golang/snappy.encodeBlock
    1460ms  6.29% 61.47%     2290ms  9.87%  compress/flate.(*compressor).findMatch
     820ms  3.53% 65.00%      820ms  3.53%  compress/flate.matchLen (inline)
     760ms  3.28% 68.28%      760ms  3.28%  compress/flate.(*dictDecoder).writeByte
     760ms  3.28% 71.55%      760ms  3.28%  runtime.memclrNoHeapPointers

Out of interest, do you have a sample of the data you're compressing/decompressing (anything available, even a bulk network dump)? After a peek at the code it does seems to be mainly network messaging data, if I am not mistaken?

@danlaine
Copy link
Collaborator Author

danlaine commented Apr 6, 2023

During hypersdk load testing, I found that avalanchego spends the majority of time performing compression-related tasks (zstd should help a ton here):

Showing nodes accounting for 16600ms, 71.55% of 23200ms total
Dropped 486 nodes (cum <= 116ms)
Showing top 10 nodes out of 233
      flat  flat%   sum%        cum   cum%
    3040ms 13.10% 13.10%     7990ms 34.44%  compress/flate.(*compressor).deflate
    2500ms 10.78% 23.88%     3170ms 13.66%  compress/flate.(*decompressor).huffSym
    2160ms  9.31% 33.19%     2160ms  9.31%  runtime.memmove
    1820ms  7.84% 41.03%     1820ms  7.84%  crypto/sha256.block
    1730ms  7.46% 48.49%     1730ms  7.46%  runtime/internal/syscall.Syscall6
    1550ms  6.68% 55.17%     1670ms  7.20%  github.com/golang/snappy.encodeBlock
    1460ms  6.29% 61.47%     2290ms  9.87%  compress/flate.(*compressor).findMatch
     820ms  3.53% 65.00%      820ms  3.53%  compress/flate.matchLen (inline)
     760ms  3.28% 68.28%      760ms  3.28%  compress/flate.(*dictDecoder).writeByte
     760ms  3.28% 71.55%      760ms  3.28%  runtime.memclrNoHeapPointers

Out of interest, do you have a sample of the data you're compressing/decompressing (anything available, even a bulk network dump)? After a peek at the code it does seems to be mainly network messaging data, if I am not mistaken?

Can't speak to the composition of messages during the test Patrick mentioned, but yes, this compression is only used for P2P messages.

hexfusion pushed a commit to hexfusion/avalanchego that referenced this pull request Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request networking This involves networking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants