Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test various compression algorithms for the bazel binary #6318

Closed
meisterT opened this issue Oct 5, 2018 · 5 comments
Closed

Test various compression algorithms for the bazel binary #6318

meisterT opened this issue Oct 5, 2018 · 5 comments
Assignees
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Performance Issues for Performance teams

Comments

@meisterT
Copy link
Member

meisterT commented Oct 5, 2018

They may affect binary size, decompression and extraction speed.

Things to try: brotli, zoepfli, zlib, gz

@meisterT meisterT added this to the shrinking the bazel binary milestone Oct 5, 2018
@meisterT meisterT self-assigned this Oct 5, 2018
@jin jin added team-Performance Issues for Performance teams untriaged labels Oct 6, 2018
@meisterT meisterT added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed untriaged labels Nov 29, 2018
@baryluk
Copy link

baryluk commented Dec 1, 2018

Not a perfect methodology, but I just took a pre-compiled bazel 0.20 for 64-bit Linux, run it and then done some tests on ~/.cache/bazel/_bazel_user/install/3aadb0885b1cb52846820562c39bed63/_embedded_binaries directory.

AMD 2950X, DDR4-3200 ECC, all input and output data in memory backed filesystems and caches. Linux 4.18.

Test name Size Decompression runtime
Original binary 173026232 1.68s
Second run - 0.74s
.../_embedded_binaries/ 310526286 -
tar 311214080 0.13s
tar.gz (gzip -6) 170102107 1.58s
tar.bz2 (bzip2 -9) 157736665 9.74s
tar.7z 125130702 5.84s
tar.xz 131641348 6.98s
tar.zst (zstd / zstd -3) 163888338 0.42s
tar.zst (zstd -6) 157385878 0.43s
tar.zst (zstd -9) 155396115 0.44s
tar.zst (zstd -19) 141284906 0.49s
tar.lz4 (lz4 -9) 178569048 0.40s
tar.lzo (lzop -9) 176166773 0.68s
tar.br (brotli -Z) 124703022 1.25s
tar.zip (zip -9) 170102283 1.72s
zip -9 -n jar 170607497 1.62s
tar.gz (zopfli --gzip --i15) 167338727 -
tar.zlib (zopfli --zlib --i15) 167338715 -
kzip (nonfree) 167777573 -
tar.dct (dact) 157811867 10.39s

Runtime is a wall clock time to decompress and untar into (memory backed) file system.

zoepfli decompression not tested, because it will be similar to gzip, and decompression speed will be similar (~1.6s).

7z and few other compressor have a native support for multiple files, but these was not tested. Only tar + single file compression.

Compression with brotli (note that -Z aka -q 11 is a default) takes ages! It takes about 11 minutes on my machine!

Compression with zopfli (--i15 is default) also takes ages. It takes about 12 minutes for each of them on my machine.

All the other compressors finish in few seconds, and below 1 minute for xz/7z/bz2 or zstd -9.

7z is also multithreaded. There are versions of multithreaded compressors for bzip2.

dact is slow because it selects bzip2 internally for most of the blocks as giving best compression.

In my opinion zstd provides excellent benefits, good compression ratio, very fast decompression and practical compression speeds. zstd -19 takes 72 seconds on my machine.

I am using standard precompiled binaries for all tools available in Debian testing.

$ dpkg -l | awk '{print $2, $3}' | egrep 'zip|7z|tar|xz|brotli|zopfli|zstd|lz4|lzop|gzip|rar|bzip|zutils|zpaq|ncompress|dact' | grep -v ^lib
brotli 1.0.7-1
bzip2 1.0.6-9
dact 0.8.42-4+b2
fcrackzip 1.0-9
fonts-cantarell 0.111-2
gzip 1.9-2.1
lbzip2 2.5-2
lrzip 0.631+git180528-1
lz4 1.8.2-1
lzip 1.20-3
lzop 1.03-4+b1
ncompress 4.2.4.4-23
needrestart 3.3-1
p7zip 16.02+dfsg-6
p7zip-full 16.02+dfsg-6
pbzip2 1.1.9-1+b1
rarcrack 0.2-1+b1
rarian-compat 0.8.1-6+b1
ruby-zip 1.2.1-1.1
rzip 2.1-4.1
tar 1.30+dfsg-2
unrar 1:5.6.6-1
unrar-free 1:0.0.1+cvs20140707-4
unzip 6.0-21
xz-utils 5.2.2-1.3
zip 3.0-11+b1
zopfli 1.0.2-1
zpaq 7.15-1
zstd 1.3.5+dfsg-1

Plus: kzipmix-20150319 (from Ken Silverman).

I can test repacking all jars into other formats, and then reassembling original content on extraction.

@meisterT
Copy link
Member Author

meisterT commented Dec 3, 2018

Thanks for testing! Do you happen to have a script handy to do that? If so, please rerun on the a bazel built from HEAD (with minimal JDK, after commit a7f07cb).

@meisterT
Copy link
Member Author

meisterT commented Dec 3, 2018

Scratch that. We broke the build and the commit was rolled back. I'll update the bug once this is in a testable state again.

@meisterT
Copy link
Member Author

meisterT commented Apr 9, 2019

We reduced size at HEAD signifcantly (~70MB now) so we might want to rerun those numbers.

@meisterT
Copy link
Member Author

meisterT commented Mar 4, 2020

We have no plans to replace the compression algo at this point, so I am closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Performance Issues for Performance teams
Projects
None yet
Development

No branches or pull requests

3 participants