Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider zstd for compression of shipped artifacts #1660

Open
dustymabe opened this issue Jan 31, 2024 · 12 comments
Open

Consider zstd for compression of shipped artifacts #1660

dustymabe opened this issue Jan 31, 2024 · 12 comments

Comments

@dustymabe
Copy link
Member

I did some investigation into zstd as our default compression algorithm. I set the compression level of zstd to 19 and xz to 9 (what we use today). Here is what I see for times on compress and decompress using xz of the metal and qemu artifacts:

Targeting build: 39.20240131.dev.0
Compressing: builds/39.20240131.dev.0/x86_64
2024-01-31 04:30:40,161 INFO - Running command: ['xz', '-c9', '-T12', 'builds/39.20240131.dev.0/x86_64/fedora-coreos-39.20240131.dev.0-ostree.x86_64-manifest.json']
Compressed: fedora-coreos-39.20240131.dev.0-ostree.x86_64-manifest.json.xz
2024-01-31 04:30:40,209 INFO - Running command: ['xz', '-c9', '-T12', 'builds/39.20240131.dev.0/x86_64/fedora-coreos-39.20240131.dev.0-qemu.x86_64.qcow2']
Compressed: fedora-coreos-39.20240131.dev.0-qemu.x86_64.qcow2.xz
2024-01-31 04:32:34,082 INFO - Running command: ['xz', '-c9', '-T12', 'builds/39.20240131.dev.0/x86_64/fedora-coreos-39.20240131.dev.0-metal.x86_64.raw']
Compressed: fedora-coreos-39.20240131.dev.0-metal.x86_64.raw.xz
Skipped compressing artifacts: ostree
Updated: builds/39.20240131.dev.0/x86_64/meta.json
+ rc=0
+ set +x

real    3m50.097s
user    0m0.155s
sys     0m0.153s


Targeting build: 39.20240131.dev.0
Uncompressing: builds/39.20240131.dev.0/x86_64
2024-01-31 04:51:32,434 INFO - Running command: ['xz', '-dc', '-T12', 'builds/39.20240131.dev.0/x86_64/fedora-coreos-39.20240131.dev.0-ostree.x86_64-manifest.json.xz']
Uncompressed: fedora-coreos-39.20240131.dev.0-ostree.x86_64-manifest.json
2024-01-31 04:51:32,452 INFO - Running command: ['xz', '-dc', '-T12', 'builds/39.20240131.dev.0/x86_64/fedora-coreos-39.20240131.dev.0-qemu.x86_64.qcow2.xz']
Uncompressed: fedora-coreos-39.20240131.dev.0-qemu.x86_64.qcow2
2024-01-31 04:51:38,337 INFO - Running command: ['xz', '-dc', '-T12', 'builds/39.20240131.dev.0/x86_64/fedora-coreos-39.20240131.dev.0-metal.x86_64.raw.xz']
Uncompressed: fedora-coreos-39.20240131.dev.0-metal.x86_64.raw
Skipped uncompressing artifacts: ostree
Updated: builds/39.20240131.dev.0/x86_64/meta.json
+ rc=0
+ set +x

real    0m13.809s
user    0m0.066s
sys     0m0.070s

and here is what I see for zstd:

 Compressing: builds/39.20240131.dev.1/x86_64
2024-01-31 04:42:08,112 INFO - Running command: ['zstd', '-19', '-c', '-T12', 'builds/39.20240131.dev.1/x86_64/fedora-coreos-39.20240131.dev.1-ostree.x86_64-manifest.json']
Compressed: fedora-coreos-39.20240131.dev.1-ostree.x86_64-manifest.json.zst
2024-01-31 04:42:08,138 INFO - Running command: ['zstd', '-19', '-c', '-T12', 'builds/39.20240131.dev.1/x86_64/fedora-coreos-39.20240131.dev.1-qemu.x86_64.qcow2']
Compressed: fedora-coreos-39.20240131.dev.1-qemu.x86_64.qcow2.zst
2024-01-31 04:43:35,600 INFO - Running command: ['zstd', '-19', '-c', '-T12', 'builds/39.20240131.dev.1/x86_64/fedora-coreos-39.20240131.dev.1-metal.x86_64.raw']
Compressed: fedora-coreos-39.20240131.dev.1-metal.x86_64.raw.zst
Skipped compressing artifacts: ostree
Updated: builds/39.20240131.dev.1/x86_64/meta.json
+ rc=0
+ set +x

real    3m2.790s
user    0m0.124s
sys     0m0.150s


Targeting build: 39.20240131.dev.1
Uncompressing: builds/39.20240131.dev.1/x86_64
2024-01-31 04:50:07,629 INFO - Running command: ['zstd', '-dc', 'builds/39.20240131.dev.1/x86_64/fedora-coreos-39.20240131.dev.1-ostree.x86_64-manifest.json.zst']
Uncompressed: fedora-coreos-39.20240131.dev.1-ostree.x86_64-manifest.json
2024-01-31 04:50:07,636 INFO - Running command: ['zstd', '-dc', 'builds/39.20240131.dev.1/x86_64/fedora-coreos-39.20240131.dev.1-qemu.x86_64.qcow2.zst']
Uncompressed: fedora-coreos-39.20240131.dev.1-qemu.x86_64.qcow2
2024-01-31 04:50:10,480 INFO - Running command: ['zstd', '-dc', 'builds/39.20240131.dev.1/x86_64/fedora-coreos-39.20240131.dev.1-metal.x86_64.raw.zst']
Uncompressed: fedora-coreos-39.20240131.dev.1-metal.x86_64.raw
Skipped uncompressing artifacts: ostree
Updated: builds/39.20240131.dev.1/x86_64/meta.json
+ rc=0
+ set +x

real    0m9.579s
user    0m0.051s
sys     0m0.071s

and here is what the difference in sizes look like:

        "qemu": {
            "path": "fedora-coreos-39.20240131.dev.0-qemu.x86_64.qcow2.xz",
            "sha256": "5e594eb29feb65e670e8c7e175d9b69eb31643ae9891074856bbd32b8bef2d56",
            "size": "662MiB",
            "uncompressed-sha256": "a117e5c02b04d93e158e246eca7409447d4808fd63e1d4a012fb688c613fc0e6",
            "uncompressed-size": "1609MiB"
        },
        "metal": {
            "path": "fedora-coreos-39.20240131.dev.0-metal.x86_64.raw.xz",
            "sha256": "132bc17c89ba82b9d0e91c3886b92447c0d1893c7c05ddeccc99b11706ec7b3a",
            "size": "661MiB",
            "uncompressed-sha256": "a8c1f04549136b3828bcb1beea7105f3b1ee70b17682cd9da5034a3ccf73b16c",
            "uncompressed-size": "2506MiB"
        }
        "qemu": {
            "path": "fedora-coreos-39.20240131.dev.1-qemu.x86_64.qcow2.zst",
            "sha256": "7697713189ff720a2a082b23948365fcdc6c71244f127ab6a16c99b11c2aec5e",
            "size": "720MiB",
            "uncompressed-sha256": "693edcc03dcb202775424c6fc4d9757a2042b374335100800b802fd0f82048e3",
            "uncompressed-size": "1609MiB"
        },
        "metal": {
            "path": "fedora-coreos-39.20240131.dev.1-metal.x86_64.raw.zst",
            "sha256": "563082baaef35847307f5ebff796992bfa1826589453861abdd905eec0d77dca",
            "size": "714MiB",
            "uncompressed-sha256": "0b802bd0a7b45b3a40760a25282c5bd8cccaa06ec180cfd87bcf033d50dde25d",
            "uncompressed-size": "2506MiB"
        }

To summarize:

Algo Time Compress Time Decompress QEMU Uncompressed QEMU Compressed Metal Uncompressed Metal Compressed
xz 3m50.097s 0m13.809s 1609MiB 662MiB 2506MiB 661MiB
zstd 3m2.790s 0m9.579ss 1609MiB 720MiB 2506MiB 714MiB

So we get about a 20% speedup in compression and 30% speedup in decompression with the tradeoff of 8-9% larger compressed files.

@dustymabe dustymabe added the meeting topics for meetings label Jan 31, 2024
@dustymabe
Copy link
Member Author

Looking at our pipelines the compression step takes around 30m:

[2024-01-29T18:14:28.266Z] + set -xeuo pipefail
[2024-01-29T18:14:28.266Z] ++ umask
[2024-01-29T18:14:28.266Z] + '[' 0022 = 0000 ']'
[2024-01-29T18:14:28.266Z] + cosa compress
[2024-01-29T18:14:28.266Z] Targeting build: 39.20240128.1.0
[2024-01-29T18:14:28.519Z] Compressing: builds/39.20240128.1.0/x86_64
[2024-01-29T18:14:28.519Z] 2024-01-29 18:14:28,343 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-ostree.x86_64-manifest.json']
[2024-01-29T18:14:28.519Z] Compressed: fedora-coreos-39.20240128.1.0-ostree.x86_64-manifest.json.xz
[2024-01-29T18:14:28.519Z] 2024-01-29 18:14:28,379 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-qemu.x86_64.qcow2']
[2024-01-29T18:17:34.891Z] Compressed: fedora-coreos-39.20240128.1.0-qemu.x86_64.qcow2.xz
[2024-01-29T18:17:34.891Z] 2024-01-29 18:17:21,459 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-azure.x86_64.vhd']
[2024-01-29T18:20:26.296Z] Compressed: fedora-coreos-39.20240128.1.0-azure.x86_64.vhd.xz
[2024-01-29T18:20:26.296Z] 2024-01-29 18:20:12,008 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-aws.x86_64.vmdk']
[2024-01-29T18:22:02.671Z] Compressed: fedora-coreos-39.20240128.1.0-aws.x86_64.vmdk.xz
[2024-01-29T18:22:02.671Z] 2024-01-29 18:22:02,288 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-openstack.x86_64.qcow2']
[2024-01-29T18:24:54.058Z] Compressed: fedora-coreos-39.20240128.1.0-openstack.x86_64.qcow2.xz
[2024-01-29T18:24:54.058Z] 2024-01-29 18:24:50,011 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-aliyun.x86_64.qcow2']
[2024-01-29T18:27:45.446Z] Compressed: fedora-coreos-39.20240128.1.0-aliyun.x86_64.qcow2.xz
[2024-01-29T18:27:45.446Z] 2024-01-29 18:27:38,062 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-metal.x86_64.raw']
[2024-01-29T18:30:21.800Z] Compressed: fedora-coreos-39.20240128.1.0-metal.x86_64.raw.xz
[2024-01-29T18:30:21.800Z] 2024-01-29 18:30:08,872 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-metal4k.x86_64.raw']
[2024-01-29T18:32:58.169Z] Compressed: fedora-coreos-39.20240128.1.0-metal4k.x86_64.raw.xz
[2024-01-29T18:32:58.169Z] 2024-01-29 18:32:47,924 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-azurestack.x86_64.vhd']
[2024-01-29T18:35:49.569Z] Compressed: fedora-coreos-39.20240128.1.0-azurestack.x86_64.vhd.xz
[2024-01-29T18:35:49.569Z] 2024-01-29 18:35:38,359 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-exoscale.x86_64.qcow2']
[2024-01-29T18:38:41.159Z] Compressed: fedora-coreos-39.20240128.1.0-exoscale.x86_64.qcow2.xz
[2024-01-29T18:38:41.159Z] 2024-01-29 18:38:35,313 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-ibmcloud.x86_64.qcow2']
[2024-01-29T18:41:32.525Z] Compressed: fedora-coreos-39.20240128.1.0-ibmcloud.x86_64.qcow2.xz
[2024-01-29T18:41:32.525Z] 2024-01-29 18:41:26,729 INFO - Running command: ['xz', '-c9', '-T6', 'builds/39.20240128.1.0/x86_64/fedora-coreos-39.20240128.1.0-vultr.x86_64.raw']
[2024-01-29T18:44:23.916Z] Compressed: fedora-coreos-39.20240128.1.0-vultr.x86_64.raw.xz
[2024-01-29T18:44:23.916Z] Skipped compressing artifacts: ostree applehv nutanix kubevirt hyperv gcp digitalocean vmware virtualbox live-iso live-kernel live-initramfs live-rootfs
[2024-01-29T18:44:23.916Z] Updated: builds/39.20240128.1.0/x86_64/meta.json

So we could possibly save 6-8m per run just on that. This could then compound because each of our CI runs may or may not run cosa compress too.

@dustymabe
Copy link
Member Author

Another thing to mention here is that in my tests I used zstd compression level of 19 which is the highest you can specify without using --ultra which requires a lot more memory.

We could experiment with different levels to see what the differences are in size versus speed, but I assumed we wanted to increase the size as little as possible so I used 19.

@jlebon
Copy link
Member

jlebon commented Jan 31, 2024

Huh, I was expecting more drastic differences in compression/decompression times. IMO spending an extra 6-8 minutes for 8% smaller images is worth it.

dustymabe added a commit to dustymabe/coreos-assembler that referenced this issue Jan 31, 2024
I implemented this to investigate it as an option for
coreos/fedora-coreos-tracker#1660
so figured I may as well post the code up for inclusion.
dustymabe added a commit to coreos/coreos-assembler that referenced this issue Jan 31, 2024
I implemented this to investigate it as an option for
coreos/fedora-coreos-tracker#1660
so figured I may as well post the code up for inclusion.
@dustymabe
Copy link
Member Author

dustymabe commented Feb 1, 2024

Some more data:

Level Time Compress Time Decompress Metal Uncompressed Metal Compressed
19 3m2.790s 0m9.579ss 2506MiB 714MiB
14 0m53.438s 0m9.751s 2506MiB 754.8MiB
10 0m21.477s 0m9.487s 2506MiB 757.8MiB
5 0m7.368s 0m9.698s 2506MiB 793.5MiB

@dustymabe
Copy link
Member Author

dustymabe commented Feb 1, 2024

If we went with something like level 10 we'd get a 90% speedup in compression which I think would take our compress stage in our pipeline down to ~5m. The increase in image size would be around 10-15%.

dustymabe added a commit to dustymabe/coreos-assembler that referenced this issue Feb 6, 2024
Based on the discussion in coreos/fedora-coreos-tracker#1660
level 10 seems to give us a good speedup versus size tradeoff.
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Feb 6, 2024
We'll experiment with this in `rawhide` to see what kind of real
world gains we get from using zstd compression. Also see if there
are any bugs that crop up. This is to further the discussion in
coreos/fedora-coreos-tracker#1660
@jlebon jlebon removed the meeting topics for meetings label Feb 7, 2024
@jlebon
Copy link
Member

jlebon commented Feb 7, 2024

This was discussed in today's community meeting:

  • INFO: we would like to gather more info on decompression speed and invite people to try out the zstd vs xz paths on their systems and report results. (@jlebon:fedora.im, 16:59:56)
  • INFO: this would require adding zstd image decompression support to coreos-installer (@jlebon:fedora.im, 17:06:04)

@jbtrystram
Copy link
Contributor

jbtrystram commented Feb 12, 2024

I did some additionnal testing on the live qemu file

Level Time Compress Time Decompress qemu Uncompressed qemu Compressed
19 6m3.53s 0m1.66s 1611MiB 722MiB
14 1m16.84s 0m1.37s 1611MiB 763MiB

Another note: zstd was not installed in my f39 toolbox by default

@baude
Copy link
Contributor

baude commented Feb 14, 2024

As a corollary, I also did some testing yesterday with the qemu image. The datasize compressed and uncompressed (cols 4 &5) were equivalent. No surprise there. Where the results differed for me was the decompression time. Mine was consistently double that. Were you passing any additional command-line switches?

@jbtrystram
Copy link
Contributor

Where the results differed for me was the decompression time. Mine was consistently double that. Were you passing any additional command-line switches?

Simply running unzstd

@Cyan4973
Copy link

Given that the image files tested are very large,
an interesting zstd option worth trying is --long,
giving the complete command : zstd -10 -T0 --long.
It may help detect repetitions (like near-identical files in the archive) at long distance.

@jlebon
Copy link
Member

jlebon commented Apr 3, 2024

So... with the recent xz news, a lot of trust was lost in that project. Apart from the other benefits listed in this ticket, switching to zstd would now also avoid forcing people to use xz to use our artifacts if they're not comfortable with that.

@baude
Copy link
Contributor

baude commented Apr 3, 2024

we have been using zstd with fcos images in podman machine now for a couple of months. lots of upside comments about the quicker decompression

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants