Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved image compression algorithm #445

Closed
bgilbert opened this issue Mar 28, 2019 · 20 comments · Fixed by #558
Closed

Improved image compression algorithm #445

bgilbert opened this issue Mar 28, 2019 · 20 comments · Fixed by #558
Assignees
Labels
enhancement New feature or request

Comments

@bgilbert
Copy link
Contributor

gzip is not very good by modern standards. xz seems to be the go-to algorithm; on the other hand it uses a comparatively large amount of memory and is not very robust. Container Linux uses bzip2.

@bgilbert bgilbert added the enhancement New feature or request label Mar 28, 2019
@bgilbert bgilbert added this to Proposed in Fedora CoreOS papercuts via automation Mar 28, 2019
@ajeddeloh
Copy link
Contributor

The new hotness is zstd but perhaps it's too new. I think the memory concerns aren't too much of an issue these days.

@lucab
Copy link
Contributor

lucab commented Mar 28, 2019

Should we care about bit-by-bit reproducible compression too?

@darkmuggle
Copy link
Contributor

darkmuggle commented Mar 28, 2019

I would vote for xz and its fast on the decompression for clients. The downside, though, that the compressor is slow. The concerns about XZ are academic in our case unless we are planning on long term storage. My rationale for xz is that is popular in the opensource world.

@dustymabe
Copy link
Member

dustymabe commented May 6, 2019

yeah my understanding was that xz used a lot of memory and was slow. This could cause problems for some constrained testing machines (i.e. local test VMs, not really a concern in real world use cases) or on first install if it takes a long time to decompress it could be perceived as a bad experience.

That being said, I LOVE the compression ratio xz gives. @darkmuggle are you saying that the memory and speed concerns are only an issue when compressing and not when decompressing?

@bgilbert
Copy link
Contributor Author

bgilbert commented May 8, 2019

xz(1) has a table giving estimated memory usage. It says xz -9 should take ~65 MiB during decompression.

@bgilbert
Copy link
Contributor Author

The concerns about XZ are academic in our case unless we are planning on long term storage.

Well... we're releasing official binaries, so this is the sort of thing we're supposed to care about. (I'd argue that's still true, notwithstanding that those binaries will go stale pretty quickly.) We'll be signing our artifacts, which sidesteps the robustness issues, but the interoperability problems are still of concern.

I'd tend to agree, though, that we shouldn't fight an uphill battle here. xz is pervasive, and other Fedora release artifacts already use it. (Also, precedent: ZIP files have interoperability problems and everyone uses them anyway.)

@dustymabe
Copy link
Member

Unless someone comes up with a good reason not to, I'm +1 for moving to xz.

@miabbott
Copy link
Member

miabbott commented May 20, 2019

Unless someone comes up with a good reason not to, I'm +1 for moving to xz.

RHCOS is providing artifacts for OpenStack and bare metal as gzipped compressed; my only concern is making sure the consumers of said artifacts (like the OpenShift installer or RHHI) are updated accordingly whenever RHCOS starts using a cosa closer to master.

(Yeah, there's a lot of conditionals there, but it's something that will easily get overlooked)

That being said, I'm not opposed to this change.

@darkmuggle
Copy link
Contributor

Rather than enforcing the change from gz to xz, why not allow it to be a user choice, for cmd-compress?

@bgilbert
Copy link
Contributor Author

I was thinking it could be an image.yaml setting.

@cgwalters
Copy link
Member

RHCOS is providing artifacts for OpenStack and bare metal as gzipped compressed; my only concern is making sure the consumers of said artifacts (like the OpenShift installer or RHHI) are updated accordingly whenever RHCOS starts using a cosa closer to master.

Right exactly, switching now would almost certainly break some use cases. We'd need to ensure we've adapted those consumers ahead of time to e.g. detect the filename suffix and use the right decompressor or whatever.

Probably starting here...

@dustymabe
Copy link
Member

The new hotness is zstd but perhaps it's too new.

Interestingly enough I just saw this proposal for moving to ztsd for rpm payloads in Fedora 31. The decompression speed is pretty impressive.

@bgilbert
Copy link
Contributor Author

bgilbert commented Jun 7, 2019

I don't think this affects our course of action at all, but coreos/bugs#2589 presents a cautionary tale.

@bgilbert bgilbert added this to Proposed in Fedora CoreOS preview via automation Jun 13, 2019
@bgilbert bgilbert removed this from Selected in Fedora CoreOS papercuts Jun 13, 2019
zonggen pushed a commit to zonggen/coreos-assembler that referenced this issue Jun 18, 2019
This change switches FCOS to produce XZ-compressed output artifacts,
but keeps using gzip for RHCOS.

Closes: coreos#445
zonggen pushed a commit to zonggen/coreos-assembler that referenced this issue Jun 18, 2019
This change switches FCOS to produce XZ-compressed output artifacts,
but keeps using gzip for RHCOS.

Changes include generating coreos-assembler-config.tar.xz instead of
coreos-config.tar.gz and compressing using `xz -9` instead of `gzip`.

Closes: coreos#445
zonggen pushed a commit to zonggen/coreos-assembler that referenced this issue Jun 18, 2019
This change switches FCOS to produce XZ-compressed output artifacts,
but keeps using gzip for RHCOS.

Changes include generating coreos-assembler-config.tar.xz instead of
coreos-config.tar.gz and compressing using `xz -9` instead of `gzip`.

Closes: coreos#445
zonggen pushed a commit to zonggen/coreos-assembler that referenced this issue Jun 19, 2019
This change switches FCOS to produce XZ-compressed output artifacts,
but keeps using gzip for RHCOS.

Changes include generating coreos-assembler-config.tar.xz instead of
coreos-config.tar.gz and parameterizing the compression algorithm
option and hardcoding a default value `xz`

Closes: coreos#445
zonggen pushed a commit to zonggen/coreos-assembler that referenced this issue Jul 4, 2019
This change switches FCOS to produce XZ-compressed output artifacts,
but keeps using gzip for RHCOS.

Changes include parameterizing the compression algorithm option and
setting the default value to `xz`.

Closes: coreos#445
zonggen pushed a commit to zonggen/coreos-assembler that referenced this issue Jul 4, 2019
This change switches FCOS to produce XZ-compressed output artifacts,
but keeps using gzip for RHCOS.

Changes include parameterizing the compression algorithm option and
setting the default value to `xz`.

Closes: coreos#445
zonggen pushed a commit to zonggen/coreos-assembler that referenced this issue Jul 8, 2019
Allows FCOS to support XZ-compressed output artifacts. Changes
include parameterizing the compression algorithm option and
setting the default value to `gzip`.

Closes: coreos#445
zonggen pushed a commit to zonggen/coreos-assembler that referenced this issue Jul 8, 2019
Allows FCOS to support XZ-compressed output artifacts. Changes
include parameterizing the compression algorithm option and
setting the default value to `gzip`.

Closes: coreos#445
jlebon pushed a commit that referenced this issue Jul 8, 2019
Allows FCOS to support XZ-compressed output artifacts. Changes
include parameterizing the compression algorithm option and
setting the default value to `gzip`.

Closes: #445
@bgilbert
Copy link
Contributor Author

bgilbert commented Jul 8, 2019

There are a couple bits remaining:

  • Updating the pipeline to pass --algorithm xz
  • Specifying the compressor in a config file rather than on the command line

@bgilbert bgilbert reopened this Jul 8, 2019
Fedora CoreOS preview automation moved this from Proposed to In Progress Jul 8, 2019
jlebon added a commit to jlebon/fedora-coreos-pipeline that referenced this issue Jul 9, 2019
@jlebon
Copy link
Member

jlebon commented Jul 9, 2019

Updating the pipeline to pass --algorithm xz

coreos/fedora-coreos-pipeline#87

Specifying the compressor in a config file rather than on the command line

Hmm, let's make that part of another ticket instead, e.g. #531?

jlebon added a commit to coreos/fedora-coreos-pipeline that referenced this issue Jul 9, 2019
@jlebon
Copy link
Member

jlebon commented Jul 9, 2019

We got our first FCOS build (30.309) compressed with xz:

$ for id in 30.308 30.309; do
> size=$(curl -sL https://builds.coreos.fedoraproject.org/prod/streams/testing/builds/$id/meta.json | jq .images.qemu.size)
> echo $id qemu size: $(($size / (1024*1024)))M
> done
30.308 qemu size: 602M
30.309 qemu size: 383M

Very nice!

@jlebon
Copy link
Member

jlebon commented Jul 12, 2019

As expected, a major downside of FCOS switching to xz is that iterating on the pipeline is now much more painful. (And yes, we could skip compression or go back to gz in the developer case, though not matching production doesn't help e2e testing either, e.g. coreos-installer).

@arithx
Copy link
Contributor

arithx commented Jul 12, 2019

jlebon: Maybe dropping the default developer case to gz but having it be a configurable option in the job itself?

@bgilbert bgilbert moved this from In Progress to Done in Fedora CoreOS preview Jul 16, 2019
@dustymabe
Copy link
Member

Updating the pipeline to pass --algorithm xz

coreos/fedora-coreos-pipeline#87

merged

Specifying the compressor in a config file rather than on the command line

Hmm, let's make that part of another ticket instead, e.g. #531?

If we're letting that be handled by #531 then we can close this?

@jlebon
Copy link
Member

jlebon commented Aug 12, 2019

If we're letting that be handled by #531 then we can close this?

Agreed, closing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

10 participants