Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OxiPNG Zopfli compression seems to be overly slow compared to zopflipng's #414

Open
AlexTMjugador opened this issue Jul 31, 2021 · 19 comments
Labels
T-Performance Relates to slowness in areas of the software or ideas to improve performance

Comments

@AlexTMjugador
Copy link
Collaborator

AlexTMjugador commented Jul 31, 2021

While using OxiPNG with the Zopfli compression mode, I noticed that some images took an unusually long time to compress. Of course, this is somewhat expected due to the usage of Zopfli compression, which is slow by design. However, some quick benchmarks against zopflipng, with the most similar compression settings possible, showed that zopflipng is both faster and more effective at optimizing images than OxiPNG with Zopfli compression, which is an interesting result.

In particular, to try to get an apples to apples comparison as much as possible, I fixed the following parameters:

  • A single thread (because zopflipng is not multithreaded).
  • A single filter.
  • A fixed number of iterations, 15 (as zopflipng by default changes the number of compression iterations depending on the file size, while OxiPNG always uses 15 iterations).

Of course, I have always used the same unprocessed image, input.png.

The results were as follows:

$ time target/release/oxipng -v -t1 -f0 -Z input.png
Processing: input.png
    2048x2048 pixels, PNG format
    4x8 bits/pixel, RGBA
    IDAT size = 419799 bytes
    File size = 422604 bytes
Trying: 1 combinations
    zc = 0  zs = 0  f = 0        399505 bytes
Found better combination:
    zc = 0  zs = 0  f = 0        399505 bytes
    IDAT size = 399505 bytes (20294 bytes decrease)
    file size = 402310 bytes (20294 bytes = 4.80% decrease)
Output: input.png
target/release/oxipng -v -t1 -f0 -Z input.png output.png  513,45s user 0,16s system 99% cpu 8:34,03 total
$ time zopflipng --iterations=15 --filters=0 -y input.png output.png
Optimizing input.png
Input size: 422604 (412K)
Result size: 399568 (390K). Percentage of original: 94.549%
Result is smaller

zopflipng --iterations=15 --filters=0 -y input.png output.png  252,55s user 0,09s system 99% cpu 4:13,16 total

These results show that, for the same input image, zopflipng was 2 times faster than OxiPNG, while also managing to compress the image a bit more.

I believe that these results are interesting, because some quick println! debugging showed that OxiPNG expends most of its execution time in this function; more exactly, in the zopfli::compress call:

pub fn zopfli_deflate(data: &[u8]) -> PngResult<Vec<u8>> {
use std::cmp::max;
let mut output = Vec::with_capacity(max(1024, data.len() / 20));
let options = zopfli::Options::default();
match zopfli::compress(&options, &zopfli::Format::Zlib, data, &mut output) {
Ok(_) => (),
Err(_) => return Err(PngError::new("Failed to compress in zopfli")),
};
output.shrink_to_fit();
Ok(output)
}

And both PNG optimization programs use Zopfli for compression, with the zopfli crate being a straightforward translation of the original Zopfli C code to Rust, which should have similar performance (and more quick tests show that the zopfli crate binary has similar performance to compress files to the upstream Zopfli binary). So the compression algorithm itself seems to not be to blame, neither its implementation in Rust, but for some reason OxiPNG still is much more slower than zopflipng, while both programs should end up compressing similar amounts of pixel data.

Has anyone else managed to reproduce this performance difference? What might be causing it?

@AlexTMjugador
Copy link
Collaborator Author

AlexTMjugador commented Aug 9, 2021

I've just realized that the zopfli repo has this related issue, so I'd say this low performance is reproducible, and it basically stayed the same during these years: carols10cents/zopfli#37. However, these findings seem to indicate that zopflipng has to use the Zopfli compression routine in a different way, because otherwise I see no reasonable explanation for these differences.

@Yay295
Copy link

Yay295 commented Oct 3, 2021

Considering neither carols10cents/zopfli nor dfrankland/zopfli-rs have been updated since 2018, and that zopfli-rs appears to perform slightly better than zopfli, it might be worth switching from zopfli to zopfli-rs. It wouldn't fix this issue, but it seems like it should give a slight performance improvement.

shssoichiro added a commit that referenced this issue Sep 5, 2022
* Update and optimize dependencies

These changes update the dependencies to their latest versions, fixing
some known issues that prevented doing so in the first place.

In addition, the direct dependency on byteorder was dropped in favor
of stdlib functions that have been stabilized for some time in Rust, and
the transitive dependency on chrono, pulled by stderrlog, was also
dropped, which had been affected by security issues and improperly
maintained in the past:

- cardoe/stderrlog-rs#31
- https://www.reddit.com/r/rust/comments/ts84n4/chrono_or_time_03/

* Run rustfmt

* Bump MSRV to 1.56.1

Updating to this patch version should not be cumbersome for end-users,
and it is required by a transitive dependency.

* Bump MSRV to 1.57.0

os_str_bytes requires it.

* Add initial support for changing Zopfli iterations

PR #445 did some dependency
updates, which included using the latest zopfli version. The latest
version of this crate exposes new options in its API that allow users to
choose the desired number of Zopfli compression iterations, which
may greatly affect execution time. In fact, other optimizers such as
zopflipng dynamically select this number depending on the input file
size (see: #414).

As a first step towards making OxiPNG deal with Zopfli better, let's add
the necessary options for libraries to be able to choose the number of
iterations. This number is still fixed to 15 as before when using the
CLI.

* Fix Clippy lint

Co-authored-by: Josh Holmer <jholmer.in@gmail.com>
@Stefan1200-de
Copy link

Stefan1200-de commented Oct 29, 2022

I recently switched from OptiPNG to OxiPNG (primary using it within GreenShot). I did some benchmarking using OptiPNG and OxiPNG while comparing my old and new CPU. I switched from an Intel i7 7700k (4 cores with SMT) to an AMD Ryzen 5800X3D (8 cores with SMT) some days ago.

While I get around 9% more speed with my Ryzen using OptiPNG (which is expected, because it only use one core), I only got 10% more speed with my Ryzen using OxiPNG (which is more or less only the faster single core performance of the new Ryzen). So I guess OxiPNG is only using up to 4 cores. Well, for me this is still a big step forward, because OxiPNG is around 5-times faster here than OptiPNG. But in fact I was expecting that OxiPNG will be around ~120% faster on my new Ryzen CPU, and not only 10%.

So my guess: zopflipng is twice as fast than OxiPNG, because it use up to 8-cores? It would be interesting to know which CPU @AlexTMjugador used to create the time measures in the first post, maybe he is using an 8-cores CPU too. Maybe there is some potential to split the workload of OxiPNG into more threads?

@AlexTMjugador
Copy link
Collaborator Author

AlexTMjugador commented Oct 29, 2022

So my guess: zopflipng is twice as fast than OxiPNG, because it use up to 8-cores? It would be interesting to know which CPU @AlexTMjugador used to create the time measures in the first post, maybe he is using an 8-cores CPU too. Maybe there is some potential to split the workload of OxiPNG into more threads?

I took those numbers on an old Intel Core i3-2100 CPU clocked at 3,2 GHz (BCLK overclock), which has two cores with SMT.

About the threading speedup, If I recall correctly, OxiPNG spawns one thread per optimization strategy it tries, and the set of possible strategies is fixed depending on the image and options. Therefore, more hardware threads will only help up to the point where there are no more strategies to try in parallel. If there are more hardware threads than strategies, you won't get any speedup.

Also, as I stated before, the vanilla zopflipng does not use several threads at all (maybe OptiPNG does, however), so I don't think that threading explains the performance difference here.

@Stefan1200-de
Copy link

Also, as I stated before, the vanilla zopflipng does not use several threads at all (maybe OptiPNG does, however), so I don't think that threading explains the performance difference here.

Oh, okay.

Today I tried to use the same OxiPNG arguments as you on my system. I'm a little bit surprised about the -Z argument. It takes around 4 minutes on my system and produce much bigger PNG files than with the default OxiPNG settings, which just needs 2,8 seconds on my system. Why I should use the -Z argument, if it takes 85-times longer to produce bigger PNG files? What is the advantage of -Z?

@AlexTMjugador
Copy link
Collaborator Author

AlexTMjugador commented Oct 29, 2022

Why I should use the -Z argument, if it takes 85-times longer to produce bigger PNG files? What is the advantage of -Z?

The -Z option makes OxiPNG use zopfli to compress the contents of the PNG image data (IDAT) chunk, which is a DEFLATE compressor that prioritizes compression over performance. This is in contrast to the default compressor, which cares about balancing compression and performance more.

However, OxiPNG tries less optimization strategies when Zopfli compression is enabled to compensate for the increased compression cost. It looks like your PNGs are more effectively optimized by trying several strategies with a not-so-extreme compressor than by trying fewer strategies with a better compressor 😉

@Stefan1200-de
Copy link

In my test case I used a PNG file with 8.682.324 Bytes. The -Z argument compressed that image down to 8.325.841 Bytes. Without the -Z argument, OxiPNG compressed that file to 6.348.630 Bytes, 85-times faster. So I stay without the -Z argument. ;)

@AlexTMjugador
Copy link
Collaborator Author

Would you mind sharing that image? Maybe it's useful to investigate if something is going wrong with Zopfli here 😄

@Stefan1200-de
Copy link

Stefan1200-de commented Oct 29, 2022

Well, in fact it is a desktop screenshot (3 displays, 6400x1440 pixels) with an background picture and desktop icons, created with Greenshot. I would share it in private with an developer, but not in public here.

@XhmikosR
Copy link
Contributor

XhmikosR commented Apr 3, 2023

There's a new zopfli patch version with some performance improvements. Could someone provide a PR, please?

I'm not familiar with rust myself, but I started using oxipng and noticed the slowness myself too.

Thanks!

@AlexTMjugador
Copy link
Collaborator Author

There's a new zopfli patch version with some performance improvements. Could someone provide a PR, please?

The performance improvements of the new zopfli version are minor. They do not come close to closing the significant runtime gap observed in this issue. Further improvement is needed in this regard, and I would like to release more patch versions that continue improving performance in the not-too-distant future, if my free time permits.

But sure, I could open a PR to update the Zopfli version that OxiPNG locks in its Cargo.lock file, if that would make you happy 😄

@XhmikosR
Copy link
Contributor

XhmikosR commented Apr 3, 2023

I don't mind if there are no gains :)

I thought there was a small improvement already, that's why I asked for it.

I'm glad to hear you have the issue under your radar, and just ignore my comment above :)

@parasew
Copy link

parasew commented Apr 8, 2023

There's a new zopfli patch version with some performance improvements. Could someone provide a PR, please?

Patch with updated and tested crates (libdeflater, zopfli, clap) is #495

@andrews05
Copy link
Collaborator

Big thanks to @AlexTMjugador for the latest zopfli updates! Have you tried running that same input.png you mentioned in the first post?

@AlexTMjugador
Copy link
Collaborator Author

AlexTMjugador commented May 28, 2023

I didn't try that yet, but I could give it a shot in a few days or so 😄

Thanks for the reminder by the way!

@AlexTMjugador
Copy link
Collaborator Author

I tried to replicate the experiment described in my original issue comment as closely as possible, but I had to use a different input image and run the commands on a much faster computer, so the exact performance and size reduction figures are not comparable. Nevertheless, I think that the results are interesting, so I'm sharing them.

time ./oxipng_latest_86fccf0 -v -t1 -f0 -Z --out /dev/null input.png

This binary was generated with cargo build --release from the latest commit, 86fccf0.

Processing: input.png
    1920x1080 pixels, PNG format
    8-bit RGB + Alpha, non-interlaced
    IDAT size = 2161881 bytes
    File size = 2165094 bytes
Reducing image to 8-bit RGB, non-interlaced
Trying: 1 filters
    IDAT size = 2161881 bytes (0 bytes decrease)
    file size = 2161938 bytes (3156 bytes = 0.15% decrease)
Output: /dev/null
./oxipng_latest_86fccf0 -v -t1 -f0 -Z --out /dev/null input.png  9.78s user 0.02s system 99% cpu 9.797 total

time ./oxipng_v8.0.0 -v -t1 -f0 -Z --out /dev/null input.png

This binary was generated with cargo build --release from the commit with the v8.0.0 tag, using the same toolchain as in the previous build.

Processing: input.png
    1920x1080 pixels, PNG format
    4x8 bits/pixel, RGB + Alpha (non-interlaced)
    IDAT size = 2161881 bytes
    File size = 2165094 bytes
Reducing image to 3x8 bits/pixel, RGB (non-interlaced)
Trying: Bigrams 
    zc = 0  f = Bigrams   1905541 bytes
Found better combination:
    zc = 0  f = Bigrams   1905541 bytes
    IDAT size = 1905541 bytes (256340 bytes decrease)
    file size = 1905598 bytes (259496 bytes = 11.99% decrease)
Output: /dev/null
./oxipng_v8.0.0 -v -t1 -f0 -Z --out /dev/null input.png  84.09s user 0.02s system 99% cpu 1:24.11 total

time ./oxipng_v5.0.0 -v -t1 -f0 -Z --out /dev/null input.png

This binary was generated with cargo build --release from the commit with the v5.0.0 tag, using the same toolchain as in the previous build. This build is likely to be very similar to the one I ran when I wrote this issue.

Processing: input.png
    1920x1080 pixels, PNG format
    4x8 bits/pixel, RGBA
    IDAT size = 2161881 bytes
    File size = 2165094 bytes
Reducing image to 3x8 bits/pixel, RGB
Trying: 1 combinations
    zc = 0  zs = 0  f = 0        2886205 bytes
    IDAT size = 2161881 bytes (0 bytes decrease)
    file size = 2161938 bytes (3156 bytes = 0.15% decrease)
Output: /dev/null
./oxipng_v5.0.0 -v -t1 -f0 -Z --out /dev/null input.png  10.79s user 0.07s system 100% cpu 10.852 total

time zopflipng --iterations=15 --filters=0 -y input.png /dev/null

The zopflipng binary used here is the one packaged by Debian Bullseye.

Optimizing input.png
Input size: 2165094 (2114K)
Result size: 2886262 (2818K). Percentage of original: 133.309%
Preserving original PNG since it was smaller

zopflipng --iterations=15 --filters=0 -y input.png /dev/null  8.35s user 0.04s system 99% cpu 8.394 total

In light of these results, my conclusions are:

  • The Zopfli algorithm can be extremely sensitive to the PNG filter being used. Most of OxiPNG's performance difference between the latest commit and v8.0.0 for this image is due to -f0 choosing a different filter strategy. If the -f0 parameter is replaced with -f7 to force the use of the bigrams filter strategy in the latest build, which was the strategy chosen in v8.0.0, the resulting file size matches (259496 bytes, 11.99% decrease), but the runtime jumps to 78.46 s. This is still ~6.7% faster, though.
  • As expected, Zopfli got a tad bit faster overall when using the same filters.
    • zopflipng is still faster somehow, but the gap has narrowed, and in this case it was not able to reduce the image size at all, unlike OxiPNG.

@andrews05
Copy link
Collaborator

That is... surprising 😯
What happens if you run the result of -f7 through zopflipng, using --filters=p to keep the filter?

@AlexTMjugador
Copy link
Collaborator Author

AlexTMjugador commented Jun 2, 2023

That is... surprising hushed What happens if you run the result of -f7 through zopflipng, using --filters=p to keep the filter?

When I run the result of ./oxipng_latest_86fccf0 -v -t1 -f7 -Z --out output.png input.png through time zopflipng --iterations=15 --filters=p -y output.png /dev/null on the same environment, I get the following result:

Optimizing output.png
Input size: 1905598 (1860K)
Result size: 1905586 (1860K). Percentage of original: 99.999%
Result is smaller

zopflipng --iterations=15 --filters=p -y output.png /dev/null  33.55s user 0.03s system 99% cpu 33.581 total

So zopflipng is also pretty sensitive to the input data distribution, although it still is somewhat better than OxiPNG in some circumstances. The Zopfli algorithm can do lots of slice accesses, so I'm wondering if bounds checks may be causing additional slowdown here... It's time to profile things again 😄

@andrews05
Copy link
Collaborator

Hm, twice as fast. Good luck with the profiling, it would be amazing if you could close that gap 😁

@andrews05 andrews05 added the T-Performance Relates to slowness in areas of the software or ideas to improve performance label Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-Performance Relates to slowness in areas of the software or ideas to improve performance
Projects
None yet
Development

No branches or pull requests

6 participants