Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch default Zip compression level from 6 to 4 #1125

Merged
merged 1 commit into from
Oct 7, 2021

Conversation

aras-p
Copy link
Contributor

@aras-p aras-p commented Aug 5, 2021

Kinda related to #1002 but this one does not add any new libraries or compressors. After #1149 added controllable zip compression levels, this makes the default zip level change; from 6 to 4.

In my test of 18 various exr files (total raw data size 1057MB):

  • zip 6: 2.452x ratio, 206MB/s compression
  • zip 4: 2.421x ratio, 437MB/s compression

So a tiny drop of compression ratio, but compression is more than 2x faster. This makes writing zip faster than writing uncompressed (386MB/s). Decompression speed unaffected.

A longer writeup of all this, with data graphs: https://aras-p.info/blog/2021/08/05/EXR-Zip-compression-levels/

Screenshot 2021-08-05 at 09 24 57

@cary-ilm
Copy link
Member

cary-ilm commented Aug 5, 2021

Looks like ImfStandardAttributes.h is missing #include "ImfIntAttribute.h", which leads to the clang link error.

@lgritz
Copy link
Contributor

lgritz commented Aug 5, 2021

This is a very exciting change! The prospect of doubling write speed with a few lines of code change, no user intervention, and barely any detectable change in compression ratio is crazy awesome.

@peterhillman
Copy link
Contributor

This does look promising. Could I suggest running these performance comparison tests on the standard OpenEXR test images as well as your larger test set so there's a more self-contained reproducible test?
https://github.com/AcademySoftwareFoundation/openexr-images/tree/master/ScanLines
https://github.com/AcademySoftwareFoundation/openexr-images/tree/master/Tiles

Also, somewhat pedantic point that might need clarification in comments: the presence of a zipCompressionLevel attribute in a file won't guarantee the file was compressed with that level of compression, or even that it was written with ZIP compression. A ZIP file might be read in and written out again as PIZ preserving all metadata, so the zipCompressionLevel attribute is preserved but ignored. Equally an older version of the library might read in a ZIP file and write it back out at again. The old library won't interpret the zipCompressionLevel and will use the old default value, so there will be an unexpected change in file size.
That behavior seems fine to me, but it might cause confusion.

@aras-p
Copy link
Contributor Author

aras-p commented Aug 6, 2021

Could I suggest running these performance comparison tests on the standard OpenEXR test images as well as your larger test set so there's a more self-contained reproducible test?

Here's writing performance on the 11 files (76MB raw pixel data) from openexr-images ScanLines and Tiles folders:
Screenshot 2021-08-06 at 07 32 26

The files are much smaller, and I guess that's why the write performance is lower overall, compared to the larger image set.

  • Uncompressed: 1.000x ratio, 343MB/s
  • Zip 6: 2.063x ratio, 166MB/s
  • Zip 4: 2.035x ratio, 290MB/s

So here changing Zip level from 6 to 4 does not quite make it 2x faster, but still a good speed up, for a really small ratio decrease.

Also, somewhat pedantic point that might need clarification in comments: the presence of a zipCompressionLevel attribute in a file won't guarantee the file was compressed with that level of compression, or even that it was written with ZIP compression

Right. Then the same gotcha should arguably be clarified for the already existing dwaCompressionLevel attribute. To me, using header attributes to pass compression level settings does feel a bit strange, TBH -- this is purely data for the compression process; the decompression does not need it at all, and one could argue that it should not be stored into the file header. But the API does not allow passing data to the compressor in any other way, that's why I guess Dwa passes it through a header attribute. I did the same for consistency.

@peterhillman
Copy link
Contributor

Thanks for that graph: that's still a sizeable chunk of performance gain.

Yes, good point about dwaCompressionLevel. Indeed, dwaCompressionLevel actually doesn't really tell you much at all. It controls how lossy the data was the last time the data was written, but doesn't tell you if an earlier step introduced more compression error. To understand how much compression error there is in an image you need to know the entire processing chain.

However, dwaCompressionLevel has been in OpenEXR since DWA compression was introduced, so no versions of the library that wrote DWA ignored the attribute. I do think it's worth clarifying that zipCompressionLevel is new so might have been ignored by the version of the OpenEXR library that compressed the data.

@aras-p
Copy link
Contributor Author

aras-p commented Aug 6, 2021

However, dwaCompressionLevel has been in OpenEXR since DWA compression was introduced, so no versions of the library that wrote DWA ignored the attribute

Good point, haven't thought of that! Something like f4036be1c then perhaps?

@kdt3rd
Copy link
Contributor

kdt3rd commented Aug 30, 2021

Do we really want the compression level as persistent attributes? as part of ImfHeader, sure, but the compression level doesn't seem right to have in the attribute list. I have identical feelings about both dwaCompressionLevel and zipCompressionLevel - both seems like something we should remove from the attribute list, and have the ImfHeader have an ephemeral list of options to control the compression (we can re-use the attribute type system for genericity)?

@karlrasche
Copy link
Contributor

karlrasche commented Aug 30, 2021 via email

@aras-p
Copy link
Contributor Author

aras-p commented Aug 31, 2021

Do we really want the compression level as persistent attributes

It does feel a bit wrong to me too, but here I was just following what DWA compression did. Some sort of "non-persistent options" thing would feel better indeed, but that's perhaps a larger scope task/effort/decision.

It still feels to be that changing zlib level from current 6 to 4 (even without any attributes to control it) would be worth it. Compression gets 2x faster, at a tiny loss of ratio.

@lgritz
Copy link
Contributor

lgritz commented Aug 31, 2021

We discussed this at length at the last TSC meeting (which you are very welcome to join, they are open zoom calls).

We're very excited about these optimizations and wish to implement them, and 100% agree that doubling speed with just a tiny change in compression ratio is definitely the right thing to do.

We thought that just changing the level internally with no runtime control was sort of the lazy way out, and we preferred to make it externally controllable so that (a) we can more easily test across a range of images to verify that we're replicating your results before changing the default for everybody, (b) others can try all of the compression levels without altering or recompiling the source, as different users and their cases may have different preferences about what compression vs speed is the right trade-off, (c) we feared that there might be users who require bit-exact files, not just bit-exact pixels (for testing/validation replicability?) and so an option to keep the old behavior may ease the transition for them, even if they agree that 4 should be the default moving forward.

@kdt3rd
Copy link
Contributor

kdt3rd commented Aug 31, 2021

yes, yes, my comments are only about the mechanism, not that we should avoid the change :)

I will try to prototype an api to add to ImfHeader and exr_context that we can comment on and once happy we can merge the efforts. Thanks again for bringing this up!

@lgritz
Copy link
Contributor

lgritz commented Sep 26, 2021

@aras-p Do you recall whether your tests results posted here and on your blog here were using multithreading or not? I don't see any reference to the thread count. And also not 100% sure if you were doing the full I/O, or doing some trick like writing to /dev/null to minimize I/O overhead in the timing.

I tried doing an analogous change in OpenImageIO, but for TIFF files, and got good results, though not as good. I tested with 100 images randomly selected from the Animal Logic Lab USD scene (they are almost all 4k^2, a mix of 1- and 3-channel images), converted from half to uint16 TIFF images. Going from zip level 6 to level 4 resulted in 3% bigger files, and around 40% faster write times.

It's not apples-to-apples versus your tests... uint16 vs half, TIFF vs OpenEXR (different overheads), the texture files I used may have different compression characteristics than the set of images you used. And I was benchmarking using a real image manipulation program, not merely a test harness, so there was probably other overhead mixed in. Oh, and you said y our files were all 4 channel RGBA (and the thumbnails you showed looked like maybe A=1.0 everywhere), versus mine that were a mix of 1- and 3-channel images, so dunno if that also may have been an important difference.

But another thing that occurred to me is threading -- I was using 32 threads when I tested, because I mostly care about performance on our typical production workstations. But writing files consists of both compression (fairly well parallelized by threads), and the actual I/O (serialized). So I was thinking that if your tests were single threaded, or had a low thread count, the improvement in compression time would appear larger to you because it would have represented a larger portion of your total runtime.

Switching the default and a ~3% larger disk space still seems like a more than fair tradeoff for a 40% improvement in write speed, but I was curious to poke a bit more about why I was seeing less than the 2x you reported.

@lgritz
Copy link
Contributor

lgritz commented Sep 26, 2021

@aras-p After looking at your previous blog post and comparing the charts, I think I figured out that you are reporting results with 16 threads?

@dekekincaid
Copy link

It would be nice to have similar controls for dwaa/dwab lossless channels which use zip compression.

@dekekincaid
Copy link

It would help if I read the code before commenting. Ignore me, this is already done in the request.

@aras-p
Copy link
Contributor Author

aras-p commented Sep 27, 2021

After looking at your previous blog post and comparing the charts, I think I figured out that you are reporting results with 16 threads?

@lgritz yes, on a 2019 MacBookPro with 16 hardware threads. My guess for why my numbers are slightly different: fewer threads in my case, so CPU time is more important; and Macs tend to have really fast SSDs, so IO time is less important.

In my test of 18 various exr files (total raw data size 1057MB):
- zip 6: 2.452x ratio, 206MB/s compression
- zip 4: 2.421x ratio, 437MB/s compression

So a tiny drop of compression ratio, but compression is more than 2x faster. This makes writing zip faster than writing uncompressed (386MB/s). Decompression speed unaffected.

Signed-off-by: Aras Pranckevicius <aras@unity3d.com>
@aras-p aras-p changed the title RFC: switch Zip compression level from 6 to 4 by default, and add attribute to control it Switch default Zip compression level from 6 to 4 Sep 27, 2021
Copy link
Contributor

@kdt3rd kdt3rd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure how we should document all this...

@kdt3rd kdt3rd merged commit 9d05d50 into AcademySoftwareFoundation:master Oct 7, 2021
@kdt3rd
Copy link
Contributor

kdt3rd commented Oct 7, 2021

Thanks!

cary-ilm pushed a commit to cary-ilm/openexr that referenced this pull request Oct 14, 2021
…areFoundation#1125)

In my test of 18 various exr files (total raw data size 1057MB):
- zip 6: 2.452x ratio, 206MB/s compression
- zip 4: 2.421x ratio, 437MB/s compression

So a tiny drop of compression ratio, but compression is more than 2x faster. This makes writing zip faster than writing uncompressed (386MB/s). Decompression speed unaffected.

Signed-off-by: Aras Pranckevicius <aras@unity3d.com>
cary-ilm pushed a commit that referenced this pull request Oct 24, 2021
In my test of 18 various exr files (total raw data size 1057MB):
- zip 6: 2.452x ratio, 206MB/s compression
- zip 4: 2.421x ratio, 437MB/s compression

So a tiny drop of compression ratio, but compression is more than 2x faster. This makes writing zip faster than writing uncompressed (386MB/s). Decompression speed unaffected.

Signed-off-by: Aras Pranckevicius <aras@unity3d.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants