Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API request : CompressionLevel.Highest #1549

Closed
ericstj opened this issue Mar 25, 2019 · 11 comments · Fixed by #41960
Closed

API request : CompressionLevel.Highest #1549

ericstj opened this issue Mar 25, 2019 · 11 comments · Fixed by #41960
Labels
api-approved API was approved in API review, it can be implemented area-System.IO.Compression help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@ericstj
Copy link
Member

ericstj commented Mar 25, 2019

Currently we only have CompressionLevel.NoCompression, CompressionLevel.Fastest, CompressionLevel.Optimal.

As discussed in https://github.com/dotnet/corefx/issues/34157 and other issues CompressionLevel.Optimal is not the "best" compression. It's a trade-off of CPU/resources vs final compressed size. In some cases the result is much worse than the best a compression algorithm can do.

Let's let the CompressionLevel.Optimal continue to be the goldilocks option, and add CompressionLevel.Highest for those who desire the absolute best compression, regardless of resource consumption.

@saucecontrol
Copy link
Member

If Optimal means "balanced", then the documentation is misleading

The compression operation should be optimally compressed, even if the operation takes a longer time to complete.

And the interpretation of that value is inconsistent between the zlib and Brotli implementations. In https://github.com/dotnet/corefx/issues/29555 it was concluded that Optimal meant best compression, even if it takes 200x longer, which Brotli at q11 frequently does.

I'd suggest either:

  1. Update the docs to reflect the new interpretation of Optimal and update the Brotli compressor to use a more sensible value for Optimal (probably 6 or 7). Then add Highest as suggested.
  2. Add a new Balanced enum value that would map to sensible values in both compressors and update the zlib compressor to use level 9 for Optimal

@ericstj
Copy link
Member Author

ericstj commented Mar 25, 2019

Yes, we should clarify the docs on Optimal. My inclination to go this direction is that Optimal has meant balanced for Deflate since it was introduced in 2011; so precedent / compatibility win over naming. We can't drastically change the behavior for existing apps or apps that recompile.

@carlossanlop
Copy link
Member

@ericstj I can help adding this clarification to the documentation.

@ahsonkhan
Copy link
Member

I can help adding this clarification to the documentation.

I believe the implementation and document are in sync atm. We shouldn't update the docs without updating the implementation first since it looks like Optimal today means Highest anyway. So, this issue is basically to update what Optimal means (to match Deflate/etc, i.e. "balanced") and then add Highest which matches what Brotli Optimal does today.

This, naturally, would be a breaking change.

Based on this comment though, Optimal == Highest is by design:
https://github.com/dotnet/corefx/issues/29555#issuecomment-388424084

Also

For Brotli, I chose 11 as the Optimal for two main reasons:

  • The docs define Optimal as the absolute best compression without regard to compression time, and for the scenarios we expect Brotli to be primarily used in (server-side compression, browser-based decompression, preference for smallest files) every little bit of compression ratio is important.
  • 11 is the default C encoder compression level, and I like keeping our defaults for our wrapper implementations the same as the defaults for their underlying native implementations.

@osexpert
Copy link

osexpert commented Jul 6, 2019

@ericstj "We can't drastically change the behavior for existing apps or apps that recompile."
@ahsonkhan "This, naturally, would be a breaking change."
It is possible to use a switch, eg Switch.System.IO.Compression.CompressionLevel.OptimalIsBalanced.
OptimalIsBalanced = false: Highest = 3 will be the same as Optimal
OptimalIsBalanced = true: Optimal becomes balanced, Highest become highest.
Sometime in the future, OptimalIsBalanced may become default true (or not).

@terrajobst
Copy link
Member

terrajobst commented Dec 17, 2019

Video

  • We prefer SmallestSize to make it more clear what the API does (Highest or Best seem ambiguous with Optimal).
namespace System.IO.Compression
{
    public partial enum CompressionLevel
    {
        SmallestSize

        // Existing:
        // Fastest,
        // NoCompression,
        // Optimal
    }
}

@osexpert
Copy link

Since you already have Fastest you could add Slowest. Pedant: maximum compression is no guarantee for the SmallestSize, it is just a guarantee that it will be slower\it will try harder. The resulting file size may be the same.

@carlossanlop
Copy link
Member

carlossanlop commented Dec 18, 2019

Triage:
Next step is define what Deflate parameter to map to this new enum value. deflate_slow level 9 may be best choice.

@JimBobSquarePants
Copy link

Why can’t you use 0-9 like the underlying API? The ambiguity of the current options is obviously a poor choice given the confusion expressed here and elsewhere.

@ericstj
Copy link
Member Author

ericstj commented Dec 19, 2019

0-9 isn't the only option. There are also different strategies. This enum is also shared with other compression technologies which don't have the same parameters. The purpose of the CompressionLevel enumeration is to let folks use compression algorithms without needing to understand the meaning of their tuning parameters. We've separately discussed exposing more advanced options that allow directly accessing different compression algorithm tuning parameters. We're not closed to that idea, but the CompressionLevel enum isn't the place to represent it.

@JimBobSquarePants
Copy link

0-9 isn't the only option. There are also different strategies.

I know this.

This enum is also shared with other compression technologies which don't have the same parameters.

The leaky abstraction over deflate causes more issues than it solves. It's already proven to be inconsistent and adding to it will not help solve that. I'll wager most individuals interested in working with compression algorithms want to use the APIs relevant to those algorithms.

@carlossanlop carlossanlop transferred this issue from dotnet/corefx Jan 9, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.IO.Compression untriaged New issue has not been triaged by the area owner labels Jan 9, 2020
@carlossanlop carlossanlop added api-approved API was approved in API review, it can be implemented and removed untriaged New issue has not been triaged by the area owner labels Jan 9, 2020
@carlossanlop carlossanlop added this to the 5.0 milestone Jan 9, 2020
@carlossanlop carlossanlop modified the milestones: 5.0.0, Future Jun 18, 2020
@carlossanlop carlossanlop added the help wanted [up-for-grabs] Good issue for external contributors label Jun 18, 2020
@ghost ghost closed this as completed in #41960 Dec 7, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jan 6, 2021
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.IO.Compression help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants