Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upAdd Brotli Compression to CoreFX #25785
Comments
This comment has been minimized.
This comment has been minimized.
I don't think the non-streaming samples are illustrating what we are after. They will never go through more than one itteration of the loop. The right sample would be something like the following: public interface IOutput
{
Span<byte> Buffer { get; };
void Commit(int bytes);
void Resize(int minimumSize);
}
// This code is very naive, but it does illustrate a pipe scenario
public static void PipeCompress(ReadOnlyMemory<byte>[] inputs, IOutput output)
{
BrotliEncoder encoder;
for(int i=0; i<inputes.Length; i++) {
var input = inputs[i];
while (!input.IsEmpty)
{
var buffer = output.Buffer;
encoder.Compress(input, buffer, out int bytesConsumed, out int written);
output.Commit(written);
input = input.Slice(bytesConsumed);
}
}
encoder.Flush(output, out int bytesWritten, isFinished: true);
} |
This comment has been minimized.
This comment has been minimized.
The BrotliResult enum is very similar (if not identical) to the OperationStatus enum in the System.Buffers namespace that we use for our Base64 Encoder and Decoder. Can we use OperationStatus as the returned enum in Brotli as well? https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/Buffers/OperationStatus.cs We should make sure we agree on the enum as part of this API review (leftover from previous API review: #22412). |
This comment has been minimized.
This comment has been minimized.
@ahsonkhan where was the OperationStatus approved as API? I asked the question on #22412 but didn't get answer :) |
This comment has been minimized.
This comment has been minimized.
Oh, yes, I didn't see that OperationStatus was in CoreFX already. It is interchangeable with BrotliStatus. I'll update the proposal. |
This comment has been minimized.
This comment has been minimized.
Updated the spec with the more accurate use-case examples. |
This comment has been minimized.
This comment has been minimized.
I'm not convinced of the value of a struct-based encoder / decoder as opposed to a class-based encoder / decoder. At its core due to the interop you have a native handle under the covers. These handles need to have definite lifetimes, which means that somebody needs to make sure that handles are closed properly and that they're no longer used after they're closed. It's very difficult to make these guarantees with structs, and I'm afraid that we'll lead developers into a pit of failure if we go this direction. Indeed, this is why |
This comment has been minimized.
This comment has been minimized.
@GrabYourPitchforks I tend to agree. I'd prefer to see the internal handle represented by a That said, I do see the value in being allocation free and that's why I made it a struct. I tried to lessen the potential for erroneous code by making it an IDisposable struct, though it's still on the developer to actually do the disposal themselves. I'd be interested to hear other people's opinions on this. Historically, using structs in this manner has been a big no-no and is warned against in almost every article I could find, so whether the minor perf gain is worth the cost is questionable. I'll run some tests and see how much of a hit we take if we make it into a class instead. |
This comment has been minimized.
This comment has been minimized.
Even if there is a measurable cost to switching to class (and that would honestly surprise me), callers can work around it by pooling encoder / decoder instances. There's nothing that requires the type be a struct in order to get high performance. As an aside, I'm seeing that with this and other proposals we're trying to invent concepts analogous to C++'s deterministic destruction and non-copyable types. If this is something we think might be valuable to C# developers, we should propose these features directly, and then we can create types that are properly implemented on top of these features. |
This comment has been minimized.
This comment has been minimized.
It doesn't appear to have a significant impact that I could see in my testing which isn't too surprising. I'll update the proposal to be class based, unless someone feels strongly that it should be a struct. @KrzysztofCwalina ? |
This comment has been minimized.
This comment has been minimized.
It was implicitly approved as a placeholder to unblock the Base64 APIs and the final review was deferred, as far as I can recall, until we have more APIs (like the text encoders APIs, or in this case, compression APIs) to get a better understanding of the use cases. That is why I brought it up here since now we have more scenarios to confirm the usefulness/correctness of the enum. |
This comment has been minimized.
This comment has been minimized.
We discussed the struct/class and non-allocating/allocating concern in-person and the ending compromise is to leave it as an IDiposable struct, but use a Safehandle for the state pointer. Does anyone else have any more feedback before I mark this as ready for review? |
This comment has been minimized.
This comment has been minimized.
cc: @terrajobst |
This comment has been minimized.
This comment has been minimized.
Regarding OperationStatus, something to consider is that we already have an enum with a similar name in System.Net.NetworkInformation - OperationalStatus. |
This comment has been minimized.
This comment has been minimized.
@ahsonkhan, good point. Maybe we should call it OperationResult |
This comment has been minimized.
This comment has been minimized.
FYI: First part of the API review discussion on 2017/12/19 was recorded - see https://youtu.be/ZrT0uOsqQlI?t=6541 (5 min duration) |
This comment has been minimized.
This comment has been minimized.
We reviewed the API and made some minor tweaks: public partial class BrotliStream : Stream
{
public BrotliStream(Stream stream, CompressionLevel compressionLevel);
public BrotliStream(Stream stream, CompressionLevel compressionLevel, bool leaveOpen);
public BrotliStream(Stream stream, CompressionMode mode);
public BrotliStream(Stream stream, CompressionMode mode, bool leaveOpen);
public Stream BaseStream { get; }
// We don't think we need those for now. Not having aligns the type with DeflateStream/GzipStream
// public BrotliStream(Stream stream, CompressionLevel compressionLevel, bool leaveOpen, int bufferSize);
// public BrotliStream(Stream stream, CompressionMode mode, bool leaveOpen, int bufferSize);
// Overrides omitted for clarity
}
// We discussed making these guys classes and derive from CFO ourselves instead of wrapping a SafeHandle
// but we decided against it due to complexity and only minor savings (like when these are boxed)
public struct BrotliDecoder : IDisposable
{
public OperationStatus Decompress(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesConsumed, out int bytesWritten);
public void Dispose();
public static bool TryDecompress(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten);
}
public struct BrotliEncoder : IDisposable
{
BrotliEncoder(int quality, int window);
public OperationStatus Compress(ReadOnlySpan<byte> source,
Span<byte> destination,
out int bytesConsumed,
out int bytesWritten,
bool isFinalBlock);
public OperationStatus Flush(Span<byte> destination, out int bytesWritten);
public void Dispose();
public static bool TryCompress(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten);
public static bool TryCompress(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten, int quality, int window);
public static int GetMaxCompressedLength(int length);
} |
This comment has been minimized.
This comment has been minimized.
FYI: Second part of API review discussion on 2017/12/19 was recorded - see https://www.youtube.com/watch?v=IIKqOagRdWA (1h 43min duration) (video is mostly frozen) |
This comment has been minimized.
This comment has been minimized.
which version of System.IO.Compression has BrotliStream? because I could not find it @ianhays |
This comment has been minimized.
This comment has been minimized.
@guylando BrotliStream is in its own assembly (System.IO.Compression.Brotli.dll) that is included in framework versions of 2.1 preview1 forward. For example, |
This comment has been minimized.
This comment has been minimized.
@ianhays what is the minimal nugget package to reference to get it? |
This comment has been minimized.
This comment has been minimized.
Essentially, yeah. A few versions back it was a bundling of packages or a meta-package with a large dependency list, but now it's a primary shipping vehicle that contains most of the framework. We don't ship Brotli as a standalone package because it's included in the framework package and we don't ship it downlevel either. cc: @weshaggard |
This comment has been minimized.
This comment has been minimized.
@ianhays Do I understand correctly that its a best practice to include Microsoft.NETCore.App instead of separate packages? Our projects are (3 years old) from before the existence of Microsoft.NETCore.App so when upgrading in the past I left them as is and never moved to Microsoft.NETCore.App because sounded better to leave as is. |
This comment has been minimized.
This comment has been minimized.
@guylando you will be on supported & tested combination of packages, which makes it actually less likely for you to hit issues. |
This comment has been minimized.
This comment has been minimized.
And does Microsoft.AspNetCore.App include it also? or only Microsoft.NetCore.App? thanks! |
This comment has been minimized.
This comment has been minimized.
If you are targeting .NET Core then yes that is the correct thing to do. You don't actually need to reference the package directly in your application, the reference you get is controlled by the TargetFramework you are using. You will need to target netcoreapp2.1 to get these new APIs.
Microsoft.AspNetCore.App depends on Microsoft.NetCore.App so it will be part of the closure. |
This comment has been minimized.
This comment has been minimized.
And is it possible to get Brotli if I target net461? |
This comment has been minimized.
This comment has been minimized.
No we don't ship it as a library package it is only available for .NET Core 2.1+ |
This comment has been minimized.
This comment has been minimized.
thanks for the answers! |
This comment has been minimized.
This comment has been minimized.
@weshaggard, @ianhays so brotli is not netstandard? and never will be? it is only on .net core? If I would like to include it into library that is available for netstandard, netcore, full framework it won't be possible? Could you explain decision behind that? the corefx-lab version works with netstandard, why final (preview) release not? Because of that decision I'm not able to use Brotli in Asp NET Core targeted full framework. |
This comment has been minimized.
This comment has been minimized.
That is correct. It is only supported on .NET Core today. That isn't to say it will never be part of the .NET Standard or built as a netstandard library but there are various technical issues that come when you ship a OOB (Out-of-Band) library like this when it exists inbox for a platform like .NET Core or .NET Framework, which is why we don't currently support that scenario. If using this Brotli library on .NET Framework is a scenario you would like to see supported I suggest filing a issue for it and the owners of the library will take it under consideration. |
This comment has been minimized.
This comment has been minimized.
@weshaggard Thank you for answer, I'm a bit disappointed because corefx lab version is targeted netstandard1.1. |
This comment has been minimized.
This comment has been minimized.
@msmolka I understand your frustration but there are reasons for our choices. Keep in mind that corefxlab is for prototypes and experimenting so they do what is the quickest to get the job done and then once we decide to actually ship a library it moves to corefx and gets designed with all the different contexts in mind. In this case we decided it is best for Brotli to ship as part of the platform so that we can take advantage of it in our lower networking layer which is also part of the platform. Given that we made it part of the platform we decided not to ship it as an OOB library because of issues that occur with OOBs (see #17522 for a hint of some similar troubles we had with System.Net.Http). Please do file an separate issue if you'd like to see Brotli supported on .NET Framework and we will consider supporting it. |
This comment has been minimized.
This comment has been minimized.
Hello,
The return is false and destination is empty. Thanks |
This comment has been minimized.
This comment has been minimized.
@ricardobeckssa The |
This comment has been minimized.
This comment has been minimized.
@khellang how can I know the space before compress? what's space should I to inform? |
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
@khellang thanks, I did:
|
This comment has been minimized.
This comment has been minimized.
FYI, you can abbreviate the following: To the following, relying on the implicit cast: ReadOnlySpan<byte> source = data; // or data.AsSpan(); |
System.IO.Compression.Brotli
Introduction
Brotli is a generic-purpose lossless compression algorithm that compresses data
using a combination of a modern variant of the LZ77 algorithm, Huffman coding
and 2nd order context modeling, with a compression ratio comparable to the best
currently available general-purpose compression methods. It is similar in speed
to deflate but offers more dense compression.
The specification of the Brotli Compressed Data Format is defined in RFC 7932.
Brotli encoding is supported by most web browsers, major web servers, and some CDNs (Content Delivery Networks).
BrotliStream
Proposed API
The API surface area for BrotliStream is identical to that of DeflateStream but with added
bufferSize
constructors.Example Usage
The BrotliStream behavior is the same as that of DeflateStream or GZipStream to allow easily converting DeflateStream/GZipStream code to use BrotliStream.
BrotliEncoder & BrotliDecoder
Proposed API
The goal of the streamless implementation is to provide a non-allocating, performant Brotli implementation free from Streams. It contains simple Compress/Decompress operations that return an enum indicating the success of the operation as well as static CompressFully/DecompressFully operations that allow single-pass compression/decompression without the need for a BrotliEncoder/BrotliDecoder instance.
Design Questions
Should we allow setting the Quality/Window via
Set_
functions of make them constructor variables? They must be set before encoding either way.BrotliEncoder SetQuality/SetWindows vs constructor overloads:
Flush vs Finalize
Should there be an option for intermediate flushes or only for finalize? The main use case of an intermediate Flush is if you want to get more of the outputted bytes but aren’t yet done supplying input to the compressor.
Allow input to Flush/Finalize?
I prefer the simpler Flush/Finalize that don’t take input, but the underlying call allows input if we decide that’s more usable. If we go that route then we could potentially just condense the API down to one function.
Naming
Example Usage
Implementation
The implementation will be based around the c code provided by Google that will be inserted into our existing native Compression libraries (clrcompression (Windows) and System.IO.Compression.Native (Unix). In CoreFX we'll have a managed wrapper to pinvoke into the native brotli implementation and provide the above API around it, same as we do for zlib. See dotnet/corefxlab#1673 for a discussion on the pros of cons of a fully managed implementation and my justification for using the native approach (at least for now). Performance testing to come later, with the implementation PR.
This proposal is an evolution of the CoreFXLab implementation of Brotli.
This is a component of #24826
PTAL: @joshfree @KrzysztofCwalina @GrabYourPitchforks @ViktorHofer @stephentoub @terrajobst @ahsonkhan @JeremyKuhne