Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

Pure managed implementation of System.IO.Compression.Brotli #1673

Closed
akoeplinger opened this issue Jul 27, 2017 · 19 comments
Closed

Pure managed implementation of System.IO.Compression.Brotli #1673

akoeplinger opened this issue Jul 27, 2017 · 19 comments
Labels
area-System.IO.Compression enhancement OpenBeforeArchiving These issues were open before the repo was archived. For re-open them, file them in the new repo
Milestone

Comments

@akoeplinger
Copy link
Member

The native dependency for the Brotli encoder/decoder is a bit problematic for e.g. Xamarin. A managed implementation of the algorithm would be more portable.

Relevant Twitter convo: https://twitter.com/terrajobst/status/890690221301350401

@terrajobst
Copy link
Member

We've had a native implementation for Deflate and I totally understand the issues one runs into with making it portable everywhere.

That being said, this might just be an artifact of this being out-of-band (i.e. delivered as a NuGet package). Eventually, we'd put Brotli where the other compression algorithms live (i.e. into the platform). At that point, it shouldn't matter too much as we'd probably add the code to the existing clrcompression native module.

@akoeplinger
Copy link
Member Author

akoeplinger commented Jul 27, 2017

FWIW Google already has an MIT licensed managed C# decoder implementation in https://github.com/google/brotli/tree/master/csharp (though this is transpiled from Java so probably not ideal).

@terrajobst
Copy link
Member

@shiftylogic, any thoughts?

@stephentoub
Copy link
Member

I would also prefer to see a managed implementation used unless the native implementation ends up being significantly more efficient. We're trying to minimize our native dependencies as much as possible, and if we're going to carry something with us, I'd prefer it be in managed unless there's a strong perf reason otherwise.

@terrajobst
Copy link
Member

terrajobst commented Jul 28, 2017

Had a quick chat with @joshfree. One of the reasons we went with native code for our deflate implementation was being able to easier pickup improvements made in the upstream version, which was written in C. The argument that I've heard is that we're not interested in building and maintaining a compression algorithm ourselves; it's mostly about how we expose them. So in that sense, having a managed implementation just because it's easier to maintain isn't the higher order bit, assuming we take the code usually from somewhere else anyway. Of course, I could imagine that going with managed code might also help us with performance, for instance by avoiding P/Invoke overhead or by leveraging newer features on our end (like SIMD). So I guess there is no clear cut answer...

@stephentoub
Copy link
Member

stephentoub commented Jul 28, 2017

we're not interested in building and maintaining a compression algorithm ourselves

If we're relying on the .so/.dylib as part of each distro we target, then we're limited to which distro we can support, and hit the kind of platform differences we've faced with libcurl. If not, then we are building and maintaining, and responsible for servicing and the like. We might benefit from fixes made upstream, but it's still up to us to consume them and adapt then to our environment. Relying on existing native components has been a great way to bootstrap the platform, and I do not regret our doing so. But going forward we need to be very thoughtful about where and how we do so. It's possible this is a desirable place, of course, especially if there is a performance argument, I just don't want us to assume so.

@terrajobst
Copy link
Member

@stephentoub, I think we'd put the source code into clrcompression and compile it ourselves. Would this solve the distro issue?

@stephentoub
Copy link
Member

I think we'd put the source code into clrcompression and compile it ourselves. Would this solve the distro issue?

We only distribute clrcompression.dll on Windows. We'd need to build it into something else on Unix, like System.IO.Compression.Native.so/dylib. But regardless, that's what I referred to above where I said "If not, then we are building and maintaining, and responsible for servicing and the like. We might benefit from fixes made upstream, but it's still up to us to consume them and adapt them to our environment." Doable? Of course. But no free lunch.

@joshfree joshfree assigned ianhays and unassigned shiftylogic Oct 30, 2017
@ianhays
Copy link
Contributor

ianhays commented Oct 30, 2017

Either way we're going to have to maintain and update an implementation manually as we can't rely on Brotli being already being available on the machine.

So our choices are:

  • Use a fully managed implementation. Write our own encoder. Use Google's decoder and spiff it up to make it performant and remove the java-isms.
    • Pros:
      • Runs anywhere.
      • Don't have to worry about distributing our native compression library to Unix platforms.
      • Makes backporting to netfx or OOBing wayy easier.
    • Cons:
      • Requires large amount of effort to write a managed encoder.
      • Updating the encoder in the future is entirely on us. We won't have a "master implementation" to copy from like we do with the native implementations or with the the managed decoder
  • Use the native implementation and write a managed wrapper around it a la zlib.
    • Pros:
      • Low cost to get it running since we only have to write the managed wrapper around the native API.
      • Maintenance of the implementation isn't our responsibility and taking fixes from google would be easy. We would just need to copy them over to our implementation.
    • Cons:
      • Not easily portable.
      • Would require we distribute clrcompression (or similar) to Unix or I guess we could maybe fit it into the shim native binary. Shouldn't be too hard, but will require some effort.
      • Makes backporting to netfx a hassle as we'd have to make some pretty heavy build modifications since netfx doesn't use cmake in its build.

I doubt there will be an especially strong performance argument for choosing one of the above over the other considering any managed encoder written today would have the latest hotness, but we can't really know for sure until we actually write the managed encoder at which point it would be silly not to use it :)

@ianhays
Copy link
Contributor

ianhays commented Oct 30, 2017

Personally, I prefer the managed approach despite the extra work required to write the encoder in C#. Having a native zlib implementation on Windows provided a great perf improvement over our old managed deflater, but man is it a hassle to keep up with. I've spent so much time resolving packaging errors, import issues, platform library availability problems, compiler version snafus, binscope failures, etc. etc. that I wonder where we'd be at perf-wise if that time went into modernizing (and correcting) the managed deflater/inflater instead. Probably still behind zlib-intel, but it's interesting to think about either way.

@stephentoub
Copy link
Member

stephentoub commented Oct 30, 2017

@ianhays, you might be able to use https://github.com/master131/BrotliSharpLib to jump start a corefx implementation.

@ianhays
Copy link
Contributor

ianhays commented Oct 31, 2017

That would certainly lower the gap in time cost between the two approaches. It's licensed under MIT so we could use that as a base for the encoder without needing too many modifications.

@akoeplinger
Copy link
Member Author

Sounds good, a managed implementation would definitely be less hassle to integrate into all our platforms.

@ianhays ianhays removed their assignment Jan 11, 2018
@ianhays ianhays added this to the Future milestone Jan 11, 2018
@msmolka
Copy link

msmolka commented May 10, 2018

Based on https://blogs.msdn.microsoft.com/dotnet/2018/05/07/announcing-net-core-2-1-rc-1/
the version 2.1 of .NET Core should support Brotli compression. However there is no assembly available on nuget which have this implementation.

@ianhays
Copy link
Contributor

ianhays commented May 10, 2018

@msmolka I just checked, and the most recent Preview1 package of runtime.linux-x64.Microsoft.NETCore.App.2.2.0-preview1-26508-01.nupkg includes Brotli.

Same for runtime.linux-x64.Microsoft.NETCore.App.2.1.0-rtm-26509-04 which is probably more relevant :)

and runtime.linux-x64.Microsoft.NETCore.App.2.1.0-rc1-26428-03

@msmolka
Copy link

msmolka commented May 10, 2018

So it will be in version 2.2. not 2.1 as mentioned on Microsoft blog?

@ianhays
Copy link
Contributor

ianhays commented May 10, 2018

It's in 2.1.

See runtime.linux-x64.Microsoft.NETCore.App.2.1.0-rc1-26428-03 for example

@ahsonkhan
Copy link
Member

ahsonkhan commented May 10, 2018

However there is no assembly available on nuget which have this implementation.

It is built into .NET Core 2.1, so there isn't an individual package on NuGet, but instead part of the Microsoft.NetCore.App metapackage.

So, you can use it if your application targets netcoreapp2.1.

image

@msmolka
Copy link

msmolka commented May 10, 2018

But then how to use it from full framework? Also how to include it in library targeted for netstandard and full ?

marek-safar pushed a commit to mono/mono that referenced this issue Feb 5, 2019
Resolves #11431

I put the native impl to Mono.Native, but probably it should be in a separate lib (`libmono-native.0.dylib` before: **81kb**, after: **898kb**) if Linker is not able to trim it if it's not used.
Compiles and works on Windows (however, we don't build Mono.Native for Windows, but we can borrow `clrcompression.dll` from .net core) and macOS but requires changes in mono/corefx,

I disabled xunit tests as those need 50mb of test data.

PS: Brotli could be fully managed 😢 dotnet/corefxlab#1673

C# ports:

- https://github.com/master131/BrotliSharpLib 
A quick benchmark (Compressing a 5mb file with max compression level=11)
Mono JIT: 25sec
Mono LLVM-AOT: 7.8sec
.NET Core 2.2: 6.8sec
.NET Core 3.0: 5.8sec
.NET Core's BrotliStream (with native lib): 3.5sec


- https://github.com/google/brotli/tree/master/csharp - official C# port. Based on Java sources, slow, outdated (2 years ago).
@pgovind pgovind added the OpenBeforeArchiving These issues were open before the repo was archived. For re-open them, file them in the new repo label Mar 11, 2021
@pgovind pgovind closed this as completed Mar 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO.Compression enhancement OpenBeforeArchiving These issues were open before the repo was archived. For re-open them, file them in the new repo
Projects
None yet
Development

No branches or pull requests

8 participants