Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.IO.Compression to support zlib thin wrapper over DEFLATE? #16923

Closed
xied75 opened this issue Apr 7, 2016 · 23 comments
Closed

System.IO.Compression to support zlib thin wrapper over DEFLATE? #16923

xied75 opened this issue Apr 7, 2016 · 23 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO.Compression
Milestone

Comments

@xied75
Copy link
Contributor

xied75 commented Apr 7, 2016

Currently we have access to this: https://msdn.microsoft.com/en-us/library/system.io.compression.deflatestream(v=vs.110).aspx

The DeflateStream type can be used to compress/decompress DEFLATE data.

Yet in the wild world people normally wrap that with zlib (i.e. RFC1950), e.g. in the Git pack file, there are multiple zlib streams embedded within.

Given underneath we are calling zlib Native, any chance we could provide types that can handle zlib streams? As in:

https://github.com/icsharpcode/SharpZipLib/blob/master/src/Zip/Compression/Inflater.cs
https://docs.python.org/2/library/zlib.html (zlib.compress(string[, level]))

@ianhays
Copy link
Contributor

ianhays commented Apr 7, 2016

I'm not sure I follow; could you give an example of what you'd like to be able to do? DeflateStream is already a managed wrapper around the zlib API that provides functionality in the .NET stream style.

@xied75
Copy link
Contributor Author

xied75 commented Apr 7, 2016

echo -n "example text to compress and decompress" | python2 -c 'import sys,zlib; sys.stdout.write(zlib.compress(sys.stdin.read()))' | hd

We get this:

00000000  78 9c 4b ad 48 cc 2d c8  49 55 28 49 ad 28 51 28  |x.K.H.-.IU(I.(Q(|
00000010  c9 57 48 ce cf 2d 28 4a  2d 2e 56 48 cc 4b 51 48  |.WH..-(J-.VH.KQH|
00000020  49 85 71 01 2a 93 0f 09                           |I.q.*...|

In which the 78 9c is the magic header for zlib (http://stackoverflow.com/questions/9050260/what-does-a-zlib-header-look-like)

Or

echo -n "example text to compress and decompress" | gzip | hd

We get:

00000000  1f 8b 08 00 bf 59 06 57  00 03 4b ad 48 cc 2d c8  |.....Y.W..K.H.-.|
00000010  49 55 28 49 ad 28 51 28  c9 57 48 ce cf 2d 28 4a  |IU(I.(Q(.WH..-(J|
00000020  2d 2e 56 48 cc 4b 51 48  49 85 71 01 fb 2d 09 c5  |-.VH.KQHI.q..-..|
00000030  27 00 00 00                                       |'...|

The 1f 8b are gzip header.

With System.IO.Compression.DeflateStream we get:

4b ad 48 cc 2d c8 49 55  28 49 ad 28 51 28 c9 57
48 ce cf 2d 28 4a 2d 2e  56 48 cc 4b 51 48 49 85
71 01

This is the DEFLATE stream, not zlib stream. It is in the middle of both zlib and gzip stream, but without headers and tails.

@ianhays
Copy link
Contributor

ianhays commented Apr 7, 2016

Ah I see what the ask is now, thanks for the elaboration.

I agree that this is something we should have at the very least because it won't require very much effort to implement and increases the number of cases that we can handle. It does complicate #16574 though.

Off the top of my head, we could add this in a few different ways:

  • GZipStream is already a thin wrapper around DeflateStream so it would be trivial to add a ZlibStream that behaved similarly.
  • Expose the DeflateStream constructor that takes windowBits so that you could select the correct windowBits number depending on the type of compression you wanted to work with.
  • Add a CompressionType enum with values Raw, ZLib, and GZip. Add a constructor to DeflateStream that accepted that enum.

@xied75
Copy link
Contributor Author

xied75 commented May 3, 2016

@ianhays Now with this "up for grabs", what is the conclusion regarding #16574 then? Managed or not, both should work, but eventually there needs to be a "policy" or "agreement" reached or pushed either from top or bottom.

My feeling is that native code should perform faster, (although happy to be proved wrong on this). Trivia difference for small runs, would add up when you are facing huge number of calls.

Another thought on this: talking about managed implementation, how about merging the effort with https://github.com/icsharpcode/SharpZipLib?

@ianhays
Copy link
Contributor

ianhays commented May 3, 2016

Now with this "up for grabs", what is the conclusion regarding #16574 then? Managed or not, both should work, but eventually there needs to be a "policy" or "agreement" reached or pushed either from top or bottom.

The purpose of #16574 is to provide a fallback implementation where a native compression library (e.g. zlib) doesn't exist. That issue actually shouldn't interfere with this one since any new API could be PlatformNotSupported in the managed implementation if it came to that.

My feeling is that native code should perform faster, (although happy to be proved wrong on this). Trivia difference for small runs, would add up when you are facing huge number of calls.

You are correct. Switching to native zlib increased our compression bechmarks by something like ~30% in cases where more than a few bytes were compressed/decompressed.

Another thought on this: talking about managed implementation, how about merging the effort with https://github.com/icsharpcode/SharpZipLib?

We already have a managed implementation in source control, I just deleted it since we didn't need it at the time. Also, as a general rule, we don't take code from other projects/libraries unless the code is available under a very liberal license like zlib or the owner contributes the code themselves under our license.

As far as this issue is concerned, implementing deflate support in CoreFX wouldn't require adding a managed Deflate algorithm, it would only require we add API around our native library (zlib) that is already built to handle that format. It's just a question of what we want that API to look like.

@ianhays
Copy link
Contributor

ianhays commented Oct 10, 2016

The work remaining on this is to propose an API for a ZLibStream class that compresses/decompresses deflate data that is wrapped in a ZLib header. The API (and implementation for that matter) for this should probably just be a copy/paste of GZipStream.

@ianhays ianhays removed their assignment Oct 10, 2016
@danmoseley
Copy link
Member

@xied75 do you want to complete formal API proposal then we can review formally?

Docs on the API review process and see Jon's API request for a great example of a strong proposal. The more concrete the proposal is (e.g. has examples of usage, real-world scenarios, fleshed out API), the more discussion it will start and the better the chances of us being able to push the addition through the review process.

@xied75
Copy link
Contributor Author

xied75 commented Feb 24, 2017

@danmosemsft Sorry for the delayed response, I reckon a formal API proposal will need large block of time which I can't start right now. This is needed in my project thus I'll come to it sooner or later, if no others want to grab it first.

@xied75
Copy link
Contributor Author

xied75 commented Apr 8, 2017

Just found we've already got these https://github.com/dotnet/corefx/tree/master/src/System.IO.Compression/src/System/IO/Compression/DeflateZLib

I wonder would these files warrant closing this issue?

@ianhays
Copy link
Contributor

ianhays commented Apr 17, 2017

@xied75 the nomenclature is somewhat confusing, but the files in that folder are for the DeflateStream that uses zlib as the underlying compression engine. It does not contain a public interface for handing zlib headers, only raw deflate.

@martinjt
Copy link

Would this mean that dotnet core would be able to decompress a file that is created using zlib's pigz implementation http://zlib.net/pigz/ it seems that the current implementation of the GZipStream only decompresses part of the stream if it's compressed using that.

@ianhays
Copy link
Contributor

ianhays commented Apr 22, 2017

From looking at the pigz docs, it appears to be something entirely different from the deflated data stream prepended by a ZLIB header that is being discussed in this issue. So I don't see this issue affecting pigz.

@adamhathcock
Copy link

I would also like access to the zlib internals to use the native implementation instead of a managed version I have. This is specifically to expose GZip functionality like the filename or comment.

This is for SharpCompress: https://github.com/adamhathcock/sharpcompress

@JamesNK
Copy link
Member

JamesNK commented Aug 29, 2019

RFC1950 support would be nice for gRPC in .NET. A common compression encoding is deflate. However the deflate content from servers and clients has headers from RFC1950.

This isn't the end of the world. gRPC .NET can provide gzip as an option instead.

@carlossanlop
Copy link
Member

Triage:
This is an advanced scenario (only 3 people wanting it so far). We are reluctant to expose our internal zlib abstraction and maintain its compatibility over time. It might prevent us from upgrading zlib easily.

@JamesNK how does this impact gRPC scenarios?

@JamesNK
Copy link
Member

JamesNK commented Dec 16, 2019

It isn't required. If it becomes more important then we can re-evaluate.

@JimBobSquarePants
Copy link

JimBobSquarePants commented Dec 17, 2019

This is an advanced scenario (only 3 people wanting it so far).

3 people plus all the people that would benefit from the change. I'm very disappointed at the short-sightedness here.

See dotnet/corefx#40170 for further reasons why this is important.

@darrenstarr
Copy link

Sadly, I spend most of my time coding and little of it tracking feature requests in .NET.

I'm number 4.... and I'm sure there are more out there like us.

@karelz
Copy link
Member

karelz commented Dec 17, 2019

@JimBobSquarePants we are hesitant to add features that would tie our hands long term and we do not believe they bring large value.

If it is that needed, I think there is always opportunity for standalone nuget package. If it is successful and widely used over time, I'd be willing to reconsider and take it into the platform.

I don't quite understand why the difference between frameworks matters. I bet we will close that issue on Wed with next round of triage. I admit I don't don't read it all yet ...

@adamhathcock
Copy link

As I mentioned in #1550 my primary issue is that the GZip implementation doesn't expose the archive metadata that GZip (the archive format) uses. GZip is basically DEFLATE plus some optional metadata.

@karelz
Copy link
Member

karelz commented Dec 17, 2019

@adamhathcock is this the only way that problem can be solved? Do we have other way to expose format medatada?
What exactly is your scenario / usage pattern? How common is it? (in the context on this issue - the linked one is rather long and detailed about other things)

@adamhathcock
Copy link

When using GZipStream to compress data like in a web server, you don't use the metadata fields. However, GZip (the utility) compresses a single file and adds metadata like the file name and timestamp. The ability to read and write this metadata is what I need.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@Ryder25
Copy link

Ryder25 commented Feb 12, 2020

I missed this when I made a new issue for it. I'm really disappointed by the decision made here. I've been waiting for zlib support for a long time now. I made a new issue requesting an API surface similar or almost exact with the API surface of GZipStream. This does not expose too many specifics of the underlying zlib implementation while still allowing the C# developer to handle zlib data in a way they are already familiar with. The ability to compress data to zlib with the existing CompressionLevel enum, and to decompress any zlib data stream would probably be good enough for the majority of use cases.

If this is something supporters of this issue would want, then show your support in the new issue, and comment about your use cases. I hope we can get them to change their minds on this topic.

Also, let's not forget that many similar languages to C# already have support for zlib, including Java and Python.

#2236

@dotnet dotnet locked as resolved and limited conversation to collaborators Jan 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO.Compression
Projects
None yet
Development

No branches or pull requests