Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing features / contributions welcome? #13

Open
EamonNerbonne opened this issue Feb 2, 2019 · 2 comments
Open

Missing features / contributions welcome? #13

EamonNerbonne opened this issue Feb 2, 2019 · 2 comments

Comments

@EamonNerbonne
Copy link

I noticed that this library and the related https://github.com/skbkontur/ZstdNet/tree/master/ZstdNet have only partially overlapping sets of functionality.

Are you interested in external contributions to fill out the gaps; and if so, how do you want those?

I could think of

  • adding API surface area corresponding to ZDICT_trainFromBuffer (this would be hugely useful to me, but may require a different compilation of libzstd.dll, since the prebundled release at https://github.com/facebook/zstd/releases aren't compiled with optional dictBuilder package.
  • adding API more suitable for (de)compressing small things; i.e. using Span<T> instead of Stream<T>, at least, under the presumption that benchmarks show this amounts to any kind of meaningful perf win.
  • perhaps there are other features the underlying C api that might be useful and simple enough to expose?
  • a little more tenuously, splitting the library into an as-thin-as-possible safe wrapper around the native library, and a wrapper that converts that into more conventional .net apis (akin to sqlitepcl.raw) - the advantage of that being that the "nice" managed wrapper need not evolve at quite the same rate as the underlying native library, and also, it's easier to expose all the crazy bits with a raw library without needing to decided on a clean api for them (i.e. this could be a way to include a dictbuilder simply), and conversely to allow experimenting with clean managed apis without polluting a library you want to keep stable and clean.

TL;DR are you interested in contributions, and if so how/what kind/etc?

@bp74
Copy link
Owner

bp74 commented Feb 3, 2019

Hi, i thought that training is mostly done with the console application that comes ith ZSTD. Do you think that this should be done with the .Net library? Regarding the Span feature - yes this would be nice, the new memory features will show up in pretty much all .Net libraries in the future.

@EamonNerbonne
Copy link
Author

EamonNerbonne commented Feb 4, 2019

Well "should" - that depends on the use case :-).

But yeah, for me it would be nice. I'm intending to use this to compress documents in what's essentially a document-database, and that means that the dictionary is dynamic: it's going to be based on a sample of actual data; and there are likely going to be a bunch of dictionaries (clustered somehow, e.g. based on document type and/or client), and the dictionaries are likely to be occasionally regenerated (to adapt to changing data distributions or simply leverage the fact that time is a reasonable predictor for a compressor).

But even for a fixed database it's a little simpler if it's possible to use the same tool to train the data as to use it.

I mean, for some people this is purely a disadvantage, because it causes some amount of library bloat. But if you're really going to leverage the small-content advantages dictionaries provide you kind of want to be able to make dictionaries. The size bloat appears to be fairly simple, based on the fact that https://github.com/skbkontur/ZstdNet/tree/master/ZstdNet's version of the dll's are actualy much smaller than the current 1.3.8 dlls; and in any case if you really care about size then a more significant win is to pick a bit-ness rather than include 32 and 64bit both. But I haven't checked yet what the bloat is using the 1.3.8 version of the codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants