-
-
Notifications
You must be signed in to change notification settings - Fork 740
add std.compression.lz77 #1335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add std.compression.lz77 #1335
Conversation
For me circular buffer seems like a general data structure, not specific to lz77. So my question is -- why not move it to std.array or std.container? I think this is not the only data structure that is implemented as private in non array/container related Phobos module. |
* InputRange | ||
*/ | ||
|
||
auto compress(R)(R src) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... if (isInputRange!R && is(ElementType!R : ubyte))
It's much easier to get the documentation right if docs are generated and linked. Pull requests for new modules generally do this. |
I think this is good work but some additional design is required. The entire signature/factory aspect must be handled upfront. Fundamentally we should have a function that given a range compressed with any supported algorithm is able to instantiate the proper decompressor and use it. |
printf("Decompression done, dilen = %d decompressed = %d\n", | ||
di.length, si2.length); | ||
|
||
if (si != si2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(si == si2, "Buffers don't match");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert's go away with -release, I wanted this one to stay in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you expect this to be executed, then arguably assert(0)
shouldn't be in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(0) is replaced with a HLT instruction for -release.
Hmm, I wrote a note on why there isn't a signature, it seems to have vanished. |
@WalterBright did we agree that it's worth looking on compress/expand algos that traffic one |
I considered and rejected having a factory & signature for the following reasons:
Hence I believe the factory/signature belongs in an outer layer, not in this component. |
@andralex yes we did agree on ubyte[], but it's non-trivial to do it, and I don't think it's at the top of my endless stack, and this module does work without it. |
} | ||
else if (offset < 0x8000) // 11111XXXXXX | ||
{ | ||
putBit(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might as well implement putBits
to use via putBits(1, 1, 1, 1)
because there's so many wasteful one-liners here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not going to do it that way because the putBit()'s are all inlined and then optimized. Doing a loop would be slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can the compiler not optimize out the call putBits(1, 1, 1, 1, 0)
? The arguments are known at compile-time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler currently does not unroll loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So much for the talk about having discovered all major optimizations back in the 80s.. How is this optimization not implemented? It looks trivial.
(Edit: just got private message from Walter he was about to submit changes to most points. Apologies for being unfairly general in my criticisms.)
We can't add a new module just because whoever wrote it just didn't have the time to get it right. We did that with Before this gets in Also this is great as an experiment in minimalism but I can't accept in |
New version posted incorporating most of your suggestions, thanks! |
I completely agree with this. I may want to use my own signature for my own compressed data; this is not the compression algorithm's business. |
Also check out line 464 of https://github.com/D-Programming-Language/phobos/blob/master/std/zip.d. If we always put out a signature, doing this code would get pretty hackish with ranges. |
I completely agree too. All I'm saying is this algorithm must be surrounded by the appropriate design, in addition to being able to work independently. If we make a mistake here (and just how often primitives just work without a design?), future implementers of compression algorithms will forever curse us for making it hard to get things done.
|
I don't think we should force inserting a signature. Our design must, however, facilitate it. I'm saying, |
Before I forget - this should inline the loop: void putBits(T)(T bits...)
{
foreach (b; bits) putBit(bit);
} That would shorten code considerably and work as well as writing the calls by hand. |
|
||
//import core.stdc.stdio : printf; | ||
|
||
private struct CircularBuffer(U: T[dim], T, size_t dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should migrate to std.range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not put this in std.range
or even std.container
. Very rarely does a module need a circular buffer. The module situation is bad enough as it is, let's not make it worse.
Put it in std.circularbuffer
. That way, this module can import just CircularBuffer
and not the whole of Phobos. Hopefully at some point we can revert the mistake of having a std.container
module, which was destined from the get-go to be a monster containing every conceivable data structure. Phobos really needs to be more organised and we might as well start now.
The putBits(T)(T bits...) does not compile. The only thing that does work is:
|
void putBits(T...)(T bits) {
foreach(b; bits) putBit(b);
}
putBits(1,0,1); |
{ | ||
// Reached the end of the source | ||
static if (arrayLike) | ||
{ if (si != src.length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enforceEx!CompressException(…)? ;)
Thanks, yebblies! That's the right form, but the code generated is worse (compiler doesn't do a good job inlining it). |
Ah, I'll bet this is a dmd-only problem. Please, PLEASE reconsider naming the functions something other than 'compress' and 'expand'. We have great tools to resolve conflicting functions, but generic names increase the effort required to understand code, and conflicts ruin ufcs. |
@yebblies: Seems like you are rather alone with that opinion. |
@klickverbot Only until I convince somebody else! |
He's not alone in this, this should be inside of some struct. But more importantly, if someone adds another compression engine (say lz78), how are you going to use multiple compression engines at once? You'd have to fully qualify everything. It's better to be type-safe than sorry (you could use |
@AndrejMitrovic: Type safety has nothing to do with this, and it's as easy to typo an import statement as it is to misspell a function call. |
@klickverbot: It's not about typing code, it's about reading code. With |
It's equally easy to spot a module importing lz78 when it should import lz77. |
Okie. |
@WalterBright are you interested in proceeding with this proposal in Phobos queue after this release is out? I will mark it as "On hold" otherwise until you can spend time on it or find another champion. |
If he's gonna work on it he can reopen it, otherwise this is just adding to the black statistic of open pulls. |
How about putting this in std.experimental? |
How about putting this on code.dlang.org? (instead or as well) |
There is no point in adding to |
I'd personally like to see std.experimental replaced by distribution via Dub. Maintaining experimental code in the main repo doesn't seem terribly appealing. |
@WalterBright please put it on dub, so your work doesn't get lost! |
No description provided.