add std.compression.lz77 #1335

WalterBright · 2013-06-06T20:33:53Z

No description provided.

korobochka · 2013-06-07T10:55:13Z

For me circular buffer seems like a general data structure, not specific to lz77. So my question is -- why not move it to std.array or std.container? I think this is not the only data structure that is implemented as private in non array/container related Phobos module.
Sorry if I am missing something.

kyllingstad · 2013-06-07T12:53:50Z

std/compression/lz77.d

+ *      InputRange
+ */
+
+auto compress(R)(R src)


... if (isInputRange!R && is(ElementType!R : ubyte))

JakobOvrum · 2013-06-07T14:34:48Z

It's much easier to get the documentation right if docs are generated and linked. Pull requests for new modules generally do this.

andralex · 2013-06-07T18:51:09Z

I think this is good work but some additional design is required. The entire signature/factory aspect must be handled upfront. Fundamentally we should have a function that given a range compressed with any supported algorithm is able to instantiate the proper decompressor and use it.

dnadlinger · 2013-06-08T22:35:37Z

std/compression/lz77.d

+        printf("Decompression done, dilen = %d decompressed = %d\n",
+            di.length, si2.length);
+
+        if (si != si2)


assert(si == si2, "Buffers don't match");

assert's go away with -release, I wanted this one to stay in.

If you expect this to be executed, then arguably assert(0) shouldn't be in there.

assert(0) is replaced with a HLT instruction for -release.

WalterBright · 2013-06-08T23:24:29Z

Hmm, I wrote a note on why there isn't a signature, it seems to have vanished.

andralex · 2013-06-08T23:33:47Z

@WalterBright did we agree that it's worth looking on compress/expand algos that traffic one ubyte[] at a time? Operating at ubyte level is inefficient for this algorithm and it will be inefficient for virtually all others.

WalterBright · 2013-06-08T23:34:01Z

I considered and rejected having a factory & signature for the following reasons:

the component is supposed to be independent. Having a factory to detect a signature and select the corresponding component requires that it knows about all the others. This is the job of an outer layer, not this.
there are endless popular compression algorithms. Some have signatures, some don't. There is no central repository of signatures - so having one is unreliable at being unique
having a signature will make front() slightly slower, as it'll have to check every time if it's seen the signature or not
some archive formats, like zip, do not store the signature with the compressed data. It's done as a flag elsewhere.
self-extracting executables, installers, etc., have no need of a factory/signature
I have used compressors to compress short strings - they'd be rendered much less useful if a signature was added

Hence I believe the factory/signature belongs in an outer layer, not in this component.

WalterBright · 2013-06-09T00:18:39Z

@andralex yes we did agree on ubyte[], but it's non-trivial to do it, and I don't think it's at the top of my endless stack, and this module does work without it.

ghost · 2013-06-09T00:27:56Z

std/compression/lz77.d

+                }
+                else if (offset < 0x8000)       // 11111XXXXXX
+                {
+                    putBit(1);


You might as well implement putBits to use via putBits(1, 1, 1, 1) because there's so many wasteful one-liners here.

I'm not going to do it that way because the putBit()'s are all inlined and then optimized. Doing a loop would be slower.

How can the compiler not optimize out the call putBits(1, 1, 1, 1, 0)? The arguments are known at compile-time.

The compiler currently does not unroll loops.

So much for the talk about having discovered all major optimizations back in the 80s.. How is this optimization not implemented? It looks trivial.

andralex · 2013-06-09T02:04:10Z

(Edit: just got private message from Walter he was about to submit changes to most points. Apologies for being unfairly general in my criticisms.)

@andralex yes we did agree on ubyte[], but it's non-trivial to do it, and I don't think it's at the top of my endless stack, and this module does work without it.

We can't add a new module just because whoever wrote it just didn't have the time to get it right. We did that with std.json and std.xml and that didn't turn well. We could, however, add it to etc so people can experiment with it.

Before this gets in std, we must make sure whether a range of ubyte[] is more appropriate than a range of ubyte and how to use multiple compression algorithms in the same application without errors.

Also this is great as an experiment in minimalism but I can't accept in std a module that reinvents parts of std. It must not define its own notion of range, copy, circular buffer, and whatnot. Whatever issues are there with std's modularization, going lone cowboy on this is emphatically not the answer.

WalterBright · 2013-06-09T02:49:44Z

New version posted incorporating most of your suggestions, thanks!

kyllingstad · 2013-06-09T05:35:41Z

@WalterBright:

I considered and rejected having a factory & signature for the following reasons: [...]

I completely agree with this. I may want to use my own signature for my own compressed data; this is not the compression algorithm's business.

WalterBright · 2013-06-09T06:06:13Z

Also check out line 464 of https://github.com/D-Programming-Language/phobos/blob/master/std/zip.d. If we always put out a signature, doing this code would get pretty hackish with ranges.

andralex · 2013-06-09T06:44:13Z

I considered and rejected having a factory & signature for the following reasons: [...]

I completely agree with this. I may want to use my own signature for my own compressed data; this is not the compression algorithm's business.

I completely agree too. All I'm saying is this algorithm must be surrounded by the appropriate design, in addition to being able to work independently.

If we make a mistake here (and just how often primitives just work without a design?), future implementers of compression algorithms will forever curse us for making it hard to get things done.

The most obvious aspect is handling buffers vs. single bytes. If we do byte-level, /everyone/ will have to.
If we make it unduly difficult to integrate with factories and signatures, again we'll only have trouble down the road trying to decompress data compressed with a different algorithm. As the simplest design, every algorithm must have a unique signature (e.g. 2-4 bytes) that its user may or may not choose to insert at the beginning of the stream. This algorithm would prolly have a signature like 1 (whereas plain copy may be 0). Then future algorithms will each reclaim new signatures.

andralex · 2013-06-09T06:50:59Z

Also check out line 464 of https://github.com/D-Programming-Language/phobos/blob/master/std/zip.d. If we always put out a signature, doing this code would get pretty hackish with ranges.

I don't think we should force inserting a signature. Our design must, however, facilitate it.

I'm saying, compress() and expand() are a great start but we need a little framework to encompass them and other algorithms.

andralex · 2013-06-09T06:53:26Z

Before I forget - this should inline the loop:

void putBits(T)(T bits...)
{
    foreach (b; bits) putBit(bit);
}

That would shorten code considerably and work as well as writing the calls by hand.

andralex · 2013-06-09T06:53:51Z

std/compression/lz77.d

+
+//import core.stdc.stdio : printf;
+
+private struct CircularBuffer(U: T[dim], T, size_t dim)


this should migrate to std.range

Please do not put this in std.range or even std.container. Very rarely does a module need a circular buffer. The module situation is bad enough as it is, let's not make it worse.

Put it in std.circularbuffer. That way, this module can import just CircularBuffer and not the whole of Phobos. Hopefully at some point we can revert the mistake of having a std.container module, which was destined from the get-go to be a monster containing every conceivable data structure. Phobos really needs to be more organised and we might as well start now.

WalterBright · 2013-06-10T05:19:39Z

The putBits(T)(T bits...) does not compile. The only thing that does work is:

void putBits(bits...)() {
    foreach(b; bits) putBit(b);
}

putBits!(1,0,1)();

yebblies · 2013-06-10T05:25:40Z

void putBits(T...)(T bits) {
    foreach(b; bits) putBit(b);
}

putBits(1,0,1);

dnadlinger · 2013-06-10T14:15:30Z

std/compression/lz77.d

+                            {
+                                // Reached the end of the source
+                                static if (arrayLike)
+                                {   if (si != src.length)


enforceEx!CompressException(…)? ;)

WalterBright · 2013-06-10T16:26:54Z

Thanks, yebblies! That's the right form, but the code generated is worse (compiler doesn't do a good job inlining it).

yebblies · 2013-06-13T00:24:18Z

Thanks, yebblies! That's the right form, but the code generated is worse (compiler doesn't do a good job inlining it).

Ah, I'll bet this is a dmd-only problem.

Please, PLEASE reconsider naming the functions something other than 'compress' and 'expand'. We have great tools to resolve conflicting functions, but generic names increase the effort required to understand code, and conflicts ruin ufcs.

dnadlinger · 2013-06-13T05:58:01Z

@yebblies: Seems like you are rather alone with that opinion.

yebblies · 2013-06-13T06:04:52Z

@klickverbot Only until I convince somebody else!

ghost · 2013-06-13T12:40:34Z

He's not alone in this, this should be inside of some struct. expand is common, for example it's used in std.typetuple, and in the DRanges library.

But more importantly, if someone adds another compression engine (say lz78), how are you going to use multiple compression engines at once? You'd have to fully qualify everything. expand for example doesn't even verify at compile-time that the input range is actually LZ77 compressed to begin with.

It's better to be type-safe than sorry (you could use std.compression.lz77.compress and then accidentally use std.compression.lz78.expand).

dnadlinger · 2013-06-13T16:12:19Z

@AndrejMitrovic: Type safety has nothing to do with this, and it's as easy to typo an import statement as it is to misspell a function call.

ghost · 2013-06-13T22:58:39Z

@klickverbot: It's not about typing code, it's about reading code.

With array.lz77.compress it's easier to spot the bug.

dnadlinger · 2013-06-13T22:59:35Z

It's equally easy to spot a module importing lz78 when it should import lz77.

ghost · 2013-06-13T23:58:42Z

Okie.

mihails-strasuns · 2014-02-03T19:50:47Z

@WalterBright are you interested in proceeding with this proposal in Phobos queue after this release is out? I will mark it as "On hold" otherwise until you can spend time on it or find another champion.

ghost · 2014-02-15T17:02:20Z

If he's gonna work on it he can reopen it, otherwise this is just adding to the black statistic of open pulls.

WalterBright · 2014-05-27T19:30:39Z

How about putting this in std.experimental?

yebblies · 2014-05-27T19:41:45Z

How about putting this in std.experimental?

How about putting this on code.dlang.org? (instead or as well)

mihails-strasuns · 2014-05-28T14:21:54Z

There is no point in adding to std.experimental module you are not ready to actively support. And in that case there is no reason it can't go through review process either.

complexmath · 2014-05-28T16:10:41Z

I'd personally like to see std.experimental replaced by distribution via Dub. Maintaining experimental code in the main repo doesn't seem terribly appealing.

wilzbach · 2016-03-30T07:36:37Z

@WalterBright please put it on dub, so your work doesn't get lost!

kyllingstad reviewed Jun 7, 2013
View reviewed changes

std/compression/lz77.d

* InputRange

*/

auto compress(R)(R src)

Copy link

Contributor

kyllingstad Jun 7, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... if (isInputRange!R && is(ElementType!R : ubyte))

dnadlinger reviewed Jun 8, 2013
View reviewed changes

ghost reviewed Jun 9, 2013
View reviewed changes

andralex reviewed Jun 9, 2013
View reviewed changes

add std.compression.lz77

0b090b2

dnadlinger reviewed Jun 10, 2013
View reviewed changes

ghost closed this Feb 15, 2014

This pull request was closed.


		//import core.stdc.stdio : printf;

		private struct CircularBuffer(U: T[dim], T, size_t dim)

Uh oh!

add std.compression.lz77 #1335

add std.compression.lz77 #1335

Uh oh!

Conversation

WalterBright commented Jun 6, 2013

Uh oh!

korobochka commented Jun 7, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JakobOvrum commented Jun 7, 2013

Uh oh!

andralex commented Jun 7, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WalterBright commented Jun 8, 2013

Uh oh!

andralex commented Jun 8, 2013

Uh oh!

WalterBright commented Jun 8, 2013

Uh oh!

WalterBright commented Jun 9, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andralex commented Jun 9, 2013

Uh oh!

WalterBright commented Jun 9, 2013

Uh oh!

kyllingstad commented Jun 9, 2013

Uh oh!

WalterBright commented Jun 9, 2013

Uh oh!

andralex commented Jun 9, 2013

Uh oh!

andralex commented Jun 9, 2013

Uh oh!

andralex commented Jun 9, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WalterBright commented Jun 10, 2013

Uh oh!

yebblies commented Jun 10, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WalterBright commented Jun 10, 2013

Uh oh!

yebblies commented Jun 13, 2013

Uh oh!

dnadlinger commented Jun 13, 2013

Uh oh!

yebblies commented Jun 13, 2013

Uh oh!

ghost commented Jun 13, 2013

Uh oh!

dnadlinger commented Jun 13, 2013

Uh oh!

ghost commented Jun 13, 2013

Uh oh!

dnadlinger commented Jun 13, 2013