Make Ranges Chainable through Concatenation Operator ~ #3353

nordlow · 2015-06-01T12:31:51Z

This allows to write more natural and readable code like:

iota(5) ~ filter!q{a}([1,0,3])
map!abs([1,2,-5]) ~ [10, 20]

instead of more noisy:

chain(iota(5), filter!q{a}([1,0,3]))
chain(map!abs([1,2,-5]), [10, 20])

Publishing this at an early stage to get quick feedback.

For details see

http://forum.dlang.org/thread/cxgrlifaovdlwzbwsmsp@forum.dlang.org#post-bcazbwztlspwzpvlrblm:40forum.dlang.org

One question:

Should

a ~ b ~ c

evaluate to a single instance of chain as

chain(a, b, c)

or to

chain(chain(a, b), c)

?

schuetzm · 2015-06-01T14:38:14Z

std/range/package.d

@@ -1106,6 +1106,16 @@ if (Ranges.length > 0 &&
                    }
                    return result;
                }
+
+            auto opCat(Range)(Range r)


opCat() is D1, please us opBinary(string op : "~")() instead.

schveiguy · 2015-06-01T14:47:08Z

I'm against this change. I think regardless of the noise, there are some safety issues with chaining that are left unsolved.

There is also the problem of using arrays as ranges, as arr ~ arr is not going to create a chain, but rather a newly allocated array. This subtle difference is going to cause problems for generic code, and also for expectations. Consider that arr ~ arr is ALWAYS safe, because it allocates a new GC memory block, but range ~ range is going to have to consider whether the underlying referenced memory is stack allocated.

This is akin to me to having .length do what .walkLength does for the sake of reducing noise.

I realize I might be pissing in the wind here, as Walter made it a part of his presentation at dconf2015 to have this done :)

WalterBright · 2015-06-01T17:08:20Z

std/range/package.d

@@ -1106,6 +1106,16 @@ if (Ranges.length > 0 &&
                    }
                    return result;
                }
+


Why is this not doing the mixin?

Because I needed a specialized logic for flattening of the chain-tree as mentioned below.

If this flattening is not needed we can reuse Chainable here, of course.

If not, I guess it's better to move this tree-flattening logic to Chainable...aah but because Result is a Voldemort type I don't think this is possible.

My plan now instead is to do something like

auto opBinary(string op : "~", Range)(Range r) if (isInputRange!(Unqual!Range)) { import std.range : chain; static if (isInstanceOf(Range, Result)) // if rhs r is a chain { return chain(source, r.source); // flatten chain-tree } else { return chain(source, r); // flatten chain-tree } }

Is there some way to check if rhs argument r is a chain?

What do you say?

Is this in vain?

WalterBright · 2015-06-01T17:14:47Z

I think the usage of template mixins is clever.
chain(a,b,c) is not really better than chain(a,chain(b,c)) from an implementation standpoint.

There is also the problem of using arrays as ranges, as arr ~ arr is not going to create a chain, but rather a newly allocated array. This subtle difference is going to cause problems for generic code, and also for expectations. Consider that arr ~ arr is ALWAYS safe, because it allocates a new GC memory block, but range ~ range is going to have to consider whether the underlying referenced memory is stack allocated.

That's a good point I had not considered. You're right that r~r will return a value that is local on the stack.

nordlow · 2015-06-01T20:42:00Z

@WalterBright Isn't chain(a,b,c) cheaper than chain(a,chain(b,c)) to represent memory-wise in DMD and perhaps also in generated code? Or are D's optimizers really that good?

Four ranges instead of five, in this case.

Should I remove this flattening of the chain-tree?

nordlow · 2015-06-05T12:31:56Z

@schveiguy Can you give a concrete code example of the problem with the stack allocation you mentioned above?

schveiguy · 2015-06-05T13:47:24Z

Sure:

import std.range;

void main()
{
   uint[4] arr1, arr2;
   auto chain1 = arr1[] ~ take(arr2[], 2); // allocates
   auto chain2 = arr1[] ~ retro(arr2[0..2]); // doesn't allocate?
}

The fact that chain1 has infinite lifetime and chain2 has (should have) scoped lifetime is not apparent. The subtle difference makes chain2 less safe, which means either we have to disallow more things with it (inexplicably to the user), or we must put warning signs everywhere. These kinds of bugs easily creep in as code evolves, or with genericity that doesn't foresee what types of possible args will be passed in.

However, it's quite obvious that when one explicitly calls chain that one is looking for a non-allocated range, and expects to be able to do certain things with it. I'd like to keep the ~ operator to mean that one wants a concatenation on the heap.

MetaLang · 2015-06-05T13:59:21Z

What about only defining opCat for ranges that are of the same type? Then the problem you describe disappears.

schveiguy · 2015-06-05T14:05:38Z

Imagine a function that at one point chains two ranges together. Each satisfies isInputRange. Who is going to "specialize" on two homogenous range types in order to call ~, vs. just writing one function that uses chain directly?

nordlow · 2015-06-11T12:27:11Z

@schveiguy What do you think about @MetaLang 's latest comment?

schveiguy · 2015-06-11T14:36:00Z

What do you think about @MetaLang 's latest comment?

I may not have been specific, but my latest comment was in reply to that.

nordlow · 2015-06-15T07:55:49Z

@schveiguy, can you please give a code example of your comment containing:

Imagine a function that at one point chains two ranges together. Each satisfies isInputRange. Who is going to "specialize" on two homogenous range types in order to call ~, vs. just writing one function that uses chain directly?

that shows what is possible with chain that is not possible with overloading on ~ operator.

nordlow · 2015-06-15T08:00:03Z

@schveiguy, in what way is

chain2 less safe than chain1, provided that

{
   uint[4] arr1, arr2;
   auto chain1 = arr1[] ~ take(arr2[], 2); // allocates
   auto chain2 = arr1[] ~ retro(arr2[0..2]); // doesn't allocate?
}

?

When returned from a function?

Code example, please.

schveiguy · 2015-06-16T20:56:19Z

@schveiguy, in what way is chain2 less safe than chain1

chain2 has a reference to stack data. chain1 does not. But the compiler doesn't see the underlying storage given the type. So yeah, if you return it, it would be corrupt data.

The best we could do is disallow the line for creating chain2 (not at the point it's returned, because it's too late then). IMO, this leaves a huge wart in the language where ~ just doesn't work sometimes, arbitrarily to the user. I prefer the stance that ~ allocates an array (and it would be cool if this also works with ranges too), and chain(...) makes a non-allocating combination range.

I'm also finding that I look at things these days more with an eye towards "how can this code change over time", and I can easily see someone replacing an array with a non-array range (or vice versa), changing the meaning of ~ in a statement without knowing it.

can you please give a code example of your comment containing

I could, but don't really want to :) It's tedious to invent such an example. But my point simply is, nobody is going to take advantage of ~ working only between same-range-types. If they are writing generic code (and let's face it, if you are using ranges, you are using generic code), they'll just write one version that uses chain directly. With your original proposal, you'd have to at least specialize on arrays (which are valid ranges) to call chain directly instead of using ~.

nordlow · 2015-06-25T10:14:42Z

IMHO, the root of the problem here is that chain2 is allowed it to be returned from a function. AFAICT this is incorrect with or without operator overloading.

Should/could this be disallowed to compile, @WalterBright @andralex ?

If this could be made to be disallowed would operator overloading still be a good idea if we gave DMD user diagnostics some love, @schveiguy ?

Is this somehow related to DIP-25, @WalterBright ?

What about extending DMD with some static analysis to figure out if a data structure recursively contains any references to stack allocated data, @WalterBright ?

Is there perhaps already some DMD logic that detects this, @WalterBright ?

schuetzm · 2015-06-25T11:57:12Z

FWIW, under my proposal the compiler would correctly detect that it refers to stack data and would prevent returning it (or for that matter, would already disallow construction of the range).

nordlow · 2015-06-25T12:18:38Z

What do you mean with your proposal?

schveiguy · 2015-06-25T13:26:48Z

If this could be made to be disallowed would operator overloading still be a good idea if we gave DMD user diagnostics some love

I don't think so. What happens is that you have a seemingly arbitrary decision as to whether ~ allocates or not. I don't like how that would work.

schuetzm · 2015-06-25T15:07:38Z

What do you mean with your proposal?

http://wiki.dlang.org/User:Schuetzm/scope3

nordlow · 2015-06-25T18:40:40Z

@schveiguy, why is this an issue now that we have @nogc?

schveiguy · 2015-06-25T18:57:10Z

It's an issue with what ~ means. It's why we have different names, for instance, for length and walkLength. Both give the same result, but instantly I can tell one's performance is better than the other's. In this case, ~ means "allocate a GC-backed concatenation of the data", but you are wanting it to mean "string these ranges together by reference". The meanings contradict each other, so that those who are used to ~ meaning concatenation must beware of safety issues, and those (future D coders) who are used to it meaning "chain this together" must be worried about performance issues.

But really, the issue is the silent changing of behavior. When I use one range type, it does one thing, but another range type does another thing. If I change one range type (let's say I really wanted the retro of an array), then all of a sudden the behavior changes.

With ~ meaning concatenation, and chain meaning chaining, I can distinguish instantly what is going on, and I don't need to investigate what the compiler thinks about the ranges. Why do we want to eliminate that? Is chain(r1, r2) so much more horrendous than r1 ~ r2?

On top of that, it's up to the implementer of a range to continue this trend. If I write r1 ~ r2, I'm at the mercy of the definer of r1 to implement this to really mean chain(r1, r2). If I write chain(r1, r2), then r1 has no say in what that means. And given that ~ means concatenation (it was hooked via opCat originally), I think an author actually defining it to mean concatenation is not abusing that definition! Then what happens? What happens if the range author simply doesn't define it? calling chain directly is just superior in every way except some perceived syntax drawbacks.

nordlow · 2015-06-27T11:18:01Z

@schveiguy, thanks for you thorough answer. Now I understand your standpoint. I guess we'll have to see what the other people think about this.

For the record: AFAIK, chain is already safe when dealing with stack arrays because

    import std.range: chain;
    int[3] x = [1, 2, 3];
    auto y = x[0 .. 2]; // allocates copy of stack on the heap
    static assert(__traits(compiles, { chain(y, y); }));
    static assert(!__traits(compiles, { chain(x, y); }));

But, of course, it is not as efficient as it could be in the case when chain result is not leaked outside of the current scope (function).

Is there some unsafe special case I've missed? If so, example please, @schveiguy.

quickfur · 2015-06-30T16:05:31Z

I agree with @schveiguy, this is not a good change. It overloads the meaning of ~ in a way that may become ambiguous.This is an example of operator overloading taken too far. I think we should stick to chain(a,b,c).

DmitryOlshansky · 2015-06-30T19:46:05Z

Agreed with @quickfur and @schveiguy . a.chain(b).chain(c) is not really harder then a~b~c and doesn't screw up with peoples expectations. Let's shoot this down.

DmitryOlshansky · 2015-07-09T10:47:39Z

For the record: AFAIK, chain is already safe when dealing with stack arrays because

Of course, not. In the example shown you just stay within the right stack frame.

See this:

import std.algorithm, std.range, std.stdio;

void blowup(T)(T range){
    int[1024] stomp;
    stomp[] = 0xFF;
    foreach(v; range){
        writeln(v);
    }
}

auto theFuse(){
    int[3] data = [1,2,3];
    return chain(data[0..1],data[1..2],data[2..3]);
}

void main(){
    blowup(theFuse());
}

Output:

1637816
1
1637824

If we hide such horrid bugs behind built-in operator we would inflict severe damage to our reputation.

nordlow · 2015-07-09T11:38:14Z

@DmitryOlshansky You're right. My mistake.

First try

38b1d3b

nordlow force-pushed the chainable branch from 7d1f7a0 to 38b1d3b Compare June 1, 2015 12:40

nordlow added 5 commits June 1, 2015 15:35

Support consecutive chains

1dd3bfb

Support map

db59e25

Support FilterBidiResult

19892ff

Support uniq

ea15bd4

Support chain

a359666

schuetzm reviewed Jun 1, 2015
View reviewed changes

WalterBright reviewed Jun 1, 2015
View reviewed changes

nordlow added 14 commits June 1, 2015 22:52

Use opBinary instead of opCat

007ed96

Add first try at documentation

74f3649

Fix bug in cache unittest

592ffcc

Support splitter

07831aa

Use mixin joiner Result but comment out failing unittest

1e3ff0b

Support choose and chooseAmong

a8be293

Fix failing unittest for chooseAmong

895d9d5

Support roundRobin

92f0d6b

Add unittest for radial

3593280

Support take

6612f0b

Support takeExactly

110213a

Support takeOne

6205e83

Support take

756de79

Assert that infinite ranges are only placed at the end of a chain

6753e89

nordlow changed the title ~~Make Ranges Chainable through opCat~~ Make Ranges Chainable through Concatenation Operator ~ Jun 5, 2015

DmitryOlshansky closed this Jul 9, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Ranges Chainable through Concatenation Operator ~ #3353

Make Ranges Chainable through Concatenation Operator ~ #3353

nordlow commented Jun 1, 2015

schuetzm Jun 1, 2015

nordlow Jun 1, 2015

schveiguy commented Jun 1, 2015

WalterBright Jun 1, 2015

nordlow Jun 1, 2015

WalterBright commented Jun 1, 2015

nordlow commented Jun 1, 2015

nordlow commented Jun 5, 2015

schveiguy commented Jun 5, 2015

MetaLang commented Jun 5, 2015

schveiguy commented Jun 5, 2015

nordlow commented Jun 11, 2015

schveiguy commented Jun 11, 2015

nordlow commented Jun 15, 2015

nordlow commented Jun 15, 2015

schveiguy commented Jun 16, 2015

nordlow commented Jun 25, 2015

schuetzm commented Jun 25, 2015

nordlow commented Jun 25, 2015

schveiguy commented Jun 25, 2015

schuetzm commented Jun 25, 2015

nordlow commented Jun 25, 2015

schveiguy commented Jun 25, 2015

nordlow commented Jun 27, 2015

quickfur commented Jun 30, 2015

DmitryOlshansky commented Jun 30, 2015

DmitryOlshansky commented Jul 9, 2015

nordlow commented Jul 9, 2015

Make Ranges Chainable through Concatenation Operator ~ #3353

Make Ranges Chainable through Concatenation Operator ~ #3353

Conversation

nordlow commented Jun 1, 2015

schuetzm Jun 1, 2015

Choose a reason for hiding this comment

nordlow Jun 1, 2015

Choose a reason for hiding this comment

schveiguy commented Jun 1, 2015

WalterBright Jun 1, 2015

Choose a reason for hiding this comment

nordlow Jun 1, 2015

Choose a reason for hiding this comment

WalterBright commented Jun 1, 2015

nordlow commented Jun 1, 2015

nordlow commented Jun 5, 2015

schveiguy commented Jun 5, 2015

MetaLang commented Jun 5, 2015

schveiguy commented Jun 5, 2015

nordlow commented Jun 11, 2015

schveiguy commented Jun 11, 2015

nordlow commented Jun 15, 2015

nordlow commented Jun 15, 2015

schveiguy commented Jun 16, 2015

nordlow commented Jun 25, 2015

schuetzm commented Jun 25, 2015

nordlow commented Jun 25, 2015

schveiguy commented Jun 25, 2015

schuetzm commented Jun 25, 2015

nordlow commented Jun 25, 2015

schveiguy commented Jun 25, 2015

nordlow commented Jun 27, 2015

quickfur commented Jun 30, 2015

DmitryOlshansky commented Jun 30, 2015

DmitryOlshansky commented Jul 9, 2015

nordlow commented Jul 9, 2015