Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Ranges Chainable through Concatenation Operator ~ #3353

Closed
wants to merge 20 commits into from

Conversation

nordlow
Copy link
Contributor

@nordlow nordlow commented Jun 1, 2015

This allows to write more natural and readable code like:

iota(5) ~ filter!q{a}([1,0,3])
map!abs([1,2,-5]) ~ [10, 20]

instead of more noisy:

chain(iota(5), filter!q{a}([1,0,3]))
chain(map!abs([1,2,-5]), [10, 20])

Publishing this at an early stage to get quick feedback.

For details see

http://forum.dlang.org/thread/cxgrlifaovdlwzbwsmsp@forum.dlang.org#post-bcazbwztlspwzpvlrblm:40forum.dlang.org

One question:

Should

a ~ b ~ c

evaluate to a single instance of chain as

chain(a, b, c)

or to

chain(chain(a, b), c)

?

@@ -1106,6 +1106,16 @@ if (Ranges.length > 0 &&
}
return result;
}

auto opCat(Range)(Range r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opCat() is D1, please us opBinary(string op : "~")() instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@schveiguy
Copy link
Member

I'm against this change. I think regardless of the noise, there are some safety issues with chaining that are left unsolved.

There is also the problem of using arrays as ranges, as arr ~ arr is not going to create a chain, but rather a newly allocated array. This subtle difference is going to cause problems for generic code, and also for expectations. Consider that arr ~ arr is ALWAYS safe, because it allocates a new GC memory block, but range ~ range is going to have to consider whether the underlying referenced memory is stack allocated.

This is akin to me to having .length do what .walkLength does for the sake of reducing noise.

I realize I might be pissing in the wind here, as Walter made it a part of his presentation at dconf2015 to have this done :)

@@ -1106,6 +1106,16 @@ if (Ranges.length > 0 &&
}
return result;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not doing the mixin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I needed a specialized logic for flattening of the chain-tree as mentioned below.

If this flattening is not needed we can reuse Chainable here, of course.

If not, I guess it's better to move this tree-flattening logic to Chainable...aah but because Result is a Voldemort type I don't think this is possible.

My plan now instead is to do something like

            auto opBinary(string op : "~", Range)(Range r)
            if (isInputRange!(Unqual!Range))
            {
                import std.range : chain;
                static if (isInstanceOf(Range, Result)) // if rhs r is a chain
                {
                    return chain(source, r.source); // flatten chain-tree
                }
                else
                {
                    return chain(source, r); // flatten chain-tree
                }
            }

Is there some way to check if rhs argument r is a chain?

What do you say?

Is this in vain?

@WalterBright
Copy link
Member

I think the usage of template mixins is clever.
chain(a,b,c) is not really better than chain(a,chain(b,c)) from an implementation standpoint.

There is also the problem of using arrays as ranges, as arr ~ arr is not going to create a chain, but rather a newly allocated array. This subtle difference is going to cause problems for generic code, and also for expectations. Consider that arr ~ arr is ALWAYS safe, because it allocates a new GC memory block, but range ~ range is going to have to consider whether the underlying referenced memory is stack allocated.

That's a good point I had not considered. You're right that r~r will return a value that is local on the stack.

@nordlow
Copy link
Contributor Author

nordlow commented Jun 1, 2015

@WalterBright Isn't chain(a,b,c) cheaper than chain(a,chain(b,c)) to represent memory-wise in DMD and perhaps also in generated code? Or are D's optimizers really that good?

Four ranges instead of five, in this case.

Should I remove this flattening of the chain-tree?

@nordlow
Copy link
Contributor Author

nordlow commented Jun 5, 2015

@schveiguy Can you give a concrete code example of the problem with the stack allocation you mentioned above?

@schveiguy
Copy link
Member

Sure:

import std.range;

void main()
{
   uint[4] arr1, arr2;
   auto chain1 = arr1[] ~ take(arr2[], 2); // allocates
   auto chain2 = arr1[] ~ retro(arr2[0..2]); // doesn't allocate?
}

The fact that chain1 has infinite lifetime and chain2 has (should have) scoped lifetime is not apparent. The subtle difference makes chain2 less safe, which means either we have to disallow more things with it (inexplicably to the user), or we must put warning signs everywhere. These kinds of bugs easily creep in as code evolves, or with genericity that doesn't foresee what types of possible args will be passed in.

However, it's quite obvious that when one explicitly calls chain that one is looking for a non-allocated range, and expects to be able to do certain things with it. I'd like to keep the ~ operator to mean that one wants a concatenation on the heap.

@MetaLang
Copy link
Member

MetaLang commented Jun 5, 2015

What about only defining opCat for ranges that are of the same type? Then the problem you describe disappears.

@schveiguy
Copy link
Member

Imagine a function that at one point chains two ranges together. Each satisfies isInputRange. Who is going to "specialize" on two homogenous range types in order to call ~, vs. just writing one function that uses chain directly?

@nordlow nordlow changed the title Make Ranges Chainable through opCat Make Ranges Chainable through Concatenation Operator ~ Jun 5, 2015
@nordlow
Copy link
Contributor Author

nordlow commented Jun 11, 2015

@schveiguy What do you think about @MetaLang 's latest comment?

@schveiguy
Copy link
Member

What do you think about @MetaLang 's latest comment?

I may not have been specific, but my latest comment was in reply to that.

@nordlow
Copy link
Contributor Author

nordlow commented Jun 15, 2015

@schveiguy, can you please give a code example of your comment containing:

Imagine a function that at one point chains two ranges together. Each satisfies isInputRange. Who is going to "specialize" on two homogenous range types in order to call ~, vs. just writing one function that uses chain directly?

that shows what is possible with chain that is not possible with overloading on ~ operator.

@nordlow
Copy link
Contributor Author

nordlow commented Jun 15, 2015

@schveiguy, in what way is

chain2 less safe than chain1, provided that

{
   uint[4] arr1, arr2;
   auto chain1 = arr1[] ~ take(arr2[], 2); // allocates
   auto chain2 = arr1[] ~ retro(arr2[0..2]); // doesn't allocate?
}

?

When returned from a function?

Code example, please.

@schveiguy
Copy link
Member

@schveiguy, in what way is chain2 less safe than chain1

chain2 has a reference to stack data. chain1 does not. But the compiler doesn't see the underlying storage given the type. So yeah, if you return it, it would be corrupt data.

The best we could do is disallow the line for creating chain2 (not at the point it's returned, because it's too late then). IMO, this leaves a huge wart in the language where ~ just doesn't work sometimes, arbitrarily to the user. I prefer the stance that ~ allocates an array (and it would be cool if this also works with ranges too), and chain(...) makes a non-allocating combination range.

I'm also finding that I look at things these days more with an eye towards "how can this code change over time", and I can easily see someone replacing an array with a non-array range (or vice versa), changing the meaning of ~ in a statement without knowing it.

can you please give a code example of your comment containing

I could, but don't really want to :) It's tedious to invent such an example. But my point simply is, nobody is going to take advantage of ~ working only between same-range-types. If they are writing generic code (and let's face it, if you are using ranges, you are using generic code), they'll just write one version that uses chain directly. With your original proposal, you'd have to at least specialize on arrays (which are valid ranges) to call chain directly instead of using ~.

@nordlow
Copy link
Contributor Author

nordlow commented Jun 25, 2015

IMHO, the root of the problem here is that chain2 is allowed it to be returned from a function. AFAICT this is incorrect with or without operator overloading.

Should/could this be disallowed to compile, @WalterBright @andralex ?

If this could be made to be disallowed would operator overloading still be a good idea if we gave DMD user diagnostics some love, @schveiguy ?

Is this somehow related to DIP-25, @WalterBright ?

What about extending DMD with some static analysis to figure out if a data structure recursively contains any references to stack allocated data, @WalterBright ?

Is there perhaps already some DMD logic that detects this, @WalterBright ?

@schuetzm
Copy link
Contributor

FWIW, under my proposal the compiler would correctly detect that it refers to stack data and would prevent returning it (or for that matter, would already disallow construction of the range).

@nordlow
Copy link
Contributor Author

nordlow commented Jun 25, 2015

What do you mean with your proposal?

@schveiguy
Copy link
Member

If this could be made to be disallowed would operator overloading still be a good idea if we gave DMD user diagnostics some love

I don't think so. What happens is that you have a seemingly arbitrary decision as to whether ~ allocates or not. I don't like how that would work.

@schuetzm
Copy link
Contributor

What do you mean with your proposal?

http://wiki.dlang.org/User:Schuetzm/scope3

@nordlow
Copy link
Contributor Author

nordlow commented Jun 25, 2015

@schveiguy, why is this an issue now that we have @nogc?

@schveiguy
Copy link
Member

It's an issue with what ~ means. It's why we have different names, for instance, for length and walkLength. Both give the same result, but instantly I can tell one's performance is better than the other's. In this case, ~ means "allocate a GC-backed concatenation of the data", but you are wanting it to mean "string these ranges together by reference". The meanings contradict each other, so that those who are used to ~ meaning concatenation must beware of safety issues, and those (future D coders) who are used to it meaning "chain this together" must be worried about performance issues.

But really, the issue is the silent changing of behavior. When I use one range type, it does one thing, but another range type does another thing. If I change one range type (let's say I really wanted the retro of an array), then all of a sudden the behavior changes.

With ~ meaning concatenation, and chain meaning chaining, I can distinguish instantly what is going on, and I don't need to investigate what the compiler thinks about the ranges. Why do we want to eliminate that? Is chain(r1, r2) so much more horrendous than r1 ~ r2?

On top of that, it's up to the implementer of a range to continue this trend. If I write r1 ~ r2, I'm at the mercy of the definer of r1 to implement this to really mean chain(r1, r2). If I write chain(r1, r2), then r1 has no say in what that means. And given that ~ means concatenation (it was hooked via opCat originally), I think an author actually defining it to mean concatenation is not abusing that definition! Then what happens? What happens if the range author simply doesn't define it? calling chain directly is just superior in every way except some perceived syntax drawbacks.

@nordlow
Copy link
Contributor Author

nordlow commented Jun 27, 2015

@schveiguy, thanks for you thorough answer. Now I understand your standpoint. I guess we'll have to see what the other people think about this.

For the record: AFAIK, chain is already safe when dealing with stack arrays because

    import std.range: chain;
    int[3] x = [1, 2, 3];
    auto y = x[0 .. 2]; // allocates copy of stack on the heap
    static assert(__traits(compiles, { chain(y, y); }));
    static assert(!__traits(compiles, { chain(x, y); }));

But, of course, it is not as efficient as it could be in the case when chain result is not leaked outside of the current scope (function).

Is there some unsafe special case I've missed? If so, example please, @schveiguy.

@quickfur
Copy link
Member

I agree with @schveiguy, this is not a good change. It overloads the meaning of ~ in a way that may become ambiguous.This is an example of operator overloading taken too far. I think we should stick to chain(a,b,c).

@DmitryOlshansky
Copy link
Member

Agreed with @quickfur and @schveiguy . a.chain(b).chain(c) is not really harder then a~b~c and doesn't screw up with peoples expectations. Let's shoot this down.

@DmitryOlshansky
Copy link
Member

For the record: AFAIK, chain is already safe when dealing with stack arrays because

Of course, not. In the example shown you just stay within the right stack frame.

See this:

import std.algorithm, std.range, std.stdio;

void blowup(T)(T range){
    int[1024] stomp;
    stomp[] = 0xFF;
    foreach(v; range){
        writeln(v);
    }
}

auto theFuse(){
    int[3] data = [1,2,3];
    return chain(data[0..1],data[1..2],data[2..3]);
}

void main(){
    blowup(theFuse());
}

Output:

1637816
1
1637824

If we hide such horrid bugs behind built-in operator we would inflict severe damage to our reputation.

@nordlow
Copy link
Contributor Author

nordlow commented Jul 9, 2015

@DmitryOlshansky You're right. My mistake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants