Issue 6798 - Integrate overloadings for multidimensional indexing and slicing #443

Merged
merged 10 commits into from Mar 16, 2014
@9rnsr
D Programming Language member

http://d.puremagic.com/issues/show_bug.cgi?id=6798

This patch is additional enhancement of opDollar (issue 3474 and #442).
Enable the mixing operator overloadings of indexing and slicing

a[$-1, 2..$] is translated to a.opIndex(a.opDollar!0 - 1, a.opSlice!1(2, a.opDollar!1))

The slicing lwr..upr inside bracket is converted to a.opSlice!(dimension)(lwr, upr).
This enhancement never break existing codes.

expression newly added overloading exists overloading
a[i0, ...] a.opIndex(i0, ...)
a[] a.opIndex() a.opSlice()
a[l..u] a.opIndex(a.opSlice!0(l, u)) a.opSlice(l, u)
a[l..u, ...] a.opIndex(a.opSlice!0(l, u), ...)
op a[i0, ...] a.opIndexUnary!op(i0, ...)
op a[] a.opIndexUnary!op() a.opSliceUnary!op()
op a[l..u] a.opIndexUnary!op(a.opSlice!0(l, u)) a.opSliceUnary!op(l, u)
op a[l..u, ...] a.opIndexUnary!op(a.opSlice!0(l, u), ...)
a[i0, ...] = v a.opIndexAssign(v, i0, ...)
a[] = v a.opIndexAssign(v) a.opSliceAssign(v)
a[l..u] = v a.opIndexAssign(v, a.opSlice!0(l, u)) a.opSliceAssign(v, l, u)
a[l..u, ...] = v a.opIndexAssign(v, a.opSlice!0(l, u), ...)
a[i0, ...] op= v a.opIndexOpAssign!op(v, i0, ...)
a[] op= v a.opIndexOpAssign!op(v) a.opSliceOpAssign!op(v)
a[l..u] op= v a.opIndexOpAssign!op(v, a.opSlice!0(l, u)) a.opSliceOpAssign!op(v, l, u)
a[l..u, ...] op= v a.opIndexOpAssign!op(v, a.opSlice!0(l, u), ...)
@braddr
D Programming Language member

This patch is failing on all 64 bit platforms:

http://d.puremagic.com/test-results/pull.ghtml?runid=12403
../druntime/import/core/stdc/stdarg.di(250): Error: cannot implicitly convert expression (parmn[0LU..tsize]) of type void[] to ulong
../druntime/import/core/stdc/stdarg.di(250): Error: cannot implicitly convert expression (p[0LU..tsize]) of type void[] to ulong
../druntime/import/core/stdc/stdarg.di(250): Error: expression p[(__error)] is void and has no value
../druntime/import/core/stdc/stdarg.di(283): Error: cannot implicitly convert expression (parmn[0LU..tsize]) of type void[] to ulong
../druntime/import/core/stdc/stdarg.di(283): Error: cannot implicitly convert expression (p[0LU..tsize]) of type void[] to ulong
../druntime/import/core/stdc/stdarg.di(283): Error: expression p[(__error)] is void and has no value
../druntime/import/core/stdc/stdarg.di(292): Error: cannot implicitly convert expression (parmn[0LU..tsize]) of type void[] to ulong
../druntime/import/core/stdc/stdarg.di(292): Error: cannot implicitly convert expression (p[0LU..tsize]) of type void[] to ulong
../druntime/import/core/stdc/stdarg.di(292): Error: expression p[(__error)] is void and has no value
std/format.d(3973): Error: template instance core.stdc.stdarg.va_arg!() error instantiating

@9rnsr
D Programming Language member

Revamped changes.

@andralex
D Programming Language member

(background: I'm doing a review of pull requests < 1000 as they tend to be controversial)

This is nice, self-consistent, and tastefully designed.

The question is scope and intended usage. This is a relatively hefty language addition, with a very narrow clientele. I'm not sure whether those clients actually need this stuff, and if they do, whether the current form is satisfactory to them. (For example one issue is that the slice limits are computed outside the index expression, so presumably they'd make certain usages difficult.) One other issue discussed in the enhancement request is that there might be a need for strides etc.

So I'm torn about this. One one hand it's a work of beauty. On the other it seems of low impact and unclear user base. Thoughts?

@quickfur
D Programming Language member

I vote for merging this.

Or at the very least, some subset of this that supports multidimensional opDollar with slicing. Some background: I'm working on a multidimensional array implementation that uses the variadic form of opIndex to dereference array elements, and it works wonderfully (opDollar works as well). However, slicing syntax currently is not supported, so I've implemented a workaround using opSlice with two array arguments, so one could write A[[0,0,0]..[2,2,2]] for example, but the problem is that opDollar doesn't work correctly ($ always gets mapped to opDollar!0, so A[[0,0,0]..[$,$,$]] does not do the right thing when the array has different lengths along each dimension. I've tried various alternative ways to provide convenient slicing syntax, but none of them can support opDollar correctly.

It would be very nice if slicing can also support variadic arguments, since currently it's a hole in the language -- opIndex supports multidimensional arrays but opSlice doesn't.

@WalterBright
D Programming Language member

I'd like to include with this some code that is commented out for the moment that issues a warning and corrective action when opSlice, opSliceUnary, opSliceAssign, and opSliceOpAssign as those will now be obsolete.

@andralex
D Programming Language member

Ping on this. Now that time has passed we should be a tad wiser so maybe we have new insights in the matter. Thoughts?

@MaksimZh

Slices and strides make code more expressive and can save hours of debugging especially if one deals with matrices.

The design suggested is really good. It doesn't break existing code. It reduces the number of methods one has to implement (e.g. opSliceAssign is not needed anymore). Strides can be added later and will also not break anything.

I just can not understand why it is still not merged.

@9rnsr
D Programming Language member

Sorry I'm failing to rebase this change to git master. Currently this change will break test/runnable/aliasthis.d...

@don-clugston-sociomantic

This needs a rebase (perhaps a rebase -i). There are dozens of unrelated commits in here.

Otherwise, I think this is awesome. This finishes the job started with opDollar, and it's something we've needed to do eventually.

What happens if you do:
void foo( int {] x) { x[2..$, 4] = 7; }
after these changes to the parser? Do you still get a nice error message?

@9rnsr
D Programming Language member

And, now I have a small performance worry. To represent slice, it would need to pack the start and end of the slice.

struct S {
    auto opSlice(int x, int y) { return tuple(x, y); }
    void opIndex(int, Tuple!(int, int) slice) {}
}
S s;
s[1, 2..3];   // 2..3 will always need creating tuple object

Is this acceptable?

@9rnsr
D Programming Language member

OK. I finished rebasing.

@aitzkora

I vote also for merging this! multidimensional slicing is very interesting for people doing numerical computations!

@9rnsr
D Programming Language member

Finished rebasing.

@andralex
D Programming Language member

Is my understanding that this is a large breakage of existing uses of opSlice?

@andralex andralex and 1 other commented on an outdated diff Mar 16, 2014
src/expression.c
for (size_t i = 0; i < ae->arguments->dim; i++)
{
+ if (i == 0)
+ e0 = extractOpDollarSideEffect(sc, ae);
+
+ Expression *e = (*ae->arguments)[i];
+ if (e->op == TOKinterval && !(slice && slice->isTemplateDeclaration()))
+ {
+ Lfallback:
+ if (ae->arguments->dim == 1)
+ {
+ ae->e1 = Expression::combine(e0, ae->e1);
+ return NULL;
+ }
+ else
+ {
+ ae->error("multi-dimentional slicing requires template opSlice");
@andralex
D Programming Language member
andralex added a line comment Mar 16, 2014

s/dimentional/dimensional/

@9rnsr
D Programming Language member
9rnsr added a line comment Mar 16, 2014

Thanks, fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@andralex
D Programming Language member

For the record Walter and I are on board with this language addition as long as there's no breakage of existing code. Thanks!

@CyberShadow
D Programming Language member

The pull description states:

This enhancement never break existing codes.

Looking at the tests, it does seem to break alias-this'd tuples. Kenji confirmed it in a comment:

Currently this change will break test/runnable/aliasthis.d...

@andralex
D Programming Language member

@9rnsr can you work some magic to keep alias this working?

9rnsr added some commits Mar 12, 2014
@9rnsr 9rnsr [Refactoring] ArrayScopeSymbol::search b12585b
@9rnsr 9rnsr [Refactoring] Operator overload handling in AssignExp
Change to recursive call for later fallback mechanism.
74f5e0e
@9rnsr 9rnsr [Refactoring] More better error propagation from resolveOpDollar 05e72b2
@9rnsr 9rnsr Add IntervalExp and fix parser, and disabled test for issue 6789
All of N-dimensional array operations are now translated to ArrayExp.
But currently they are immediately translated to SliceExp, so have no effect.
ca6643e
@9rnsr 9rnsr Translate IntervalExp in ArrayExp::arguments to opSlice!dim(lwr, upr) 6124e8d
@9rnsr 9rnsr Support multi-dimensional opIndex
If it fails, fall back to opSlice for backward compatibility.
1f308ef
@9rnsr 9rnsr Support multi-dimensional opIndexAssign
If it fails, fall back to opSliceAssign for backward compatibility.
1dd0746
@9rnsr 9rnsr Support multi-dimensional opIndexUnary
If it fails, fall back to opSliceUnary for backward compatibility.
d947c75
@9rnsr 9rnsr Support multi-dimensional opIndexOpAssign
If it fails, fall back to opSliceOpAssign for backward compatibility.
b1f0106
@9rnsr 9rnsr Unmask all test for issue 6798 e46e7e5
@9rnsr
D Programming Language member

Is my understanding that this is a large breakage of existing uses of opSlice?

No. New opSlice!dim(lwe, upr) will be preferred, but if it does not exist, old signature opSlice(lwr, upr) is used.

Currently this change will break test/runnable/aliasthis.d

It's an exception. Issue 8735 & 9709 was fixed in 2.065, but in the test case, A[0] should not invoke alias this, because A is the type Tuple9709!(1,int,"foo"), so A[0] should be analyzed as the zero-length static array type. I think that is a bug, so the fix is legitimate.

@andralex
D Programming Language member

OK, great to know! What's with the red though?

@andralex
D Programming Language member

Auto-merge toggled on

@9rnsr
D Programming Language member

What's with the red though?

It seems to be caused by the merge of #2256. This and that are both big, so something would be changed. I'll fix it.

@andralex
D Programming Language member

thanks!

@braddr
D Programming Language member

Pull updated, auto_merge toggled off

@9rnsr
D Programming Language member

Hmm, the red seems to have been fake. Could you re-toggle auto merge?

@andralex
D Programming Language member

Auto-merge toggled on

@andralex andralex merged commit 7148ab2 into dlang:master Mar 16, 2014

1 check passed

Details default Pass: 10
@9rnsr
D Programming Language member

Thanks!

@yebblies yebblies commented on the diff Mar 17, 2014
src/expression.h
@@ -1069,6 +1069,18 @@ class ArrayLengthExp : public UnaExp
void accept(Visitor *v) { v->visit(this); }
};
+struct IntervalExp : Expression
@yebblies
D Programming Language member
yebblies added a line comment Mar 17, 2014

This should have been a class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@Orvid

Akk...... I see a serious design flaw with the multi-dimensional slicing, namely that it's impossible to determine whether the slice is the first or second argument from within the opSlice call, because it would change what the slice actually represents in things such as multi-dimensional arrays. I'm going to put this here for now just because there is an active topic on the mailing list over this, I'll update this momentarily to illustrate the issue better with examples.

So, it appears I have mis-understood what this PR actually does, and instead have the simple question of why? Why is this useful? Why would you want a slice operation to potentially call an opIndex, which is intended for selecting a single element? All I see in this PR is the actual addition of the ability to do multi-dimensional slicing, but even that is using an opIndex, when it should be introducing it as an opSlice. This also will cause issues due to the fact an operator can't currently be overloaded based on return type. There are situations, such as with matrices, that would mean that this syntax would require a dynamic array allocation for each index in order to both allow a user to both do that single dimensional slice and use it, as well as to still allow a multi-dimensional slice to be done.

This isn't the first addition to the D language that's been merged that really shouldn't have been, not without a more public review process. Perhaps requiring a formal review, such as what is required for the addition of Phobos modules, when adding or removing features or behaviors from the D language itself?

@quickfur
D Programming Language member

Huh? This pull has been referenced multiple times over the course of (at least) the past year in the forum. Is that not public enough? Or are we supposed to copy-n-paste the comments / code from the pull to the forum in order to make it more public? I'm confused.

@quickfur
D Programming Language member

Regarding your question about opIndex, the idea here is that opSlice!n(a,b) will return some kind of type that represents a slice from a to b in the n'th position, and opIndex will be overloaded to take arguments of this type and implement the actual slicing this way. The idea is that you should be able to call opSlice to get a type representing an index range, and then you can use that range object in multiple subsequent calls to opIndex. This is more flexible than limiting opSlice to do the actual slicing, then you have to keep track of the endpoints of the range manually each time.

@Orvid

The thing is that the way it's been being done is subjective to the actual reader of the forum topics, by using the same process that phobos modules go through, it produces a non-subjective result, in the form of votes.

@Orvid

But that still doesn't address the fact that your using opIndex, which is intended to retrieve a single element, to retrieve a slice instead. The same overload that is being called for opIndex could be done as an overload for opSlice instead, and would make much more sense to the reader of the code.

@quickfur
D Programming Language member

I don't know why you think that opIndex is only intended to retrieve a single element. In my own multidimensional array implementation, I've done opIndex(Range a, Range b, ...) because it's easier to consolidate mixed slices that way, e.g., opIndex(int a, Range b, int c, ...). It doesn't make sense to me to differentiate between indexing and slicing, because then how would you implement things like arr[1, 0..$, 2, 3, 2..5]? Rather, think of opIndex as implementing a[...], which can either be a single element (all arguments are indices), a full-dimensional slice (i.e., all arguments are ranges), or a sub-dimensional slice (some arguments are indices, some are ranges).

@Orvid

Your use of the ranges there is born out of the lack of a proper way to do it, opIndex should only ever be used to refer to a single element, a sub-dimensional slice is still a slice, and could even be treated as a full-dimensional slice. Whatever way you look at it, they are still slices and thus should be using opSlice.

@quickfur
D Programming Language member

I find the use of opIndex for both purposes more uniform, because it allows generic implementations that handle all cases in a single interface. Subdimensional slicing, in my implementation, actually returns a different type depending on how many arguments are slices and how many are indices (the types are segmented by dimension; the 0-dimensional case aliases to the element type); the case where all are indices is only one of the many possible combinations. Arbitrarily segmenting opIndex to two names introduces a lot of inelegance in the code, because now everything has to specially treat the case where all arguments happen to be indices, whereas now, it simply handles all cases together.

In any case, you're arguing with the wrong person; you should be asking Kenji the rationale for this design, since he's the one who implemented it! :)

@MasonMcGill

Great work folks! I do have a concern, though: it looks like you're bringing Python 2.0-style slice handling to D, even though it turned out there were limitations to that model that could have been avoided with a different design. It worries me that Julia, a language specifically designed ~15 years later to make up for the shortcomings in existing scientific platforms, chose an alternate approach. More discussion of that here:
http://forum.dlang.org/thread/upzdamhmxrrlsexgcdva@forum.dlang.org

It looks like I timed this pretty poorly, but I actually just wrote up a DIP that addresses the problem of "advanced" indexing in a more general way, and enables more MATLAB/Julia-style numerical features, while reducing language complexity and maintaining backwards compatibility:
http://wiki.dlang.org/DIP58

Again, I know the timing's horrible, after all the work that's gone into this, but I think it's important to get numerical syntax right, and I think a bit of discussion about the pros and cons of each approach would do the language some good.

Discussion receptacle:
http://forum.dlang.org/thread/clswrooryjcudncqbkje@forum.dlang.org

@9rnsr
D Programming Language member

@quickfur In the proposal, the member function opIndex will be used for every bracket access syntax (a[], a[i], a[i..j], a[$, i..j], etc). It's just syntactic rewriting rule, and it's not directly related to the code meaning of indexing, as the same as that opDollar is not directly related to the length or size of container.

@9rnsr
D Programming Language member

@MasonMcGill Sometimes .. operator had been proposed in the newsgroup, but it is not yet exist. So this proposal did not consider it.

@MasonMcGill

@9rnsr Understood. My request was for a discussion about why this approach is better than something like http://wiki.dlang.org/DIP58 , and whether this is meant to be the "last stop" for the evolution of numerical syntax in D, or part of a larger transition.

@Orvid

I believe that it is related to the code meaning of indexing, by introducing this under the name of opIndex, it's using the wrong term for the slicing operation that is being performed.

@9rnsr
D Programming Language member

@Orvid I agree it was related. But, by supporting multi-dimensional slicing, a[i..j] could also be considered as one dimension indexing with an interval.
So uniformly using the name opIndex for both a[i..j] and a[i..j, k] would be practically convenient. I don't think it's so big issue.

@don-clugston-sociomantic

@Orvid

I believe that it is related to the code meaning of indexing, by introducing this under the name of opIndex, it's using the wrong term for the slicing operation that is being performed.

IMHO opSlice would not be the correct term. The only sense in which it is a "slice" is that with an n-dimensional array, it doesn't reduce the dimensionality, whereas an "index" normally does. But we don't even enforce those semantics. For example, you could create something with an infinite number of dimensions, and you could make slicing reduce the dimensionality as well.

Really it is a strided slice, which is a totally different beast, neither an index nor a slice, it doesn't have much in common with normal slices. Pragmatically, the opIndex syntax generalizes, whereas the opSlice syntax does not.

@Orvid

While I do see your point that opSlice is not correct in all cases, there are cases where it would be the correct term. This PR adds the ability to use opIndex in places where opSlice was previously the preferred operator. It also adds an overload to opIndex that opSlice should have as well, for use in the cases where opSlice would be the correct term, unless part of the intention of this PR is to start a deprecation path for opSlice in favor of opIndex?

@JakobOvrum
D Programming Language member

Is there a corresponding documentation PR for this?

@quickfur
D Programming Language member

@Orvid No, in this PR opSlice performs the duty of translating an x..y range into an aggregate index object represent that range. This aggregate index is then translated into an actual slicing operation by opIndex. The idea is to unify slicing and indexing, not to deprecate anything over the other.

@9rnsr 9rnsr deleted the 9rnsr:fix6798 branch Apr 15, 2014
@John-Colvin

documentation?

@quickfur
D Programming Language member

I'll see if I can work up a PR for the docs today.

@quickfur quickfur referenced this pull request in dlang/dlang.org Aug 7, 2014
Merged

Document multidimensional array op overloading #625

@quickfur
D Programming Language member
@Orvid Orvid commented on the diff Jan 29, 2015
test/runnable/aliasthis.d
static assert(a[0] == 1);
//static assert(is(A[1] == int));
//static assert(is(a[1] == int));
- static assert(A[2] == "foo");
+ //static assert(A[2] == "foo");
@Orvid
Orvid added a line comment Jan 29, 2015

What was the reasoning behind this? (only noticed it as I went to see how this handled both opSlice and opIndex being assigned)

@9rnsr
D Programming Language member
9rnsr added a line comment Jan 29, 2015

I don't remember the precise reason, but it had been interfere with the new operator overloading rule. And the test case was added without deep thinking when issue 8735 was fixed (actually it was added by me).

So I was disable it again because it would not introduce serious use code breaking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@CyberShadow
D Programming Language member

This pull request introduced a regression:
https://issues.dlang.org/show_bug.cgi?id=14621

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment