Issue 3825 - AAs entries are default initialized before the new value is evaluated #1465

9rnsr · 2013-01-11T14:45:45Z

http://d.puremagic.com/issues/show_bug.cgi?id=3825

don-clugston-sociomantic · 2013-01-11T15:41:53Z

My initial impression was that the oddness comes from e2ir.c. IndexExp::toIR().
There is a funny bit of code that says

if (modifiable) s = taa->aaGetSymbol("GetX", 1);
else s = taa->aaGetSymbol("GetRvalueX", 1);

and this is the only bit of code that uses modifiable (it's set in IndexExp::semantic()). Looks to me like a nasty hack that doesn't actually work...

9rnsr · 2013-01-11T16:06:29Z

What do you mean? I know IndexExp::modifiable field and it is used to distinguish AA element getting and setting. But this change does not touch it. I can agree that It is one of the refactor-able point, but it is not directly related to the bug.

The root cause of bug 3825 is an incorrect runtime evaluation order of array element setting. To fix the problem, I reordered AST in front-end.

andralex · 2013-01-11T22:13:19Z

@9rnsr can we please work this out so it has the same exact semantics as a user-defined version? We'd want expr1[expr2] = expr3 to behave the same way for the built-in hashtable and for a custom-built structure that defines opIndexAssign.

9rnsr · 2013-01-12T06:19:18Z

@andralex Yes. As I commented here,

aa[key] op= val;  // (op == empty or any binary operators)

is rewritten to:

ref __aatmp = aa;
ref __aakey = key;
ref __aaval = val;
__aatmp[__aakey] op= __aaval;  // assignment (allocate and set slot)

Then the runtime evaluation order is equal to user defined opIndex(Op)?Assign call.

monarchdodra · 2013-01-12T12:55:49Z

We were wondering in the message boards, what behavior does this code produce:

int[int] a;
++a[0];

Does this insert 1 in a[0]? Or does it throw a RangeError?

If it does insert 1 in a[0], then are we able to guarantee that a is not mutated if op++ throws?

Thanks.

9rnsr · 2013-01-12T13:13:33Z

Hmm, I didn't considered about that.
With this change, following test passes.

unittest {
    int[int] naa;
    ++naa[0];   // !!
    // This is implicitly translated to: naa[0] += 1;
    // Allocate the slot naa[0] first, set int.init next,
    // and finally increment it.
    assert(naa[0] == 1);

    import core.exception;
    struct S {
        void opUnary(string op : "++")() { assert(0); }
    }
    S[int] saa;
    bool thrown = false;
    try {
        ++saa[0];   // !!
        assert(0);
    } catch (RangeError) thrown = true;
    // Getting slot saa[0] first, then call opUnary!"++".
    // So RangeViolation occurs.
    assert(thrown);
    assert(saa.length == 0);
}

This seems inconsistent behavior to me, but I'm not sure it is unexpected or not...

monarchdodra · 2013-01-12T13:24:00Z

For what it's worth, I think both ++aa[0] and aa[0] += 1 should throw a range error: While they are mutating operations, they mutate something that should be existing.

If it doesn't throw, then ++aa[0] would be inconsistent with aa[0] == a[0] + 1 (which should be equivalent).

9rnsr · 2013-01-12T14:33:42Z

For what it's worth, I think both ++aa[0] and aa[0] += 1 should throw a range error: While they are mutating operations, they mutate something that should be existing.

OK. That seems reasonable. I updated compiler code and test case to implement it.

yebblies · 2013-01-12T16:03:53Z

For what it's worth, I think both ++aa[0] and aa[0] += 1 should throw a range error: While they are mutating operations, they mutate something that should be existing.

OK. That seems reasonable. I updated compiler code and test case to implement it.

I think this is two much, and will break a lot of code.

andralex · 2013-01-12T20:07:33Z

The idiom ++a[n] is widespread in things like word counters and such. Otherwise the efficient usage would be: if (auto p = n in a) ++*p; else a[n] = 1; which is quite a bit more difficult to write and understand.

Therefore this is somewhat of a special case. If the value type is numeric, ++a[n] sets the slot to 1 if it wasn't previously present (i.e. it initializes it to 0 and then increments it). This is easy to also implement in a user-defined type by defining opIndexUnary appropriately.

andralex · 2013-01-12T20:19:12Z

To drive the point home: I think the current behavior is useful and principled enough to justify for its being a special case.

…alue is evaluated

9rnsr · 2013-01-13T01:49:23Z

OK. I reverted recent change, and explicitly added test cases for ++aa[key], aa[key] += n, and so on.

yebblies · 2013-01-13T02:20:08Z

Could you make the tests run with ctfe as well?

9rnsr · 2013-01-13T02:33:17Z

@yebblies Hmm, I tried to it, but following errors are shown:
Error: cannot index null array aax
Error: key 1 not found in associative array aax

In current, maybe, CTFE interpreter is designed as that an invalid AA access immediately makes an error, instead of throwing compile-time RangeError object.

To @donc: is this an improvement point of CTFE engine?

deadalnix · 2013-01-13T03:55:03Z

@andralex this seems to me like a weak argument for a special case.

Granted, it is probably a frequent usage of AA, but you'll find another frequent use (even more frequent in my experience) in caching.

Caching with D's AA is quite bad. First, you can't cache anything immutable as immutable(T)[U] is roughly equivalent to immutable(T[U]) and because you have no way to compute some cached value without 2 round trip into the cache when 1 should be enough.

Additionally, this very specific use case can be handled quite nicely doing :
a[k] = a.get(k, 0) + 1;

Something that can be done as simply as this shouldn't justify any special casing, especially when other really frequent uses case are handled in a totally backward way.

quickfur · 2013-01-13T04:27:14Z

I'm on the fence about whether to allow ++a[0]. Consider what happens, and what should happen, if the AA is double[int], for example. Making the result 1.0 would be very convenient, but inconsistent with double.init being nan. Making the result nan is equivalent to not having this special case. So that means it's really only useful for int-valued AA's. I don't like the idea of introducing a special case for specific value types; it makes AA behaviour inconsistent which later will lead to problems with generic code.

CyberShadow · 2013-01-13T04:35:29Z

Additionally, this very specific use case can be handled quite nicely doing :
a[k] = a.get(k, 0) + 1;

Yes, but it's not efficient. Andrei posted the efficient way of doing it, which doesn't involve two AA lookups.

I suppose that for the common case of word counters and such there could be a library type wrapping AAs which allows declaring a "default value", used in ++, += and similar operations...

deadalnix · 2013-01-13T04:43:13Z

@CyberShadow I know that it does involve 2 aa lookups (and that this is bad).

You'll note that, with the current AA interface, 2 lookups when only one is needed is required for many use cases, and that this special case will only solve a very specific one. If 2 lookup are considered an issue (I do consider this is) then this very problem have to be addressed.

On a final note, Andrei's solution require 2 AA lookup when the value is created, which is still suboptimal.

quickfur · 2013-01-13T04:46:33Z

The root of the problem is the design of the current interface. There is no direct way to differentiate between "lookup X and create it if it doesn't exist" vs. "lookup X and throw error if it doesn't exist". An ideal interface lets you differentiate between the two, which solves all of these issues without efficiency problems and without need of special cases.

IMO, it's the compiler's job to optimize a[k] = a.get(k,0)+1 into a single lookup. But the underlying AA API needs to provide the necessary hooks for the compiler to do this (by providing a distinct way of saying "lookup X" vs. "lookup or create X").

CyberShadow · 2013-01-13T04:47:28Z

Oh, that's true. There should probably be an AA operation which gives you the value pointer (like in), but creates and initializes the entry if the key isn't in the AA yet.

quickfur · 2013-01-13T04:55:59Z

That would solve the problem, yes. Either that, or add a bool parameter to aa.get to create the entry if it doesn't already exist. So we would have:

Value* get(Key key, lazy Value defValue=Value.init, bool create=false)

If defValue is not specified, create cannot (should not) be true anyway, so this works.

CyberShadow · 2013-01-13T04:58:29Z

get doesn't currently return a pointer/reference, and it can't without also making defValue a Value pointer/reference.

deadalnix · 2013-01-13T04:59:02Z

@quickfur boolean in interface usually ends up confusing everybody. But if get return a ref, the general idea make sense.

@CyberShadow can't auto ref help here ?

CyberShadow · 2013-01-13T05:00:50Z

@CyberShadow can't auto ref help here ?

What happens if I do: return &aa.get("nonexisting key", 2+2); ?

deadalnix · 2013-01-13T05:06:22Z

In such a case, you can't return a reference so you get a compile error. If the argument is a ref, then a ref can be returned.

Anyway, you can always return a reference when you create the entry, so we probably need a function to do that that return a ref instead of a flag as @quickfur suggested.

quickfur · 2013-01-13T05:22:57Z

You're right, better not conflate getOrCreate with get. It should be a separate method.

deadalnix · 2013-01-13T05:36:10Z

I'd would prefer that getOrCreate became get and get became something else, but this breakage is probably not acceptable. This behavior is much more useful.

donc · 2013-01-14T09:18:06Z

What do you mean? I know IndexExp::modifiable field and it is used to distinguish AA element getting and setting. But this change does not touch it. I can agree that It is one of the refactor-able point, but it is not directly related to the bug. The root cause of bug 3825 is an incorrect runtime evaluation order of array element setting. To fix the problem, I reordered AST in front-end.

That's all fine. My point was that both the IndexExp::modifiable hack, and this one, are special cases that wouldn't be necessary at all, if AAs really were a library type (this would be an opIndexOpAssign). But on further thought, I think it would be too disruptive to change that right now. So this is probably the best approach in the short term.

For the CTFE stuff, I found this while working on bug 9023. While generating test cases, I came to the conclusion that the existing runtime behaviour doesn't make any sense. I wanted to clarify what the correct behaviour should be, before I duplicate it in CTFE. After merging this in, it'd be much easier to do in CTFE. I'll add the CTFE cases when I fix 9023.

Summary: I think we should merge this in to get the semantics correct, with the understanding that it'll be removed eventually when a complete AA overhaul happens. LGTM.

9rnsr · 2013-01-14T10:45:23Z

@donc OK, understood. Thanks for your reply.

Issue 3825 - AAs entries are default initialized before the new value is evaluated

fix Issue 3825 - AAs entries are default initialized before the new v…

d5aac8c

…alue is evaluated

WalterBright added a commit that referenced this pull request Mar 4, 2013

Merge pull request #1465 from 9rnsr/fix3825

4af7946

Issue 3825 - AAs entries are default initialized before the new value is evaluated

WalterBright merged commit 4af7946 into dlang:master Mar 4, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 3825 - AAs entries are default initialized before the new value is evaluated #1465

Issue 3825 - AAs entries are default initialized before the new value is evaluated #1465

9rnsr commented Jan 11, 2013

don-clugston-sociomantic commented Jan 11, 2013

9rnsr commented Jan 11, 2013

andralex commented Jan 11, 2013

9rnsr commented Jan 12, 2013

monarchdodra commented Jan 12, 2013

9rnsr commented Jan 12, 2013

monarchdodra commented Jan 12, 2013

9rnsr commented Jan 12, 2013

yebblies commented Jan 12, 2013

andralex commented Jan 12, 2013

andralex commented Jan 12, 2013

9rnsr commented Jan 13, 2013

yebblies commented Jan 13, 2013

9rnsr commented Jan 13, 2013

deadalnix commented Jan 13, 2013

quickfur commented Jan 13, 2013

CyberShadow commented Jan 13, 2013

deadalnix commented Jan 13, 2013

quickfur commented Jan 13, 2013

CyberShadow commented Jan 13, 2013

quickfur commented Jan 13, 2013

CyberShadow commented Jan 13, 2013

deadalnix commented Jan 13, 2013

CyberShadow commented Jan 13, 2013

deadalnix commented Jan 13, 2013

quickfur commented Jan 13, 2013

deadalnix commented Jan 13, 2013

donc commented Jan 14, 2013

9rnsr commented Jan 14, 2013

Issue 3825 - AAs entries are default initialized before the new value is evaluated #1465

Issue 3825 - AAs entries are default initialized before the new value is evaluated #1465

Conversation

9rnsr commented Jan 11, 2013

don-clugston-sociomantic commented Jan 11, 2013

9rnsr commented Jan 11, 2013

andralex commented Jan 11, 2013

9rnsr commented Jan 12, 2013

monarchdodra commented Jan 12, 2013

9rnsr commented Jan 12, 2013

monarchdodra commented Jan 12, 2013

9rnsr commented Jan 12, 2013

yebblies commented Jan 12, 2013

andralex commented Jan 12, 2013

andralex commented Jan 12, 2013

9rnsr commented Jan 13, 2013

yebblies commented Jan 13, 2013

9rnsr commented Jan 13, 2013

deadalnix commented Jan 13, 2013

quickfur commented Jan 13, 2013

CyberShadow commented Jan 13, 2013

deadalnix commented Jan 13, 2013

quickfur commented Jan 13, 2013

CyberShadow commented Jan 13, 2013

quickfur commented Jan 13, 2013

CyberShadow commented Jan 13, 2013

deadalnix commented Jan 13, 2013

CyberShadow commented Jan 13, 2013

deadalnix commented Jan 13, 2013

quickfur commented Jan 13, 2013

deadalnix commented Jan 13, 2013

donc commented Jan 14, 2013

9rnsr commented Jan 14, 2013