Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 3825 - AAs entries are default initialized before the new value is evaluated #1465

Merged
merged 1 commit into from Mar 4, 2013

Conversation

9rnsr
Copy link
Contributor

@9rnsr 9rnsr commented Jan 11, 2013

@don-clugston-sociomantic
Copy link
Contributor

My initial impression was that the oddness comes from e2ir.c. IndexExp::toIR().
There is a funny bit of code that says

if (modifiable) s = taa->aaGetSymbol("GetX", 1);
else s = taa->aaGetSymbol("GetRvalueX", 1);

and this is the only bit of code that uses modifiable (it's set in IndexExp::semantic()). Looks to me like a nasty hack that doesn't actually work...

@9rnsr
Copy link
Contributor Author

9rnsr commented Jan 11, 2013

What do you mean? I know IndexExp::modifiable field and it is used to distinguish AA element getting and setting. But this change does not touch it. I can agree that It is one of the refactor-able point, but it is not directly related to the bug.

The root cause of bug 3825 is an incorrect runtime evaluation order of array element setting. To fix the problem, I reordered AST in front-end.

@andralex
Copy link
Member

@9rnsr can we please work this out so it has the same exact semantics as a user-defined version? We'd want expr1[expr2] = expr3 to behave the same way for the built-in hashtable and for a custom-built structure that defines opIndexAssign.

@9rnsr
Copy link
Contributor Author

9rnsr commented Jan 12, 2013

@andralex Yes. As I commented here,

aa[key] op= val;  // (op == empty or any binary operators)

is rewritten to:

ref __aatmp = aa;
ref __aakey = key;
ref __aaval = val;
__aatmp[__aakey] op= __aaval;  // assignment (allocate and set slot)

Then the runtime evaluation order is equal to user defined opIndex(Op)?Assign call.

@monarchdodra
Copy link
Contributor

We were wondering in the message boards, what behavior does this code produce:

int[int] a;
++a[0];

Does this insert 1 in a[0]? Or does it throw a RangeError?

If it does insert 1 in a[0], then are we able to guarantee that a is not mutated if op++ throws?

Thanks.

@9rnsr
Copy link
Contributor Author

9rnsr commented Jan 12, 2013

Hmm, I didn't considered about that.
With this change, following test passes.

unittest {
    int[int] naa;
    ++naa[0];   // !!
    // This is implicitly translated to: naa[0] += 1;
    // Allocate the slot naa[0] first, set int.init next,
    // and finally increment it.
    assert(naa[0] == 1);

    import core.exception;
    struct S {
        void opUnary(string op : "++")() { assert(0); }
    }
    S[int] saa;
    bool thrown = false;
    try {
        ++saa[0];   // !!
        assert(0);
    } catch (RangeError) thrown = true;
    // Getting slot saa[0] first, then call opUnary!"++".
    // So RangeViolation occurs.
    assert(thrown);
    assert(saa.length == 0);
}

This seems inconsistent behavior to me, but I'm not sure it is unexpected or not...

@monarchdodra
Copy link
Contributor

For what it's worth, I think both ++aa[0] and aa[0] += 1 should throw a range error: While they are mutating operations, they mutate something that should be existing.

If it doesn't throw, then ++aa[0] would be inconsistent with aa[0] == a[0] + 1 (which should be equivalent).

@9rnsr
Copy link
Contributor Author

9rnsr commented Jan 12, 2013

For what it's worth, I think both ++aa[0] and aa[0] += 1 should throw a range error: While they are mutating operations, they mutate something that should be existing.

OK. That seems reasonable. I updated compiler code and test case to implement it.

@yebblies
Copy link
Member

For what it's worth, I think both ++aa[0] and aa[0] += 1 should throw a range error: While they are mutating operations, they mutate something that should be existing.

OK. That seems reasonable. I updated compiler code and test case to implement it.

I think this is two much, and will break a lot of code.

@andralex
Copy link
Member

The idiom ++a[n] is widespread in things like word counters and such. Otherwise the efficient usage would be: if (auto p = n in a) ++*p; else a[n] = 1; which is quite a bit more difficult to write and understand.

Therefore this is somewhat of a special case. If the value type is numeric, ++a[n] sets the slot to 1 if it wasn't previously present (i.e. it initializes it to 0 and then increments it). This is easy to also implement in a user-defined type by defining opIndexUnary appropriately.

@andralex
Copy link
Member

To drive the point home: I think the current behavior is useful and principled enough to justify for its being a special case.

@9rnsr
Copy link
Contributor Author

9rnsr commented Jan 13, 2013

OK. I reverted recent change, and explicitly added test cases for ++aa[key], aa[key] += n, and so on.

@yebblies
Copy link
Member

Could you make the tests run with ctfe as well?

@9rnsr
Copy link
Contributor Author

9rnsr commented Jan 13, 2013

@yebblies Hmm, I tried to it, but following errors are shown:
Error: cannot index null array aax
Error: key 1 not found in associative array aax

In current, maybe, CTFE interpreter is designed as that an invalid AA access immediately makes an error, instead of throwing compile-time RangeError object.

To @donc: is this an improvement point of CTFE engine?

@deadalnix
Copy link
Contributor

@andralex this seems to me like a weak argument for a special case.

Granted, it is probably a frequent usage of AA, but you'll find another frequent use (even more frequent in my experience) in caching.

Caching with D's AA is quite bad. First, you can't cache anything immutable as immutable(T)[U] is roughly equivalent to immutable(T[U]) and because you have no way to compute some cached value without 2 round trip into the cache when 1 should be enough.

Additionally, this very specific use case can be handled quite nicely doing :
a[k] = a.get(k, 0) + 1;

Something that can be done as simply as this shouldn't justify any special casing, especially when other really frequent uses case are handled in a totally backward way.

@quickfur
Copy link
Member

I'm on the fence about whether to allow ++a[0]. Consider what happens, and what should happen, if the AA is double[int], for example. Making the result 1.0 would be very convenient, but inconsistent with double.init being nan. Making the result nan is equivalent to not having this special case. So that means it's really only useful for int-valued AA's. I don't like the idea of introducing a special case for specific value types; it makes AA behaviour inconsistent which later will lead to problems with generic code.

@CyberShadow
Copy link
Member

Additionally, this very specific use case can be handled quite nicely doing :
a[k] = a.get(k, 0) + 1;

Yes, but it's not efficient. Andrei posted the efficient way of doing it, which doesn't involve two AA lookups.

I suppose that for the common case of word counters and such there could be a library type wrapping AAs which allows declaring a "default value", used in ++, += and similar operations...

@deadalnix
Copy link
Contributor

@CyberShadow I know that it does involve 2 aa lookups (and that this is bad).

You'll note that, with the current AA interface, 2 lookups when only one is needed is required for many use cases, and that this special case will only solve a very specific one. If 2 lookup are considered an issue (I do consider this is) then this very problem have to be addressed.

On a final note, Andrei's solution require 2 AA lookup when the value is created, which is still suboptimal.

@quickfur
Copy link
Member

The root of the problem is the design of the current interface. There is no direct way to differentiate between "lookup X and create it if it doesn't exist" vs. "lookup X and throw error if it doesn't exist". An ideal interface lets you differentiate between the two, which solves all of these issues without efficiency problems and without need of special cases.

IMO, it's the compiler's job to optimize a[k] = a.get(k,0)+1 into a single lookup. But the underlying AA API needs to provide the necessary hooks for the compiler to do this (by providing a distinct way of saying "lookup X" vs. "lookup or create X").

@CyberShadow
Copy link
Member

Oh, that's true. There should probably be an AA operation which gives you the value pointer (like in), but creates and initializes the entry if the key isn't in the AA yet.

@quickfur
Copy link
Member

That would solve the problem, yes. Either that, or add a bool parameter to aa.get to create the entry if it doesn't already exist. So we would have:

Value* get(Key key, lazy Value defValue=Value.init, bool create=false)

If defValue is not specified, create cannot (should not) be true anyway, so this works.

@CyberShadow
Copy link
Member

get doesn't currently return a pointer/reference, and it can't without also making defValue a Value pointer/reference.

@deadalnix
Copy link
Contributor

@quickfur boolean in interface usually ends up confusing everybody. But if get return a ref, the general idea make sense.

@CyberShadow can't auto ref help here ?

@CyberShadow
Copy link
Member

@CyberShadow can't auto ref help here ?

What happens if I do: return &aa.get("nonexisting key", 2+2); ?

@deadalnix
Copy link
Contributor

In such a case, you can't return a reference so you get a compile error. If the argument is a ref, then a ref can be returned.

Anyway, you can always return a reference when you create the entry, so we probably need a function to do that that return a ref instead of a flag as @quickfur suggested.

@quickfur
Copy link
Member

You're right, better not conflate getOrCreate with get. It should be a separate method.

@deadalnix
Copy link
Contributor

I'd would prefer that getOrCreate became get and get became something else, but this breakage is probably not acceptable. This behavior is much more useful.

@donc
Copy link
Collaborator

donc commented Jan 14, 2013

What do you mean? I know IndexExp::modifiable field and it is used to distinguish AA element getting and setting. But this change does not touch it. I can agree that It is one of the refactor-able point, but it is not directly related to the bug. The root cause of bug 3825 is an incorrect runtime evaluation order of array element setting. To fix the problem, I reordered AST in front-end.

That's all fine. My point was that both the IndexExp::modifiable hack, and this one, are special cases that wouldn't be necessary at all, if AAs really were a library type (this would be an opIndexOpAssign). But on further thought, I think it would be too disruptive to change that right now. So this is probably the best approach in the short term.

For the CTFE stuff, I found this while working on bug 9023. While generating test cases, I came to the conclusion that the existing runtime behaviour doesn't make any sense. I wanted to clarify what the correct behaviour should be, before I duplicate it in CTFE. After merging this in, it'd be much easier to do in CTFE. I'll add the CTFE cases when I fix 9023.

Summary: I think we should merge this in to get the semantics correct, with the understanding that it'll be removed eventually when a complete AA overhaul happens. LGTM.

@9rnsr
Copy link
Contributor Author

9rnsr commented Jan 14, 2013

@donc OK, understood. Thanks for your reply.

WalterBright added a commit that referenced this pull request Mar 4, 2013
Issue 3825 - AAs entries are default initialized before the new value is evaluated
@WalterBright WalterBright merged commit 4af7946 into dlang:master Mar 4, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
10 participants