Attaching additional information to the AST #17

lucaswerkmeister · 2014-05-21T20:12:19Z

I wonder if it would be useful to provide a place in the AST nodes where other tools / modules / programs can attach additional information. For example:

shared abstract class Node<ExtraInfo=Anything>() {
    // ...
    shared variable ExtraInfo? extraInfo = null;
    // or
    shared variable {ExtraInfo*} extraInfo = {};
}

For example, a Ceylon compiler modeled after the current one might use

Node<[
    Token, // startToken
    Token, // endToken
    {Message*}, // errors, warnings
    Scope, // scope
    Unit // unit
]>

With ceylon/ceylon-spec#594, we could even have:

shared abstract class Node<ExtraInfo=Anything>(extraInfo = null) {
    // ...
    shared variable ExtraInfo extraInfo;
}

With this, you could make extraInfo non-optional as long as you provide a valid initial value.

The text was updated successfully, but these errors were encountered:

lucaswerkmeister · 2014-05-21T20:25:25Z

The conventional way to do this is the Decorator pattern: You define a class MyNode() extends Node() { shared variable ExtraInfo extraInfo; }. However, that gives you problems:

Your MyExpression can’t extend both Expression and MyNode (this is exactly the problem we had in Inheritance #6)
As the AST attributes are all non-default (we don’t want to allow anyone to make them variable), you couldn’t refine e. g. {Statement*} Block.statements to {MyStatement*} (which would in principle be sound, being immutable – Mutability? #2 – all attributes are out only).

But surely, these problems aren’t unique to ceylon.ast – there must be standard ways and patterns to deal with them?

lucaswerkmeister · 2014-06-19T15:42:09Z

Maybe the people who worked on the current compiler can say if this would be useful? @gavinking @FroMage @tombentley @quintesse @thradec?

FroMage · 2014-06-19T16:05:21Z

This is definitly a great idea. We've wanted to attach custom info for a while on both AST and model nodes.

lucaswerkmeister · 2014-06-20T08:27:56Z

Thanks for the feedback!

I’m adding this right now, and there’s one thing I didn’t think about: For children, the type Node[] is no longer valid. Which one makes more sense – Node<ExtraInfo>[] or Node<Anything>[]? I’m using the first variant right now, but I feel like I might be imposing some unnecessary restriction with this.

lucaswerkmeister · 2014-06-20T08:43:34Z

Ouch. Without ceylon/ceylon-spec#791, this makes the AST very annoying to use when you don’t want additional info, because I can’t provide a default ExtraInfo=Anything, so the user has to write Expression<Anything> every time. I’m not doing that.

Postponed :(

lucaswerkmeister · 2014-06-20T08:47:28Z

In fact, I don’t think I would add this even with Expression<>, because that still feels weird to use. I want Expression to use the default type arg. But this might be impossible in the general case.

I suppose we could have type-unsafe Anything extraInfo instead :(
But since that’s trivially added (doesn’t need an update in every node type), I won’t do that for now.

gavinking · 2014-06-20T08:49:55Z

What is an ExtraInfo? What operations does it have?

lucaswerkmeister · 2014-06-20T08:53:39Z

I don’t care, I simply give you a place where you can put an ExtraInfo if you want to. Maybe your parser attaches lexer tokens to the nodes (ExtraInfo is Token), or your compiler attaches typing info and errors to the nodes (ExtraInfo is [Type, Error[]]). I don’t do anything with it, so as far as I’m concerned, it doesn’t need to have any operations, and with the default – Anything – it wouldn’t have any.

gavinking · 2014-06-20T09:13:41Z

So if you don't care, why impose a constraint? What's wrong with Anything?

lucaswerkmeister · 2014-06-20T09:16:23Z

What do you mean? I’m not imposing a constraint.

The problem is that even if I declare Node<ExtraInfo=Anything>, if people create a Node() it’s a Node<Nothing>, not a Node<Anything>, because of ceylon/ceylon-spec#791.

OTOH… if I make Node<ExtraInfo=Nothing>, then Node n = Node() is well-typed again. Nothing is a less useful default than Anything, but I suppose if people want to add info to the AST, they should declare it instead of using a default Anything. So I could do that.

gavinking · 2014-06-20T09:20:18Z

But what on earth is the point of the type parameter? You are either imposing a constraint: that every Node have the same kind if ExtraInfo, or you're adding a totally useless type parameter that does not make the code any more type safe. Perhaps I'm missing something.

lucaswerkmeister · 2014-06-20T09:24:22Z

So you think I should just have Anything extraInfo instead? But then, people who want to use it have to do assert (is Type type = node.extraInfo) all the time. I’d rather let them have a Node<Type> (they would probably import ceylon.ast { TheNode=Node} and alias Node => TheNode<Type>), and then they can use node.extraInfo directly.

It doesn’t make my code any more type safe, but it makes my user’s code more type safe. At least, that’s the intention.

(But I’m not sure if the constraint that all Nodes in an AST have the same kind of ExtraInfo is justified.)

gavinking · 2014-06-20T10:11:09Z

But then, people who want to use it have to do assert (is Type type = node.extraInfo) all the time. I’d rather let them have a Node<Type> (they would probably import ceylon.ast { TheNode=Node} and alias Node => TheNode<Type>), and then they can use node.extraInfo directly.

This sounds bogus to me. Where would they get a Node<Type> from? You would just be swapping:

 assert (is Type extraInfo = node.extraInfo);

For

 assert (is Node<Type> type = node);
 value extraInfo = type.extraInfo;

The second is more verbose and has worse performance.

It's not like the whole tree is going to have the same kind of ExtraInfo. It's going to depend on the node type.

lucaswerkmeister · 2014-06-20T10:15:18Z

It's not like the whole tree is going to have the same kind of ExtraInfo.

That was the intention. In the examples I’m thinking of, all nodes would have the same ExtraInfo.

However, I just noticed another problem when I tried to add extraInfo with ExtraInfo=Nothing:

shared actual Boolean equals(Object that) {
    if (is CurrentType<What?> that) { // <--
        return attr == that.attr /* && etc. */;
    } else {
        return false;
    }
}

equals is supposed to ignore extraInfo, but because ExtraType has to variance, I can’t test if that is CurrentType without also testing its type parameter.

Looks like this just can’t be done.

gavinking · 2014-06-20T10:24:38Z

That was the intention. In the examples I’m thinking of, all nodes would have the same ExtraInfo.

But I think that's just extremely unlikely. Why would a Class node have the same kind of extra info as an AssignmentExpression node?

lucaswerkmeister · 2014-06-20T10:34:51Z

Hm, maybe you’re right. In that case, extraInfo would just be Anything or perhaps {Anything*}.

(This would also solve the equals problem from above.)

FroMage · 2014-06-20T13:47:55Z

In our case most of the nodes would have the same info type. Like boxing, erasure, etc, so it's not crazy at all.

lucaswerkmeister · 2014-06-20T14:24:31Z

Okay, but I still can’t implement equals correctly if I have a type parameter, right? At least I don’t see how I test for is CurrentType without involving the type argument. (I mean, I can use type() to determine if the type is right, but that doesn’t narrow the type for me like I need it.)

FroMage · 2014-06-20T14:26:35Z

if(exists extraInfo) should be enough, since that will make it Object&CurrentType.

lucaswerkmeister · 2014-06-20T14:33:17Z

No, I don’t care about my extraInfo. The problem is different.

This is the current equals of QualifiedType:

shared actual Boolean equals(Object that) {
    if (is QualifiedType that) {
        return nameAndArgs == that.nameAndArgs && qualifyingType == that.qualifyingType;
    } else {
        return false;
    }
}

If QualifiedType has a type parameter, but I want to ignore it in equals – what should the second line be? Because the type parameter is defaulted, as it is it would test is QualifiedType<Nothing>, and because the type parameter has to be invariant (type of a variable attribute), that’s not true for, e. g., a QualifiedType<Token>. That’s the problem I have right now.

FroMage · 2014-06-20T14:49:50Z

Ah, that's a good question for @gavinking ;)

gavinking · 2014-06-20T15:05:10Z

If the generic type is covariant, the upper bound or Anything
if the generic type is contravariant, Nothing
if invariant, you're screwed

lucaswerkmeister · 2014-06-20T15:15:41Z

Okay, dang. That rules out typesafe extraInfo.

I guess I could be happy about that, since the unsafe version is much easier to implement ;)

The only remaining question: Anything or Anything[]?

FroMage · 2014-06-20T16:28:33Z

Anything.

gavinking · 2014-07-07T21:38:39Z

Well I'm not sure what you're using the Key for. I'm just using the class as its own key.

lucaswerkmeister · 2014-07-07T21:39:41Z

@luolong: You’re completely right, that needs to be Value?.

As to the second point… it’s probably better, because my way allows people to break the typesafety by using two keys with different type arguments but the same key argument.

@gavinking: This way, you could never store two objects of the same type as extra info.

I’m not sure if that’s a disadvantage. It would probably encourage people to use more “wrapper” classes (Message(String text) instead of just String message), which could improve readability (and I’d guess the JVM optimizes these pretty well). It would break the RedHat Compiler-style Token token; Token endToken;, but that seems inadequate to me anyways, since many nodes have just one token, and others have more than two (IIRC e. g. the case types of an enumerated type), which is why I’d use Token[] instead. So this might work out.

gavinking · 2014-07-07T21:45:09Z

@gavinking: This way, you could never store two objects of the same type as extra info.

Sure. But that's a good thing.

lucaswerkmeister · 2014-07-07T21:47:06Z

that's a good thing.

Why?

gavinking · 2014-07-07T21:48:15Z

Because if you store two objects of the same type, how will you distinguish them, other than by using an untypesafe string name?

lucaswerkmeister · 2014-07-07T21:54:34Z

Hm, I’m beginning to see your point. I was thinking of them as identifiers, but of course these keys aren’t checked by the compiler, so it is a bit less safe.

But leaving that aside… can you actually implement the functions in Ceylon as you suggested them? I don’t think you can pass a type parameter as a key to an underlying map in Ceylon… (You could of course do it in Java, having access to the TypeDescriptor $reified$T, which is a regular object.)

lucaswerkmeister · 2014-07-07T21:57:45Z

can you actually implement the functions in Ceylon

Well, I suppose you could iterate over the values in the underlying map, and check for is T, but

that’s vulnerable to subtypes, and
it’s slow.

gavinking · 2014-07-07T22:34:03Z

But leaving that aside… can you actually implement the functions in Ceylon as you suggested them? I don’t think you can pass a type parameter as a key to an underlying map in Ceylon…

T getExtra<T>() given T satisfies Object {
    assert (is T info = map.get(`T`));
    return info;
}
void setExtra<T>(T info) => map.put(`T`, info);

lucaswerkmeister · 2014-07-08T08:31:12Z

Okay, T works… but it’s slow :(

Type param, String: 23846817953ns = 23846ms
Key, String: 1117633725ns = 1117ms
Type param, C: 15541791161ns = 15541ms
Key, C: 1167863307ns = 1167ms

https://gist.github.com/lucaswerkmeister/971f5e01b387b0ddbf56

(In short: “Type param” uses T, “Key” uses Key<T>("key"). “String” tests with String, “C” with a class C(String string) {}.)

gavinking · 2014-07-08T08:58:03Z

Or:

T? getExtra<T>() given T satisfies Object {
    for (info in list) {
        if (is T info) {
            return info;
        }
    }
    else {
        return null;
    }
}
void setExtra(Object info) => list.add(info);

gavinking · 2014-07-08T08:59:15Z

P.S. I'm pretty certain that this is not a highly performance-sensitive feature!

lucaswerkmeister · 2014-07-08T09:01:10Z

Depends… if someone builds a compiler using ceylon.ast, they’ll access things like the type a lot, no?

As to the list approach… as I wrote earlier, that’s

slow, and
vulnerable to subtypes (if someone wants to put both a Sup and a Sub in the list).

quintesse · 2014-07-08T09:17:00Z

Why the for (i in 1..5) { in that test code? That doesn't seem to make the different tests comparable, right? In any case you might take the

`T`

outside of the loop so it's obtained only once.

gavinking · 2014-07-08T09:22:16Z

As to the list approach… as I wrote earlier, that’s

slow, and

I highly doubt that. It would likely be faster than a hashmap most of the time. What, you have nodes with 100s of info objects on them? Not likely. You'll have one, maybe two.

gavinking · 2014-07-08T09:22:57Z

Premature optimization, etc, etc.

The three rules of optimization: you know the rest.

quintesse · 2014-07-08T09:30:18Z

I show the results on my machine for 1) the original code, 2) without the 1..5 for loops and 3) obtaining the Type for T only once in the method and 4) finally taking that completely outside the test loop and getting each type exactly once :

Type param, String: 3083866368ns = 3083ms
Key, String:        76544790ns = 76ms
Type param, C:      1567190447ns = 1567ms
Key, C:             76250620ns = 76ms


Type param, String: 1516916732ns = 1516ms
Key, String:        89309751ns = 89ms
Type param, C:      882192277ns = 882ms
Key, C:             50123139ns = 50ms


Type param, String: 1130417180ns = 1130ms
Key, String:        70965899ns = 70ms
Type param, C:      839865339ns = 839ms
Key, C:             71088539ns = 71ms


Type param, String: 68354966ns = 68ms
Key, String:        76031040ns = 76ms
Type param, C:      36245192ns = 36ms
Key, C:             70160213ns = 70ms

NB: seems like we could gain a lot with some clever caching somewhere, but it does underscore Gavin's point about premature optimization.

lucaswerkmeister · 2014-07-08T09:49:44Z

Tako: The 'T' optimization is invalid, because get() will have to construct the metamodel every time - otherwise, we might as well use a proper key.

IIRC I added the 1..5 to reflect that get() will most likely be used more than set().

Gavin: Well, this is an "optimization" that affects the API of Node - I can't really change it later.

----- Ursprüngliche Nachricht -----
Von: "Tako Schotanus" notifications@github.com
Gesendet: ‎08.‎07.‎2014 11:30
An: "ceylon/ceylon.ast" ceylon.ast@noreply.github.com
Cc: "Lucas Werkmeister" mail@lucaswerkmeister.de
Betreff: Re: [ceylon.ast] Attaching additional information to the AST (#17)

I show the results on my machine for 1) the original code, 2) without the 1..5 for loops and 3) obtaining the Type for T only once in the method and 4) finally taking that completely outside the test loop and getting each type exactly once :
Type param, String: 3083866368ns = 3083ms
Key, String: 76544790ns = 76ms
Type param, C: 1567190447ns = 1567ms
Key, C: 76250620ns = 76ms

Type param, String: 1516916732ns = 1516ms
Key, String: 89309751ns = 89ms
Type param, C: 882192277ns = 882ms
Key, C: 50123139ns = 50ms

Type param, String: 1130417180ns = 1130ms
Key, String: 70965899ns = 70ms
Type param, C: 839865339ns = 839ms
Key, C: 71088539ns = 71ms

Type param, String: 68354966ns = 68ms
Key, String: 76031040ns = 76ms
Type param, C: 36245192ns = 36ms
Key, C: 70160213ns = 70ms
NB: seems like we could gain a lot with some clever caching somewhere, but it does underscore Gavin's point about premature optimization.
—
Reply to this email directly or view it on GitHub.

quintesse · 2014-07-08T11:27:54Z

What I meant is that is that right now that might be really expensive, but with proper caching or other optimizations it might be improved, possibly a lot. @FroMage might be able to give some more info on that.

lucaswerkmeister · 2014-07-08T14:17:45Z

Sure, if the type descriptor could cache its metamodel, performance would improve a lot. It appears that all type meta expressions use the the (undocumented!) toplevel c.l.meta::typeLiteral function (and cast the result to the appropriate subtype of Type), so it might be possible to implement the caching there, without any changes to the compiler. @FroMage WDYT?

This way, if the extraInfo interface changes in the future (see discussion in #17), I’ll only need to update Node and not the dozens of copy() methods as well. Done with the following command: find source/ceylon/ast/core/ -name '[A-Z]*.ceylon' \ -exec \ sed -i \ -e 's|ret\.extraInfo = extraInfo|copyExtraInfoTo(ret)|' \ {} +

lucaswerkmeister · 2014-07-18T18:52:38Z

The type descriptor performance becomes (almost) irrelevant if we use key objects, by the way, because then we can trivially determine shared String id = T.string + "\0" + name in the initializer, and use that for the lookup (which ought to be really fast).

With key objects, we could also speed up the lookup even more by having a constant number of String key1; Object value1; String key2; Object value2; … attributes, to avoid hashmap lookup for the first few keys. Not sure if that’s worth it, though.

All in all, I’m still in favour of this solution.

Depending on whether a Bound is used as a lower or an upper bound, its token (< / <=) appears before or after the endpoint, which affects the correct token order. We now record the information whether it’s a lower or upper bound in the node’s extra information (#17) when transforming the WithinOperation, and then use that when transforming the Bound itself.

get() and remove() now take a covariant key. put() can’t be variant since it also returns the previous value, so a new set() is added that takes a contravariant key. CC #17. This can be used to solve the problem from this comment: #17 (comment) Key<out Anything> or Key<in Nothing> are supertypes of any key type.

Removes additional information for some given keys from an AST. Uses use-site variance to abstract over keys of multiple types (CC #17). RemoveExtraInfoVisitor was first proposed in #47, and later in #51.

Both document additional information (#17), but they differed slightly.

This removes thisInstance, superInstance, outerInstance and packageInstance. They should never have made it into the 1.1.0 release in the first place; the assumption that they are stateless was invalidated by the introduction of extra info on nodes (#17). If compilers and other tools attach information to these nodes, it’s actually very dangerous to reuse them, since that information will unexpectedly be shared between instances where the information should be different (worse, neither put() nor set() warn you about this).

lucaswerkmeister added feature labels May 21, 2014

lucaswerkmeister self-assigned this May 21, 2014

lucaswerkmeister added this to the Postponed milestone Jun 20, 2014

lucaswerkmeister closed this as completed in 9cd90af Jun 20, 2014

lucaswerkmeister closed this as completed in c769e2f Jul 26, 2014

lucaswerkmeister mentioned this issue Jul 29, 2014

Attach information to nodes converted from ceylon.ast.redhat #47

Closed

lucaswerkmeister mentioned this issue Oct 14, 2014

AST transformers ceylon/ceylon-compiler#1865

Open

lucaswerkmeister added a commit that referenced this issue Oct 16, 2014

Sync documentation of Node and ceylon.ast.core

b9396b5

Both document additional information (#17), but they differed slightly.

CeylonMigrationBot mentioned this issue Nov 14, 2015

AST transformers eclipse-archived/ceylon#1865

Open

Attaching additional information to the AST #17

Attaching additional information to the AST #17

Comments

lucaswerkmeister commented May 21, 2014

lucaswerkmeister commented May 21, 2014

lucaswerkmeister commented Jun 19, 2014

FroMage commented Jun 19, 2014

lucaswerkmeister commented Jun 20, 2014

lucaswerkmeister commented Jun 20, 2014

lucaswerkmeister commented Jun 20, 2014

gavinking commented Jun 20, 2014

lucaswerkmeister commented Jun 20, 2014

gavinking commented Jun 20, 2014

lucaswerkmeister commented Jun 20, 2014

gavinking commented Jun 20, 2014

lucaswerkmeister commented Jun 20, 2014

gavinking commented Jun 20, 2014

lucaswerkmeister commented Jun 20, 2014

gavinking commented Jun 20, 2014

lucaswerkmeister commented Jun 20, 2014

FroMage commented Jun 20, 2014

lucaswerkmeister commented Jun 20, 2014

FroMage commented Jun 20, 2014

lucaswerkmeister commented Jun 20, 2014

FroMage commented Jun 20, 2014

gavinking commented Jun 20, 2014

lucaswerkmeister commented Jun 20, 2014

FroMage commented Jun 20, 2014

gavinking commented Jul 7, 2014

lucaswerkmeister commented Jul 7, 2014

gavinking commented Jul 7, 2014

lucaswerkmeister commented Jul 7, 2014

gavinking commented Jul 7, 2014

lucaswerkmeister commented Jul 7, 2014

lucaswerkmeister commented Jul 7, 2014

gavinking commented Jul 7, 2014

lucaswerkmeister commented Jul 8, 2014

gavinking commented Jul 8, 2014

gavinking commented Jul 8, 2014

lucaswerkmeister commented Jul 8, 2014

quintesse commented Jul 8, 2014

gavinking commented Jul 8, 2014

gavinking commented Jul 8, 2014

quintesse commented Jul 8, 2014

lucaswerkmeister commented Jul 8, 2014

quintesse commented Jul 8, 2014

lucaswerkmeister commented Jul 8, 2014

lucaswerkmeister commented Jul 18, 2014