Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attaching additional information to the AST #17

Closed
lucaswerkmeister opened this issue May 21, 2014 · 51 comments
Closed

Attaching additional information to the AST #17

lucaswerkmeister opened this issue May 21, 2014 · 51 comments

Comments

@lucaswerkmeister
Copy link
Member

I wonder if it would be useful to provide a place in the AST nodes where other tools / modules / programs can attach additional information. For example:

shared abstract class Node<ExtraInfo=Anything>() {
    // ...
    shared variable ExtraInfo? extraInfo = null;
    // or
    shared variable {ExtraInfo*} extraInfo = {};
}

For example, a Ceylon compiler modeled after the current one might use

Node<[
    Token, // startToken
    Token, // endToken
    {Message*}, // errors, warnings
    Scope, // scope
    Unit // unit
]>

With ceylon/ceylon-spec#594, we could even have:

shared abstract class Node<ExtraInfo=Anything>(extraInfo = null) {
    // ...
    shared variable ExtraInfo extraInfo;
}

With this, you could make extraInfo non-optional as long as you provide a valid initial value.

@lucaswerkmeister
Copy link
Member Author

The conventional way to do this is the Decorator pattern: You define a class MyNode() extends Node() { shared variable ExtraInfo extraInfo; }. However, that gives you problems:

  • Your MyExpression can’t extend both Expression and MyNode (this is exactly the problem we had in Inheritance #6)
  • As the AST attributes are all non-default (we don’t want to allow anyone to make them variable), you couldn’t refine e. g. {Statement*} Block.statements to {MyStatement*} (which would in principle be sound, being immutable – Mutability? #2 – all attributes are out only).

But surely, these problems aren’t unique to ceylon.ast – there must be standard ways and patterns to deal with them?

@lucaswerkmeister
Copy link
Member Author

Maybe the people who worked on the current compiler can say if this would be useful? @gavinking @FroMage @tombentley @quintesse @thradec?

@FroMage
Copy link
Member

FroMage commented Jun 19, 2014

This is definitly a great idea. We've wanted to attach custom info for a while on both AST and model nodes.

@lucaswerkmeister
Copy link
Member Author

Thanks for the feedback!

I’m adding this right now, and there’s one thing I didn’t think about: For children, the type Node[] is no longer valid. Which one makes more sense – Node<ExtraInfo>[] or Node<Anything>[]? I’m using the first variant right now, but I feel like I might be imposing some unnecessary restriction with this.

@lucaswerkmeister
Copy link
Member Author

Ouch. Without ceylon/ceylon-spec#791, this makes the AST very annoying to use when you don’t want additional info, because I can’t provide a default ExtraInfo=Anything, so the user has to write Expression<Anything> every time. I’m not doing that.

Postponed :(

@lucaswerkmeister lucaswerkmeister added this to the Postponed milestone Jun 20, 2014
@lucaswerkmeister
Copy link
Member Author

In fact, I don’t think I would add this even with Expression<>, because that still feels weird to use. I want Expression to use the default type arg. But this might be impossible in the general case.

I suppose we could have type-unsafe Anything extraInfo instead :(
But since that’s trivially added (doesn’t need an update in every node type), I won’t do that for now.

@gavinking
Copy link
Member

What is an ExtraInfo? What operations does it have?

@lucaswerkmeister
Copy link
Member Author

I don’t care, I simply give you a place where you can put an ExtraInfo if you want to. Maybe your parser attaches lexer tokens to the nodes (ExtraInfo is Token), or your compiler attaches typing info and errors to the nodes (ExtraInfo is [Type, Error[]]). I don’t do anything with it, so as far as I’m concerned, it doesn’t need to have any operations, and with the default – Anything – it wouldn’t have any.

@gavinking
Copy link
Member

So if you don't care, why impose a constraint? What's wrong with Anything?

@lucaswerkmeister
Copy link
Member Author

What do you mean? I’m not imposing a constraint.

The problem is that even if I declare Node<ExtraInfo=Anything>, if people create a Node() it’s a Node<Nothing>, not a Node<Anything>, because of ceylon/ceylon-spec#791.

OTOH… if I make Node<ExtraInfo=Nothing>, then Node n = Node() is well-typed again. Nothing is a less useful default than Anything, but I suppose if people want to add info to the AST, they should declare it instead of using a default Anything. So I could do that.

@gavinking
Copy link
Member

But what on earth is the point of the type parameter? You are either imposing a constraint: that every Node have the same kind if ExtraInfo, or you're adding a totally useless type parameter that does not make the code any more type safe. Perhaps I'm missing something.

@lucaswerkmeister
Copy link
Member Author

So you think I should just have Anything extraInfo instead? But then, people who want to use it have to do assert (is Type type = node.extraInfo) all the time. I’d rather let them have a Node<Type> (they would probably import ceylon.ast { TheNode=Node} and alias Node => TheNode<Type>), and then they can use node.extraInfo directly.

It doesn’t make my code any more type safe, but it makes my user’s code more type safe. At least, that’s the intention.

(But I’m not sure if the constraint that all Nodes in an AST have the same kind of ExtraInfo is justified.)

@gavinking
Copy link
Member

But then, people who want to use it have to do assert (is Type type = node.extraInfo) all the time. I’d rather let them have a Node<Type> (they would probably import ceylon.ast { TheNode=Node} and alias Node => TheNode<Type>), and then they can use node.extraInfo directly.

This sounds bogus to me. Where would they get a Node<Type> from? You would just be swapping:

 assert (is Type extraInfo = node.extraInfo);

For

 assert (is Node<Type> type = node);
 value extraInfo = type.extraInfo;

The second is more verbose and has worse performance.

It's not like the whole tree is going to have the same kind of ExtraInfo. It's going to depend on the node type.

@lucaswerkmeister
Copy link
Member Author

It's not like the whole tree is going to have the same kind of ExtraInfo.

That was the intention. In the examples I’m thinking of, all nodes would have the same ExtraInfo.

However, I just noticed another problem when I tried to add extraInfo with ExtraInfo=Nothing:

shared actual Boolean equals(Object that) {
    if (is CurrentType<What?> that) { // <--
        return attr == that.attr /* && etc. */;
    } else {
        return false;
    }
}

equals is supposed to ignore extraInfo, but because ExtraType has to variance, I can’t test if that is CurrentType without also testing its type parameter.

Looks like this just can’t be done.

@gavinking
Copy link
Member

That was the intention. In the examples I’m thinking of, all nodes would have the same ExtraInfo.

But I think that's just extremely unlikely. Why would a Class node have the same kind of extra info as an AssignmentExpression node?

@lucaswerkmeister
Copy link
Member Author

Hm, maybe you’re right. In that case, extraInfo would just be Anything or perhaps {Anything*}.

(This would also solve the equals problem from above.)

@FroMage
Copy link
Member

FroMage commented Jun 20, 2014

In our case most of the nodes would have the same info type. Like boxing, erasure, etc, so it's not crazy at all.

@lucaswerkmeister
Copy link
Member Author

Okay, but I still can’t implement equals correctly if I have a type parameter, right? At least I don’t see how I test for is CurrentType without involving the type argument. (I mean, I can use type() to determine if the type is right, but that doesn’t narrow the type for me like I need it.)

@FroMage
Copy link
Member

FroMage commented Jun 20, 2014

if(exists extraInfo) should be enough, since that will make it Object&CurrentType.

@lucaswerkmeister
Copy link
Member Author

No, I don’t care about my extraInfo. The problem is different.

This is the current equals of QualifiedType:

shared actual Boolean equals(Object that) {
    if (is QualifiedType that) {
        return nameAndArgs == that.nameAndArgs && qualifyingType == that.qualifyingType;
    } else {
        return false;
    }
}

If QualifiedType has a type parameter, but I want to ignore it in equals – what should the second line be? Because the type parameter is defaulted, as it is it would test is QualifiedType<Nothing>, and because the type parameter has to be invariant (type of a variable attribute), that’s not true for, e. g., a QualifiedType<Token>. That’s the problem I have right now.

@FroMage
Copy link
Member

FroMage commented Jun 20, 2014

Ah, that's a good question for @gavinking ;)

@gavinking
Copy link
Member

  • If the generic type is covariant, the upper bound or Anything
  • if the generic type is contravariant, Nothing
  • if invariant, you're screwed

@lucaswerkmeister
Copy link
Member Author

Okay, dang. That rules out typesafe extraInfo.

I guess I could be happy about that, since the unsafe version is much easier to implement ;)

The only remaining question: Anything or Anything[]?

@FroMage
Copy link
Member

FroMage commented Jun 20, 2014

Anything.

@gavinking
Copy link
Member

Well I'm not sure what you're using the Key for. I'm just using the class as its own key.

@lucaswerkmeister
Copy link
Member Author

@luolong: You’re completely right, that needs to be Value?.

As to the second point… it’s probably better, because my way allows people to break the typesafety by using two keys with different type arguments but the same key argument.

@gavinking: This way, you could never store two objects of the same type as extra info.

I’m not sure if that’s a disadvantage. It would probably encourage people to use more “wrapper” classes (Message(String text) instead of just String message), which could improve readability (and I’d guess the JVM optimizes these pretty well). It would break the RedHat Compiler-style Token token; Token endToken;, but that seems inadequate to me anyways, since many nodes have just one token, and others have more than two (IIRC e. g. the case types of an enumerated type), which is why I’d use Token[] instead. So this might work out.

@gavinking
Copy link
Member

@gavinking: This way, you could never store two objects of the same type as extra info.

Sure. But that's a good thing.

@lucaswerkmeister
Copy link
Member Author

that's a good thing.

Why?

@gavinking
Copy link
Member

Because if you store two objects of the same type, how will you distinguish them, other than by using an untypesafe string name?

@lucaswerkmeister
Copy link
Member Author

Hm, I’m beginning to see your point. I was thinking of them as identifiers, but of course these keys aren’t checked by the compiler, so it is a bit less safe.

But leaving that aside… can you actually implement the functions in Ceylon as you suggested them? I don’t think you can pass a type parameter as a key to an underlying map in Ceylon… (You could of course do it in Java, having access to the TypeDescriptor $reified$T, which is a regular object.)

@lucaswerkmeister
Copy link
Member Author

can you actually implement the functions in Ceylon

Well, I suppose you could iterate over the values in the underlying map, and check for is T, but

  1. that’s vulnerable to subtypes, and
  2. it’s slow.

@gavinking
Copy link
Member

But leaving that aside… can you actually implement the functions in Ceylon as you suggested them? I don’t think you can pass a type parameter as a key to an underlying map in Ceylon…

T getExtra<T>() given T satisfies Object {
    assert (is T info = map.get(`T`));
    return info;
}
void setExtra<T>(T info) => map.put(`T`, info);

@lucaswerkmeister
Copy link
Member Author

Okay, T works… but it’s slow :(

Type param, String: 23846817953ns = 23846ms
Key, String: 1117633725ns = 1117ms
Type param, C: 15541791161ns = 15541ms
Key, C: 1167863307ns = 1167ms

https://gist.github.com/lucaswerkmeister/971f5e01b387b0ddbf56

(In short: “Type param” uses T, “Key” uses Key<T>("key"). “String” tests with String, “C” with a class C(String string) {}.)

@gavinking
Copy link
Member

Or:

T? getExtra<T>() given T satisfies Object {
    for (info in list) {
        if (is T info) {
            return info;
        }
    }
    else {
        return null;
    }
}
void setExtra(Object info) => list.add(info);

@gavinking
Copy link
Member

P.S. I'm pretty certain that this is not a highly performance-sensitive feature!

@lucaswerkmeister
Copy link
Member Author

Depends… if someone builds a compiler using ceylon.ast, they’ll access things like the type a lot, no?

As to the list approach… as I wrote earlier, that’s

  1. slow, and
  2. vulnerable to subtypes (if someone wants to put both a Sup and a Sub in the list).

@quintesse
Copy link
Member

Why the for (i in 1..5) { in that test code? That doesn't seem to make the different tests comparable, right? In any case you might take the

`T`

outside of the loop so it's obtained only once.

@gavinking
Copy link
Member

As to the list approach… as I wrote earlier, that’s

  1. slow, and

I highly doubt that. It would likely be faster than a hashmap most of the time. What, you have nodes with 100s of info objects on them? Not likely. You'll have one, maybe two.

@gavinking
Copy link
Member

Premature optimization, etc, etc.

The three rules of optimization: you know the rest.

@quintesse
Copy link
Member

I show the results on my machine for 1) the original code, 2) without the 1..5 for loops and 3) obtaining the Type for T only once in the method and 4) finally taking that completely outside the test loop and getting each type exactly once :

Type param, String: 3083866368ns = 3083ms
Key, String:        76544790ns = 76ms
Type param, C:      1567190447ns = 1567ms
Key, C:             76250620ns = 76ms


Type param, String: 1516916732ns = 1516ms
Key, String:        89309751ns = 89ms
Type param, C:      882192277ns = 882ms
Key, C:             50123139ns = 50ms


Type param, String: 1130417180ns = 1130ms
Key, String:        70965899ns = 70ms
Type param, C:      839865339ns = 839ms
Key, C:             71088539ns = 71ms


Type param, String: 68354966ns = 68ms
Key, String:        76031040ns = 76ms
Type param, C:      36245192ns = 36ms
Key, C:             70160213ns = 70ms

NB: seems like we could gain a lot with some clever caching somewhere, but it does underscore Gavin's point about premature optimization.

@lucaswerkmeister
Copy link
Member Author

Tako: The 'T' optimization is invalid, because get() will have to construct the metamodel every time - otherwise, we might as well use a proper key.

IIRC I added the 1..5 to reflect that get() will most likely be used more than set().

Gavin: Well, this is an "optimization" that affects the API of Node - I can't really change it later.

----- Ursprüngliche Nachricht -----
Von: "Tako Schotanus" notifications@github.com
Gesendet: ‎08.‎07.‎2014 11:30
An: "ceylon/ceylon.ast" ceylon.ast@noreply.github.com
Cc: "Lucas Werkmeister" mail@lucaswerkmeister.de
Betreff: Re: [ceylon.ast] Attaching additional information to the AST (#17)

I show the results on my machine for 1) the original code, 2) without the 1..5 for loops and 3) obtaining the Type for T only once in the method and 4) finally taking that completely outside the test loop and getting each type exactly once :
Type param, String: 3083866368ns = 3083ms
Key, String: 76544790ns = 76ms
Type param, C: 1567190447ns = 1567ms
Key, C: 76250620ns = 76ms

Type param, String: 1516916732ns = 1516ms
Key, String: 89309751ns = 89ms
Type param, C: 882192277ns = 882ms
Key, C: 50123139ns = 50ms

Type param, String: 1130417180ns = 1130ms
Key, String: 70965899ns = 70ms
Type param, C: 839865339ns = 839ms
Key, C: 71088539ns = 71ms

Type param, String: 68354966ns = 68ms
Key, String: 76031040ns = 76ms
Type param, C: 36245192ns = 36ms
Key, C: 70160213ns = 70ms
NB: seems like we could gain a lot with some clever caching somewhere, but it does underscore Gavin's point about premature optimization.

Reply to this email directly or view it on GitHub.

@quintesse
Copy link
Member

What I meant is that is that right now that might be really expensive, but with proper caching or other optimizations it might be improved, possibly a lot. @FroMage might be able to give some more info on that.

@lucaswerkmeister
Copy link
Member Author

Sure, if the type descriptor could cache its metamodel, performance would improve a lot. It appears that all type meta expressions use the the (undocumented!) toplevel c.l.meta::typeLiteral function (and cast the result to the appropriate subtype of Type), so it might be possible to implement the caching there, without any changes to the compiler. @FroMage WDYT?

lucaswerkmeister added a commit that referenced this issue Jul 10, 2014
This way, if the extraInfo interface changes in the future (see
discussion in #17), I’ll only need to update Node and not the dozens of
copy() methods as well.

Done with the following command:

    find source/ceylon/ast/core/ -name '[A-Z]*.ceylon' \
        -exec \
            sed -i \
            -e 's|ret\.extraInfo = extraInfo|copyExtraInfoTo(ret)|' \
        {} +
@lucaswerkmeister
Copy link
Member Author

The type descriptor performance becomes (almost) irrelevant if we use key objects, by the way, because then we can trivially determine shared String id = T.string + "\0" + name in the initializer, and use that for the lookup (which ought to be really fast).

With key objects, we could also speed up the lookup even more by having a constant number of String key1; Object value1; String key2; Object value2; … attributes, to avoid hashmap lookup for the first few keys. Not sure if that’s worth it, though.

All in all, I’m still in favour of this solution.

lucaswerkmeister added a commit that referenced this issue Jul 26, 2014
Depending on whether a Bound is used as a lower or an upper bound,
its token (< / <=) appears before or after the endpoint, which affects
the correct token order. We now record the information whether it’s a
lower or upper bound in the node’s extra information (#17) when
transforming the WithinOperation, and then use that when transforming
the Bound itself.
lucaswerkmeister added a commit that referenced this issue Oct 15, 2014
get() and remove() now take a covariant key. put() can’t be variant
since it also returns the previous value, so a new set() is added that
takes a contravariant key.

CC #17. This can be used to solve the problem from this comment:
#17 (comment)
Key<out Anything> or Key<in Nothing> are supertypes of any key type.
lucaswerkmeister added a commit that referenced this issue Oct 15, 2014
Removes additional information for some given keys from an AST. Uses
use-site variance to abstract over keys of multiple types (CC #17).

RemoveExtraInfoVisitor was first proposed in #47, and later in #51.
lucaswerkmeister added a commit that referenced this issue Oct 16, 2014
Both document additional information (#17), but they differed slightly.
lucaswerkmeister added a commit that referenced this issue Jul 16, 2015
This removes thisInstance, superInstance, outerInstance and
packageInstance. They should never have made it into the 1.1.0 release
in the first place; the assumption that they are stateless was
invalidated by the introduction of extra info on nodes (#17). If
compilers and other tools attach information to these nodes, it’s
actually very dangerous to reuse them, since that information will
unexpectedly be shared between instances where the information should be
different (worse, neither put() nor set() warn you about this).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants