Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expression and type grouping – () and <> #13

Closed
lucaswerkmeister opened this issue May 7, 2014 · 32 comments
Closed

Expression and type grouping – () and <> #13

lucaswerkmeister opened this issue May 7, 2014 · 32 comments

Comments

@lucaswerkmeister
Copy link
Member

In Ceylon, there are two kinds of syntactical grouping:

value i = (1 + 2) * 3; // expression grouping – 1 + 2 * 3 has a different evaluation order
{<Key->Item>*} dict; // type grouping – {Key->Item*} isn’t syntactically legal

(I made up the names, but that doesn’t matter right now.)

In the Ceylon compiler, these are handled differently:

  • For expression grouping, there’s the rule and class expression, which is a wrapper around a Term with optional parentheses. All the expression classes – InvocationExpression, BinaryOperatorExpression, etc. – extend Term, not Expression. An Expression is a Term. Grammar rules often look like this:
index returns [Expression expression]
    : additiveExpression
      { $expression = new Expression(null);
        $expression.setTerm($additiveExpression.term); }
    ;
  • For type grouping, there is the rule groupedType, but no corresponding class. The < and > tokens are discarded – the AST offers no way to discover if a type was grouped or not. (This caused me some headache in lucaswerkmeister/ceylon.formatter#23.)

How do we handle these in ceylon.ast?

@lucaswerkmeister
Copy link
Member Author

Another possibility would be to have Boolean Type.isGrouped, Boolean Expression.hasParentheses (or Term.hasParentheses).

@lucaswerkmeister
Copy link
Member Author

Boolean Type.isGrouped

No, nonsense, that doesn’t work with <<Type>>. We’ll need at least Integer Type.groupingLevel, Integer Expression.numberParentheses (names may vary).

@lucaswerkmeister
Copy link
Member Author

Excerpt from the official Java 7 grammar:

Primary: 
    Literal
    ParExpression
    […]
[…]
ParExpression: 
    ( Expression )

(Expression is complicated, but boils down to, among other things, containing a Primary or more.)

@lucaswerkmeister
Copy link
Member Author

@gavinking what do you think? Should I adopt the compiler’s Term/Expression relationship (possibly with better names), add an Integer Expression.numberOfParentheses, or something else?

@gavinking
Copy link
Member

IIRC the reason for the split b/w Expression and Term was something to do with not having multiple inheritance in my node subtype hierarchy. But looking at the code now, I don't really fully understand why it's necessary anymore. Perhaps the distinction could be removed.

The reason we don't have a ParenthesizedExpression node is just that neither the typechecker nor backends care. In your case, I think you do care, and it would make sense to have one.

@lucaswerkmeister
Copy link
Member Author

we don't have a ParenthesizedExpression

I thought that Expression was essentially ParenthesizedExpression (Term being what’s usually called Expression). It’s a subclass of Term that wraps a Term in optional parentheses.

@lucaswerkmeister
Copy link
Member Author

You see lots of stuff like this in the grammar:

spreadArgument returns [SpreadArgument positionalArgument]
    : PRODUCT_OP
      { $positionalArgument = new SpreadArgument($PRODUCT_OP); }
      unionExpression
      { Expression e = new Expression(null);    // <--
        e.setTerm($unionExpression.term);       // <--
        $positionalArgument.setExpression(e); } // why not $positionalArgument.setTerm($unionExpression.term); ?
    ;

@lucaswerkmeister
Copy link
Member Author

The grammar contains this hierarchy of expression rules:

expression = assignmentExpression > thenElseExpression > disjunctionExpression > conjunctionExpression > logicalNegationExpression > equalityExpression > comparisonExpression > existenceEmptinessExpression > entryRangeExpression > additiveExpression > scaleExpression > multiplicativeExpression > unionExpression > intersectionExpression > negationComplementExpression > exponentiationExpression > incrementDecrementExpression > postfixIncrementDecrementExpression > primary > base > (parExpression, others)

If we adopted this structure in the AST, we would solve both this issue and #18. I’m thinking something like this:

...
abstract class Expression10 of SumExpression|DifferenceExpression|Expression11
abstract class Expression11 of ScaleExpression|Expression12
...
class SumExpression(Expression10 left, Expression11 right)
class ScaleExpression(Expression12 left, Expression11 right)
...

or

...
abstract class AdditiveExpression of SumExpression|DifferenceExpression|ScalingExpression
abstract class ScalingExpression of ScaleExpression|MultiplicativeExpression
...
class SumExpression(AdditiveExpression left, ScalingExpression right)
class ScaleExpression(MultiplicativeExpression left, ScaleExpression right)
...

Conceptually, I like this approach, but I’m not sure about the naming of either of the variants above: I don’t like the many numbers of the first, but in the second the ScalingExpression extends AdditiveExpression sounds wrong, and the naming gets tough for levels which contain only one type (like ScaleExpression; in the grammar, you don’t have the abstract rule, so this problem doesn’t occur there).

@lucaswerkmeister
Copy link
Member Author

I guess the numbers in Expression10 et al. wouldn’t be too bad because users would rarely actually see them – you would just write

SumExpression {
    ProductExpression {
        NaturalLiteral(1);
        NaturalLiteral(2);
    };
    NaturalLiteral(3);
}

and that would be well-typed. If you swapped SumExpression and ProductExpression, you would get a typing error; it would be awesome if we could actually influence the compiler’s error message to include a hint to ParExpression, but that’s not going to happen. Still, with decent documentation on all Expression classes, I could live with Expression10.

@lucaswerkmeister
Copy link
Member Author

Of course, Expression10 etc. don’t necessarily need to be classes (as there isn’t a lot of shared meaning to them); they could also be aliases.

@lucaswerkmeister
Copy link
Member Author

Yeah, I’ll definitely go with Expression10 etc., as aliases. I’m still not sure about the naming though. The names Base and Primary are good enough that I’d hate to lose them, but they’re of course inconsistent with Expression10, so I might use AdditiveExpression instead, even though the naming becomes awkward for levels with just one expression type.

@lucaswerkmeister
Copy link
Member Author

I’m going to start with the types first, though, because their hierarchy is simpler (and I need them for expressions anyway).

Draft for the names:

  • Type = UnionType|IntersectionType|PrimaryType|EntryType
  • UnionType([<IntersectionType|PrimaryType>+] children)
  • IntersectionType([PrimaryType+] children)
  • PrimaryType = SimpleType|TupleType|IterableType|GroupedType|OptionalType|SequenceType|CallableType
  • OptionalType(PrimaryType definiteType)
  • SequentialType(PrimaryType elementType)
  • CallableType(PrimaryType returnType, TypeList argumentTypes)
  • TupleType(TypeList)
  • IterableType(VariadicType? elementType)
  • SimpleType(TypeNameWithArguments nameAndArgs)
  • QualifiedType(SimpleType|GroupedType qualifyingType, TypeNameWithArguments nameAndArgs)
  • BaseType(TypeNameWithArguments nameAndArgs)
  • TypeNameWithArguments(TypeName name, TypeArguments? arguments) does not extend Type
  • TypeList(<Type|DefaultedType>[] elements, VariadicType? variadic)
  • TypeArguments = [Type+] (does this need to be a class instead of an alias?)
  • EntryType(UnionType|IntersectionType|PrimaryType left, UnionType|IntersectionType|PrimaryType right)
  • GroupedType(Type type)
  • VariadicType(UnionedType type, Boolean isNonempty)
  • DefaultedType(Type type)

@lucaswerkmeister
Copy link
Member Author

Oh, and ceylon/ceylon-spec#852.

@lucaswerkmeister
Copy link
Member Author

Now that both { … } and [ … ] can contain a SomeType*, a SomeType+, or not, we should find some better solution than [UnionedType, Boolean /* isNonempty */]?, and use it in both IterableType and TypeList.

@lucaswerkmeister
Copy link
Member Author

I’m done with the type hierarchy! (Not pushed yet because of compiler bugs.)

So now I get to start with the actual expressions.

One thing to note is that I ended up making the Type equivalents to Expression10 etc. actual types (MainType, UnionableType). This is because real types just have better support in the general API: If MainType a type, every Transformer can use transformMainType; if MainType is an alias for a union type, then every Transformer has to switch on the value itself.

@lucaswerkmeister
Copy link
Member Author

I suppose I could add Transformer methods for aliases as well… but it’s conceptually ugly, not only because they’re not reachable from any node type’s transform, but also because they introduce an asymmetry: since aliases have no supertypes, transformSomeAlias would always narrow the type (the implementation would be in Transformer, not in NarrowingTransformer).

I also don’t know how useful it is that people can override transformSomeAlias; what would you want to do with it?

@lucaswerkmeister
Copy link
Member Author

(And since Transformer.transformSomeAlias would have several lines, it would probably break my source-gen – although this might be a good thing, because if I adapt my source-gen to handle this, then I can also use it for Editor and RedHatTransformer.)

@lucaswerkmeister
Copy link
Member Author

WRT transformSomeAlias: OTOH, these might be mostly useless.

  • if your Transformer is like Editor and RedHatTransformer in that it uses many different Result types, then it has to refine transformSomeAlias again anyways to change the return type (assert (is ReturnType ret = super.transformSomeAlias(that)); return ret;).
  • if your Transformer is like CeylonExpressionTransformer with only one Result type, then you can just use that.transform(this) instead of transformSomeAlias(that).

@lucaswerkmeister
Copy link
Member Author

Wait, no, this would still require every user to either

  • create the transformSomeAlias once in their subclass of Transformer, or
  • switch on the instance at every use in their Transformer.

Also, for the expression hierarchy the aliases would be pretty long (when fully expanded), so the second variant is completely unfeasible.

@lucaswerkmeister
Copy link
Member Author

I might just be stupid, but I can’t think of any cases where you’d call transformSomeAlias on a WideningTransformer… so it might be okay if we added transformSomeAlias only in NarrowingTransformer.

@lucaswerkmeister
Copy link
Member Author

Okay, I think the plan now is:

  • Expression10 etc. are aliases
  • I’ll try my best to find good names for them
  • NarrowingTransformer has methods for them that switch on the case types
  • I added --alias and --abstract generation to source-gen to facilitate adding these things

@lucaswerkmeister
Copy link
Member Author

(I’m working on the hierarchy in https://gist.github.com/lucaswerkmeister/64a59e091b70b36469d1)

@lucaswerkmeister
Copy link
Member Author

So, back to the names of Expression10 et al.… I’ve now come up with Precedence10Operation. To me, this name makes clear that it’s an auxiliary construct to represent operator precedence in the type system, and seeing the level as a number (easily comparable) might occasionally be useful. More importantly, I don’t think I could make SummableExpression or SummingExpression work, because some associativity levels have multiple operators (unary +-, */%, == != ===, etc.).

I’m not completely sure if it should be Precedence10Operation (only for operations, i. e., operator expressions) or Precedence10Expression. Do other expressions also have associativity that needs to be represented like that?
EDIT: I think Precedence10Operation will be enough. All other compound expressions are either subtypes of Primary or can only be combined with other expressions via GroupedExpression.

(Edited again because I confused precedence and associativity.)

@lucaswerkmeister
Copy link
Member Author

I think Precedence10Operation will be enough.

No, I think -Expression is better. A Precedence1Expression can also be a Primary (not an Operation), so Operation would be misleading. (For the same reason, I’ll write “the base expression” and not “the base operation” in the documentation; the type name should be consistent with that.)

lucaswerkmeister added a commit that referenced this issue Jul 12, 2014
and aliases for operator precedence (see #13).

ExponentiationOperation includes a really ugly workaround for a bug that
may or may not be ceylon/ceylon-compiler#1728; I’ll investigate that
some more.
@lucaswerkmeister
Copy link
Member Author

Well, the name pattern PrecedenceXExpression is really shitty when another precedence level needs to be inserted. I just noticed this for “operator-style expressions” (I’ll need to insert something for them between what’s now levels 16 and 17) – if this ever happens after the initial release (for example, for if expressions, or perhaps the function composition operator), we potentially break binary compatibility hard.

I’ll try to see if I can come up with good non-numerical names once more.

@lucaswerkmeister
Copy link
Member Author

  • Precedence1Expression = ExponentiableExpression
  • Precedence2Expression = NegatableExpression
  • Precedence3Expression = IntersectableExpression
  • Precedence4Expression = UnionableExpression
  • Precedence5Expression = MultipliableExpression
  • Precedence6Expression = ScalableExpression
  • Precedence7Expression = SummableExpression
  • Precedence8Expression = SpannableExpression
  • Precedence9Expression = ExistsableExpression?
  • Precedence10Expression = ComparableExpression
  • Precedence11Expression = EquatableExpression
  • Precedence12Expression = NottableExpression? (Well certainly not NotableExpression!)
  • Precedence13Expression = AndableExpression
  • Precedence14Expression = OrableExpression
  • Precedence15Expression = ThenableExpression
  • Precedence16Expression = AssignableExpression
  • Precedence17Expression = AssignedExpression?

As I wrote these down, I realized that

  1. some of these names, like ComparableExpression and SummableExpression, might me misunderstood to refer to the type of the expression, rather than its precedence; and
  2. this is actually only a bit better than the number names – if I want to insert something between level 16 and 17, I still have to rename AssignableExpression (because it’s no longer the direct child type of AssignmentOperation).

So let’s try the -ive route next, where the level is based on its own types rather than the types of the next level.

@lucaswerkmeister
Copy link
Member Author

Hm, -ive doesn’t seem to work that well most of the time. Perhaps combined with -ed?

  • Precedence1Expression = PrePostfixedExpression
  • Precedence2Expression = ExponentiatedExpression
  • Precedence3Expression = NegatedExpression
  • Precedence4Expression = IntersectedExpression
  • Precedence5Expression = UnionedExpression
  • Precedence6Expression = MultiplicativeExpression
  • Precedence7Expression = ScaledExpression
  • Precedence8Expression = AdditiveExpression
  • Precedence9Expression = SpannedExpression
  • Precedence10Expression = ExistsedExpression? PostfixTestedExpression?
  • Precedence11Expression = ComparedExpression
  • Precedence12Expression = EquatedExpression
  • Precedence13Expression = NottedExpression?
  • Precedence14Expression = AndedExpression? ConjunctiveExpression?
  • Precedence15Expression = OredExpression? DisjuntiveExpression?
  • Precedence16Expression = ThenElsedExpression?
  • Precedence17Expression = AssignedExpression

@lucaswerkmeister
Copy link
Member Author

I really just need to stop pretending that it’s possible to shoehorn all these operations into one naming scheme. Let’s keep the “name based on own expression” scheme, but drop all grammar schemes, and see where that gets us:

  • Precedence1Expression = PrePostfixedExpression
  • Precedence2Expression = ExponentiatedExpression
  • Precedence3Expression = NegatedExpression
  • Precedence4Expression = IntersectedExpression
  • Precedence5Expression = UnionedExpression
  • Precedence6Expression = MultiplicativeExpression
  • Precedence7Expression = ScaledExpression
  • Precedence8Expression = AdditiveExpression
  • Precedence9Expression = SpannedExpression
  • Precedence10Expression = ExistsNonemptyExpression
  • Precedence11Expression = ComparedExpression
  • Precedence12Expression = EquatedExpression
  • Precedence13Expression = NottedExpression?
  • Precedence14Expression = ConjunctiveExpression
  • Precedence15Expression = DisjuntiveExpression
  • Precedence16Expression = ThenElseExpression
  • Precedence17Expression = AssignedExpression

(Actually, most names seem fine with -ed – but there are a few where it’s just abominable to enforce it. So I changed only those.)

@lucaswerkmeister
Copy link
Member Author

Or perhaps -ing instead of -ed:

  • Precedence1Expression = PrePostfixingExpression
  • Precedence2Expression = ExponentiatingExpression
  • Precedence3Expression = NegatingExpression? Or perhaps InvertingExpression to avoid the conflict with logical negation, even though the operation is called negation, not inversion?
  • Precedence4Expression = IntersectingExpression
  • Precedence5Expression = UnioningExpression
  • Precedence6Expression = MultiplyingExpression
  • Precedence7Expression = ScalingExpression
  • Precedence8Expression = AddingExpression
  • Precedence9Expression = SpanningExpression
  • Precedence10Expression = ExistsNonemptyExpression
  • Precedence11Expression = ComparingExpression
  • Precedence12Expression = EquatingExpression
  • Precedence13Expression = NottingExpression? (see NegatingExpression)
  • Precedence14Expression = AndingExpression? ConjoiningExpression?
  • Precedence15Expression = OringExpression? DisjoiningExpression?
  • Precedence16Expression = ThenElseExpression
  • Precedence17Expression = AssigningExpression

In general, that seems a bit better to me.

@lucaswerkmeister
Copy link
Member Author

One issue I still have with these names is that they no longer convey that these aliases are just auxiliary constructs to represent precedence. Perhaps a different suffix than the generic -Expression would be better?

@lucaswerkmeister
Copy link
Member Author

One issue I still have with these names is that they no longer convey that these aliases are just auxiliary constructs to represent precedence.

That could be solved with

  • PrecedencePrePostfixExpression
  • PrecedenceExponentiationExpression
  • PrecedenceNegationExpression
  • etc.

but those names feel too long to me, even for auxiliary constructs.

@lucaswerkmeister
Copy link
Member Author

Final draft:

  • Precedence1Expression = PrePostfixingExpression
  • Precedence2Expression = ExponentiatingExpression
  • Precedence3Expression = InvertingExpression
  • Precedence4Expression = IntersectingExpression
  • Precedence5Expression = UnioningExpression
  • Precedence6Expression = MultiplyingExpression
  • Precedence7Expression = ScalingExpression
  • Precedence8Expression = AddingExpression
  • Precedence9Expression = SpanningExpression
  • Precedence10Expression = ExistsNonemptyExpression
  • Precedence11Expression = ComparingExpression
  • Precedence12Expression = EquatingExpression
  • Precedence13Expression = NegatingExpression
  • Precedence14Expression = ConjoiningExpression
  • Precedence15Expression = DisjoiningExpression
  • Precedence16Expression = ThenElseExpression
  • Precedence17Expression = OperatingExpression
  • Precedence18Expression = AssigningExpression

Avoiding the names “anding”, “oring” and “notting” because they’re just too silly. These names are not supposed to appear in user code often, so I think it’s okay to use “conjoining” and “disjoining”. As for “inverting” vs “negating”, the interface is called Invertible, so I think that’s okay as well.

It’s now been exactly four months since the original idea for representing precedence in the type system first came up… I just need to finally get this done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants