Example: An Expression Language

Grammar

Let us consider this Xtext implementation of an Expressions language (similar to the Arithmetic example shipped with Xtext)

grammar it.xsemantics.example.expressions.Expressions 
    with org.eclipse.xtext.common.Terminals

generate expressions "http://xsemantics.sf.net/example/expressions/Expressions"

Model:
    (variables+=Variable)*
;

Variable:
    name=ID '=' expression=Expression
;

Expression:
    BooleanExpression;

BooleanExpression returns Expression:
    Comparison 
    (({AndOrExpression.left=current} op=("||"|"&&")) right=Comparison)*;

Comparison returns Expression:
       Equals
       (({Comparison.left=current} op=("<") ) right=Equals)*;

Equals returns Expression:
       Addition 
       (({Equals.left=current} op=("==") ) right=Addition)*;
 
Addition returns Expression:
    Multiplication
    (({Plus.left=current} '+' | {Minus.left=current} '-')
        right=Multiplication)*; 

Multiplication returns Expression:
    Prefixed (({MultiOrDiv.left=current} op=("*"|"/")) right=Prefixed)*;

Prefixed returns Expression:
    {BooleanNegation} =>"!" expression=Atomic | /* right associativity */
    {ArithmeticSigned} =>"-" expression=Atomic | /* right associativity */
    Atomic;

Atomic returns Expression:
    '(' Expression ')' |
    {NumberLiteral} value=INT |
    {StringLiteral} value=STRING |
    {BooleanLiteral} value=('true'|'false') |
    {VariableReference} ref=[Variable]
;

// the types will be used only internally by the type system
Type:
    {IntType} 'int' |
    {StringType} 'string' |
    {BooleanType} 'boolean'
;

With this language you can write untyped variable declarations assigning an expression. In this language, expressions can refer to variables declared before the current variable; the scoping will take care of this.

Note that, for the sake of simplicity, we do not consider a full expression grammar. For instance, we consider consider only the < comparison operator. This is to keep things simple and focus on implementing a type system and an interpreter with Xsemantics.

We have our grammar. Now, we would like to implement these checks:

the - / * binary operators and the - unary operator should only act on numeric expressions
+ can act both on numeric expressions and on strings; if one of the operand is a string, the whole expression should be a string
the == operator acts only on expressions of the same type
the < operator acts only on numeric and string expressions, and the two subexpressions must be of the same type
the && || binary operators and the ! unary operator act only on boolean expressions

So let's start writing the static semantics for the expressions language, i.e., the type system.

Note that in the grammar we also have a definition for Type, even though that rule will never be used in the grammar. We designed it this way so that we will have also the EClass definitions for the types that will be used by the type system. Of course, you could simply define the hierarchy for types using plain Java.

Expressions Type System

Since we will extend the semantics of this language, we will start with a first implementation.

system it.xsemantics.example.expressions.typing.ExpressionsSemantics

validatorExtends it.xsemantics.example.expressions.validation.AbstractExpressionsJavaValidator

import it.xsemantics.example.expressions.expressions.*

judgments {
    type |- Expression expression : output Type
        error "cannot type " + stringRep(expression)
        source expression
    // more judgments later
}

So we define a judgment that we call type which takes an Expression as input parameter and provides a Type as output.

axiom NumeralLiteral
    G |- NumberLiteral num : 
        ExpressionsFactory::eINSTANCE.createIntType

axiom BooleanLiteral
    G |- BooleanLiteral bool : 
        ExpressionsFactory::eINSTANCE.createBooleanType

axiom StringLiteral
    G |- StringLiteral str : 
        ExpressionsFactory::eINSTANCE.createStringType

For the constants, we can write axioms, which will simply return an instance of the corresponding type.

rule MultiOrDiv
    G |- MultiOrDiv multiOrDiv : IntType intType
from {
    G |- multiOrDiv.left : intType
    G |- multiOrDiv.right : intType
}

rule Minus
    G |- Minus minus : 
        ExpressionsFactory::eINSTANCE.createIntType
from {
    var IntType intType
    G |- minus.left : intType
    G |- minus.right : intType
}

The rules for multiplication, division and minus are similar. They state that the result type is an IntType, provided that the subexpressions have themselves an an IntType. They do this in two different (equivalent) ways.

The first one states that the result is an IntType which is the result of the typing of the left and right subexpressions. Remember that for this judgment, the second parameter is an output parameter. Basically, the second rule does the same, in a more explicit way, and it does not bind the output result (which is created on the fly) to the result of typing of the subexpressions (which, however, are still required to have integer type, since we pass to the rule invocation an IntType as output argument).

rule Plus
    G |- Plus plus : Type type
from {
    G |- plus.left : var Type leftType
    G |- plus.right : var Type rightType
    
    {
        (leftType instanceof StringType || rightType instanceof StringType)
        type = ExpressionsFactory::eINSTANCE.createStringType
    }
    or
    {
        (leftType instanceof IntType && rightType instanceof IntType)
        type = leftType // i.e., IntType
    }
}

Remember that for the + operator we are willing to deal both with strings (string concatenation) and with integers (standard arithmetic sum). If one of the two subexpressions has StringType, then we consider the whole plus expression as a string concatenation and we give it type string. As expected, if one of the subexpressions has boolean type, the expression will fail this rule.

Note that we use the or operator. See this section about premises <> in the reference for its semantics.

rule Comparison
    G |- Comparison comparison : ExpressionsFactory::eINSTANCE.createBooleanType
from {
    empty |- comparison.left : var Type leftType
    empty |- comparison.right : var Type rightType
    
    // cannot compare booleans
    (leftType instanceof IntType && rightType instanceof IntType) ||
    (leftType instanceof StringType && rightType instanceof StringType)
}

rule Equals
    G |- Equals comparison : ExpressionsFactory::eINSTANCE.createBooleanType
from {
    G |- comparison.left : var Type leftType
    G |- comparison.right : var Type rightType
    
    // can compare only if they have the same type
    (leftType.eClass == rightType.eClass)
}

The code for Comparison and Equals are similar, in that they both return a boolean type as a result. But while equality can be checked with any type of expressions (provided the subexpressions have the same type), the comparison does not consider boolean subexpressions.

rule BooleanNegation
    G |- BooleanNegation negation : BooleanType boolType
from {
    G |- negation.expression : boolType
}

rule AndOr
    G |- AndOrExpression andOr : BooleanType boolType
from {
    G |- andOr.left : boolType
    G |- andOr.right : boolType
}

rule ArithmeticSigned
    G |- ArithmeticSigned signed : ExpressionsFactory::eINSTANCE.createIntType
from {
    G |- signed.expression : var IntType intType
}

The rules for boolean negation, logical && and || and the unary arithmetic signed operations should be clear from the previous examples.

Type Inference for Variables

We need one more rule for typing expressions, in particular the one for VariableReference, whose type is the type of the referred variable. But what is the type of a variable? The idea is that we infer the type of a variable from the type of its expression.

Since Variable is not an Expression in our grammar, we need to introduce another judgment in our type system.

Side Note: an alternative would have been to introduce in the grammar a rule introducing a common superclass for both Variable and Expression, e.g., Typable, and write the type judgment with the first parameter of type Typable.

judgments {
    ... // as above...
    vartype ||- Variable variable : output Type
        error "cannot type " + stringRep(variable)
        source variable
        feature ExpressionsPackage::eINSTANCE.variable_Expression
}

Note that the new judgment has a different symbol w.r.t. the type judgment. Moreover, when specifying the error for such judgments we specify that the EObject for the error marker is the variable, but the feature for the error marker is the variable's expression. This will generate a better error marker.

rule VariableReference
    G |- VariableReference varRef : Type type
from {
    G ||- varRef.ref : type
}

rule Variable
    G ||- Variable variable : Type type
from {
    variable.expression != null
    G |- variable.expression : type
}

Now, the type rule for VariableReference relies on the judgment vartype, which states that the type of a variable is the type of its expression.

IMPORTANT: expressions can refer to other variables, and in case of mutual dependencies like:

i = j + 10
j = i *5

the typing would enter an infinite loop! For this reason, for this example language we limited the scope of variables: expressions can refer only to variables defined before the current expression.

Expressions Validation

Now that we have our type system for expressions we can write validation rules (see the section CheckRules<> in the reference):

checkrule CheckVariable for
    Variable variable
from {
    empty ||- variable : var Type type
}

This simply checks that we can give a Variable a type (in an empty environment).

Please refer to GeneratedValidator <>, in particular validatorExtends<>, to see how to use the generated Java validator, which we also followed in the implementation of this Expressions example.

Interpreter for Expressions

Xsemantics can be used also to write interpreter rules, which can be seen also as the operational semantics of the language.

For the Expressions language we thus introduce another judgment kind:

judgments {
    ... // as above...
    interpret |- Expression expression ~> output Object
}

Note that we assume that we will run the interpreter ONLY on well-typed Expressions program.

We then start writing the rules for this new judgment

axiom InterpretNumberLiteral
    G |- NumberLiteral number ~> number.value

axiom InterpretStringLiteral
    G |- StringLiteral string ~> string.value

axiom InterpretBooleanLiteral
    G |- BooleanLiteral bool ~> Boolean::valueOf(bool.value)

The rules for literals are pretty straightforward; note only the case for boolean literals: since the value feature is a string, we need to convert it into an actual boolean.

rule InterpretMinus
    G |- Minus plus ~> Integer result
from {
    G |- plus.left ~> var Integer leftResult
    G |- plus.right ~> var Integer rightResult
    result = leftResult.intValue - rightResult.intValue
}

rule InterpretMultiOrDiv
    G |- MultiOrDiv multiOrDiv ~> Integer result
from {
    G |- multiOrDiv.left ~> var Integer leftResult
    G |- multiOrDiv.right ~> var Integer rightResult
    if (multiOrDiv.op == '*')
        result = leftResult.intValue * rightResult.intValue
    else
        result = leftResult.intValue / rightResult.intValue
}

rule InterpretArithmeticSigned
    G |- ArithmeticSigned signed ~> Integer result
from {
    G |- signed.expression ~> var Integer expResult
    result = -(expResult)
}

The above rules rely on the fact that the result of these subexpressions is an Integer.

rule InterpretAndOr
    G |- AndOrExpression andOr ~> Boolean result
from {
    G |- andOr.left ~> var Boolean leftResult
    G |- andOr.right ~> var Boolean rightResult
    
    if (andOr.op == "&&")
        result = leftResult.booleanValue && rightResult.booleanValue
    else
        result = leftResult.booleanValue || rightResult.booleanValue
}

rule InterpretBooleanNegation
    G |- BooleanNegation neg ~> Boolean result
from {
    G |- neg.expression ~> var Boolean expResult
    result = !expResult
}

Similarly in case of boolean expressions the result of subexpressions is assumed to be a Boolean.

rule InterpretComparison
    G |- Comparison comparison ~> Boolean result
from {
    empty |- comparison.left ~> var Object leftResult
    empty |- comparison.right ~> var Object rightResult
    
    if (leftResult instanceof String && rightResult instanceof String) {
        result = leftResult.toString < rightResult.toString
    } else {
        // both are int if the expression is well-typed
        result = (leftResult as Integer) < (rightResult as Integer)
    }
}

rule InterpretEquals
    G |- Equals comparison ~> Boolean result
from {
    empty |- comparison.left ~> var Object leftResult
    empty |- comparison.right ~> var Object rightResult
    
    result = leftResult.toString == rightResult.toString
}

For comparison, the result will be a Boolean; note that we check whether the subexpressions are strings to convert them to a string representation and compare such representation; the only other possible case is that they are both integer (remember, we will only call the interpreter on well-typed programs). Note that for equality we simply convert the result of subexpressions to string and compare the resulting string representation.

rule InterpretPlus
    G |- Plus plus ~> Object result
from {
    G |- plus.left ~> var Object leftResult
    G |- plus.right ~> var Object rightResult
    
    if (leftResult instanceof String || rightResult instanceof String) {
        var leftString = leftResult.toString
        var rightString = rightResult.toString
        result = leftString + rightString
    } else {
        // both are int if the expression is well-typed
        var leftInt = leftResult as Integer
        var rightInt = rightResult as Integer
        result = leftInt + rightInt
    }
}

rule InterpretVariableRefenrence
    G |- VariableReference varRef ~> Object result
from {
    G |- varRef.ref.expression ~> result
}

At this point, the remaining above interpreter rules should be straightforward.

Customizations for Expressions

For the Expressions language we provide a custom implementation of StringRepresentation (see also section StringRepresentation):

public class ExpressionsStringRepresentation extends StringRepresentation {
    protected String stringRep(String s) {
        return "'" + s + "'";
    }
}

This way when a String has to be represented by Xsemantics generated code, it will add quotes around it; this way we are sure that we are dealing with an actual string.

Of course, we provide the guice binding for this:

public class ExpressionsRuntimeModule extends
        it.xsemantics.example.expressions.AbstractExpressionsRuntimeModule {

    public Class<? extends StringRepresentation> bindStringRepresentation() {
        return ExpressionsStringRepresentation.class;
    }
...
}

A more involved Expressions System

Now, let's complicate things a bit, and make the system for our Expressions language more powerful, with this new requirement: we want to be able to implicitly convert string literals to numbers and booleans when this is possible. This way, if an expression requires the subexpressions to be numbers or booleans we can also accept string literals if they can be convertible. Note that we intentionally avoided to consider the number literal 0 as false and a number literal different from 0 as true, but you can extend the system further if you want. Thus, we want the following expressions to be valid (and interpreted accordingly with an implicit conversion):

i = 20 - '5' // OK! can be given int type and evaluates to 15
b = (i > 20) || 'true' // OK! can be given boolean type
// and evaluates to true

Note that in case no expectation on types is expressed, the system will be behave as before:

i = 20 + '5' // has string type and evaluates to '205'
j = (20 * 1) + '5' // has string type again
k = 20 * (1 + '5') // has int type, since * requires int types
// on subexpressions

Since this more involved system will have most rules in common with the first system we presented, we will write this new system as an extended system (section SystemExtension), starting from the previous one:

system it.xsemantics.example.expressions.typing.ExtendedExpressionsSemantics
    extends it.xsemantics.example.expressions.typing.ExpressionsSemantics

Thus, we introduce a new judgment in our system (the other ones are simply inherited from the super system):

judgments{
    coerce |~ Expression expression |> Type expectedType
        error "cannot convert " + stringRep(expression) +
            " to type " + stringRep(expectedType)
        source expression
}

We then have to define the rules for this new judgment:

rule StringToInt
    G |~ StringLiteral string |> IntType type
from {
    Integer::parseInt(string.value)
}

rule StringToBool
    G |~ StringLiteral string |> BooleanType type
from {
    string.value.equalsIgnoreCase("true") ||
    string.value.equalsIgnoreCase("false")
}

Then, this judgment will be used by the other rules; in particular, we need to modify some rules we have seen so far.

The idea is that, as hinted above, that rules that expect a subexpression to be of a specific type should "communicate" this expectation through the rule environment (refer to section Environment), by mapping the string 'expected' to the expected type.

We will need then to override some rules of the previous system: the rule for StringLiteral, since its type now depends on the above expectation:

override rule StringLiteral
    G |- StringLiteral str : Type resultType
from {
    {
        val expected = env(G, 'expected', Type)
        G |~ str |> expected
        resultType = expected
    }
    or
        resultType = ExpressionsFactory::eINSTANCE.createStringType
}

The first block of the or can fail due to the fact that

No expectation was requested or
the expectation cannot be satisfied

In any other case the type will be string (which might make invoking rules fail).

Now we need to override those rules that have expectations on subexpressions, in order to pass to rule invocation an appropriate environment:

override rule MultiOrDiv
    G |- MultiOrDiv multiOrDiv : IntType intType
from {
    intType = ExpressionsFactory::eINSTANCE.createIntType
    G, 'expected' <- intType |- multiOrDiv.left : intType
    G, 'expected' <- intType |- multiOrDiv.right : intType
}

override rule AndOr
    G |- AndOrExpression andOr : BooleanType boolType
from {
    boolType = ExpressionsFactory::eINSTANCE.createBooleanType
    G, 'expected' <- boolType |- andOr.left : boolType
    G, 'expected' <- boolType |- andOr.right : boolType
}

We only showed two examples (the other rules for expected int and boolean types are updated accordingly).

Note that in cases where no expectation is required, we now need to pass an empty environment when typing subexpressions, otherwise we might reject well-typed expressions due to expectations that were passed to the current rule upon invocation. This however was already implemented in the super system, so we don't need to override those rules.

The rule for Plus now is a little bit more complex, since it has to take care of possible expectations:

override rule Plus
    G |- Plus plus : Type type
from {
    G |- plus.left : var Type leftType
    G |- plus.right : var TyThe -e option is not standard. A POSIX-compliant echo does not accept any options:pe rightType
    
    { 
        val expected = env(G, 'expected', Type)
        G |~ plus.left |> expected
        G |~ plus.right |> expected
        type = expected
    }
    or
    {
        (leftType instanceof StringType || rightType instanceof StringType)
        type = ExpressionsFactory::eINSTANCE.createStringType
    }
    or
    {
        (leftType instanceof IntType && rightType instanceof IntType)
        type = leftType // i.e., IntType
    }
}

Now we need to adapt also some interpretation rules to take into consideration expectations:

override rule InterpretStringLiteral
    G |- StringLiteral string ~> Object result
from {
    var Type expected
    {
        expected = env(G, 'expected', IntType)
        result = Integer::parseInt(string.value)
    }
    or
    {
        expected = env(G, 'expected', BooleanType)
        result = Boolean::parseBoolean(string.value)
    }
    or
    result = string.value
}

override rule InterpretMultiOrDiv
    G |- MultiOrDiv multiOrDiv ~> Integer result
from {
    var IntType intType = ExpressionsFactory::eINSTANCE.createIntType
    G, 'expected' <- intType |- multiOrDiv.left ~> var Integer leftResult
    G, 'expected' <- intType |- multiOrDiv.right ~> var Integer rightResult
    if (multiOrDiv.op == '*')
        result = leftResult.intValue * rightResult.intValue
    else
        result = leftResult.intValue / rightResult.intValue
}

Again, we only showed some modified rules; now it should be clear what other rules need to be adapted and how.

All the other rules which do not require adaption will be simply inherited from the super system.

Generation for Expressions

The Expressions language we used in this example does not implement any real code generation. However, we implemented a generation strategy just for the sake of demonstration on how to use the generated Java code and the trace utilities.

The generator will generate an .output file which shows the type for each variable, the interpretation of its expression, together with the application traces (as strings) both for the type and the interpretation. The relevant part of the generator (written in Xtend2) is shown:

class ExpressionsGenerator implements IGenerator {
    
    @Inject ExpressionsSemantics semantics
    @Inject extension TraceUtils
    @Inject extension StringRepresentation
    
    // skipped some code
    
    def compileExpression(Expression exp) {
        val typeTrace = new RuleApplicationTrace()
        val interpreterTrace = new RuleApplicationTrace()
        val type = semantics.type(null, typeTrace, exp)
        val result = semantics.interpret(null, interpreterTrace, exp)
        '''
        type: Â«type.value.stringÂ»
        type trace: Â«typeTrace.traceAsStringÂ»
        
        interpretation: Â«result.value.stringÂ»
        interpretation trace: Â«interpreterTrace.traceAsStringÂ»
        '''
    }
}

ExpressionsSemantics is the Java class generated by Xsemantics from the system we wrote for the Expressions language.

Actually, we want to use the extended system (section MoreInvolvedExpressionsSystem), thus we will bind ExpressionsSemantics to ExtendedExpressionsSemantics in the runtime module:

public class ExpressionsRuntimeModule extends
        it.xsemantics.example.expressions.AbstractExpressionsRuntimeModule {
...
    public Class<? extends ExpressionsSemantics> bindExpressionsSemantics() {
        return ExtendedExpressionsSemantics.class;
    }
}

Note that string and traceAsString come from StringRepresentation and TraceUtils respectively, and they are used as extension methods.

Thus, starting from this expression:

 myString = 'foo' + 10 + false

we get this output:

 Variable: myString
type: StringType
type trace: 
Plus: [] |- 'foo' + 10 + false : StringType
 Plus: [] |- 'foo' + 10 : StringType
  StringLiteral: [] |- 'foo' : StringType
  NumeralLiteral: [] |- 10 : IntType
 BooleanLiteral: [] |- false : BooleanType

interpretation: 'foo10false'
interpretation trace: 
InterpretPlus: [] |- 'foo' + 10 + false ~> 'foo10false'
 InterpretPlus: [] |- 'foo' + 10 ~> 'foo10'
  InterpretStringLiteral: [] |- 'foo' ~> 'foo'
  InterpretNumberLiteral: [] |- 10 ~> 10
 InterpretBooleanLiteral: [] |- false ~> false

Note that the indentation represents the stack of invoked rules. The quotes around string literals come from our customization of StringRepresentation, see section ExpressionsCustomization. Moreover [] represents an empty environment.

A more involved example including variable references and implicit coercions is:

intVar = 1
intVar2 = 2 * (intVar + '1')

which produces the following output

Variable: intVar2
type: IntType
type trace: 
MultiOrDiv: [] |- 2 * (intVar + '1') : IntType
 NumeralLiteral: ['expected' <- IntType] |- 2 : IntType
 Plus: ['expected' <- IntType] |- intVar + '1' : IntType
  VariableReference: ['expected' <- IntType] |- intVar : IntType
   Variable: ['expected' <- IntType] ||- intVar = 1 : IntType
    NumeralLiteral: ['expected' <- IntType] |- 1 : IntType
  StringLiteral: ['expected' <- IntType] |- '1' : IntType
   StringToInt: ['expected' <- IntType] |~ '1' |> IntType

interpretation: 4
interpretation trace: 
InterpretMultiOrDiv: [] |- 2 * (intVar + '1') ~> 4
 InterpretNumberLiteral: ['expected' <- IntType] |- 2 ~> 2
 InterpretPlus: ['expected' <- IntType] |- intVar + '1' ~> 2
  InterpretVariableRefenrence: ['expected' <- IntType] |- intVar ~> 1
   InterpretNumberLiteral: ['expected' <- IntType] |- 1 ~> 1
  InterpretStringLiteral: ['expected' <- IntType] |- '1' ~> 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly