Skip to content
alexey-ryazhskikh edited this page Dec 3, 2023 · 24 revisions

expression parsing

defining expressions grammar

Writing an expression grammar is a tedious task. A recursive descent parser is hard to maintain when parsing expressions with multiple precedence levels. So CSLY offers a way to express expression parsing using only operator tokens and precedence level. CSLY will then generates production rules to parse expressions.

⚠️ When using expressions generator you MUST build the grammar with EBNF_LL_RECURSIVE_DESCENTtype as :

Parser = builder.BuildParser(parserInstance, ParserType.EBNF_LL_RECURSIVE_DESCENT, StartingRule); 

operations

An expression operation is noted using the [Operation(token, affix, associativity, precedence)] attribute where

  • token is

    • the int value of a token ( C# does not support generics on attributes. May be coming with C# 8)
    • or the string representation of the token.
  • affix is the affix of the expression :

    • Affix.PreFix : for prefix operators such as : - 3
    • Affix.PostFix : for postfix operators such as : i ++
    • Affix.InFix : for infix operators such as : 2 + 2
  • associativity is an Associativity enum member

    • Associativity.Right for right associativity
    • Associativity.Left for left associativity
  • precedence is an int stating the precedence level : the higher the int is the higher the precedence is. Each attribute is associated to a method that will act as the syntax tree visitor for the matching operation.

Shorter attributes

There exist shorter attributes for infix, prefix and postfix operations :

  • Infix(token, associativity, precedence) : infix operation
  • Prefix(token, associativity, precedence) : prefix operation
  • Postfix(token, associativity, precedence) : postfix operation

example

for example here is a basic example for a numeric expression parser :

public class SimpleExpressionParser
    {
    
        // using token int value
        [Operation((int)ExpressionToken.PLUS, Affix.InFix, Associativity.Right, 10)]
        // using token string representation and Infix attribute
        [Infix("MINUS", Associativity.Left, 10)]
        public int binaryTermExpression(int left, Token<ExpressionToken> operation, int right)
        {
            int result = 0;
            switch (operation.TokenID)
            {
                case ExpressionToken.PLUS:
                    {
                        result = left + right;
                        break;
                    }
                case ExpressionToken.MINUS:
                    {
                        result = left - right;
                        break;
                    }
            }
            return result;
        }

        
        // using token int value
        [Operation((int)ExpressionToken.TIMES, Affix.InFix,, Associativity.Right, 50)]
        // using token string representation
        [Operation("DIVIDE", Affix.InFix, Associativity.Left, 50)]
        public int binaryFactorExpression(int left, Token<ExpressionToken> operation, int right)
        {
            int result = 0;
            switch (operation.TokenID)
            {                
                case ExpressionToken.TIMES:
                    {
                        result = left * right;
                        break;
                    }
                case ExpressionToken.DIVIDE:
                    {
                        result = left / right;
                        break;
                    }
            }
            return result;
        }

        [Prefix((int)ExpressionToken.MINUS, Associativity.Right, 100)]
        public int unaryExpression(Token<ExpressionToken> operation, int value)
        {
            return -value;
        }

operands

The expression parser generator need to identify a production rule for operands (int bool or string scalar values for example). the [Operand] attribute must indicate the Production rule producing basic operands

    [Operand]
    [Production("operand : INT")]
    public int operand(Token<ExpressionToken> value)
    {
        return value.IntValue;
    }

You can define many [Operand] visitor methods. As in :

    [Operand]
    [Production("operand_int : INT")]
    public int operand(Token<ExpressionToken> value)
    {
        return value.IntValue;
    }

    [Operand]
    [Production("operand_double : double")]
    public int operand(Token<ExpressionToken> value)
    {
        return value.IntValue;
    }

using the expression parser

CSLY generates a non terminal named <ParserClassName>_expressions where <ParserClassName> is the C# parser class name (case sensitive). SimpleExpressionParser_expressions for the above example.

under the hood

expression production rules

Under the hood, for each precedence level CSLY generates dedicated production rules. Let say we have 2 precedence level P1 (operator "+") and P2(operator "*") with P2 > P1 then CSLY generates the following rules :

> > ** P1 ***
> > P1 : P2 "+" P1
> > P1 : P2

> > ** P2 **
> > P2 : operand "*" operand
> > P2: operand

This scheme can be reproduced with as many precedence level as needed.

⚠️ LL grammars can only produce right associative operator as LL grammar can not be left recursive. nevertheless CSLY can handle left associativity as explained below

about left associativity

As stated above LL grammar (the ones handled by CSLY) cannot managed left associative operator. Hence for a naive expression parser 1 - 2 - 3 will be understood as 1 - (2 - 3) = 0 and not (1 - 2) - 3 = -4 as expected.

To cope with this CSLY acts like this :

  1. produce a syntaxic tree (right associative)
  2. if needed (i.e. a left associative operation is found in the syntaxic tree) : rearrange the tree to produce a left associative tree.