Skip to content

Introduction

Dávid Németi edited this page Jan 31, 2023 · 26 revisions

Sarcasm (by Dávid Németi) is an SDK for .NET platform based on Irony SDK (by Roman Ivantsov).

Irony

Irony provides an easy way to define grammars in pure C# (or in any other .NET language) without using a specific grammar meta-language. Rather than generating a specific parser based on a specific grammar, Irony has some kind of general parser which can use any grammar defined in Irony, and parse text which is written in the language specified by the grammar. The implementation of AST building (abstract syntax tree) can be written in C# (or in any other .NET language) as well.

Sarcasm

Sarcasm (using together with Irony) provides the following extra features:

  • Index-free AST building using domain-grammar bindings (so you don't have to deal with parse tree child node indexing).
  • Typesafe AST building using typesafe grammar (so you can work with the proper types in the syntax tree instead of typeless objects, thus getting compile time errors if you make a mistake).
  • Automatic unparsing provided by the general unparser based on the grammar (so you can also unparse your AST to get a text representation of your structure).

These features and concepts are going to be detailed later. But first let's see a little "teaser" about what you can do with using Sarcasm:

Example

Domain

Let's define a very simple expression domain:

namespace MyExpression
{
    // This class represents the domain, specifies its root, and may define domain specific settings.
    public class Domain : Sarcasm.DomainCore.Domain<DomainDefinitions.Expression> { }

    // the following types are the definitions in the domain
    namespace DomainDefinitions
    {
        public abstract class Expression
        {
        }

        public class BinaryExpression : Expression
        {
            public Expression Term1 { get; set; }
            public BinaryOperator Op { get; set; }
            public Expression Term2 { get; set; }
        }

        public class NumberLiteral : Expression
        {
            public decimal Value { get; set; }
        }

        public enum BinaryOperator
        {
            Add,
            Sub,
            Mul,
            Div,
            Pow
        }
    }
}

Grammar

Now let's define a grammar for this domain. Note that this is not just a simple grammar definition, it also contains domain-grammar bindings (BindTo methods). Also note that the grammar (terminals and nonterminals) and the domain-grammar bindings are both typesafe.

using D = MyExpression.DomainDefinitions;

namespace MyExpression
{
    public class Grammar : Sarcasm.GrammarAst.Grammar<D.Expression>
    {
        public Grammar()
            : base(new MyExpression.Domain())    // the grammar needs the domain specific settings while building the AST
        {
            var TerminalFactoryS = new TerminalFactoryS(this);

            // definitions of nonterminals
            
            var expression = new BnfiTermChoice<D.Expression>();
            var binaryExpression = new BnfiTermRecord<D.BinaryExpression>();
            var numberLiteral = new BnfiTermRecord<D.NumberLiteral>();
            var binaryOperator = new BnfiTermChoice<D.BinaryOperator>();

            // definitions of terminals

            var ADD_OP = TerminalFactoryS.CreateKeyTerm("+", D.BinaryOperator.Add);
            var SUB_OP = TerminalFactoryS.CreateKeyTerm("-", D.BinaryOperator.Sub);
            var MUL_OP = TerminalFactoryS.CreateKeyTerm("*", D.BinaryOperator.Mul);
            var DIV_OP = TerminalFactoryS.CreateKeyTerm("/", D.BinaryOperator.Div);
            var POW_OP = TerminalFactoryS.CreateKeyTerm("^", D.BinaryOperator.Pow);

            var LEFT_PAREN = TerminalFactoryS.CreateKeyTerm("(");
            var RIGHT_PAREN = TerminalFactoryS.CreateKeyTerm(")");

            // syntax rules
            
            expression.SetRuleOr(
                binaryExpression,
                numberLiteral,
                LEFT_PAREN + expression + RIGHT_PAREN
                );

            binaryExpression.Rule =
                expression.BindTo(binaryExpression, t => t.Term1)
                + binaryOperator.BindTo(binaryExpression, t => t.Op)
                + expression.BindTo(binaryExpression, t => t.Term2)
                ;

            numberLiteral.Rule = TerminalFactoryS.CreateNumberLiteral().BindTo(numberLiteral, t => t.Value);

            binaryOperator.Rule = ADD_OP | SUB_OP | MUL_OP | DIV_OP | POW_OP;

            // operator precedences and associativities
            
            RegisterOperators(10, ADD_OP, SUB_OP);
            RegisterOperators(20, MUL_OP, DIV_OP);
            RegisterOperators(30, Associativity.Right, POW_OP);
        }
    }
}

Now we have the domain and the grammar too. Since the grammar contains the domain-grammar bindings, it has all the information how to build the AST (Abstract Syntax Tree), so we don't need to implement a separate AST builder.

Since we don't need to work with a typeless parse tree, don't need to index its child nodes and retrieve the AST values they hold, our code is much less error-prone than by using a conventional grammar with a separate AST builder. By using domain-grammar bindings index-errors will disappear, and by using typesafe grammars type errors will emerge at compile time instead of run time.

Usage

Now let's use what we have, and parse a simple expression from text, then unparse it back to text:

using MyExpression.DomainDefinitions;
using Sarcasm.Parsing;
using Sarcasm.Unparsing;

public static class TestClass
{
    public static void Test()
    {
        MyExpression.Grammar grammar = new MyExpression.Grammar();
        
        MultiParser<Expression> parser = MultiParser.Create(grammar);   // it's using Irony.Parsing.Parser
        string originalText = "(1 + 2) * 3";
        ParseTree<Expression> parseTree = parser.Parse(originalText);

        Expression expression = parseTree.RootAstValue;     // we have our expression
        /*
         * expression's structure is the following:
         * 
         * BinaryExpression(
         *     Op:  BinaryOperator.Mul
         *     Term1:
         *         BinaryExpression(
         *             Op:  BinaryOperator.Add
         *             Term1:
         *                 NumberLiteral(value: 1)
         *             Term2:
         *                 NumberLiteral(value: 2)
         *         )
         *     Term2:
         *         NumberLiteral(value: 3)
         * )
         * */
        
        Unparser unparser = new Unparser(grammar);
        string unparsedText = unparser.Unparse(expression).AsText(unparser);
        // unparsedText will contain the string "(1 + 2) * 3"
    }
}

Note that our grammar, our parser and our parse tree (i.e. the AST value inside the parse tree root node) are all typesafe.

If you would like to get the comments you can do that too:

ParseTree<Expression> parseTree = parser.Parse(originalText);
Document document = parseTree.GetDocument();

Document has both the AST and the comments. The comments are associated with the AST nodes, and stored in a separate dictionary inside Document.

You can also unparse your AST with comments:

string unparsedText = unparser.Unparse(document).AsText(unparser);

If you would like to read more, continue with Conventional Grammars.

Clone this wiki locally