indented languages

Presentation

Since version 2.7.0.0 CSLY can parse indented languages like python or YAML.

With CSLY indentations are always relative. if at some point of the parsed source you use 3 spaces (" ") to indent than you 'll have to use the same 3 spaces for all elements of the same level. You can mix spaces and tabs as long as you respects relative indents (I've choosed not to take position in the tabs vs spaces war).

It introduces 2 special tokens INDENT and UINDENT to denote respectively indentation and unindentation. These tokens can then be used to surround blocks of code.

Indentation is only supported by the generic lexer

Configuration

To make generic lexer add the INDENT and UINDENT tokens in the token stream you have to declare that your lexer is scanning an indented language with the IndentationAware of the [Lexer] attribute :

[Lexer(IndentationAWare = true)]
public enum {
    // here goes your lexeme definitions
}

How to use

Lexer

As explained above you first have to declare that your lexer is indentation aware. Note that you don't have to declare INDENT an UINDENT tokens, hey will be automatically managed.

[Lexer(IndentationAWare = true)]
    public enum IndentedLangLexer
    {
        [Lexeme(GenericToken.Identifier, IdentifierType.Alpha)]
        ID = 1,

        [Lexeme(GenericToken.KeyWord, "if")] IF = 2,

        [Lexeme(GenericToken.KeyWord, "else")] ELSE = 3,

        [Lexeme(GenericToken.SugarToken, "==")] EQ = 4,

        [Lexeme(GenericToken.SugarToken, "=")] SET = 5,

        [Lexeme(GenericToken.Int)] INT = 6,
        
    }

Parser

When defining rules you can simply use INDENT an UINDENT tokens as other token to delimit blocks. The following parser defines a very simple language with if then else and assign statements. Then and Else blocks are delimited with indentations (as with python).

public class IndentedParser
    {
        
        [Production("id : ID")]
        public Ast id(Token<IndentedLangLexer> tok)
        {
            return new Identifier(tok.Value);
        }

        [Production("int : INT")]
        public Ast integer(Token<IndentedLangLexer> tok)
        {
            return new Integer(tok.IntValue);
        }

        [Production("statement: [set|ifthenelse]")]
        public Ast Statement(Ast stat)
        {
            return stat as Statement;
        }
        
        [Production("set : id SET[d] int")]
        public Ast Set(Identifier id, Integer i)
        {
            return new Set(id, i);
        }
        
        [Production("cond : id EQ[d] int")]
        public Ast Condi(Identifier id, Integer i)
        {
            return new Cond(id, i);
        }

        [Production("root: statement*")]
        public Ast Root(List<Ast> statements)
        {
            return new Block(statements);
        }
        
        [Production("ifthenelse: IF[d] cond block (ELSE[d] block)?")]
        public Ast ifthenelse(Cond cond, Block thenblk, ValueOption<Group<IndentedLangLexer,Ast>> elseblk)
        {
            var eGrp = elseblk.Match(
                x => {
                return x;
            }, () =>
            {
                return null;
            });
            var eBlk = eGrp?.Value(0) as Block;
            return new IfThenElse(cond, thenblk, eBlk);
        }

        // a block is a group of statements at the same indentation level
        // it is surrounded by INDENT and UINDENT tokens.
        
        [Production("block : INDENT[d] statement* UINDENT[d]")]
        public Ast Block(List<Ast> statements)
        {
            return new Block(statements);
        }
    }

Menu

Provide feedback

Saved searches