Skip to content

Latest commit

 

History

History
77 lines (53 loc) · 3.43 KB

1.arpeggio.rst

File metadata and controls

77 lines (53 loc) · 3.43 KB

QuickNote: Arpeggio

2022/11/8

In this short QuickNote, we give a bit of info on Arpeggio; a python package to implement a (PEG) parser. Eventually, the parser is written in Castle -- like all Castle-WorkshopTools. To kickstart, we use python and python-packages. Where arpeggio is one of the options.

As Arpeggio is quite well documented this is a short note. We also describe some differences with the Pegen <QN_PEGEN> package.

QN_PEGEN another candidate package for the PEG parser in the initial Castle-WorkshopTools

TextX & Arpeggio

Arpeggio is part of TextX, a python clone of (the java-based) XText --a language workbench for building DSLs. Arpeggio is the PEG parser for those Domain-Specific Languages.

Arpeggio takes another route than most parsers. It does not need a (text file) grammar to generate a parser for it, one can configure the grammar with python statements and directly use it -- no generation is needed. Alternatively, one can read a text-based grammar-- multiple meta-grammars are supported-- that is parsed first to configure the requested parser. This can be a bit confusing, at first sight; but is quite convenient. It also shows the power of both Python and Arpeggio itself. After all, Arpeggio is a parser to read any language; even a meta-language that describes the languages.

Parse-Tree

After creating the grammar in python or a PEG meta-grammar one can run the parser to parse a source file.

The result is a Parse-Tree, where each leave is a terminal node: a StrMatch or a RegExMatch object. All other nodes (non-terminals) are created by the grammar. Each node has a .name attribute: the parsing expression (name) that created it. It has some other convenient attributes too, like the references to the location in the source file.

Hint

This feature -- named nodes-- is missing in the Pegen <QN_PEGEN> parsers.

Visitors

Arpeggio has no inline actions1 (ref: GrammarActions). Instead, a visitor pattern can be used. For each node (kind) in the parse-tree, a visitor visit_{node}(self, node, children) can be written. Or, should be written -- as the default is to return a SemanticActionResults instance; which is often quite useless.

Note

Arpeggio call phase Semantic analysis; which can be a bit misleading as it generally is used to build the AST. Which is the input for the semantic-analyse, in a future step.

Mata-Syntax

Arpeggio supports the normal PEG semantics, including the lexer (aka tokenizer). Tokens can be described by literal strings, or by regular expressions -- the latter is very powerful.


Footnotes


  1. As Arpeggio is very flexible in its input grammar, it would be possible to add a PEG-Grammar that includes inline actions. That, however, is out of scope.