Simple parser builder written on Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Simple PEG parser builder written on Python.

There are many parser generators for python but each of 'em has IMO some disadvantages: e.g. not suitable for stream processing, or does not allow to track parsing and understand where it was failed, or does not support mode w/o skipping whitespaces, or has ugly rules syntax, or does not allow to attach semantic action in-place...

"Parsed" is inspired by "arpeggio" ( This parser generator is interesting and there was the chance I'll start to use it but... maybe I am too lazy to study the code and modify it to correspond my needs or, most probably, it is interesting to write own parser... so I wrote One More parser generator.

Writing new grammar parsers

To create parser and use predefined rules one need to import parsed package.

    from parsed import *

Parser generator function accepts no parameters and is decorated by @rule, it should return a rule, e.g.:

    def abc(): return char('abc') > value

As expected result of calling a generator function is a parser. Parser gets its name from generator function name. This is why lambdas etc. are not used. To correctly parse iterable source one should wrap it with the source() function.

    parser = abc()
    src = 'a'
    position, result = parser.parse(src)
    if result != nomatch:
        print "MATCH", result, "@", position
        print "NO MATCH"


Character matching rule

  • against single char

    #single char, ignore by default
    def is_A(): return char('A')
  • against char from iterable

    #matches against LF or CR, on match by default returns
    #matched character
    def vspace(): return char('\r\n')
  • against predicate

    import string
    #boolean predicate to test a char for some condition
    def __is_punct(c): return c in string.punctuation
    #any char matching predicate, return a value by default
    def is_punctuation(): return char(__is_punct)


Matching sequence of rules, using operator '+':

    #hash symbol('#'), followed by any 'abcABC'
    def hashed_abc(): return char('#') + 'abcABC'


Short circuiting 'OR':

    #hashed_abc from the example above OR any char from 'abc' set
    def hashed_abc_or_abc(): return hashed_abc | 'abc'


    #any character except 'a', does not consume input
    def not_a(): return ~char('a')


    #character 'a' repeated 1 or more times
    def one_or_more_a(): return char('a')[1:]

    #character 'a' repeated 0 or more times
    def zero_or_more_a(): return char('a')[0:]

    #character 'a' or its absence
    def maybe_a(): return char('a')[0:1]


Lookahead can be expressed or negating (adding prefix '-' operator) a rule or by appending rule to sequence using bitwise AND ('&') operator.

    #character 'a', matches only if followed by any character from 'abc'
    #set, does not consume input
    def a_before_abc(): return char('a') + -char('abc')

    #the same as above, in this case lookahead match is always
    #excluded from parsing results
    def also_a_before_abc(): return char('a') & char('abc')

Parsing (semantic) action

    #extract a list of characters from double quoted string
    #consisting from 'abc' characters
    def dquoted_abc(): return '"' + char('abc')[1:] + '"' > first

    #compose 2 characters in string like "{CHAR1}&{CHAR2}" if first is
    #from 'abc' set and second - from 'def'. lambda should be enclosed in
    #braces because it has lowest precendence
    def abc_def(): return char('abc') + 'def' > (lambda x: x[0] + '&' + x[1])

    #lookahead result can be also included in parsing results if
    #lookahead declared using prefix '-' but not binary '&'
    def a_before_abc(): return char('a') + (-char('abc') > value)


All examples above can be run and tested by running examples/

Predefined rules


Predefined constants

    nomatch #means rule is not matched

    empty #end of input (eof) or ignored result


Predefined actions

    list2str #string from list

    first, second #first/second element of list

    nth(N) #returns action extracting Nth list element

    ignore #return empty constant


Parsing options

  • is_trace: if it is set to True, parsing process will be traced into stderr

  • is_remember: if True --- use memoization

  • use_unicode

  • is_stat: gather matching statistics (now only hits and miss count for each rule)

What's next?

  • add option to ignore whitespaces

  • add Abstract Syntax Tree visualization

  • parser optimization

  • more unit tests to avoid regressions

  • ?..