Skip to content

Latest commit

 

History

History
219 lines (158 loc) · 5.03 KB

Brackets.md

File metadata and controls

219 lines (158 loc) · 5.03 KB

Nested Brackets Mini-HOWTO

Author:
Andrey Vlasovskikh
License:
Creative Commons Attribution-Noncommercial-Share Alike 3.0
Library Homepage:
http://code.google.com/p/funcparserlib/
Library Version:
0.3.6

Intro

Let's try out funcparserlib using a tiny example: parsing strings of nested curly brackets. It is well known that it can't be done with regular expressions so we need a parser. For more complex examples see The funcparserlib Tutorial or other examples at the funcparserlib homepage.

Here is the EBNF grammar of our curly brackets language:

nested = "{" , { nested } , "}" ;

i. e. nested is a sequence of the symbol {, followed by zero or more occurences of the nested production itself, followed by the symbol }. Let's develop a parser for this grammar.

We will parse plain strings, but in real life you may wish to use funcparserlib.lexer or any other lexer to tokenize the input and parse tokens, not just symbols.

We will use the following funcparserlib functions: a, forward_decl, maybe, finished, many, skip, pretty_tree. The library actually exports only 11 functions, so the API is quite compact.

Nested Brackets Checker

Basic usage:

>>> from funcparserlib.parser import a

>>> brackets = a('{') + a('}')

>>> brackets.parse('{}')
('{', '}')

Let's write a nested brackets checker:

>>> from funcparserlib.parser import forward_decl, maybe

>>> nested = forward_decl()
>>> nested.define(a('{') + maybe(nested) + a('}'))

Test it:

>>> nested.parse('{}')
('{', None, '}')

>>> nested.parse('{{}}')
('{', ('{', None, '}'), '}')

>>> nested.parse('{{}')
Traceback (most recent call last):
    ...
NoParseError: no tokens left in the stream: <EOF>

>>> nested.parse('{foo}')
Traceback (most recent call last):
    ...
NoParseError: got unexpected token: f

>>> nested.parse('{}foo')
('{', None, '}')

In the last test we have parsed only a valid subsequence. Let's ensure that all the input symbols have been parsed:

>>> from funcparserlib.parser import finished

>>> input = nested + finished

Test it:

>>> input.parse('{}foo')
Traceback (most recent call last):
    ...
NoParseError: should have reached <EOF>: f

Allow zero or more nested brackets:

>>> from funcparserlib.parser import many

>>> nested = forward_decl()
>>> nested.define(a('{') + many(nested) + a('}'))
>>> input = nested + finished

Test it:

>>> input.parse('{{}{}}')
('{', [('{', [], '}'), ('{', [], '}')], '}', None)

Skip None, the result of finished:

>>> from funcparserlib.parser import skip

>>> end_ = skip(finished)
>>> input = nested + end_

Test it:

>>> input.parse('{{}{}}')
('{', [('{', [], '}'), ('{', [], '}')], '}')

Textual Parse Tree

Objectify a parse tree:

>>> class Bracket(object):
...     def __init__(self, kids):
...         self.kids = kids
...     def __repr__(self):
...         return 'Bracket(%r)' % self.kids

>>> a_ = lambda x: skip(a(x))
>>> nested = forward_decl()
>>> nested.define(a_('{') + many(nested) + a_('}') >> Bracket)
>>> input = nested + end_

Test it:

>>> nested.parse('{{{}{}}{}}')
Bracket([Bracket([Bracket([]), Bracket([])]), Bracket([])])

Draw a textual parse tree:

>>> from funcparserlib.util import pretty_tree

>>> def ptree(t):
...     def kids(x):
...         if isinstance(x, Bracket):
...             return x.kids
...         else:
...             return []
...     def show(x):
...         if isinstance(x, Bracket):
...             return '{}'
...         else:
...             return repr(x)
...     return pretty_tree(t, kids, show)

Test it:

>>> print ptree(nested.parse('{{{}{}}{}}'))
{}
|-- {}
|   |-- {}
|   `-- {}
`-- {}

>>> print ptree(nested.parse('{{{{}}{}}{{}}{}{{}{}}}'))
{}
|-- {}
|   |-- {}
|   |   `-- {}
|   `-- {}
|-- {}
|   `-- {}
|-- {}
`-- {}
    |-- {}
    `-- {}

Nesting Level

Let's count the nesting level:

>>> def count(x):
...     return 1 if len(x) == 0 else max(x) + 1
>>> nested = forward_decl()
>>> nested.define(a_('{') + many(nested) + a_('}') >> count)
>>> input = nested + end_

Test it:

>>> input.parse('{}')
1
>>> input.parse('{{{}}}')
3
>>> input.parse('{{}{{{}}}}')
4
>>> input.parse('{{{}}{}}')
3