Skip to content

Using a custom parser

Adam Brunnmeier edited this page Jan 18, 2016 · 4 revisions

Parser Generator

See for example the waxeye parser-generator. It allows to define a Parsing Expression Grammar. An example for implementing a part of Markdown could look like following:

Document <- *(:'\n' | Block )

### Block Elements

Block <= Header | Blockquote | Codeblock | Linknote | List | Table | Paragraph | Invalid_Block

Header <= Header_L2 | Header_L1
Header_L2 <- :'##' :?' ' *(!'\n' .)
Header_L1 <- :'#' :?' ' *(!'\n' .)

Blockquote <- +( :'>' *(!'\n' Unparsed) (&'\n' Unparsed) )

Codeblock <- :'```' Codelanguage :'\n' *(!'```' .) :'```'
Codelanguage <- *(!'\n' .)

Linknote <- :'[' Link_Text :']: ' Link_Url *:[ \t]

List <- +( :'- ' List_Item :'\n' )
List_Item <- +(!'\n' Span)

Paragraph <- +( +(!'\n' (Newline|Span)) ?'\n' )
Newline <- :'  \n'

Table <- Table_Header Table_Body
Table_Header <- +(:'|' Table_Header_Item) :'\n'
Table_Header_Item <- *(!('|'|'\n') Span)
Table_Body <- *Table_Body_Row
Table_Body_Row <- !'\n' :?'|' Table_Body_Row_Item *(:'|' Table_Body_Row_Item) :'\n'
Table_Body_Row_Item <- *(!('|'|'\n') Span)

Invalid_Block <- +( +(!'\n' .) '\n')

### Span Elements

Span <= Link | Emphasis | Code | Image | .

Link <= Link_Inline | Link_Reference | Link_Auto
Link_Inline <- :'[' Link_Text :'](' Link_Url :')'
Link_Reference <- :'[' Link_Text :'][]'
Link_Auto <- :'<' +(!'>' .) :'>'
Link_Text <- *(!']' Span)
Link_Url <- *(![) \n] .)

Emphasis <= Emphasis_Bold  | Emphasis_Italic
Emphasis_Bold <- :'**' +(!'**' Span) :'**'
Emphasis_Italic <- :'*' +(!'*' Span) :'*'

Code <- :'`' +(!'`' .) :'`'

Image <- :'![' Image_Alt :'](' Image_Url :')'
Image_Alt <- +(!']' Span)
Image_Url <- +(!')' .)

### Unparsed for multi-pass

# I dont know how to parse nested block quotes, they can be resolved in some iterations
Unparsed <- .

The parsed tree can be printed to HTML with e.g. a tree traversing function.

#Stuff, things etc...