Skip to content

Latest commit

 

History

History
79 lines (56 loc) · 2.85 KB

parser.md

File metadata and controls

79 lines (56 loc) · 2.85 KB

Parsing Rules

Parsing is the first step of converting markdown to HTML. The parsing rules are responsible for separating markdown syntax from plain text. The parsers scan the markdown content and use various rules to produce a list of tokens.

Each token represents either a piece of markdown syntax (for example, the begining of a fenced code block, a list item, etc.) or plain text that will be included as-is (escaped) in the final HTML text. The tokens will be later be used by the Renderer to produce actual HTML.

There are three kind of parsing rules in Remarkable:

  1. Core rules
  2. Block rules
  3. Inline rules

Each uses different datastructures and signatures. Unless you wish to modify the internal workflow of Remarkable, you will most probably only deal with Block and Inline rules.

Parsing tokens

Tokens comes in three kinds:

  1. Tag token
  2. Content (text) token
  3. Block content (inline) tokens

All tokens have the following properties:

  • type: The type of the token
  • level: The nesting level of the associated markdown structure in the source.

Tokens generated by block parsing rules will also include a lines property which is a 2 elements array marking the first and last line of the src used to generate the token.

Parsing rules will usually generates at least three tokens:

  1. The start or open token marking the beginning of the markdown structure
  2. The content token (usually with type inline for a block rule, or text for an inline rule)
  3. The end or close token makring the end of the markdown structure

Tag token

Tag tokens are used to represent markdown syntax. Each tag token represents a special markdown syntax in the original markdown source. They are usually used for the open and close tokens. For example the "```" at the begining of a fenced block code, the start of an item list or the end of a emphasized part of a line.

Tag tokens have a variety of types and each is associated to a rendering rule.

Content (text) token

Text tokens represent plain text. It is usually used for the content of inline structures. Most of them will be generated automatically by the inline parser. They are also sometimes generated explicitly by the inline parsing rules.

A text token has a content property containing the text it represents.

Block content (inline) tokens

Inline tokens represent the content of a block structure. These tokens have two additional properties:

  • content: The content of the block. This might include inline mardown syntax which may need further processing by the inline rules.
  • children: This is initialized with an empty array ([]) and will be filled with the inline parser tokens as the inline parsing rules are applied.