Skip to content
This repository has been archived by the owner on Aug 7, 2020. It is now read-only.

Commit

Permalink
Add Parse Process doc section
Browse files Browse the repository at this point in the history
  • Loading branch information
chrisjsewell committed Mar 11, 2020
1 parent a483a91 commit 6373e1e
Show file tree
Hide file tree
Showing 2 changed files with 74 additions and 41 deletions.
112 changes: 71 additions & 41 deletions docs/using/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,6 @@ import mistletoe

with open('foo.md', 'r') as fin:
rendered = mistletoe.markdown(fin)

```

{py:func}`mistletoe.markdown` defaults to
Expand Down Expand Up @@ -120,10 +119,11 @@ and some \textit{italics}

To exert even greater control over the parsing process,
renderers can be initialised with an existing {py:class}`~mistletoe.parse_context.ParseContext` instance.
This class stores global variables that are utilised during the parsing process, such as such as the block/span tokens to search for,
This class stores global variables that are utilised during the parsing process,
such as such as the block/span tokens to search for,
and link/footnote definitions that have been collected.
At any one time, one of these objects is set per thread;
set by {py:func}`~mistletoe.parse_context.set_parse_context` and
which can be changed by {py:func}`~mistletoe.parse_context.set_parse_context` and
retrieved by {py:func}`~mistletoe.parse_context.get_parse_context`.

In the following example, we use the {py:class}`~mistletoe.renderers.html.HTMLRenderer` to parse a file:
Expand Down Expand Up @@ -169,29 +169,22 @@ To parse the text only to the mistletoe AST, the general entry point is the {py:
(athough actually all block tokens have a ``read`` method that can be used directly).

```python
from mistletoe import Document

text = """
Here's some *text*
1. a list
> a *quote*"""
doc = Document.read(text)
doc
```

```python
>> from mistletoe import Document
>> text = """
.. Here's some *text*
..
.. 1. a list
..
.. > a *quote*"""
>> doc = Document.read(text)
>> doc
Document(children=3, link_definitions=0, footnotes=0, footref_order=0, front_matter=None)
```

All tokens have a `children` attribute:

```python
doc.children
```

```python
>> doc.children
[Paragraph(children=2, position=(2, 2)),
List(children=1, loose=False, start_at=1, position=(3, 4)),
Quote(children=1, position=(6, 6))]
Expand All @@ -201,11 +194,8 @@ or you can walk through the entire syntax tree, using the
{py:meth}`~mistletoe.base_elements.Token.walk` method:

```python
for item in doc.walk():
print(item)
```

```python
>> for item in doc.walk():
.. print(item)
WalkItem(node=Paragraph(children=2, position=(2, 2)), parent=Document(children=3, link_definitions=0, footnotes=0, footref_order=0, front_matter=None), index=0, depth=1)
WalkItem(node=List(children=1, loose=False, start_at=1, position=(3, 4)), parent=Document(children=3, link_definitions=0, footnotes=0, footref_order=0, front_matter=None), index=1, depth=1)
WalkItem(node=Quote(children=1, position=(6, 6)), parent=Document(children=3, link_definitions=0, footnotes=0, footref_order=0, front_matter=None), index=2, depth=1)
Expand All @@ -221,27 +211,67 @@ WalkItem(node=RawText(), parent=Paragraph(children=1, position=(4, 4)), index=0,
WalkItem(node=RawText(), parent=Emphasis(children=1), index=0, depth=4)
```

Finally you could even build your own AST programatically!
You could even build your own AST programatically!

```python
from mistletoe import block_tokens, span_tokens, HTMLRenderer

doc = block_tokens.Document(children=[
block_tokens.Paragraph(
position=(0, 1),
children=[
span_tokens.Emphasis(
position=(0, 1),
children=[span_tokens.RawText("hallo")]
)
])
])
HTMLRenderer().render(doc)
>> from mistletoe import block_tokens, span_tokens, HTMLRenderer
>> doc = block_tokens.Document(children=[
.. block_tokens.Paragraph(
.. position=(0, 1),
.. children=[
.. span_tokens.Emphasis(
.. position=(0, 1),
.. children=[span_tokens.RawText("hallo")]
.. )
.. ])
.. ])
>> HTMLRenderer().render(doc)
"<p><em>hallo</em></p>"
```

```html
<p><em>hallo</em></p>
### The Parse Process

At a lower level, the actual parsing process is split into two stages:

1. The full source text is read into an AST with all the span/inline level text stored
as raw text in {py:class}`~mistletoe.base_elements.SpanContainer`.
This allows all link definitions and (if included) footnote definitions to be read,
before references are processed.
2. We walk through this intermediary AST and 'expand' the `SpanContainer`
to produce all the span tokens; inspecting the global context for available definitions.

This process is illustrated in the following example, using the lower level parse method,
{py:func}`~mistletoe.block_tokenizer.tokenize_main`:

```python
>> from mistletoe.block_tokenizer import tokenize_main
>> paragraph = tokenize_main(["a [text][key]\n", "\n", '[key]: link "target"\n'], expand_spans=False)[0]
>> paragraph.children
SpanContainer('a [text][key]')
```

```python
>> from mistletoe.parse_context import get_parse_context
>> get_parse_context()
ParseContext(blocks=11,spans=9,link_defs=1,footnotes=0)
>> get_parse_context().link_definitions
{'key': ('link', 'target')}
```

```python
>> paragraph.children.expand()
[RawText(), Link(target='link', title='target')]
```

````{important}
If directly using {py:func}`~mistletoe.block_tokenizer.tokenize_main`,
you should (a) ensure all lines are terminated with `\n`, and
(b) ensure that the global context is reset (if you don't want to use previously read defintions):
```python
>> get_parse_context(reset=True)
```
````

(intro/performance)=

Expand Down
3 changes: 3 additions & 0 deletions mistletoe/base_elements.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,9 @@ def __iter__(self):
def __len__(self):
return 0

def __repr__(self):
return "{0}({1})".format(self.__class__.__name__, repr(self.text))


class SourceLines:
"""A class for storing source lines and tracking current line index.
Expand Down

0 comments on commit 6373e1e

Please sign in to comment.