converts markdown into a Python parse tree
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
md2py cleaned up parseBranches Jan 10, 2016
tests fixed test seed Jan 10, 2016
.gitignore Initial commit Jan 7, 2016
LICENSE Initial commit Jan 7, 2016 added link to tex2py Apr 2, 2016 initial working version Jan 8, 2016

Markdown2Python (md2py)

md2py converts markdown into a Python parse tree. This allows you to navigate a markdown file as a document structure.

See tex2py for a LaTeX parse tree.


Markdown2Python offers only one function md2py, which generates a Python object from markdown text. This object is a navigable, "Tree of Contents" abstraction for the markdown file.

Take, for example, the following markdown file.

# Chikin Tales

## Chapter 1 : Chikin Fly

Chickens don't fly. They do only the following:

- waddle
- plop

### Waddling

## Chapter 2 : Chikin Scream

### Plopping

Plopping involves three steps:

1. squawk
2. plop
3. repeat, unless ordered to squat

### I Scream

Akin to a navigation bar, the TreeOfContents object allows you to expand a markdown file one level at a time. Running md2py on the above markdown file will generate a tree, abstracting the below structure.

          Chikin Tales
          /           \
    Chapter 1       Chapter 2
      /               /     \
  Waddling      Plopping    I Scream

At the global level, we can access the title.

>>> toc = md2py(markdown)
>>> toc.h1
Chikin Tales
>>> str(toc.h1)
'Chikin Tales'

Notice that at this level, there are no h2s.

>>> list(toc.h2s)

The main h1 has two h2s beneath it. We can access both.

>>> list(toc.h1.h2s)
[Chapter 1 : Chikin Fly, Chapter 2 : Chikin Scream]
>>> toc.h1.h2
Chapter 1 : Chikin Fly

In total, there are 3 h3s in this document. However, only 1 h3 is actually nested within 'Chapter 1 : Chikin Fly' (accessible via toc.h1.h2). As a result, toc.h1.h2.h3s will only show one h3s.

>>> list(toc.h1.h2.h3s)

The TreeOfContents class also has a few more conveniences defined. Among them is support for indexing. To access the ith child of an <element> - instead of <element>.branches[i] - use <element>[i].

See below for example usage.

>>> toc.h1.branches[0] == toc.h1[0] == toc.h1.h2
>>> list(toc.h1.h2s)[1] == toc.h1[1]
>>> toc.h1[1]
Chapter 2 : Chikin Scream
>>> list(toc.h1[1].h3s)
[Plopping, I Scream]
>>> list(map(str, toc.h1[1].h3s))
['Plopping', 'I Scream']


Install via pip.

pip install md2py

Additional Notes

  • Behind the scenes, the md2py uses BeautifulSoup. All md2py objects have a source attribute containing a BeautifulSoup object.