In this notebook we convert Markdown to Python code using `mistletoe`.  Our requirements are:

* All Markdown of the markdown is compiled Python.
    * We'll use strings not comments.
* Each line in the Python should correpsond with the Markdown to give acceptable tracebacks.
* We use Markdown ast as the source.
* We acheive our goal by wrapping Markdown in quotes and indenting appropriately.

> Outstanding issue: Code in Lists.

In [704]:
import mistletoe, functools
__all__ = 'markdown_to_python',

* `mistletoe.block_tokenizer.FileWrapper` counts lines for us, in the _index attribute.
* `mistletoe.block_tokenizer.ParseBuffer` creates tokens from lines.
* All we care about are splitting `(mistletoe.block_token.BlockCode,  mistletoe.block_token.Paragraph)` with special conditions for `mistletoe.block_token.List`.


In [635]:
def tokenize_block(str):
    
    lines, parse_buffer = mistletoe.block_tokenizer.FileWrapper(str), mistletoe.block_tokenizer.ParseBuffer()
    line, start = lines.peek(), lines._index
    
    block_code_cls  = BlockCode.new()
    
    while line is not None:
        
        for token_type in (mistletoe.block_token.BlockCode,  mistletoe.block_token.Paragraph):
            if issubclass(token_type, mistletoe.block_token.BlockCode): token_type = block_code_cls
            if token_type.start(line):
                result = token_type.read(lines)
                if result is not None:
                    end = lines._index+1
                    while result and not result[-1].strip(): end -= 1; result.pop()
                    while result and not result[0].strip(): start += 1; result.pop(0)
                    parse_buffer.append((token_type, result, start, end))
                    start = end
                    break
        else: 
            next(lines)
            parse_buffer.loose = True
        line = lines.peek()
    return parse_buffer

In [669]:
    class BlockCode(mistletoe.block_token.BlockCode):
        first_code_indent = None
        @classmethod
        def start(cls, line):
            if line.replace('\t', '    ', 1).startswith(' '*(cls.first_code_indent or 4)):
                if line.strip() and cls.first_code_indent == None:  cls.first_code_indent = len(line)-len(line.lstrip())
                return True
        
        @classmethod
        def read(cls, lines):
            line_buffer = []
            for line in lines:
                if line.strip() and cls.first_code_indent == None: 
                    cls.first_code_indent = len(line)-len(line.lstrip())   
                if line.strip() == '': line_buffer.append(''); continue
                if not line.replace('\t', '    ', 1).startswith(' '*(cls.first_code_indent or 4)): lines.backstep(); break
                line_buffer.append(line)
            return line_buffer
        
        @classmethod
        def new(cls): return type(cls.__name__, (cls,), {'first_code_indent': None})

`markdown_to_python` converts a string to a valid Python source string.  It satisfies the requirements laid out beforehand

* `mistletoe.block_token.BlockCode` triggers most of the logic so we can know  about the prior source and new source.
* `mistletoe.block_token.Paragraph` wraps the paragraph in quotes.

> The Lists don't work yet.

In [671]:
def quote(str, punc=''):
    str, leading_ws = ''.join(str), []
    lines = str.splitlines(True)
    _ = '"""'
    if _ in str: _ = "'''"
    if not str.strip(): _ = punc = ''
    while lines and not lines[0]: leading_ws.append(lines.pop())
    str = ''.join(lines)
    end = len(str.rstrip())
    str, ending_ws = str[:end], str[end:]
    if str and str.endswith(_[0]): str += ' '                    
    return F"{''.join(leading_ws)}{_}{str}{_}{punc}{ending_ws}"

In [677]:
def markdown_to_python(object):
    final, buffer = [], []
    object = object.splitlines()
    indent = 0
    for t, lines, start, end in tokenize_block(object):
        if issubclass(t, mistletoe.block_token.BlockCode):
            if final:
                last_line = get_last_line(reversed(final))
                prior_indent = get_line_indent(last_line)
                definition, returns =  last_line.rstrip().endswith(':'), last_line.lstrip().startswith('return')
                indent = (returns and min or max)(prior_indent, get_line_indent(get_last_line(object[start:end])))
                if definition and prior_indent == indent: indent += 4
            else:  indent = get_line_indent(get_last_line(object[start:end])) 

            while (len(final)+len(buffer)) < start: buffer += ['']
                
            start_quote = len(final)
            while start_quote<start:
                if object[start_quote].strip(): break
                start_quote +=1
                
            final.extend(['']*(start_quote-len(final))
                         + textwrap.indent(quote('\n'.join(object[start_quote:start])), ' '*indent).splitlines()
                         + object[start:end]) 
    else: 
        for last_line in reversed(object[len(final):]):
            if last_line.strip(): break
        
        final.extend(
            (
                textwrap.indent(quote('\n'.join(object[len(final):end])).rstrip() + ['',';'][last_line.rstrip().endswith(';')], ' '*indent)
            ).splitlines()) 
    return textwrap.dedent('\n'.join(final))


### Helpers

`get_last_line` returns the last non empty line in a list of lines

In [678]:
def get_last_line(lines, line=''):
    for line in lines or ['']: 
        if line.strip(): break
    return line

`get_line_indent` returns the indent of a line.

In [679]:
def get_line_indent(line):  return len(line) - len(line.lstrip())

In [703]:
def cmp(a, b):
    for A, B in zip(*map(str.splitlines, (a, b))):
        if A.strip().strip('"').strip("'").strip() == B.strip().strip('"').strip("'").strip(): continue
        return False
    return True

s="""    
        ...
                    
        def f():
Test is a test

            print(100)
            return Stuff
* Continuing along

        assert True, \\
* This is stuff


"""
assert cmp(s, markdown_to_python(s)), """The converter is wrong"""