The `pidgin.tangle` module allows __Markdown__ as source code.  __Tangling__ is the first phase of a literate 
computing REPL.  The __tangle__ step converts documentation forward __Markdown__ to __Python__ code 
based on a set of rules defined on __Markdown__ block code objects.  `pidgin.tangle` only relies on a
the `mistune.BlockLexer` step to transform code.

This module exports a magic to experiments with the output syntax.  _The output could probably be cooler._

A few configurable variables are placed on the `IPython` shell.

### The `IPython` tangle step

The `IPython` tangle phase is broken down below.  An input string is passed through `string` and `ast.NodeTransformer` 
methods to `compile` valid __Python__ byte code.

1. `ip.input_transformer_manager.transform_cell` should return valid python code.  There are two steps:

    1. `ip.input_transformers_cleanup` modifies the source input; `pidgin.tangle.markdown_to_python` is the first transformer visited.
    0. `ip.input_transformers_post` should return a valid __Python__ source string.
0. `ip.compile.ast_parse` converts the string to `ast`
0. `ip.ast_transformers` are `ast.NodeTransformer`s applied to each node in the `ast`.

    `pidgin.tangle` adds the ability reference the `return` statement outside of a function ala the [`%autoawait` magic][autoawait].
0. `ip.compile` convert the `ast` notes to `compile` bytecode.

[_More information about the input transforms can be found on the `IPython` documentation._][input transforms]

[autoawait]: https://ipython.readthedocs.io/en/stable/interactive/autoawait.html
[input transforms]: ip.input_transformers_cleanup

    from IPython import get_ipython; ip = get_ipython()
    %reload_ext pidgin.tangle

`import pidgin.tangle` modifies the `get_ipython().input_transformer_manager` to accept __Markdown__ source.  The `pidgin.tangle` module
exports:
    
* `pidgin.tangle.markdown_to_python` - is a semi-lossless __Markdown__ to __Python__ converter.


In [1]:
    import doctest, re, ast, mistune, textwrap, traitlets, functools, itertools, IPython, importnb, fnmatch, string
    __all__ = 'markdown_to_python',

In [2]:
    class PidginTangle(traitlets.config.SingletonConfigurable): 
        """`PidginTangle` modifies the __tangle__ phase of the cell execution."""

        markdown = traitlets.Bool(True, 
                                  help="""Convert __Markdown__ source to Python & execute the block code.""")
    
        return_display = traitlets.Bool(True, help=
    """`PidginTangle.returns` enables <code>return</code> outside of a function scope ala async or python 2.7 print statements.  
    _This approach makes it easier to mature cells to functions._""")
        
        display = traitlets.Bool(True, help="""When set to `False` the cell output is captured.""")
        
    config = PidginTangle()

## Exports

In [3]:
    class PidginBlockGrammar(mistune.BlockGrammar):
        doctest = doctest.DocTestParser._EXAMPLE_RE
        block_html = re.compile(
            r'^ *(?:%s|%s|%s) *(?:\n{2,}|\s*$)' % (
                r'<!--[\s\S]*?-->|<!DOCTYPE [\s\S]*?>|<\?[\s\S]*?\?>',
                r'<(%s)((?:%s)*?)>([\s\S]*?)<\/\1>' % (mistune._block_tag, mistune._valid_attr),
                r'<%s(?:%s)*?\s*\/?>' % (mistune._block_tag, mistune._valid_attr)))       


In [4]:
    class PidginBlockLexer(mistune.BlockLexer): 
        grammar_class = PidginBlockGrammar        
        @staticmethod
        def to_string(m): return m.string[slice(*m.span())].splitlines(True)

        def parse_doctest(self, m):
            self.tokens.append({'type': 'doctest', 'text': m.string[slice(*m.span())]})
            
        def parse(self, text, rules=None):
            # It is common to have leading tabs in the source which can bork the markdown parser
            # before parsing strip the blank lines.
            return super().parse('\n'.join(str if str.strip() else '' for str in ''.join(text).splitlines()))
        
    PidginBlockLexer.default_rules = list(PidginBlockLexer.default_rules)
    PidginBlockLexer.default_rules.insert(
        PidginBlockLexer.default_rules.index('block_quote'), 'doctest'
    )
    PidginBlockLexer.footnote_rules = list(PidginBlockLexer.footnote_rules)
    PidginBlockLexer.footnote_rules.insert(
        PidginBlockLexer.footnote_rules.index('block_quote'), 'doctest'
    )
    PidginBlockLexer.list_rules = list(PidginBlockLexer.list_rules)
    PidginBlockLexer.list_rules.insert(
        PidginBlockLexer.list_rules.index('block_quote'), 'doctest'
    )

In [5]:
    class TanglePidginBlockLexer(PidginBlockLexer):
        def __init__(self, *args, **kwargs):
            super().__init__(*args, **kwargs)
            self.original = None
            self.raw, self.unindented, self.indented = [], [], []
            self.min_indent = 0
            
        def parse(self, text, rules=None): 
            
            if any((self.original, self.raw, self.unindented, self.indented)): 
                return super().parse(text, rules)
            
            self.original = ''.join(text).splitlines(True)
            tokens = super().parse(text, rules)
            
            while self.original:  self.raw.append(self.original.pop(0))
            self.min_indent = self.min_indent or 4
            self.format(punc=';')
            self.indent()
            final = ''.join(self.indented)
            return final
            
        def clear_original(self):
            while self.original and not self.original[0].strip(): 
                self.raw += [self.original.pop(0)]
                
        def pop_text(self, m)->str:
            text = []
            lines = self.to_string(m)
            # Remove empty lines
            while lines and not lines[0].strip(): lines.pop(0)
            
            # Drop the lines from the original body
            if self.original:
                while lines:
                    line = lines.pop(0)
                    if line.strip():
                        while line.strip() not in self.original[0]: 
                            text += [self.original.pop(0)]
                        text += [self.original.pop(0)]
            return text
        
        def parse_generic(self, m): 
            self.raw.extend(self.pop_text(m))
            
        parse_doctest = parse_block_html = parse_block_quote = parse_fences =\
        parse_heading = parse_hrule = parse_lheading = parse_newline =\
        parse_nptable = parse_paragraph = parse_table = parse_text =  parse_generic
        
        def parse_block_code(self, m=None):
            # The body goes about the code, the buffer is non-code.
            self.format()
            self.raw.extend(m and self.pop_text(m) or [])
            m and super().parse_block_code(m)
            self.indent()
            
        def format(self, punc=''):
            self.clear_original()            
            if self.raw: 
                last_line = get_first_line(reversed(self.indented)).rstrip()
                lines = [line for line in self.raw]
                if not last_line.endswith(('"""', "'''")):
                    lines = [quote(lines, punc)]
                self.unindented.extend(lines)
                self.raw = []
                        
            
        def indent(self):
            # Extract the first line of the current code block.
            first_line = get_first_line(self.raw)
            # Construct the code we'll 
            code = ''.join(self.raw)                            
            body = ''.join(self.unindented)            
            
            # The previous last line append
            last_line = get_first_line(reversed(self.indented))
            
            # The current indent level so far.
            prior_indent = get_line_indent(last_line)

            # Does the last line enter a block statement
            definition = last_line.rstrip().endswith(':')
            returns = _has_return(self.indented)

            this_indent = get_line_indent(get_first_line(self.raw))
            
            # Assign the minimum indent 
            if not self.min_indent: 
                self.min_indent = this_indent
            
            if this_indent < self.min_indent:
                code = textwrap.indent(code, ' '*(self.min_indent-this_indent))
                this_indent = get_line_indent(get_first_line(code.splitlines()))

                
            # Normalize the indent we'll assign the body+code
            indent = max(self.min_indent, (returns and min or max)(prior_indent, this_indent))        
            
            if definition:
                if prior_indent >= indent:
                    indent = (prior_indent + 4)
                
                body = hanging_indent(textwrap.indent(body, ' '*self.min_indent), ' '*(indent-self.min_indent))
            else:
                body = textwrap.indent(body, ' '*indent)
                
            # Cell Magics
            if code.lstrip().startswith('%%'):
                # Cell magics can be split across __Markdown__ blocks.  With this 
                # approach conditional blocks can be used with magics.
                code = (' '*this_indent) + importnb.loader.dedent(code)
                # might have to add lines if line sized changed.
                
            if self.min_indent:
                self.indented.extend(body.splitlines(True) + code.splitlines(True))
                self.unindented = []
            self.raw = []
            return ''       

        def parse_def_footnotes(self, m):
            self.format()
            text = ''.join(self.pop_text(m))
            key, sep, body = text.lstrip('[').lstrip('^').partition(']:')
            key = quote(key)
            quoted = quote(body.lstrip())
            self.unindented.extend(F"""globals()[{key}] = {quoted if quoted.strip() else "None"}""".splitlines(True))
            
        def parse_def_links(self, m):
            self.parse_block_code()
            text = ''.join(self.pop_text(m))
            key, sep, body = text.lstrip('[').lstrip('^').partition(']:')
            key = quote(key)

            self.unindented.extend(F"""globals()[{key}] = globals().get({key}, {quote(body.lstrip(), ')')}""".splitlines(True))

`markdown_to_python` converts __Markdown__ source to python code with the following opinions on markdown block elements defined by `mistune`.

* `mistune.BlockGrammar.block_code` is executed as not python, and guide the indenting of non-block code.
* `mistune.BlockGrammar.def_footnotes and mistune.BlockGrammar.def_links` define their names in the `globals` to optimize the reusability of links and references across documents.
* All other blocks are wrapped in block strings and indented relative the `mistune.BlockGrammar.block_code`.

The <code>%%markdown_to_python</code> cell magic may be used to 

In [6]:
    def markdown_to_python(str)->"Valid Python Source":         
        return TanglePidginBlockLexer().parse(str)

    @functools.wraps(markdown_to_python)
    def markdown_to_python_magic_wrapper(line, cell) -> IPython.display.DisplayObject: return IPython.display.Pretty(markdown_to_python(cell))

`pidgin_transformer` is the function is a `ip.input_transformers_cleanup` item that is placed at the beginning of the `list`.

In [7]:
    def pidgin_transformer(lines: "that end with a newline."): 
        lines = lines + ['\n']
        ip = IPython.get_ipython()
        if not ip.tangle.display:
            """# Not functional yet.  Please help"""
        if lines[0].startswith('%%'):
            """Don't do anything if the cell starts with an immediate cell magic.
            Indented cell magic are transformed on a block level."""
        elif ip.tangle.markdown: 
            lines = markdown_to_python(lines).splitlines(1)
        return lines

## The <code>return</code> statement

Use `return` outside of a function or class as a display expression.

This opinion brings interactive code and portable code closer together.

The `return` statement has a similar feel the python 2.7 `print` statement.  It is used to should the rich display of its value.

In [8]:
    import ast, IPython
    class ReturnDisplay(ast.NodeTransformer):
        def visit_FunctionDef(self, node): return node
        
        def visit_Return(self, node): 
            if IPython.get_ipython().tangle.return_display:
                return ast.Expr(
                    ast.Call(
                        func=ast.parse('__import__("IPython").display.display', mode='eval').body, 
                        args=node.value.elts if isinstance(node.value, ast.Tuple) else [node.value], 
                        keywords=[]))
            return node

## Utitility functions

`quote` wrotes non code objects in triple ticks.

In [9]:
    def quote(str, punc=''):
        str, leading_ws = ''.join(str), []
        lines = str.splitlines(True)
        _ = '"""'
        if _ in str: _ = "'''"
        if not str.strip(): _ = punc = ''
        while lines and (not lines[0].strip()): leading_ws.append(lines.pop(0))    
        str = ''.join(lines)
        end = len(str.rstrip())
        str, ending_ws = str[:end], str[end:]
        if str and str.endswith(_[0]): str += ' '                    
        return F"{''.join(leading_ws)}{_}{str}{_}{punc}{ending_ws}"

`get_first_line` get the first non-`iter`able strings in `lines`

In [10]:
    def get_first_line(lines, line=''):
        for line in lines or ['']: 
            if line.strip(): break
        return line

`get_line_indent` computes the indent of a string.

In [11]:
    def get_line_indent(line):  return len(line) - len(line.lstrip())

The __Lexer__ s only consider coarse features of the markdown spec.  

In [12]:
    def _has_return(code):
        code = '\n'.join(code)
        if 'return ' not in code: return False
        code = importnb.loader.dedent(code)
        try:
            node = ast.parse(code)
            while hasattr(node, 'body'): node = node.body[-1]
            return isinstance(node, ast.Return)
        except: ...  

In [13]:
    def hanging_indent(str, indent):
        out = """"""
        for line in str.splitlines(True):
            if not line.strip(): 
                out += line
            else:
                if out.strip(): out += line
                else: out += indent+line
        return out

In [14]:
    def load_ipython_extension(ip=None):
        ip = ip or IPython.get_ipython()
        ip.tangle = config.instance()
        IPython.core.magic.register_cell_magic(markdown_to_python_magic_wrapper)
        ip.input_transformer_manager.cleanup_transforms = [pidgin_transformer] + [
            object for object in ip.input_transformer_manager.cleanup_transforms
            if object not in {pidgin_transformer, IPython.core.inputtransformer2.classic_prompt}
        ]
        ip.ast_transformers.append(ReturnDisplay())

    def unload_ipython_extension(ip=None):
        ip = ip or IPython.get_ipython()
        try: delattr(ip, 'tangle')
        except: ...
        ip.input_transformer_manager.cleanup_transforms = [
            object for object in ip.input_transformer_manager.cleanup_transforms
            if object is not pidgin_transformer]
        ip.ast_transformers = [x for x in ip.ast_transformers if not isinstance(x, ReturnDisplay)]

    if __name__ == '__main__': 
        load_ipython_extension()
        ip = get_ipython()