`pidgin.tangle` is an `IPython` extension that permits __Markdown__ as valid source code; the source is transpile in valid `IPython` flavored __Python__.

>>> %reload_ext pidgin
>>> shell = IPython.get_ipython()
>>> result = shell.run_cell("""
...     foo = 42""")
>>> assert not any((result.error_before_exec, result.error_in_exec))

In [1]:
    if __name__ == '__main__':
        %reload_ext pidgin
        import pidgin, IPython, deathbeds
        shell = get_ipython()
        get_ipython().Completer.use_jedi=False
        shell.display_formatter.html = True

The first `pidgin` idiom is that is code input is written in the __Markdown__ markup language requiring that explicit code statements and expression.  `pidgin.tangle` - the first execution as literate source code - transpiles __Markdown__ into valid __Python__ code.  The transformation is line-for-line to provide a usable tracebacks to the author.

In [2]:
    def markdown_to_python(str: "Markdown Source") -> "Valid Python Source": 
        return TangleBlockLexer().parse(''.join(str))

In [3]:
    import traitlets, typing as t
    
    class PidginTangle(traitlets.config.SingletonConfigurable): 
        """`PidginTangle` modifies the __tangle__ phase of the cell execution."""
        markdown = traitlets.Bool(True, help="""Convert __Markdown__ source to Python & execute the block code.""")
        def __call__(self, lines: t.List[str]) -> t.List[str]:
            if self.markdown: 
                lines = markdown_to_python(lines + ['\n']).splitlines(True)
            return lines

`pidgin.tangle` uses `shell.input_transformer_manager` to transform text into input that can `compile` to __Python__ source to `exec` or `eval` the users statements respectively.  `IPython` provides two transformations steps.

`pidgin` must be injected first, after the `pidgin.tangle.markdown_to_python` conversion the source is valid python.


[transformers]: https://ipython.readthedocs.io/en/stable/config/inputtransforms.html

In [4]:
    def load_ipython_extension(shell):
        shell.input_transformers_cleanup = [PidginTangle().instance()] + [
            object for object in shell.input_transformers_cleanup
            if not((object == IPython.core.inputtransformer2.classic_prompt) or isinstance(object, PidginTangle))]
        
        try: IPython.core.magic.register_cell_magic(markdown_to_python_magic_wrapper)
        except: ip.log.error("Unable to load the pidgin.tangle cell magic.")

In [5]:
    def load_ipython_extension(shell):
        shell.input_transformer_manager.cleanup_transforms = [PidginTangle().instance()] + [
            object for object in shell.input_transformer_manager.cleanup_transforms
            if not(
                (object == IPython.core.inputtransformer2.classic_prompt) or isinstance(object, PidginTangle)
            )
        ]
        try: IPython.core.magic.register_cell_magic(markdown_to_python_magic_wrapper)
        except: ip.log.error("Unable to load the pidgin.tangle cell magic.")

        shell.ast_transformers.append(ReturnYieldDisplay())
            

    load_ipython_extension(get_ipython())
    unload_ipython_extension(get_ipython())

weaving always follows tangling.  tangling can exist without weaving.  this is explicit code.

Tangle `exec`s statements and `eval`s expressions.  The normal IPython kernel has a weaving step in.

ast.Interactive, ast.Module exected in ip.run_ast_nodes.

## `pidgin.tangle` controls the first execution of the literate source

`pidgin.tangle` modifies configurable `interactive_shell=\` objects; `IPython.get_ipython()`.

### The `interactive_shell = IPython.get_ipython()`


1. `interactive_shell.kernel.do_execute` is trigger when a cell in the notebook
    1. `interactive_shell.run_cell`
        1. `interactive_shell.transform_cell` `pidgin.tangle`
        2. `interactive_shell.compile.ast_parse`
        3. `interactive_shell.transform_ast` `pidgin.tangle`
        4. `interactive_shell.run_ast_nodes`
            5. `interactive_shell.display_pub` `interactive_shell.ast_node_interactivity`

`markdown_to_python` converts __Markdown__ source to python code with the following opinions on markdown block elements defined by `mistune`.

* `mistune.BlockGrammar.block_code` is executed as not python, and guide the indenting of non-block code.
* `mistune.BlockGrammar.def_footnotes and mistune.BlockGrammar.def_links` define their names in the `globals` to optimize the reusability of links and references across documents.
* All other blocks are wrapped in block strings and indented relative the `mistune.BlockGrammar.block_code`.

The <code>%%markdown_to_python</code> cell magic may be used to 

In [3]:
    import doctest, re, ast, mistune, textwrap, traitlets, functools, itertools, IPython, importnb, fnmatch, string, importnb, importlib, IPython, textwrap, typing as t
    __all__ = 'markdown_to_python',

In [4]:
    def markdown_to_python(str: "Markdown Source") -> "Valid Python Source": 
        return TangleBlockLexer().parse(''.join(str))

## Exports

Documentation and testing.

In [6]:
    class DocTestGrammar:
        doctest = doctest.DocTestParser._EXAMPLE_RE

In [7]:
    class HtmlGrammar:        
        block_html = re.compile(
            r'^ *(?:%s|%s|%s) *(?:\n{2,}|\s*$)' % (
                r'<!--[\s\S]*?-->|<!DOCTYPE [\s\S]*?>|<\?[\s\S]*?\?>',
                r'<(%s)((?:%s)*?)>([\s\S]*?)<\/\1>' % (mistune._block_tag, mistune._valid_attr),
                r'<%s(?:%s)*?\s*\/?>' % (mistune._block_tag, mistune._valid_attr)))       

In [8]:
    class BlockGrammar(DocTestGrammar, HtmlGrammar, mistune.BlockGrammar): ...

In [9]:
    class DoctestLexer:
        def parse_doctest(self, m): self.tokens.append({'type': 'doctest', 'text': m.string[slice(*m.span())]})

In [10]:
    class BlockLexer(DoctestLexer, mistune.BlockLexer): 
        grammar_class = BlockGrammar        
        def parse(self, text: str, rules=None) -> t.List[t.Dict]:
            # It is common to have leading tabs in the source which can bork the markdown parser
            # before parsing strip the blank lines.
            return super().parse('\n'.join(str if str.strip() else '' for str in ''.join(text).splitlines()))
        
    for key in "default_rules footnote_rules list_rules".split():
        setattr(BlockLexer, key, list(BlockLexer.default_rules))
        getattr(BlockLexer, key).insert(getattr(BlockLexer, key).index('block_quote'), 'doctest')

In [2]:
    class TangleBlockLexer(BlockLexer):
        raw = unindented = indented = None
        def __init__(self, *args, **kwargs):
            super().__init__(*args, **kwargs)
            self.original = None
            self.raw, self.unindented, self.indented = [], [], []
            self.min_indent = 0
            
        def parse(self, text: str, rules=None) -> t.List[t.Dict]: 
            if text.startswith('%%'): return text
            
            if any((self.original, self.raw, self.unindented, self.indented)): 
                return super().parse(text, rules)
            
            self.original = ''.join(text).splitlines(True)
            
            super().parse(text, rules) # returns tokens but we dont need them.
            
            while self.original:  self.raw.append(self.original.pop(0))
            self.min_indent = self.min_indent or 4
            self.format(punc=';')
            self.indent()
            final = ''.join(self.indented)
            return final
            
        def clear_original(self) -> None:
            while self.original and not self.original[0].strip(): 
                self.raw += [self.original.pop(0)]
                
        def pop_text(self, m: re._pattern_type) -> str:
            text = []
            lines = m.string[slice(*m.span())].splitlines(True)
            # Remove empty lines
            while lines and not lines[0].strip(): lines.pop(0)
            
            # Drop the lines from the original body
            if self.original:
                while lines:
                    line = lines.pop(0)
                    if line.strip():
                        while line.strip() not in self.original[0]: 
                            text += [self.original.pop(0)]
                        text += [self.original.pop(0)]
            return text
        
        def parse_generic(self, m: re._pattern_type) -> str: 
            self.raw.extend(self.pop_text(m))
            
        parse_doctest = parse_block_html = parse_block_quote = parse_fences =\
        parse_heading = parse_hrule = parse_lheading = parse_newline =\
        parse_nptable = parse_paragraph = parse_table = parse_text =  parse_generic
        
        def parse_block_code(self, m=None):
            # The body goes about the code, the buffer is non-code.
            self.format()
            self.raw.extend(m and self.pop_text(m) or [])
            m and super().parse_block_code(m)
            self.indent()
            
        def format(self, punc=''):
            self.clear_original()            
            if self.raw: 
                last_line = get_first_line(reversed(self.indented)).rstrip()
                lines = [line for line in self.raw]
                if not last_line.endswith(('"""', "'''")):
                    lines = [quote(lines, punc)]
                self.unindented.extend(lines)
                self.raw = []
                        
            
        def indent(self):
            # Extract the first line of the current code block.
            first_line = get_first_line(self.raw)
            # Construct the code we'll 
            code = ''.join(self.raw)                            
            body = ''.join(self.unindented)            
            
            # The previous last line append
            last_line = get_first_line(reversed(self.indented))
            
            # The current indent level so far.
            prior_indent = get_line_indent(last_line)

            # Does the last line enter a block statement
            definition = last_line.rstrip().endswith(':')
            returns = _has_return(self.indented)

            this_indent = get_line_indent(get_first_line(self.raw))
            
            # Assign the minimum indent 
            if not self.min_indent: 
                self.min_indent = this_indent
            
            if this_indent < self.min_indent:
                code = textwrap.indent(code, ' '*(self.min_indent-this_indent))
                this_indent = get_line_indent(get_first_line(code.splitlines()))

                
            # Normalize the indent we'll assign the body+code
            indent = max(self.min_indent, (returns and min or max)(prior_indent, this_indent))        
            
            if definition:
                if prior_indent >= indent:
                    indent = (prior_indent + 4)
                
                body = hanging_indent(textwrap.indent(body, ' '*self.min_indent), ' '*(indent-self.min_indent))
            else:
                body = textwrap.indent(body, ' '*indent)
                
            # Cell Magics
            if code.lstrip().startswith('%%'):
                # Cell magics can be split across __Markdown__ blocks.  With this 
                # approach conditional blocks can be used with magics.
                code = (' '*this_indent) + importnb.loader.dedent(code)
                # might have to add lines if line sized changed.
                
            if self.min_indent:
                self.indented.extend(body.splitlines(True) + code.splitlines(True))
                self.unindented = []
            self.raw = []
            return ''       

        def parse_def_footnotes(self, m):
            self.format()
            text = ''.join(self.pop_text(m))
            key, sep, body = text.lstrip('[').lstrip('^').partition(']:')
            key = quote(key)
            quoted = quote(body.lstrip())
            self.unindented.extend(F"""globals()[{key}] = {quoted if quoted.strip() else "None"}""".splitlines(True))
            
        def parse_def_links(self, m):
            self.parse_block_code()
            text = ''.join(self.pop_text(m))
            key, sep, body = text.lstrip('[').lstrip('^').partition(']:')
            key = quote(key)

            self.unindented.extend(F"""globals()[{key}] = globals().get({key}, {quote(body.lstrip(), ')')}""".splitlines(True))

NameError: name 'BlockLexer' is not defined

## Explicit loaders

The tangle step idenitifies explicit statements executed to create the base state.

In [12]:
    class PidginMixin:
        extensions = '.md.ipynb',
        def visit(self, node): return ReturnYieldDisplay().visit(node)
        def code(self, str): return importnb.loader.dedent(markdown_to_python(str))

    class Pidgin(PidginMixin, importnb.Notebook): ...

`MarkdownImporter` imports __Markdown__ files are source.  By default they recieve a Markdown repr.

In [13]:
    class MarkdownMixin:
        extensions = '.md',
        def get_data(self, path): return self.code(self.decode())
        def exec_module(self, module):
            super().exec_module(module)
            module._ipython_display_ = lambda: IPython.display.display(
                IPython.display.Markdown(filename=module.__file__))

In [14]:
    class MarkdownImporter(MarkdownMixin, PidginMixin, importnb.Notebook):  ...

In [15]:
    class MarkdownParameterize(MarkdownMixin, PidginMixin, importnb.Parameterize):  ...

In [16]:
    class PidginParameterize(PidginMixin, importnb.Parameterize): ...

In [17]:
    if importnb.ipython_extension.IPYTHON_MAIN():
        Pidgin, Markdown = PidginParameterize, MarkdownParameterize
    else: Markdown =  MarkdownImporter

## The <code>return</code> statement

Use `return` outside of a function or class as a display expression.

This opinion brings interactive code and portable code closer together.

The `return` statement has a similar feel the python 2.7 `print` statement.  It is used to should the rich display of its value.

In [18]:
    import ast, IPython
    class ReturnYieldDisplay(ast.NodeTransformer):
        def visit_FunctionDef(self, node): return node
        
        def visit_Return(self, node): 
            ip = IPython.get_ipython()
            if (not hasattr(ip, 'tangle')) or (hasattr(ip, 'tangle') and ip.tangle.return_display):
                return ast.Expr(
                    ast.Call(
                        func=ast.parse('__import__("IPython").display.display', mode='eval').body, 
                        args=node.value.elts if isinstance(node.value, ast.Tuple) else [node.value], 
                        keywords=[]))
            return node
        
        def visit_Expr(self, node):
            if isinstance(node.value, ast.Yield): 
                node = self.visit_Return(node.value)
            return node

In [29]:
    def unload_ipython_extension(shell):
        shell.input_transformer_manager.cleanup_transforms = [
            object for object in shell.input_transformer_manager.cleanup_transforms
            if not isinstance(object, PidginTangle)]
        shell.ast_transformers = [x for x in shell.ast_transformers if not isinstance(x, ReturnYieldDisplay)]

## Utitility functions

`quote` wrotes non code objects in triple ticks.

In [19]:
    def quote(str, punc=''):
        str, leading_ws = ''.join(str), []
        lines = str.splitlines(True)
        _ = '"""'
        if _ in str: _ = "'''"
        if not str.strip(): _ = punc = ''
        while lines and (not lines[0].strip()): leading_ws.append(lines.pop(0))    
        str = ''.join(lines)
        end = len(str.rstrip())
        str, ending_ws = str[:end], str[end:]
        if str and str.endswith(_[0]): str += ' '                    
        return F"{''.join(leading_ws)}{_}{str}{_}{punc}{ending_ws}"

`get_first_line` get the first non-`iter`able strings in `lines`

In [20]:
    def get_first_line(lines, line=''):
        for line in lines or ['']: 
            if line.strip(): break
        return line

`get_line_indent` computes the indent of a string.

In [21]:
    def get_line_indent(line):  return len(line) - len(line.lstrip())

The __Lexer__ s only consider coarse features of the markdown spec.  

In [22]:
    def _has_return(code):
        code = '\n'.join(code)
        if 'return ' not in code: return False
        code = importnb.loader.dedent(code)
        try:
            node = ast.parse(code)
            while hasattr(node, 'body'): node = node.body[-1]
            return isinstance(node, ast.Return)
        except: ...  

In [23]:
    def hanging_indent(str, indent, *, out=""""""):
        for line in str.splitlines(True):
            if not line.strip(): out += line
            else:
                if out.strip(): out += line
                else: out += indent+line
        return out

In [24]:
    @functools.wraps(markdown_to_python)
    def markdown_to_python_magic_wrapper(line: str, cell: str) -> IPython.display.DisplayObject: return IPython.display.Pretty(markdown_to_python(cell))

In [25]:
    if __name__ == '__main__': 
        load_ipython_extension(get_ipython())
        ip = get_ipython()
        ip.Completer.use_jedi=False