enhancements parsetreetransforms

Enhancement proposal: Support for parse tree transforms

Author: DagSverreSeljebotn

Status: Visitor pattern was implemented in Visitor.py with most transforms in ParseTreeTransforms.py

Description

In order to more easily implement more advanced features, it is suggested to have a simple API for implementing transforms that work on the parse tree level, at seperate stages. From this perspective the operation of Cython can be specified through a series of phases where the parse tree is successively transformed in small increments from the original Cython code and to a tree that will generate the wanted C code. This can be done almost without touching the original code base, only using the API when it is best for the job. If the API is not used, nothing is different to the rest of Cython.

The proposal for parse tree transforms is:

A very simply "document object model" is put on top of the parse tree, so that code can be manipulated symbolically.

A simple Transform interface is specified.

A series of phases/cut-points are specified (by ID names), by which it is possible to install optional (and perhaps in time not-so-optional) transforms.

In each phase, different information is available in the tree: Ie it is possible both to do processing before variables are typed (what PEX does, but better) and after (NumPy, operator overloading).

Cython compilation then proceeds exactly like before, but when hitting a phase/cut-point, the tree is handed off to the appropriate transforms for extra processing.

Simplified example of an idealized compile using this:

Internal: Cython code is parsed
Through API: The transforms that currently happen in PEX can happen, but with a DOM (simple for-loop optimizations etc.)
Internal: Typing happens
Through API: Transforms demanding typing (operator overloading, NumPy support etc.) happens
Internal or through API: Coercion happens
Internal: Generate C code.

However, more sophisticated phases can be done once everything is stable. Type inference and so on should at least be possible without agonizing pain using this framework (even if it is not wanted, there might be other optimizations waiting to happen).

A more concrete example further down.

Discussion

The background for this proposal comes from trying to find a good way to implement CodeGeneratorPluginArchitecture, and once this is in place it should be rather easy to implement that. However I believe this approach to give much power beyond that goal and it should make it very easy to implement custom optimizations.

Impact on existing code

The parse tree was already well designed and transformations that happens to it happen in seperate, well-defined steps. Making a simple prototype that allows plugging in new transforms that has hooks within function/"def" scope only, and only for two stages (before and after typing of symbols), took about four hours including experimentation and examples once the ideas was in place. It also had very small consequences for existing code (generate_function_definition had an added parameter everywhere and that was about it). So all in all things look very good.

However, there's one obstacle left that might call for a bigger refactoring: The "type analysis" (figure out the type of each symbol) and "type coercion" (inserting new wrapper nodes in the tree to do coercions) currently happen as one phase. This obviously makes it much more difficult to write transforms that depends on type information. It is possible (simply always rewrite entire statements instead of individual parts), however a refactor, making analysis one phase and coercion another phase with a call to external transforms in-between, is probably a much better solution.

This doesn't look like an impossible task at all, because the two are already quite seperate, they just happen to be done in the same chain of function calls.

Parse tree DOM

User side

Each node has a function get_child_accessors which returns an iterator over a "child accessor". A child accessor has three methods: get(), set(value), and name():

>>> childacc = funcnode.get_child_accessors().next()
>>> print childacc.name() # What is the relation between funcnode and this child?
'body'
>>> print childacc.get() # Get this child
<StatListNode at ....>
>>> childacc.set(other) # Replace the entire function body

Currently, this is all. Adding more functions would just make it more complex, while one can be pretty sure that this functionality can be emulated on top of any future DOM for backwards compatability.

What can be accessed through the DOM?

The DOM only exists to interact with the tree structure, so most information (type, name, everything interesting) is not available through the DOM, rather the nodes can be checked for type and interacted with directly.

Notes about lists and node sets

Lists of children are treated as their own level in the tree and exists in their own right, ie get will return the list and set will replace the entire list. This way, one doesn't need to add list manipulation functions to the API.

Node implementation side

For ExprNodes nothing is needed, since they already carry information in "subexprs" that is made use of by ExprNode on their behalf to provide children (though they can override get_child_accessors of course).

Other nodes should either reimplement get_child_accessors, or use the default implementation which expect that they list the names of the children attributes is listed in child_attrs. A child should either be a Node descandant, or a Python list. Such a list should in turn contain a valid child (list or Node) and so on. Example:

class SingleAssignmentNode(AssignmentNode):
    child_attrs = ["lhs", "rhs"]
...

Note again that not all attributes are listed, only the ones that consists of more nodes in the parse tree.

If a Node doesn't implement this an exception will be raised if DOM access is tried. If a Node is a leaf node, it can implement this by declaring

child_attrs = []

Possible extension: Visitor pattern

This is not implemented yet, but might become useful.

If doing and/or refactoring a very wide transform (like type coercion), then the way to do it is this:

Have one class for the transform (CoercionTransform), don't spread it all over the types (like it is now)
Define an abstract base class that has methods "visit_NameNode" and so on
Override all these to provide the behaviour for different classes
Define a method in Node and descendants, visit, like this:

class NameNode(Node):
   ...
   def visit(self, visitor, *args, **kwargs): return visitor.accept_NameNode(self, *args, **kwargs)

Back to our transform, the process_node simply becomes a call to node.visit()...

Nice and reusable for lots of situations...

Demonstration

Make sure you have the latest cython-devel. Make test.pyx:

def test():
    for i in range(0, 4):
        print 'a'
    for j from 0 <= j < 4:
        print 'b'

Then, run Cython like usual but add some command-line options (better use a script for invoking!):

cython -T before_analyse_function:Cython.Compiler.Transform.PrintTree \
       -T after_analyse_function:Cython.Compiler.Transform.PrintTree \
       myfile.pyx

(Note: This command-line parameter is not intended to even be the beginning of a plugin system, rather it can be considered a developer tool.) This will simply dump descriptions of (incomplete, ie terminating too early some places because the DOM isn't defined on all nodes yet) parse trees for the functions, one before type information has been added to symbols and one after.

Then, to make it interesting:

mkdir Cython/Optional
touch Cython/Optional/__init__.py
# and save the example transform below to ForRange.py

And use the following call:

cython -T before_analyse_function:Cython.Compiler.Transform.PrintTree \
       -T before_analyse_function:Cython.Optional.ForRange.ForRangeOptimization \
       -T before_analyse_function:Cython.Compiler.Transform.PrintTree \
       myfile.pyx

(The two print "statements" are only so that you can inspect the results of the transform manually). Notice that the for... in range is transformed!

And here comes the transform (for now I'll leave it as an excersize for the reader to find and fix the bug that causes nested cases not to be transformed and update the wiki accordingly):

from Cython.Compiler import Transform, Nodes, ExprNodes

#
# This is only a quick example! It generates wrong code for decreasing step!
#
# A simple demo transformation turning a "for i in range(...)" into "for i from 0 <= 0 < dest".
# No checking for whether it is the "right" range function (that is, has the builtin been overriden?) and so
# on - this is only in order to have a simple example for a transform.
#
# How it works: It filters the tree unchanged, except when hitting the excapt combination wanted
# and then replaces a ForInStatNode with a range function attribute with a corresponding
# ForFromStatNode.

class ForRangeOptimization(Transform.Transform):
    def initialize(self, phase, env, genv, lenv):
        assert phase == "before_analyse_function"

    def process_node(self, node, name):
        def default():
            self.process_children(node)
            return node

        # Go through all nodes and filter for the interesting ones.
        if isinstance(node, Nodes.ForInStatNode) and \
                isinstance(node.iterator, ExprNodes.IteratorNode) and \
                isinstance(node.iterator.sequence, ExprNodes.SimpleCallNode) and \
                isinstance(node.iterator.sequence.function, ExprNodes.NameNode) and \
                node.iterator.sequence.function.name == "range":
            return self.process_for_in_range(node, node.iterator.sequence)
        else:
            return default()

    def process_for_in_range(self, node, range_func):
        result = Nodes.ForFromStatNode(
            pos=node.pos,
            target=node.target,
            body=node.body,
            else_clause=node.else_clause,
            step=None,
            relation1 = "<=",
            relation2 = "<"
        )
        if len(range_func.args) >= 2:
            result.bound1 = range_func.args[0]
            result.bound2 = range_func.args[1]
            if len(range_func.args) == 3:
                result.step = range_func.args[2]
        else:
            result.bound1 = ExprNodes.IntNode(pos=node.pos, value=0)
            result.bound2 = range_func.args[0]
        return result

Running it:

Parse tree dump at phase 'before_analyse_function'
- (root): DefNode
  - star_arg: (none)
  - starstar_arg: (none)
  - body: StatListNode
    - stats[0]: ForInStatNode
      - target: NameNode(type=None)
      - iterator: IteratorNode(type=None)
        - sequence: SimpleCallNode(type=None)
          - self: (none)
          - coerced_self: (none)
          - function: NameNode(type=None)
          - args[0]: IntNode(type=<CNumericType long>)
          - args[1]: IntNode(type=<CNumericType long>)
          - arg_tuple: (none)
      - body: PrintStatNode
        - args[0]: StringNode(type=<CPtrType <CNumericType char>>)
      - else_clause: (none)
      - item: (none)
    - stats[1]: ForFromStatNode
      - target: NameNode(type=None)
      - bound1: IntNode(type=<CNumericType long>)
      - bound2: IntNode(type=<CNumericType long>)
      - step: (none)
      - body: PrintStatNode
        - args[0]: StringNode(type=<CPtrType <CNumericType char>>)
      - else_clause: (none)
      - py_loopvar_node: (none)
Parse tree dump at phase 'before_analyse_function'
- (root): DefNode
  - star_arg: (none)
  - starstar_arg: (none)
  - body: StatListNode
    - stats[0]: ForFromStatNode
      - target: NameNode(type=None)
      - bound1: IntNode(type=<CNumericType long>)
      - bound2: IntNode(type=<CNumericType long>)
      - step: (none)
      - body: PrintStatNode
        - args[0]: StringNode(type=<CPtrType <CNumericType char>>)
      - else_clause: (none)
      - py_loopvar_node: (none)
    - stats[1]: ForFromStatNode
      - target: NameNode(type=None)
      - bound1: IntNode(type=<CNumericType long>)
      - bound2: IntNode(type=<CNumericType long>)
      - step: (none)
      - body: PrintStatNode
        - args[0]: StringNode(type=<CPtrType <CNumericType char>>)
      - else_clause: (none)
      - py_loopvar_node: (none)

/* "/home/dagss/code/cython/sample/numpy.pyx":9
*
* def test():
*     for i in range(0, 4):             # <<<<<<<<<<<<<<
*         print 'a'
*     for j from 0 <= j < 4:
*/
 for (__pyx_1 = 0; __pyx_1 < 4; __pyx_1++) {
   __pyx_2 = PyInt_FromLong(__pyx_1); if (unlikely(!__pyx_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 9; goto __pyx_L1;}
   Py_DECREF(__pyx_v_i);
   __pyx_v_i = __pyx_2;
   __pyx_2 = 0;

Conclusion

In order to (easily) provide NumPy support, C++ template support, and other goodies, there are still a bit left, but if this proposal is accepted it makes these tasks much more self-contained, seperated, manageable and they can be taken successively in smaller steps. Only the seperate coercion phase refactoring looks any daunting at all.

Comments

robertwb: This looks very good, I have included it in the development branch to play with (there appears to be no regression). It may be nice to be able to parse the entire file at once (rather than function-by-function). One issue is that in the current model the phases are not necessarily all in sync (e.g. analyzing function bodies happens during the generate execution code phase of the module).

dagss: True. However I think what is most important is that one can be sure that transforms written now will be easily used with future approaches, and if one writes a transform for a function-analyze-within-module-generation phase I'm sure it can also be used with very few changes for a global-analyse-everything phase later -- that way small, incremental steps can be taken rather than changing all Cython code at once.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly