enhancements Ast

FabrizioM edited this page Apr 6, 2008 · 11 revisions
Clone this wiki locally

CEP 903 - Abstract Syntax Tree

  • Community status: Wanted
  • Implementation status: Prototyped
  • Comments: This is part of [:FabrizioM]'s GSoC 2008 project proposal
  • [:enhancements/Ast/Status: Completion Status]


This Page describes the idea of introducing an Abstract Syntax Tree in Cython.

I suggest to follow the same transition made by Python from the 2.4 version to 2.5 [http://www.python.org/dev/peps/pep-0339 PEP0339]


In the current compilation phase, from Cython source to C code, two steps are involved:

1. Parse the source code into a parse tree (Compiler/Parser.py) 1. Emit C code based on the parse tree (Compiler/Nodes.py Compiler/ExprNodes.py)

This is not how a standard compiler works. The usual steps for compilation are:

1. Parse source code into a parse tree (new Parser) 1. Transform parse tree into an Abstract Syntax Tree (new Abstract Syntax Tree) 1. Transform AST into a Control Flow Graph ( new Flow Graph) 1. Emit C code based on the Control Flow Graph (Compiler/Nodes.py Compiler/ExprNodes.py)

Compare the current code for the 'if' stmt

def p_simple_expr(s):
    pos = s.position()
    expr = p_or_test(s)
    if s.sy == 'if':
        test = p_or_test(s)
        if s.sy == 'else':
            other = p_test(s)
            return ExprNodes.CondExprNode(pos, test=test, true_val=expr, false_val=other)
            s.error("Expected 'else'")
        return expr

to the new one

ast = parser.parse (source)

class AstVisitor:
    def visitIfStmt (self, node):
       node.test.accept (self)
       node.then.accept (self)
       node.else_.accept (self)

To generate the AST, you provide the required action. For example the for_stmt to build the ast.For Node:

def build_for_stmt(builder, nb):
    """for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite]"""
    atoms = get_atoms(builder, nb)
    else_ = None
    # skip 'for'
    assign = to_lvalue(atoms[1], consts.OP_ASSIGN)
    # skip 'in'
    iterable = atoms[3]
    # skip ':'
    body = atoms[5]
    # if there is a "else" statement
    if len(atoms) > 6:
        # skip 'else' and ':'
        else_ = atoms[8]
    builder.push(ast.For(assign, iterable, body, else_, atoms[0].lineno))

Where the ast.For in the last line is:

class For(Node):
    def __init__(self, assign, list, body, else_, lineno=None):
        self.assign = assign
        self.list = list
        self.body = body
        self.else_ = else_
        self.lineno = lineno

#   ...
    def accept(self, visitor):
        return visitor.visitFor(self)

Notice here the visit Method. Ast's Nodes support the [:enhancements/Ast/Visitor: VisitorPattern]

Creating the current Expression Tree from an AST

The steps to generate the current Cython Expression Tree are:
1. Create the AST by parsing the source file 1. Create a CythonVisitor 1. Visit the Ast with the CythonVisitor
class CythonVisitor(ASTVisitor):
    # ...
    def visitCFunction (self, node):
        pos = (__file__, 0, node.lineno)
        basenode = CSimpleBaseTypeNode(pos,name = 'int',module_path = [],is_basic_c_type = True,signed = 1,longness = 0,is_self_arg = False)
        base = CNameDeclaratorNode(pos, name = node.name, cname = node.name)
        declarator = CFuncDeclaratorNode (pos,base = base,args = [],has_varargs = False,exception_value = None, exception_check = 0,nogil = True, with_gil = False)
        suite = node.code.accept(self)
        return CFuncDefNode(pos,visibility = 'private',base_type = basenode, declarator = declarator, body = suite,doc = node.doc,modifiers = [],api = False, overridable = False)
    # ...

The above code is just a prototype of course. But it already translates with the new parser small Cython programs. The CFuncDefNode and other classes are used unmodified.

To use the ASTVisitor you simply do:

v = CythonVisitor()
tree = ast.accept (v)

The main benefit of this approach is this: you can feed different AstVisitor for a same ast. This is really useful if you want to support different Python API versions:

if opt.output == PYTHON_23:
   v = CythonVisitor23()
elif opt.output == PYTHON_24:
   v = CythonVisitor24()
elif opt.output == PYTHON_25:
   v = CythonVisitor25()

tree = ast.accept (v)
output_code (tree)
# untested code

Why this is needed? Every release of Python API brings new C API features. You can add support for new C API language feature in a clear way. Look at the new [http://docs.python.org/whatsnew/ports.html Python 2.5 API] features especially the big change for the [http://www.python.org/dev/peps/pep-0353/ Py_ssize_t] at a high level should look like:

class CythonVisitor24(ASTVisitor):
    # ...
    def visitIndex (self, node):
        return ssize_tProducer(node)
    # ...

class CythonVisitor25(ASTVisitor):
    # ...
    def visitIndex (self, node):
        return Py_ssize_tProducer(node)
    # ...
#untested invented code

Right now, handling the logic for different versions will involve to plug obscure if in the parser.

More AST features

On fly code injection For the example supporting the with statement ([:enhancements/with: an other proposal] will be easy: You simply detect the WithNode in the generate AST and replace it with an new generate AstNodes:

class ASTVisitor:
      def expandWith (self, node):
        with_transform = """
        mgr = (%(EXPR)s)
        exit = mgr.__exit__  # Not calling it yet
        value = mgr.__enter__()
        exc = True
                # The exceptional case is handled here
                exc = False
                if not exit(*sys.exc_info()):
                # The exception is swallowed if exit() returns true
            # The normal and non-local-goto cases are handled here
            if exc:
                exit(None, None, None)
        """ % {
                EXPR : flatten(node.expr), # re transform ast to python code,
                VAR : if node.value "%(VAR)s = value  # Only if "as VAR" is present" % node.var or " ",
                BLOCK : flatten(node.block)
        new_node = build_ast (with_transform_code ) # parse the code and generate an ast subtree
        return new_node

      def visitWithNode (self, node):
          return self.transformWith (node)

The above code is only a vision of how it would operate the With Transform, more advance transformations should be handled with the Mutator interface

For a complete FUNCTIONAL test case look at: /CythonVisitorTest

Design changes


Cython's current steps, for code transformation, are the following:

1. analyse_declarations 1. analyse_expressions 1. generate_code


Make symbol table entries for all declarations at the current level, both explicit (def, cdef, etc.) and implicit (assignment to an otherwise undeclared name).
this will be done in the CythonVisitor, after constructing the AST, it will be optimized with all the transformation that does not require type or flow information.


Determine the result types of expressions and fill in the 'type' attribute of each ExprNode. Insert coercion nodes into the tree where needed to convert to and from Python objects. Allocate temporary locals for intermediate results. Fill in the 'result_code' attribute of each ExprNode with a C code fragment.
for each function will be constructed a [:enhancements/FlowGraph: FlowGraph] by visiting it with a FlowGraphBuilder(Visitor). The graph will be analyzed with well know standard techniques, with the objective of infer every variable's type, and apply transformation where needed. (i.e. dead code removal) After all the possible transformations are applied to the graph, the graph will be converted back to an aST subtree and attached to the main tree.



Emit C code for all declarations, statements and expressions. Recursively applies the 3 processing phases to the bodies of functions.

  1. generate_function_definitions Emit C code for the definitions of any structs, unions, enums and functions defined in the current scope-block.
  1. generate_execution_code Emit C code for executable statements.
this phase should be just a simple visitor that flattens the AST structure, or creates the DOM structure.
With the current implementation, the code generation and the type inference are mixed in the same place, look for example at

#!python # class FuncDefNode(Node): # ... def generate_function_definitions(self, env, code, transforms):


# XXX A code generator should know anything about a Program Scope! genv = env.global_scope() lenv = LocalScope(name = self.entry.name, outer_scope = genv) lenv.return_type = self.return_type code.init_labels() self.declare_arguments(lenv) # XXX

# XXX This Phase should not be here! transforms.run('before_analyse_function', self, env=env, lenv=lenv, genv=genv) self.body.analyse_declarations(lenv) self.body.analyse_expressions(lenv) transforms.run('after_analyse_function', self, env=env, lenv=lenv, genv=genv) # XXX

# Just this self.generate_interned_num_decls(lenv, code) self.generate_interned_name_decls(lenv, code) self.generate_py_string_decls(lenv, code) self.generate_cached_builtins_decls(lenv, code) self.generate_const_definitions(lenv, code)

All this enhancements will be applied GRADUALLY.



  • Design improvement: Separation of how to parse the logic (Parser) from how to give a Meaning to it (Nodes, ExprNodes).
  • Peephole optimization like a = 10 * 5 directly in the AST
  • Code flow analysis could be implemented easily by


  • should be clear that any future Pyrex feature parser-dependent will be not supported, but I am sure will be emulated without too much effort

The previous code is functional but not yet published

Thanks for reading! (and Sorry for the BAD English)

robertwb: Could you please write an example mutator for the visitWithNode? Re-creating and re-parsing a huge string is a bad idea on several levels (efficiency, loosing line number information, indentation woes, etc.)

robertwb: you need to have a way of actually modifying the grammar file, not just adding new nodes (e.g. the positional_arguments note must accept type information.

robertwb: your Py_ssize_t index visitor example doesn't make any sense, could you come up with a more concrete example?

robertwb: I would really like to see control flow, if possible, as a central part of the project fabrizioM: here it is [:enhancements/FlowGraph: Control Flow ]


  • [:enhancements/Ast/Visitor: Visitor Interface]
  • [:enhancements/Ast/Mutator: Mutator Interface]