Skip to content
FabrizioM edited this page Mar 31, 2008 · 34 revisions

Towards Full Support of Python Syntax in Cython

  • Author Fabrizio Milo

TableOfContents(3)

Abstract

The Cython language makes writing C extensions for the Python language as easy as writing in Python itself, but it currently has some shortcomings.

I propose to improve Cython's compiler design by developing and integrating:

  • a new Parser,
  • an Abstract Syntax Tree ,and
  • a Flow Graph
These enhancements will allow:
  • full parsing of all valid Python 2.5 code
  • easier transition to Python 3000
  • simpler support of new features
  • better reasoning about the code for optimizations

This is not a from-scratch implementation. I worked on similar projects before and many of the proposed ideas have been already tested in the CPython compiler and in pypy, another open source python compiler. (I participated in a pypy sprint in 2006 and worked on an AST peephole optimizer)

By modifying and adapting existing open source code and leveraging from my experience, I shall accomplish this project in the available 12 weeks.

The ability to maintain Cython in the long term will be significantly enhanced by these modifications.

Detailed description

This document is meant to be a detailed technical version of the submitted Soc proposal. Some content may be repeated.

Intro

The Python language has been demonstrated to be successful for its extendibility and his clean C API, besides at its intuitive and simple syntax. Many of the existing extensions are developed using the C language and this can be error prone and tedious which means longer development cycles or the rejection of developing the extension. Cython, as Pyrex before, focuses on solving this problem.

What is the Objective ?

  • Improve Cython's compiler design, enabling the generation of code for Python 2.5 syntax.
  • Pave the way for future improvements and optimizations.

What is Cython ? How it is related to Python?

The Cython language makes writing C extensions for the Python language as easy as writing in Python.

The Cython Compiler generates C code that makes full use of the CPython API, making it easy to mix compiled and non-compiled code. It does not convert Python code to a pure C program.

Cython is based on the well-known Pyrex, but supports more cutting edge functionality and optimizations.

Other solutions exist, although generated code is ugly and poorly optimized.

What about the proposal ?

The current Cython implementation has some shortcomings. After studying Cython's design, I propose to improve Cython's compiler design by developing and integrating:

  1. a new Parser
  2. an Abstract Syntax Tree
  3. a Flow Graph

These enhancements will allow full parsing of all valid Python 2.5 code.

The central idea in my design is the construction of an Abstract Syntax Tree from the input source. The benefit of this approach are further underlined by noticing that Python itself adopted it during the 2.4 -> 2.5 transition [http://www.python.org/dev/peps/pep-0339/ PEP 0339]

(More in the "How it will be done?" section)

Why the need of this project ? Why should be the project approved?

  • Cython's long term maintainability will be significantly enhanced by the project
  • The project would benefit the python, Cython and Open Source community in several ways

What will the accomplished project deliver? What is in it for the Cython project ?

These enhancements will allow:

  1. Simpler support of new features
  2. Full parsing of Python 2.5 syntax
  3. Better reasoning about the code for optimizations
  4. Pave the way to support Python 3000 features
  5. Improve performance: many well know optimizing techniques could be implemented, boosting extension speed execution

What is in it for the Python community ?

  1. Better usability: Faster prototyping, development and code re-use
  2. Better module support for the different Python API versions
  3. Easier porting for existing C code, giving the python community more extensions to use and leverage
  4. Improve performance: existing python modules could be translated to Cython

What is in it for the Open Source community ?

  1. Easier way of integrating python with existing C/C++ applications
  2. C Programs could leverage from the power of python in a faster way

What is in it for me?

  1. This work will be the base of my master's thesis on "automatic code generation from dynamic language for high performance computers"
  2. Gain a better understanding of Cython and Python internals
  3. Improve my skill in writing optimizing python applications
  4. Learn more on Python and C / C++ integration
  5. Have fun with the community of Cython
  6. Be sponsored by my favorite company (Google)

How it will be done ?

Introduction

In the current compilation phase, from Cython source to C code, two steps are involved:
  1. Parse the source code into a parse tree (Compiler/Parser.py)
  2. Emit C code based on the parse tree (Compiler/Nodes.py Compiler/ExprNodes.py)

This is not how a standard compiler works. The usual steps for compilation are:

  1. Parse source code into a parse tree (new Parser)
  2. Transform parse tree into an Abstract Syntax Tree (new Abstract Syntax Tree)
  3. Transform AST into a Control Flow Graph ( new Flow Graph)
  4. Emit C code based on the Control Flow Graph (Compiler/Nodes.py Compiler/ExprNodes.py)

compare the current code for the 'if' stmt

#!python
def p_simple_expr(s):
    pos = s.position()
    expr = p_or_test(s)
    if s.sy == 'if':
        s.next()
        test = p_or_test(s)
        if s.sy == 'else':
            s.next()
            other = p_test(s)
            return ExprNodes.CondExprNode(pos, test=test, true_val=expr, false_val=other)
        else:
            s.error("Expected 'else'")
    else:
        return expr

to the new one
#!python
ast = parser.parse (source)
ast.visit(AstVisitor())

class AstVisitor:
    ...
    def visitIfStmt (self, node):
       node.test.accept (self)
       node.then.accept (self)
       node.else_.accept (self)


This is NOT a "from-scratch" implementation. I have done this before and many of
the proposed ideas are have been already tested in CPython compiler and in pypy (another open source python compiler)(I know this, because I participated to a pypy sprint in 2006 and worked on an Ast peephole optimizer).

By modifying and adapting existing open source code and leveraging from my
experience, I shall accomplish this project in the available 12 weeks.

Does the Project have sub-tasks ?

The project is divided in three sub tasks:
  • Parser
  • Abstract Syntax Tree
  • Flow Graph

Parser


Task:: Adapt existing implementation of python parsers to Cython: Goal:: The Parser is able to parse the full Cython Test Suite (~200 tests, ~4176 line of code) Estimated Time:: 1 - 2 weeks

Summary Description:: * creating the grammar * parsing the grammar Task Page: I can adapt this from a previous experience in a pypy sprint (see further section: Why me?) The parser will be generated from the standard Python grammar file, with additions for Cython keywords.

Abstract Syntax Tree


Task:: Use the Parser to generate an Abstract Syntax Visitor Goal:: The Visitor runs and passes the entire Python and Cython test Suites (~200 tests, ~4176 line of code) Estimated Time:: 3 - 6 weeks Description:: From the parsed structure

Further informations:: http://wiki.Cython.org/enhancements/Ast/Mutator http://wiki.Cython.org/enhancements/Ast http://wiki.Cython.org/enhancements/Ast/Status

Flow Graph


Task:: Adapt the Flow Model to implement a Builder from the AST Goal:: The FlowGraph Logic is able to resolve the Locals test Estimated Time:: 6+ weeks Description:: At this phase we have functional AST. To build a function's flow graph you create a Visitor, that hooks when he enter in a visitFuncDef Node:

class FlowGraphGen(AstVisitor):

   def visitFuncDef(self,node):
       graph = FlowGraph(node)
       #...

The Flow Graph's model is described in [:enhancements/FlowGraph: ]

The proposed model is largely inspired by Pypy's one. The main difference is how it will be assembled: Their implementation assembles the Flow Graph visiting the Source-Compiled Byte Code I will assemble it visiting the FunctionNode's AST.

How do you know that your project is feasible?

During this week I spent time on the IRC Cython channel discussing with the Cython community the pros, cons, caveats and pitfalls of my proposal.

With the objective to understand my idea, to explore the entity of the project, and obtain an inside understanding of the written code, I started modifying the existing code, and wrote some first tests.

At the moment I have a working test in my local repository for a small subset of the above ideas see: http://wiki.Cython.org/enhancements/AutoGenParser/CythonVisitorTest

To have a concrete idea of the amount of work to be done I wrote some documents describing the expected behavior http://wiki.Cython.org/enhancements/Parser http://wiki.Cython.org/enhancements/Ast http://wiki.Cython.org/enhancements/Ast/Mutator http://wiki.Cython.org/enhancements/FlowGraph

Methodology

I am Test Driven ! (to not be confused with UnitTesting) I shall report Weekly current task's status. I would like to introduce a test coverage system.

What is the Project Estimate ?

The schedule constraint is 12 Full Weeks. Parser: 1 to 3 weeks * Grammar * add Cython extended syntax rules and actions to python parser

Ast: 2 to 5 weeks
  • Create additional Nodes
Flow Graph: 4 to 6+ weeks
  • Implement the model
  • Space Operations

Schedule

http://wiki.Cython.org/FabrizioM/soc2008/Schedule

Why Me ?

  1. I have done this before
  2. I am an experienced Python and C/C++ programmer
  3. I am competent about project's topic
  4. I am passionate
  1. I have done this before:

    Many of the ideas of the Flow Graph and the base code comes from the pypy project.

    In Geneve 2006, I participated in the EuroPython Conference's pypy sprint and worked on the stackless back-end support. During the sprint I proposed to implement a peephole optimizer for the pypy AST. For evidence see http://codespeak.net/pipermail/pypy-svn/2006-July.txt and search for "misto" (my nick)

  2. I am an experienced Python / C/C++ programmer

    Cython is written in python and I have 3+ years of experiance as a python programmer. The task also involves knowledge of C/C++ that I have. Finally, I have experience in writing compilers.

  3. I am competent about project's topic

    I'm graduating with a master's degree in "Computer Science" from the University of Rome "La Sapienza". I am writing a thesis on "automatic code generation from dynamic language for high performance computers", where I dealt with and studied many of the project pitfalls.

    I presented a Talk at the first Python Conference in Italy. Title: Python & C/C++/Java/.Net : How To http://www.pycon.it/pycon1/schedule/speaker/1/ I described the preferred ways of integrating python with existing applications.

    I study the process of agile development with focus on the creative stage. I was speaker at the Italian Agile Day 2007 (http://www.agileday.it/index.php?page=speakers)

    and at the European Summer School of Agile Programming 2007 with the title: Software Creativity (http://essap.dicom.uninsubria.it/pmwiki.php?n=Main.FabrizioMiloSoftwareCreativity)

    and many other public conferences (Linux Day, BarCamp, Java Meetings, etc)

  4. I am passionate

    I am passionate and highly motivated to accomplish this project, have strong communication skills and an innate drive to continue improve myself and the world.

References

www.cython.org www.python.org Pyrex: http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/ For more on PyPy see : http://codespeak.net/pypy/dist/pypy/doc/home.html

Clone this wiki locally