Skip to content
Find file
Fetching contributors…
Cannot retrieve contributors at this time
71 lines (64 sloc) 3.58 KB
view: single
permalink: demystifying-unladen-swallow
pubdate: 2009-10-12T00:00:00Z
title: Demystifying Unladen Swallow
author: Ramkumar Ramachandra
snip: Python + LLVM = Unladen Swallow
Unladen Swallow is "an optimization branch of CPython, intended to be
fully compatible and significantly faster". Interesting. In order to
understand Unladen Swallow, we first needed to understand the CPython
compiler. It's written in C and fairly easy to understand, very unlike
the self-hosting complex beasts Steel Bank Common Lisp and Glasgow
Haskell Compiler. The design of the compiler is explained fully in PEP
339, and I'm just including a small warm-up discussion.
Start off at the top level: the evaluator (Python/cevel.c). It
executes Python bytecode. From the Python program you write to the
Python bytecode that is produced, there are several stages with
optimizations at every stage. It goes like this:
* Using the SPARK parser generator, generate the Parse Tree from the
Python code (Parser/pgen.c)
* Convert the Parse Tree into an Abstract Syntax Tree (Python/ast.c)
* Transform the AST into a Control Flow Graph. Several compiler
optimizations like removing unreachable code are implemented here
* Post-order DFS the CFG and flatten it to emit the final Python
bytecode. The bytecode is then subjected to various peephole
optimizations (Python/compile.c)
* Instead of having to go through this this long procedure everytime
we want to run the same program, the Python bytecode produced is
stored in a .pyc file.
Enough warming up. The Unladen Swallow developers want to optimize
CPython very smoothly, preferably without breaking any existing
code. Their Project Plan is laid out on the Wiki page; I'll mainly
discuss 2009Q2.
If they work at any stage higher than the Python bytecode, they'll
have to re-implement the fantastic peephole optimizations written by
the community over the years. The plan is simple: instead of executing
the high-level bytecode in the eval loop (which can be painfully
slow), compile atleast some part of it into a lower representation
that can be executed faster. Why not compile all of it to x86
assembly? Firstly, we'd lose the interactive shell, a very major part
of Python. Secondly, we'd lose out on Python's high portability, that
arises because it's not tied down to any single architecture's
assembly language (I even run Python on my phone using the PyS60
package). Thirdly, it's terribly difficult to do, makes Python uglier
and unmaintainable, and it's utterly pointless- we're trading off too
much for execution speed.
Just-in-time (JIT) compilation is the solution. Compile the code only
when needed. Then again, don't compile everything; compile when the
new execution time is less than the old execution time by an extent
enough to justify the compile time. Otherwise, the startup delay will
become unbearable and the interactive interpreter won't really be
interactive anymore. The perfect low-level platform independent
representation to JIT to is LLVM (Low-level virtual machine). So, the
Unladen Swallow developers first wrote a compiler from Python bytecode
to LLVM IR from scratch (Python/ The LLVM IR
(Intermediate Representation) looks a lot like assembly, except that
it's architecture-independent, and can be subject to a nice set of
optimizations. Then they modified the eval loop to execute LLVM IR,
while keeping the existing Python bytecode evaluator (file renamed:
Q3 and Q4 aren't updated on the Wiki page, but it should be easy
enough to find out the work in progress from the IRC channel, mailing
list, and repository commits.
Jump to Line
Something went wrong with that request. Please try again.