Clone this wiki locally
The current state of language implementation and code generation in the Python ecosystem is a rapidly growing untended garden. The vast number of projects at SciPy 2013 that have implemented some form of code generation technology is amazing, but many projects are stepping on each other toes and the path for a user of the ecosystem is not clear. At SciPy 2013 many of us got together to discuss this issue and come up with a path forward. While no clear statement emerged, a consensus that we need more discussion and study of the issues is needed.
Andy: I called this meeting at SciPy2013 because I have been independently contacted by four individuals about the state of affairs of speeding Python through code generation.
The development activity in the field and the support from the broader community is encouraging. Unfortunately while there have been many successful projects in this area development is highly fragmented. This raises two issues among smaller projects
- Uncertainty about the future makes potential developers hesitant
- Independent researchers repeatedly create complete compiler infrastructures to test novel components
As a result a large amount of exciting work is either unimplemented or implemented within siloed, uncomposable projects.
Currently there is very little sharing among projects. This makes collaboration and reuse difficult. This is not a new phenomenon in the compiler community. Lars Bergstrom's masters thesis lists over 95 papers that never pushed their work to a common compiler infrastructure. I often refer to this as the composability problem.
The composability problem is that there is no effective way to compose (very) high level languages together. One can break execution from one to another, but this incurs a high cost context switch. This often requires code duplication to pass all the necessary details from one system to the other. It also requires any sort of static analysis to become quite complex as the different languages may employ different semantics. As a result it is difficult to link high and low level code. The scientific python community is feeling this problem in spades as they want both to work naturally.
Our discussion came to two basic goals:
- A shared interface for the representation of high-level algorithmic, and array-based computations (how can we share optimizations?)
- A shared intermediate level implementation from abstract (usually numeric) algorithms to better code generators (how can we generate efficient low-level code for common algorithms on accelerators?)
While coming up with two Intermediate Representations (IRs) to meet these needs might be the most effective path for sharing, the vast number of use cases (see below) means that someone is not going to be happy with anything. In addition it will be difficult to convince development communities to change existing codebases. The community has come together in the past to define a single array protocol in NumPy, Does an analogous representation exist? How do we work together towards interoperation?
One way forward is to build pairwise translations from one project's graph to another's. This solution seems poorer in quality but has a clear path to solution. Our current plan is to engage the community a bit and then find funding to have a small summit to come to some more face time to engage about the issue.
Below is a characterization of that discussion. The original google doc is at https://t.co/WlCUfRzrfX
potential project name: scikit-air (Array Intermediate Representation)
- What do we gain by sharing?
- What transformations can we share?
- What interface is necessary?
- DAG vs. tree
- Type systems: robust (C++?), user defines, concrete types
- language vs. library
- data format vs. interface
- Share common functionality
- speed: fast vectorization of user code
- data flow
- Ufuncs from many libs
- Define Intermediate Representations
- Routines to transform between Intermediate Representations
- Invariant of the IR
- Bindings (connecting low level code to Python)
- Optimizations of graphs
- Algorithmic Differentiation
- Operator fusion
- Loop fusion
- graph transformations
- Mathematical / numerical optimizations
- Bayesian probabilistic programming
- Numerical optimization
- Write OpenCL kernels and ufuncify them over numpy arrays
- treat control flow and data flow together
- Want to be architecture independent
- Want vector instructions
- Don’t duplicate what LLVM does well
- track data flow
- Source and Target
- Lower than SymPy
- Should Target GPUs
- Shouldn’t go lower than LLVM
- don’t depend on LLVM
- No ISA, concrete vector intrinsics
- GPU generation is hard enough so that we only want to do it once
- Do we want to include execution model in this representation?
- higher level accelerator target: heterogeneous IR == track execution
- Missing Features of LLVM
- Too Low Level
- Weird Arrays / Vectors
- Generic symbols
- Dissassemble and Assemble
- Visitor pattern (and generalizations)
- abstract / dynamic properties
- Document language implementation (Alex)
- Higher-level abstraction above CUDA: Load/store vector machine (Alex)
- Define data/control-flow concept (Siu)
- Thread on NumFOCUS mailing list (Matt)
- NumFOCUS SEP (Anthony)
- Send details about PyKit (Siu)
- Experiment to test IRs (Andy)
- Secure funding (Andy)
- Turning explicit loops into the polyhedral model (Serge)
Large Conceptual Pieces (this list should not expand beyond 4-5)
- Efficient low-level code generation
- not concrete (enough)
- problems not isolated -> piecewise unification of projects
- backing / blessing / funding?
- James Bergstra
- Ondřej Čertík
- Serge Guelton
- Siu Kwan Lam
- Jason Moore
- Aaron Meurer
- Florian Rathgeber
- Matt Rocklin
- Alex Rubinsteyn
- Kurt Smith
- Bill Spotz
- Andy Terrel
- Anthony Scopatz
- Mark Wiebe
- John Wiggins
- Numba Pro
- Seamless / ODIN
- Theano, HyperOpt
Other Projects generating code in Python
- Copperhead (?)
- Cython (?)
- PyCUDA / PyOpenCL
- PyKit (Mark Florisson)
- CorePy (defunct?)