Lazy Matrix Algebra Ecosystem
Clone this wiki locally
Following on discussion from theano-dev "SymPy and BLAS/LAPACK" around October 2012, this page tries to put together some of the projects that have some overlap with Theano, or which Theano should use.
- Op, Type, Apply, Variable data structures for describing large expression graphs.
- numpy-based implementations for almost all expressions
- almost-standalone C implementations for many expressions
- CUDA-based implementations for many expressions
- FunctionGraph and graph transformations for fast execution, some improved numerical stability
- inference: dtype and ndim, shape in many cases, constant value propagation. "hint" mechanism propagates positive semi-definite flag in some cases to identify where Cholesky can be used for inverse and determinant.
- uses custom CUDA stack for driving single GPU, configured at import-time (yuck)
ndarray, a data structure for IO with Python code.
- Provides implementations for most user-facing / default theano Ops.
- Provides python access to a subset of BLAS and LAPACK.
- High level interface to matrix expressions
- Logical inferences on matrix properties (symmetry, definiteness, ...)
- Symbolic representation of BLAS and LAPACK computations and code generation (in progress)
- Pretty/latex printing
- Scalar simplification
- provides templated C++ linear algebra.
- How? Example?
- Compiler for marked-up Python to C.
- Excels at C-like Python working on numpy arrays.
- I don't think it can output standalone C.
- Wraps CUDA API (see below) and provides Python data structures for CUDA. Andreas (maintainer) would prefer to deprecate this in favor of PyOpenCL, I believe.
- Wraps OpenCL API (see below) and provides Python data structures for OpenCL.
- Converts looping scalar python code into fast LLVM implementation.
- Converts simple numpy-based Python into Theano.
- Fast math primitives, FFTs and BLAS operations for NVidia GPUs.
- Fast math primitives, FFTs and BLAS operations for ATI GPUs.
- Uses CUDA-like programming style to program new multicore Intel CPUs.
- Fast templating engine. Theano should use this (or similar) for generating code, but does not. Another alternative is Jinja2.
- Standard API for programming heterogeneous systems by applying kernels to data buffers. ATI uses this as main GPGPU strategy. Nvidia also supports it. PyOpenCL wraps any implementation of this API.
- CUDA kernel generator from pretty normal-looking Python code.
- Unreleased lazy retargetable expression evaluation engine from Continuum Analytics.