Lazy Matrix Algebra Ecosystem

mforbes edited this page Nov 17, 2012 · 11 revisions

Following on discussion from theano-dev "SymPy and BLAS/LAPACK" around October 2012, this page tries to put together some of the projects that have some overlap with Theano, or which Theano should use.



  • Op, Type, Apply, Variable data structures for describing large expression graphs.
  • numpy-based implementations for almost all expressions
  • almost-standalone C implementations for many expressions
  • CUDA-based implementations for many expressions
  • FunctionGraph and graph transformations for fast execution, some improved numerical stability
  • inference: dtype and ndim, shape in many cases, constant value propagation. "hint" mechanism propagates positive semi-definite flag in some cases to identify where Cholesky can be used for inverse and determinant.
  • uses custom CUDA stack for driving single GPU, configured at import-time (yuck)


  • Provides ndarray, a data structure for IO with Python code.
  • Provides implementations for most user-facing / default theano Ops.


  • Provides python access to a subset of BLAS and LAPACK.



  • High level interface to matrix expressions
  • Logical inferences on matrix properties (symmetry, definiteness, ...)
  • Symbolic representation of BLAS and LAPACK computations and code generation (in progress)
  • Pretty/latex printing
  • Scalar simplification


<a href=


<a href=>Eigen

  • provides templated C++ linear algebra.
  • How? Example?


  • Compiler for marked-up Python to C.
  • Excels at C-like Python working on numpy arrays.
  • I don't think it can output standalone C.


  • Wraps CUDA API (see below) and provides Python data structures for CUDA. Andreas (maintainer) would prefer to deprecate this in favor of PyOpenCL, I believe.


  • Wraps OpenCL API (see below) and provides Python data structures for OpenCL.


  • Converts looping scalar python code into fast LLVM implementation.


  • Converts simple numpy-based Python into Theano.


  • Fast math primitives, FFTs and BLAS operations for NVidia GPUs.

ATI/AMD libs

  • Fast math primitives, FFTs and BLAS operations for ATI GPUs.

Intel SPMD Compiler

  • Uses CUDA-like programming style to program new multicore Intel CPUs.


  • Fast templating engine. Theano should use this (or similar) for generating code, but does not. Another alternative is Jinja2.


  • Standard API for programming heterogeneous systems by applying kernels to data buffers. ATI uses this as main GPGPU strategy. Nvidia also supports it. PyOpenCL wraps any implementation of this API.


  • CUDA kernel generator from pretty normal-looking Python code.


  • Unreleased lazy retargetable expression evaluation engine from Continuum Analytics.