Lazy Matrix Algebra Ecosystem

mforbes edited this page Nov 17, 2012 · 11 revisions

Following on discussion from theano-dev "SymPy and BLAS/LAPACK" around October 2012, this page tries to put together some of the projects that have some overlap with Theano, or which Theano should use.



  • Op, Type, Apply, Variable data structures for describing large expression graphs.
  • numpy-based implementations for almost all expressions
  • almost-standalone C implementations for many expressions
  • CUDA-based implementations for many expressions
  • FunctionGraph and graph transformations for fast execution, some improved numerical stability
  • inference: dtype and ndim, shape in many cases, constant value propagation. "hint" mechanism propagates positive semi-definite flag in some cases to identify where Cholesky can be used for inverse and determinant.
  • uses custom CUDA stack for driving single GPU, configured at import-time (yuck)


  • Provides ndarray, a data structure for IO with Python code.
  • Provides implementations for most user-facing / default theano Ops.


  • Provides python access to a subset of BLAS and LAPACK.



  • High level interface to matrix expressions
  • Logical inferences on matrix properties (symmetry, definiteness, ...)
  • Symbolic representation of BLAS and LAPACK computations and code generation (in progress)
  • Pretty/latex printing
  • Scalar simplification


<a href=


<a href=>Eigen

  • provides templated C++ linear algebra.
  • How? Example?


  • Compiler for marked-up Python to C.
  • Excels at C-like Python working on numpy arrays.
  • I don't think it can output standalone C.


  • Wraps CUDA API (see below) and provides Python data structures for CUDA. Andreas (maintainer) would prefer to deprecate this in favor of PyOpenCL, I believe.


  • Wraps OpenCL API (see below) and provides Python data structures for OpenCL.


  • Converts looping scalar python code into fast LLVM implementation.


  • Converts simple numpy-based Python into Theano.


  • Fast math primitives, FFTs and BLAS operations for NVidia GPUs.

ATI/AMD libs

  • Fast math primitives, FFTs and BLAS operations for ATI GPUs.

Intel SPMD Compiler

  • Uses CUDA-like programming style to program new multicore Intel CPUs.


  • Fast templating engine. Theano should use this (or similar) for generating code, but does not. Another alternative is Jinja2.


  • Standard API for programming heterogeneous systems by applying kernels to data buffers. ATI uses this as main GPGPU strategy. Nvidia also supports it. PyOpenCL wraps any implementation of this API.


  • CUDA kernel generator from pretty normal-looking Python code.


  • Unreleased lazy retargetable expression evaluation engine from Continuum Analytics.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.