Numba support #47

FedericoV · 2015-09-17T14:40:40Z

Hi everyone,

I started using autograd, and it's pretty fantastic. The one draw-back, is that I've not really found a way to jit the functions (in nopython mode) using numba when calculating the derivatives.

Is this something very difficult - or is it on the timeline to eventually add? Being able to get really fast forward and backwards evaluation would be pretty awesome.

datnamer · 2015-10-09T02:15:43Z

+1 that would be cool

richardotis · 2015-11-13T19:05:27Z

This would essentially eliminate the performance difference between optimized Fortran and Python in my project, so I would obviously be very pleased to see this.

Using numba.vectorize in nopython mode I've been able to achieve 3-30x speedups on my big, scalar-valued objective function, but unfortunately I've not yet been able to make it with work with autograd, and profiling shows gradient/Hessian evaluations to be near 80% of the execution time in my use case.

Relevant documentation: http://numba.pydata.org/numba-doc/0.21.0/developer/index.html

I don't have a great understanding of how autograd or even backpropagation works in general, but if the relevant differentiation operations for a node are essentially written to a tape and then played back, in principle numba would already support most of the necessary code generation operations on the backend since they are just operations on numpy arrays, and type inference on the frontend would basically be one-to-one with numba's existing numpy support. I think the biggest hurdle would be figuring out how to make autograd.core.primitive work with numba's JIT compiler.

richardotis · 2015-11-15T23:42:16Z

For the curious, I recorded some experiments I did trying to compute gradients/Hessians using Numba in nopython mode, where the function is dynamically generated from a SymPy graph: numba/numba#1532. The flow is like SymPy -> NumPy functions -> numba.

I see a ~10x speedup computing gradients and Hessians from these Numba nopython functions versus pure autograd, while still retaining broadcasting support. The downside is the differentiation is done by SymPy and not by reverse accumulation using autograd, so the differentiation and compilation step is very inefficient and slow (~1 minute for a 6 variable gradient function, ~14 minutes for a 6x6 Hessian function). For the same function autograd builds in less than a second, so the target computation would have to be quite large to see a net benefit at the moment.

So I'm hopeful if we could combine the efficiency of autograd's approach with numba's JIT compiler we'd see some very nice results.

mattjj · 2015-11-16T01:06:26Z

That's really great!

We've been discussing different code generation strategies but we haven't had the spare cycles to take a good stab at one. I did some experiments with numba a while ago and it looked promising because generating Python function objects is more convenient than writing a code printer, but then I noticed that numba doesn't generate blas/lapack calls and instead goes back into Python when you hit a dot. That kills my use case, which involves as lot of numerical linear algebra routines, so I haven't looked into it more (though I think numba knows how to handle cffi calls, so a cffi-wrapped OpenBLAS-Lapack might be all I need...).

One of the challenges is there are a lot of different directions to go. Maybe we should generate a Theano graph and let that compiler take over, or generate a TensorFlow graph and pass that off to the TF runtime. (A bit tangentially, we also want to wrap cupy.) We're just juggling too much to follow up on these things.

So we really appreciate your investigations here!

Here's some code to illustrate what I mean about numba:

from numba import jit, float64
import numpy as np

square = jit(lambda x: x**2)
hypot = jit(lambda x, y: np.sqrt(square(x) + square(y)))
print hypot(3., 4.)
print hypot.inspect_asm()[(float64, float64)]

matrix_product = jit(lambda x, y: np.dot(x, y))
matrix_product(np.random.randn(3,3), np.random.randn(3))
print matrix_product.inspect_asm().values()[0]

The good bits from the first print statement after some unboxing:

    movsd   (%rsp), %xmm1
    mulsd   %xmm1, %xmm1
    movsd   8(%rsp), %xmm0
    mulsd   %xmm0, %xmm0
    addsd   %xmm1, %xmm0
    movabsq $_numba.npymath.sqrt, %rax
    callq   *%rax

Yet the second one just generates a generic Python call (note the _PyObject_CallFunctionObjArgs).

richardotis · 2015-11-16T01:52:07Z

@mattjj You're correct about cffi and numba: http://numba.pydata.org/numba-doc/dev/reference/pysupported.html#third-party-modules
@seibert may have something to add at this point

richardotis · 2015-11-16T02:09:19Z

I spent several weeks trying to make Theano work for my use case, including some extremely helpful correspondence on their mailing list, but unfortunately I did not have a good experience with its support for elementwise gradient operations.

Your requirements are a bit different from just trying to solve a bunch of numerical optimization problems at once, but I'd say if you can keep autograd from being coupled too tightly to a heavyweight dependency like Theano, it's more beneficial to the community.

seibert · 2015-11-16T02:10:23Z

Yes, using a CFFI wrapper to BLAS/LAPACK should be a reasonable workaround. Getting np.dot to work in nopython mode is a high priority for us, but numpy does not seem to re-export the C functions for the underlying BLAS library they link to. We think there might be an alternative route to these symbols through scipy, or we'll have to setup something where we optionally link a BLAS library directly to Numba (with the downside of making the build of Numba more complex).

mattjj · 2015-11-16T03:19:40Z

@richardotis good points, that's very helpful feedback. We have some of the same reservations.

@seibert Wow, thanks for the insight! There's the scipy.linalg.cython_blas and cython_lapack function pointer grabbing which works well for cython code, but that seems like it would have its own drawbacks (like preventing link-time optimizations which might be useful with small matrices). Any progress on that front would be really exciting!

datnamer · 2016-01-17T00:20:48Z

Np.dot works in nopython now I think (thanks numba team!) . Pymc3 is looking at allowing a numpy autograd backend but would need numba support. @twiecki

twiecki · 2016-02-13T12:21:26Z

This would be quite cool indeed. One of the problems with PyMC3 is that theano can be difficult to work with. It wouldn't be hard to provide probability distributions for numpy which could then be autograd'ed and numbafied. This would allow for model specification in numpy which could then easily be used with sampyl. The np.dot blocker is gone, what other blockers are there?

twiecki · 2016-02-13T12:40:06Z

This might be an alternative to reimplementing the probability distributions: http://nbviewer.jupyter.org/github/synapticarbors/rmath-cffi-example/blob/master/rmath-cffi-example.ipynb

astrojuanlu · 2016-02-13T13:07:28Z

I've tried that combination recently to use some scipy.special functions from numba and it works like a charm (in Spanish, sorry ☺️)

mattjj · 2016-02-26T15:05:17Z

Thanks for the heads-up that np.dot works with nopython now (The implementation for grabbing the gemm and gemv functions looks pretty readable too). That is promising, especially since I think a lot of elementwise math was already supported. It'd be great if LAPACK calls could also be generated in nopython mode (at least for the kind of code I tend to write), but writing a cffi wrapper for that is a viable option (thanks for the pointers on those).

As for blockers, maybe there are none, other than finding time to give it a shot. We've been pretty busy working on other things, so I'm not sure when that will happen.

dhirschfeld · 2016-05-03T11:26:27Z

...it looks like linalg support will be part of numba core:
numba/numba#1839
numba/numba#1862

richardotis · 2016-05-03T13:48:02Z

With numpy.dot, numpy.linalg.inv (in Numba 0.25), numpy.linalg.svd and numpy.linalg.qr (in Numba master and a PR, as linked above), I think all the linear algebra primitives are available to express the most common matrix operations. For example, even though there is no nopython implementation of numpy.linalg.lstsq yet, you can use the SVD and dot to perform least-squares fitting.

stuartarchibald · 2016-07-14T08:52:59Z

Numba 0.27.0 is now out https://groups.google.com/a/continuum.io/d/msg/numba-users/ZYuOge08sTg/tzBgn219AAAJ
and has np.linalg.lstsq, np.linalg.solve, and np.linalg.pinv supported in nopython mode.

j-towns · 2016-09-07T16:48:31Z

I think we're gonna need support for arithmetic operator overloading in numba.jitclass. This currently does not work:

import numba

FloatNodeSpec = [('value', numba.types.float64)]

@numba.jitclass(FloatNodeSpec)
class FloatNode(object):
    def __init__(self, value):
        self.value = value

    def __add__(self, other):
        return self.value + other

a = FloatNode(5.3)
a + 2.2

will throw

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-ef28302c7942> in <module>()
      1 a = FloatNode(5.3)
----> 2 a + 2.2

TypeError: unsupported operand type(s) for +: 'FloatNode' and 'float'

This is mentioned in this thread, though it's unclear whether they plan to implement it soon.

pitrou · 2016-09-07T17:05:44Z

@j-towns, jitclasses are really suboptimal for this kind of tiny wrappers, since each jitclass is reference-counted and can have a significant cost (including in CPU time). Instead, I would suggest simply storing your floats in a Numpy array.

pitrou · 2016-09-07T17:06:38Z

In general, Numba makes compatible Python code much faster, but it does not mean all abstractions become zero-cost. It is best to avoid overengineering and write simple, streamlined code.

FedericoV · 2016-09-07T17:11:46Z

To be able to accumulate gradients subclassing floats is not optional

On mer 7 set 2016 at 19:06, Antoine Pitrou notifications@github.com wrote:

In general, Numba makes compatible Python code much faster, but it does
not mean all abstractions become zero-cost. It is best to avoid
overengineering and write simple, streamlined code.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#47 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAmddp2QLKdC7TORX92Q2mAen1Q1GVGWks5qnu8kgaJpZM4F_MYc
.

pitrou · 2016-09-07T17:12:24Z

I may be missing some context, but I'm not sure why subclassing floats would ever be needed to accumulate gradients.

j-towns · 2016-09-08T08:30:49Z

The way Autograd is currently implemented, it executes the function being differentiated on a special Node object which behaves like a normal float (or array of floats) but records each element of computation as it goes along. The gradient is then calculated by working your way back through the computation (this is known as reverse mode automatic differentiation).

I was going to say yesterday that some kind of dict like functionality (which is planned) might also be necessary for storing the computation, but @mattjj and the other authors of the Autograd will know more.

j-towns · 2016-09-08T08:45:20Z

To be able to accumulate gradients subclassing floats is not optional

To be clear, you don't actually need to subclass float or numpy.ndarray, Autograd doesn't, you just need a class that behaves like float/numpy.ndarray. See e.g. https://github.com/HIPS/autograd/blob/master/autograd/numpy/numpy_extra.py#L24.

kitizz · 2017-03-08T22:10:29Z

If someone can point me in the right direction, I'm happy to dive into the autograd code base in the upcoming summer to look into this. I may also be looking for something a bit different from the original poster.

I don't want to accelerate the calculation derivatives in autograd with numba.

I'm more interesting in, for example, generating a diff function with autograd of a cost function that I've defined. And then passing that to numba for JIT compilation. My main use case is for generating Jacobians quickly, and not having to work out the highly non-linear matrix calculus every time I want to play with a new cost function.

Somehow, these days, I still find it easier and faster to work this maths out with a pencil and paper. When really I feel like shouldn't even be having to think about it.

adityakpathak · 2018-05-10T18:30:34Z

How to implement Word2Vec algorithm implementation through Autograd?

louisabraham · 2018-06-18T14:18:07Z

Hi, just to say this feature would really be great!

Maybe it could be achieved by exporting formulas for the gradients. This way it would be even simpler to use numexpr. For numba, one could just make a lambda function with eval and pass it to jit.

absudabsu · 2018-12-10T19:30:15Z

https://github.com/google/jax

^ Offers some good jit examples. Haven't tried myself though.

(As a sidenote: is HIPS working with Google, or are these completely separate packages?)

mattjj · 2018-12-10T19:56:01Z

@dougalm and I are researchers at Google now, and we're the co-creators of JAX along with @froystig and @learyg (and a growing list of contributors, including former HIPS members @alexbw, @JasperSnoek, and others). And @j-towns did an internship with us to work on it! The only person we're missing is @duvenaud...

HIPS doesn't actually exist anymore, in that Ryan moved to Princeton and started LIPS. (EDIT: fixed the link to point to the right page.)

absudabsu · 2018-12-10T20:01:46Z

Good to know!. I've been tracking Autograd for a while now, and briefly switched to Julia for some flexibility, but I'm glad to see there's renewed interest in this.

ericmjl · 2018-12-14T14:59:33Z

@jtowns I'm curious as to what you have tried before getting autograd + numba working together?

mattjj mentioned this issue Oct 17, 2015

Dynd support #51

Closed

mattjj added the enhancement label Oct 29, 2015

dhirschfeld mentioned this issue Nov 30, 2015

Added erf and erfc to scipy.special #75

Merged

mattjj mentioned this issue Feb 26, 2016

Speeding up Hessian computation #84

Open

richardotis mentioned this issue Mar 18, 2016

Blaze as a statistical and ML modeling front end DSL blaze/blaze#1450

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numba support #47

Numba support #47

FedericoV commented Sep 17, 2015

datnamer commented Oct 9, 2015

richardotis commented Nov 13, 2015

richardotis commented Nov 15, 2015

mattjj commented Nov 16, 2015

richardotis commented Nov 16, 2015

richardotis commented Nov 16, 2015

seibert commented Nov 16, 2015

mattjj commented Nov 16, 2015

datnamer commented Jan 17, 2016

twiecki commented Feb 13, 2016

twiecki commented Feb 13, 2016

astrojuanlu commented Feb 13, 2016

mattjj commented Feb 26, 2016

dhirschfeld commented May 3, 2016

richardotis commented May 3, 2016 •

edited

Loading

stuartarchibald commented Jul 14, 2016

j-towns commented Sep 7, 2016

pitrou commented Sep 7, 2016

pitrou commented Sep 7, 2016

FedericoV commented Sep 7, 2016

pitrou commented Sep 7, 2016

j-towns commented Sep 8, 2016

j-towns commented Sep 8, 2016

kitizz commented Mar 8, 2017

adityakpathak commented May 10, 2018 •

edited

Loading

louisabraham commented Jun 18, 2018

absudabsu commented Dec 10, 2018

mattjj commented Dec 10, 2018 •

edited

Loading

absudabsu commented Dec 10, 2018

ericmjl commented Dec 14, 2018

Numba support #47

Numba support #47

Comments

FedericoV commented Sep 17, 2015

datnamer commented Oct 9, 2015

richardotis commented Nov 13, 2015

richardotis commented Nov 15, 2015

mattjj commented Nov 16, 2015

richardotis commented Nov 16, 2015

richardotis commented Nov 16, 2015

seibert commented Nov 16, 2015

mattjj commented Nov 16, 2015

datnamer commented Jan 17, 2016

twiecki commented Feb 13, 2016

twiecki commented Feb 13, 2016

astrojuanlu commented Feb 13, 2016

mattjj commented Feb 26, 2016

dhirschfeld commented May 3, 2016

richardotis commented May 3, 2016 • edited Loading

stuartarchibald commented Jul 14, 2016

j-towns commented Sep 7, 2016

pitrou commented Sep 7, 2016

pitrou commented Sep 7, 2016

FedericoV commented Sep 7, 2016

pitrou commented Sep 7, 2016

j-towns commented Sep 8, 2016

j-towns commented Sep 8, 2016

kitizz commented Mar 8, 2017

adityakpathak commented May 10, 2018 • edited Loading

louisabraham commented Jun 18, 2018

absudabsu commented Dec 10, 2018

mattjj commented Dec 10, 2018 • edited Loading

absudabsu commented Dec 10, 2018

ericmjl commented Dec 14, 2018

richardotis commented May 3, 2016 •

edited

Loading

adityakpathak commented May 10, 2018 •

edited

Loading

mattjj commented Dec 10, 2018 •

edited

Loading