Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numba support #47

Open
FedericoV opened this issue Sep 17, 2015 · 30 comments
Open

Numba support #47

FedericoV opened this issue Sep 17, 2015 · 30 comments

Comments

@FedericoV
Copy link

Hi everyone,

I started using autograd, and it's pretty fantastic. The one draw-back, is that I've not really found a way to jit the functions (in nopython mode) using numba when calculating the derivatives.

Is this something very difficult - or is it on the timeline to eventually add? Being able to get really fast forward and backwards evaluation would be pretty awesome.

@datnamer
Copy link

datnamer commented Oct 9, 2015

+1 that would be cool

@richardotis
Copy link
Contributor

This would essentially eliminate the performance difference between optimized Fortran and Python in my project, so I would obviously be very pleased to see this.

Using numba.vectorize in nopython mode I've been able to achieve 3-30x speedups on my big, scalar-valued objective function, but unfortunately I've not yet been able to make it with work with autograd, and profiling shows gradient/Hessian evaluations to be near 80% of the execution time in my use case.

Relevant documentation: http://numba.pydata.org/numba-doc/0.21.0/developer/index.html

I don't have a great understanding of how autograd or even backpropagation works in general, but if the relevant differentiation operations for a node are essentially written to a tape and then played back, in principle numba would already support most of the necessary code generation operations on the backend since they are just operations on numpy arrays, and type inference on the frontend would basically be one-to-one with numba's existing numpy support. I think the biggest hurdle would be figuring out how to make autograd.core.primitive work with numba's JIT compiler.

@richardotis
Copy link
Contributor

For the curious, I recorded some experiments I did trying to compute gradients/Hessians using Numba in nopython mode, where the function is dynamically generated from a SymPy graph: numba/numba#1532. The flow is like SymPy -> NumPy functions -> numba.

I see a ~10x speedup computing gradients and Hessians from these Numba nopython functions versus pure autograd, while still retaining broadcasting support. The downside is the differentiation is done by SymPy and not by reverse accumulation using autograd, so the differentiation and compilation step is very inefficient and slow (~1 minute for a 6 variable gradient function, ~14 minutes for a 6x6 Hessian function). For the same function autograd builds in less than a second, so the target computation would have to be quite large to see a net benefit at the moment.

So I'm hopeful if we could combine the efficiency of autograd's approach with numba's JIT compiler we'd see some very nice results.

@mattjj
Copy link
Contributor

mattjj commented Nov 16, 2015

That's really great!

We've been discussing different code generation strategies but we haven't had the spare cycles to take a good stab at one. I did some experiments with numba a while ago and it looked promising because generating Python function objects is more convenient than writing a code printer, but then I noticed that numba doesn't generate blas/lapack calls and instead goes back into Python when you hit a dot. That kills my use case, which involves as lot of numerical linear algebra routines, so I haven't looked into it more (though I think numba knows how to handle cffi calls, so a cffi-wrapped OpenBLAS-Lapack might be all I need...).

One of the challenges is there are a lot of different directions to go. Maybe we should generate a Theano graph and let that compiler take over, or generate a TensorFlow graph and pass that off to the TF runtime. (A bit tangentially, we also want to wrap cupy.) We're just juggling too much to follow up on these things.

So we really appreciate your investigations here!

Here's some code to illustrate what I mean about numba:

from numba import jit, float64
import numpy as np

square = jit(lambda x: x**2)
hypot = jit(lambda x, y: np.sqrt(square(x) + square(y)))
print hypot(3., 4.)
print hypot.inspect_asm()[(float64, float64)]

matrix_product = jit(lambda x, y: np.dot(x, y))
matrix_product(np.random.randn(3,3), np.random.randn(3))
print matrix_product.inspect_asm().values()[0]

The good bits from the first print statement after some unboxing:

    movsd   (%rsp), %xmm1
    mulsd   %xmm1, %xmm1
    movsd   8(%rsp), %xmm0
    mulsd   %xmm0, %xmm0
    addsd   %xmm1, %xmm0
    movabsq $_numba.npymath.sqrt, %rax
    callq   *%rax

Yet the second one just generates a generic Python call (note the _PyObject_CallFunctionObjArgs).

@richardotis
Copy link
Contributor

@mattjj You're correct about cffi and numba: http://numba.pydata.org/numba-doc/dev/reference/pysupported.html#third-party-modules
@seibert may have something to add at this point

@richardotis
Copy link
Contributor

I spent several weeks trying to make Theano work for my use case, including some extremely helpful correspondence on their mailing list, but unfortunately I did not have a good experience with its support for elementwise gradient operations.

Your requirements are a bit different from just trying to solve a bunch of numerical optimization problems at once, but I'd say if you can keep autograd from being coupled too tightly to a heavyweight dependency like Theano, it's more beneficial to the community.

@seibert
Copy link

seibert commented Nov 16, 2015

Yes, using a CFFI wrapper to BLAS/LAPACK should be a reasonable workaround. Getting np.dot to work in nopython mode is a high priority for us, but numpy does not seem to re-export the C functions for the underlying BLAS library they link to. We think there might be an alternative route to these symbols through scipy, or we'll have to setup something where we optionally link a BLAS library directly to Numba (with the downside of making the build of Numba more complex).

@mattjj
Copy link
Contributor

mattjj commented Nov 16, 2015

@richardotis good points, that's very helpful feedback. We have some of the same reservations.

@seibert Wow, thanks for the insight! There's the scipy.linalg.cython_blas and cython_lapack function pointer grabbing which works well for cython code, but that seems like it would have its own drawbacks (like preventing link-time optimizations which might be useful with small matrices). Any progress on that front would be really exciting!

@datnamer
Copy link

Np.dot works in nopython now I think (thanks numba team!) . Pymc3 is looking at allowing a numpy autograd backend but would need numba support. @twiecki

@twiecki
Copy link

twiecki commented Feb 13, 2016

This would be quite cool indeed. One of the problems with PyMC3 is that theano can be difficult to work with. It wouldn't be hard to provide probability distributions for numpy which could then be autograd'ed and numbafied. This would allow for model specification in numpy which could then easily be used with sampyl. The np.dot blocker is gone, what other blockers are there?

@twiecki
Copy link

twiecki commented Feb 13, 2016

This might be an alternative to reimplementing the probability distributions: http://nbviewer.jupyter.org/github/synapticarbors/rmath-cffi-example/blob/master/rmath-cffi-example.ipynb

@astrojuanlu
Copy link

I've tried that combination recently to use some scipy.special functions from numba and it works like a charm (in Spanish, sorry ☺️)

@mattjj
Copy link
Contributor

mattjj commented Feb 26, 2016

Thanks for the heads-up that np.dot works with nopython now (The implementation for grabbing the gemm and gemv functions looks pretty readable too). That is promising, especially since I think a lot of elementwise math was already supported. It'd be great if LAPACK calls could also be generated in nopython mode (at least for the kind of code I tend to write), but writing a cffi wrapper for that is a viable option (thanks for the pointers on those).

As for blockers, maybe there are none, other than finding time to give it a shot. We've been pretty busy working on other things, so I'm not sure when that will happen.

@dhirschfeld
Copy link
Contributor

...it looks like linalg support will be part of numba core:
numba/numba#1839
numba/numba#1862

@richardotis
Copy link
Contributor

richardotis commented May 3, 2016

With numpy.dot, numpy.linalg.inv (in Numba 0.25), numpy.linalg.svd and numpy.linalg.qr (in Numba master and a PR, as linked above), I think all the linear algebra primitives are available to express the most common matrix operations. For example, even though there is no nopython implementation of numpy.linalg.lstsq yet, you can use the SVD and dot to perform least-squares fitting.

@stuartarchibald
Copy link

Numba 0.27.0 is now out https://groups.google.com/a/continuum.io/d/msg/numba-users/ZYuOge08sTg/tzBgn219AAAJ
and has np.linalg.lstsq, np.linalg.solve, and np.linalg.pinv supported in nopython mode.

@j-towns
Copy link
Collaborator

j-towns commented Sep 7, 2016

I think we're gonna need support for arithmetic operator overloading in numba.jitclass. This currently does not work:

import numba

FloatNodeSpec = [('value', numba.types.float64)]

@numba.jitclass(FloatNodeSpec)
class FloatNode(object):
    def __init__(self, value):
        self.value = value

    def __add__(self, other):
        return self.value + other

a = FloatNode(5.3)
a + 2.2

will throw

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-ef28302c7942> in <module>()
      1 a = FloatNode(5.3)
----> 2 a + 2.2

TypeError: unsupported operand type(s) for +: 'FloatNode' and 'float'

This is mentioned in this thread, though it's unclear whether they plan to implement it soon.

@pitrou
Copy link

pitrou commented Sep 7, 2016

@j-towns, jitclasses are really suboptimal for this kind of tiny wrappers, since each jitclass is reference-counted and can have a significant cost (including in CPU time). Instead, I would suggest simply storing your floats in a Numpy array.

@pitrou
Copy link

pitrou commented Sep 7, 2016

In general, Numba makes compatible Python code much faster, but it does not mean all abstractions become zero-cost. It is best to avoid overengineering and write simple, streamlined code.

@FedericoV
Copy link
Author

To be able to accumulate gradients subclassing floats is not optional

On mer 7 set 2016 at 19:06, Antoine Pitrou notifications@github.com wrote:

In general, Numba makes compatible Python code much faster, but it does
not mean all abstractions become zero-cost. It is best to avoid
overengineering and write simple, streamlined code.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#47 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAmddp2QLKdC7TORX92Q2mAen1Q1GVGWks5qnu8kgaJpZM4F_MYc
.

@pitrou
Copy link

pitrou commented Sep 7, 2016

I may be missing some context, but I'm not sure why subclassing floats would ever be needed to accumulate gradients.

@j-towns
Copy link
Collaborator

j-towns commented Sep 8, 2016

The way Autograd is currently implemented, it executes the function being differentiated on a special Node object which behaves like a normal float (or array of floats) but records each element of computation as it goes along. The gradient is then calculated by working your way back through the computation (this is known as reverse mode automatic differentiation).

I was going to say yesterday that some kind of dict like functionality (which is planned) might also be necessary for storing the computation, but @mattjj and the other authors of the Autograd will know more.

@j-towns
Copy link
Collaborator

j-towns commented Sep 8, 2016

To be able to accumulate gradients subclassing floats is not optional

To be clear, you don't actually need to subclass float or numpy.ndarray, Autograd doesn't, you just need a class that behaves like float/numpy.ndarray. See e.g. https://github.com/HIPS/autograd/blob/master/autograd/numpy/numpy_extra.py#L24.

@kitizz
Copy link

kitizz commented Mar 8, 2017

If someone can point me in the right direction, I'm happy to dive into the autograd code base in the upcoming summer to look into this. I may also be looking for something a bit different from the original poster.

I don't want to accelerate the calculation derivatives in autograd with numba.

I'm more interesting in, for example, generating a diff function with autograd of a cost function that I've defined. And then passing that to numba for JIT compilation. My main use case is for generating Jacobians quickly, and not having to work out the highly non-linear matrix calculus every time I want to play with a new cost function.

Somehow, these days, I still find it easier and faster to work this maths out with a pencil and paper. When really I feel like shouldn't even be having to think about it.

@adityakpathak
Copy link

adityakpathak commented May 10, 2018

How to implement Word2Vec algorithm implementation through Autograd?

@louisabraham
Copy link

Hi, just to say this feature would really be great!

Maybe it could be achieved by exporting formulas for the gradients. This way it would be even simpler to use numexpr. For numba, one could just make a lambda function with eval and pass it to jit.

@absudabsu
Copy link

https://github.com/google/jax

^ Offers some good jit examples. Haven't tried myself though.

(As a sidenote: is HIPS working with Google, or are these completely separate packages?)

@mattjj
Copy link
Contributor

mattjj commented Dec 10, 2018

@dougalm and I are researchers at Google now, and we're the co-creators of JAX along with @froystig and @learyg (and a growing list of contributors, including former HIPS members @alexbw, @JasperSnoek, and others). And @j-towns did an internship with us to work on it! The only person we're missing is @duvenaud...

HIPS doesn't actually exist anymore, in that Ryan moved to Princeton and started LIPS. (EDIT: fixed the link to point to the right page.)

@absudabsu
Copy link

Good to know!. I've been tracking Autograd for a while now, and briefly switched to Julia for some flexibility, but I'm glad to see there's renewed interest in this.

@ericmjl
Copy link
Contributor

ericmjl commented Dec 14, 2018

@jtowns I'm curious as to what you have tried before getting autograd + numba working together?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests