Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better jitted interface to multilinear interpolation #22

Closed
albop opened this issue Jul 12, 2018 · 11 comments
Closed

better jitted interface to multilinear interpolation #22

albop opened this issue Jul 12, 2018 · 11 comments

Comments

@albop
Copy link
Member

albop commented Jul 12, 2018

Currently there are two wrapper functions for multilinear interpolation on regular grids. They are implemented in splines/multilinear_numba.py but could be seriously improved, while reusing the same low-level functions multilinear_interpolation_xd and vec_multilinear_interpolation_xd

We could have one jitted, inplace function mlininterp(a,b,n,y,u,out) (name is a place holder) where:

  • a, b, n are 1d vectors or tuple of length d (number of dimensions) denoting a cartesian grid (a and b are lower and upper bounds, n number of nodes in each direction
  • y is a d dimensional arrays with the scalar values to be interpolated at each node of the grid. Or if y has d+1 dimensions, then one interpolates vectors instead (y[...,:] is the vector value at one grid point)
  • u is 2d array, where each line has the coordinates of a point at which to interpolate
  • out is a 1d-array if y is d-dimensional or 2d if y is (d+1)-dimensional and contains the output.

With this inplace function, it is possible to deduce from the dimensions of y and out only which is the value of d and whether the y represents scalar or vector values.

I wonder what is the best way to write the allocating version mlininterp(a,b,n,y,u) in order to detect whether values are scalars or vectors. One could expect a,b,n to be tuples, in which case their type would contain the dimension of the problem. One could have two function names (which is cheating). One could have a different signature like mlinterp(grid,y,u) where grid would be a tuple of tuples like ( (0.0,1.0,10), (-1.0, 1.0, 50) ). Not all options are mutually exclusive.

@jstac: what do you think ?

@albop
Copy link
Member Author

albop commented Jul 12, 2018

I think in the long term I like the idea of using mlininterp(grid,y,u). grid could also potentially denote a cartesian non-regular grid with all the rest of the logic carrying through.

@jstac
Copy link

jstac commented Jul 13, 2018

That sounds fine, although there might be some confusion as to what grid should contain. With interpolation I'm used to explicitly providing the grid, rather than the information used to build the grid. In your example above it seems you have in mind the latter (information used to build the grid).

Also, if you allow for the same interface for irregular grids, then grid will indeed contain the grid(s) themselves.

I wonder if it's not better to just coordinate on users always supplying the actual grids. If the routine in question uses logic that expects a regular grid, then this can either be tested at run time or identified in the function name (e.g., lininterp_regular).

@albop
Copy link
Member Author

albop commented Jul 13, 2018

My idea was that the grid could be either regular, or irregular, but that they would correspond to different numba types so the appropriate method could be compiled. For instance the two following calls would correspond to different implementations:

  • mlininterp(((0.0, 0.1, 30), ((0.0, 1.0, 10)), y,u) (first element is of type tuple((float64, float64, int64) x 2)
  • mlininterp((np.array([0.1, 0.2, 0.5]), np.array([0.1, 0.2, 0.5, 0.6])), y,u) (first element is of type tuple(array(float64, 1d, C) x 2))

Actually, one could define RGrid and IGrid such that the two calls above could be written with less ambiguity:

  • mlininterp(RGrid((0.0, 0.1, 30), (0.0, 1.0, 10)), y,u)
  • mlininterp(IGrid(np.array([0.1, 0.2, 0.5]), np.array([0.1, 0.2, 0.5, 0.6])), y,u)
    In this case RGrid and IGRid could be functions returning the appropriate tuple structure, or jitclasses mostly used for dispatch.

I am not a big fan of using irregular grids by default: the advantage of cartesian grid is that one doesn't need memory access to figure out the location of a point. The grid never needs to be constructed. In the same spirit detecting whether a grid is regular or not doesn't appeal to me.

Two different names would indeed solve the problem. Here I think there is an obvious proposition that could work for you: let's just keep interp as the name for basic interpolation on irregular grids. Since the nd behaviour is unspecified for now we can choose to do: interp(x1,x2,x3, y, u) where x1, x2, x3 are array-like for the nodes in each dimension. That is still compatible for scalar or vector-valued interpolation but we need to use keyword argument if we want inplace operations as in interp(x1,x2,x3, y,u,out=...). Let's do that ?

@jstac
Copy link

jstac commented Jul 14, 2018

That all sounds great.

I hear what you are saying about defaulting to regular grids. It makes sense. But I would be glad to have the option pass explicit grids in the lectures, even if they are regular, for simple low dimensional problems.

Perhaps RGrid and IGrid are not necessary --- a few examples will clarify the syntax of the function calls.

@spvdchachan, @chrishyland Let us know if you have thoughts.

@albop
Copy link
Member Author

albop commented Jul 22, 2018

@jstac: with this proposal we'll have both types of grid with one single function call. It would be safe if users don't mix grid types unwillingly. Otherwise, I'm happy to use interp for irregular and interpolate for regular/irregular depending on input type.
I've been pondering during the week and think we could have a very general, very consistent api. For instance if itp=Interpolator(Grid((0,0.1, 4),np.linspace(0,1,100)**2), coeffs) then interpolate(itp, u) would interpolate linearly on a grid whose first dimension would be regular and the second one irregular. But the same call could be done as interpolate((0.0,0.1,4), np.linspace(0,1,100)**2, coeffs, u) as long as there is no ambiguity about what it means. Since coeffs is not a 1d array, save for 1d interpolation, it seems to me there shouldn't be any.
There is some decision to make about the syntax for the point where we evaluate. Given a scalar 3d interpolation object itp, we know what interpolate(itp, u) when u is 1d or 2d but what do you expect interpolate(itp, x,y,z) to do when x,y,z are 1d arrays ? Do you expect it to interpolate on an irregular grid with these coordinates i.e. return the values f(x_i, y_j, z_k) as a 3d array or would you rather expect to interpolate on f(x_i,y_i,z_i) in which case all vectors are require to have the same size. The latter seem to be favoured by scipy.interpolate.interp2d

@albop
Copy link
Member Author

albop commented Jul 22, 2018

BTW, I started to implement the multilinear interpolation, mostly to try a new approach. So far it doesn't take many types of input, but it can accomodate for even or uneven grids or a mix of them.
The approach is a bit julianesque in the sense that it uses generated functions for elementary computational bricks and prefers tuples to arrays when possible, but the little high-level code there is is written in pure python and it relies on the compiler to optimize away all functions calls.
You can check the code here: https://github.com/EconForge/interpolation.py/blob/albop/interp/misc/interp_experiment.py

On 2d interpolation for a 10x10 matrix, interpolating K=1000 times on N=10000 points, I get the following timings:

Even: 4.386086940765381
Uneven: 10.724631071090698
Scipy: 16.82255983352661

This is of course not very scientific but seems to indicate that numba does optimize the code reasonaby well (and apparently inlines many functions).

@chrishyland
Copy link

Hi Pablo,
Thanks for posting up your latest code. I've had a look at it and just wanted to ask a few questions on it:

  1. line 111-123, does tup1 and tup2 need to be the same length?
  2. From line 83-139, have you considered refactoring it with lists rather than tuples? Numba is much more forgiving when it comes to lists and you'll be able to just have a function that returns a list of different length rather than having 5 if statements in each function. I'm not sure about this but it's just a possible idea. Another option would be to use the Python map operator and just apply the function on the elements of whatever is passed in and return it.
  3. Sorry for the naive question, I must admit I'm no expert on interpolation, but what is the purpose of the tensor reduction and how does it fit in?

Amazing work on getting it to run though!

@albop
Copy link
Member Author

albop commented Jul 23, 2018

@chrishyland : thanks for looking at the code.
Here are some responses to your comments:
0. the reason the code might look convoluted in some dimension is because it tries to give all the information to the compiler, including the size of the arguments (not only the dimensions). This is done for performance reason. The previous approach, used in the same library, was to generate simple flat source (like https://github.com/EconForge/interpolation.py/blob/master/interpolation/splines/multilinear_numba.py). Given the compilation architecture, simple flat code is still what is optimized best. Now, mostly for fun, I tried to see whether the same result could be achieved by relying on llvm optimization instead.
For a second, imagine that all functions are inlined. Then is the resulting source tree similar to the text generated ?
Let me give you a simple example: suppose you want to perform x@A@y when A is 2x2 and x, y are 2 elements vectors. You can of course do: def mul(A,x,y): np.dot(x,np.dot(A,y)) and this function can be njitted so that is a good first order approximation. But it does two function calls and create an intermediary array which is probably not optimized away (in the current implementation of numba). Or you could write by hand:

def mul_hand(A,x):
    a_00 = A[0,0]
    a_01 = A[0,1]
    a_10 = A[1,0]
    a_11 = A[1,1]
    x_0 = x[0]
    x_1 = x[1]
    y_0 = y[0]
    y_1 = y[1]
   # then compute first intermediary vector
   v_0 = A[0,0]*y[0] + A[0,1]*y[1]
   v_1 = A[1,0]*y[0] + A[1,1]*y[1]
   # then scalar product
   res = x_0*v_0 + x_1*v_1
   return res

This is now a very simple function for the compiler. In particular, the exact amount of memory used by these calculations is known in advance and there is no need to allocate an array dynamically. I have not run this code, but I bet it should be faster than the naive implementation (it is this one the naive one?) if evaluated many times.
It is of course not perfect. If A,x are of dimension n . n, n, then how does the compiler knownow the value of n when you call the function ? This is not an information passed with the array type. The idea is thus to replace the arguments by tuples (vor vectors) and tuple of tuples (for matrices). It relies on the fact that (0.1,0.2) and (0.1, 0.3, 0.4) are not of the same type. They are Tuple[Float, Float] and Tuple[Float, Float, Float] respectively (check the result of numba.typeof). This is not true of [0.1, 0.2] and [0.1, 0.3, 0.4] which are of the same List[Float] type, which explains why arrays can't be used there.
So one could generate a function mul_aut which does what mul_hand do based on the type of argument. For mul_aut( ((0.0,3.0,1.0),(0.0,3.3,1.2), (0.3,20.0,10.0)), (3.2, 1.0, 4.0) ) it would generate the 3d version for instance.

Now there is another nice thing with tuple and numba: defining tuple has essentially zero cost, w.r.t. defining many variables. For instance: doing

a_00 = A[0,0]
a_01 = A[0,1]
a_10 = A[1,0]
a_11 = A[1,1]
# and using a_00, ... a_11

or doing

t = ((A[0,0], A[0,1]), (A[1,0], A[1,1]))
# and using t[0][0], ... t[1][1]

are essentially equivalent for the compiler, hence performance-wise.
Lastly, the numba documentation advertises aggressive inline of small functions. So if we define multiplications on tuple tmul(A:Tuple[Tuple[Float64, Float64], Tuple[Float64, Float64]], y: Tuple[Float64, Float64])->Tuple[Float64, Float64] and the scalar product smul(x,y) we can rewrite the original function as

def mul_tup(A:Tuple[Tuple[Float64, Float64], Tuple[Float64, Float64]], y: Tuple[Float64, Float64])->Float64:
     t1 = tmul(A, x)
     res = smul(t1, y)
     return res

When this one is jitted, tmul and smul are replaced literally (inlined) and in a perfect world, the emitted code should have performances similar to the fastest mul_hand version.

  1. yes, tup1 and tup2 are of the same length. The generator should error (or maybe return None), if that is not the sace.

  2. inline with my point 0, lists wouldn't have the same benefits as tuples there. They are mutable so that their length cannot be known in advance which defeats the whole approach. Also there are issues with passing lists from python to numba (the boxing/unboxing problem) and I believe it should in general be avoided (creating and using lists from one side only is fine).
    As for the fmap and other companion functions: one thing to note is that I wrote them quickly by hand, but it would be nicer to generate them automatically either by writing strings or by some ast manipulation. They are a kind of workaround but I haven't found a better one.

  • list comprehension returns, well, a list, which size is not known in advance. So [ i+1 for i in (1,2,3)] is not as simple from the compiler perspective than (2,3,4)
  • map is a python function which as far as I know doesn't exist in nopython mode
  • if you define def add1(i): i+1 then you can use fmap(add1, (1,2,3) in nopython. When compilation occurs, the type of the result is perfectly known in advance. It is a tuple with three elements equal to (fmap(1), fmap(2), fmap(3)). Now, if fmap is inlined properly, there is no function call and the result should not be slower than writing by hand (1+1, 2+1, 3+1).
    Same logic applies to the other helper functions. By the way their name was randomly generated, so one needs to find better replacement for fmap, fmap1, fmapc.
  1. Suppose you interpolate in a hypercube of dimension d, knowing the values V at the sommets and the barycentric coordinates l. V is naturally a 2.2...2 array with d dimensions while l is a vector with d coordinates between 0 and 1. Multilinear interpolation interpolates linearly in each dimension and there is a formula which does it efficiently:
(1-l[0])*(
      (1-l[1])*V[0,0]
     + l[1]*V[0,1]
) +
(l[0])*(
      (1-l[1])*V[1,0]
     + l[1]*V[1,1]
)

If you define a 2d array λ such that λ[0,i] == 1-l[i] and λ[1,i]=l[i] then the formula above is exactly np.einsum('ij,i,j->', V, λ) or λ[:,0]@V@λ[:,1] (but the latter is less efficient and works only in 2d). It is also the computation done by tensor_reduction when V and λ are tuples of tuples representing arrays.

I hope my explanations are clear enough. Don't hesitate if they are not. The code is still a proof of concept. It is inspired by Julia's StaticArrays library which does essentially the same trick but with the ability to define actual arrays instead of tuples of tuples (and also with some help of julia macros).

@jstac
Copy link

jstac commented Jul 24, 2018

@albop This is beautiful. Thank you. You should be freelancing for hedge funds, getting paid millions. (If you decide to go down this route, please buy a yacht and invite me to come sailing.)

The interface is really neat and the code is fast. I liked your explanations.

Regarding your question "There is some decision to make about the syntax for the point where we evaluate.... The latter seem to be favoured by scipy.interpolate.interp2d", I would also expect the latter and following scipy seems like a good idea.

I don't have any suggestions for improvements. Certainly it will accommodate all the needs of the lectures.

@chrishyland and @spvdchachan might be able to help with tests, although their classes will be starting back soon...

@albop
Copy link
Member Author

albop commented Jul 24, 2018

Thanks @jstac ! I don't know about the hedge fund, but a small boat is a good idea, and when I get one, you'll be welcome to come sailing and fishing...

I'll reorganize the code a little bit and put in place a working version with all the interfaces very soon. Then it would be indeed very useful to check it works in all cases by writing some tests, and write a tiny little bit of documentation.

There is just one little issue left... It could be faster. In particular, the regular grid method is still at least 4 times slower than the one already in interpolation.py using code generation. I guess I'll go now for the fastest and easiest path to get the interp function we want, but then the right thing to do is certainly to activate the fastest version in a way that is transparent to the user. This might actually a good example to motivate some numba improvements.

@jstac
Copy link

jstac commented Jul 24, 2018

Sounds like a good plan. QuantEcon's lectures will provide partial documentation and I can help put together a notebook of examples.

@albop albop mentioned this issue Aug 6, 2018
@albop albop closed this as completed Jan 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants