# `uarray` NumPy Compatability

In [1]:
from uarray import *
import numpy as np
from numba import njit

## Original Expression

Let's look at this simple NumPy expression of calling the outer production of two values and then indexing it:

In [2]:
def some_fn(a, b):
    return np.multiply.outer(a, b)[5]

We can see that this does a lot of extra work, since we discard most of the results of the outer product after indexing. We can look at the time:

In [3]:
args = [np.arange(1000), np.arange(10)]

In [4]:
%time some_fn(*args)

CPU times: user 230 µs, sys: 107 µs, total: 337 µs
Wall time: 285 µs


array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45])

## Uarray reduced

Now let's use uarray's `optimize` decorator to create an updated function that specifes the dimensionality of the arrays to produced an optimized form:

In [5]:
@optimize
def optimized_some_fn(a, b):
    return some_fn(a.has_dim(1), b.has_dim(1))

Now let's try our function out to see if it's faster:

In [6]:
%time optimized_some_fn(*args)

CPU times: user 27 µs, sys: 7 µs, total: 34 µs
Wall time: 39.1 µs


array([ 0.,  5., 10., 15., 20., 25., 30., 35., 40., 45.])

Yep about 10x as fast. Let's look at how this is done! First, we create an abstract representation of the array operations:

In [7]:
optimized_some_fn.__optimize_steps__['resulting_expr']

Index(Sequence(Value('1'), VectorCallable(Content(Scalar(Value('5'))))),
      OuterProduct(Function(Content(Call(Ufunc(<ufunc 'multiply'>),
                                         Scalar(Unbound('', variable_name=i0)),
                                         Scalar(Unbound('', variable_name=i1)))),
                            Unbound('', variable_name=i0),
                            Unbound('', variable_name=i1)),
                   ToSequenceWithDim(NPArray(Expression(Name(id='a', ctx=Load()))),
                                     Value('1')),
                   ToSequenceWithDim(NPArray(Expression(Name(id='b', ctx=Load()))),
                                     Value('1'))))

Then, we compile that to Python AST:

In [8]:
print(optimized_some_fn.__optimize_steps__['ast_as_source'])



def fn(a, b):
    i_5 = ()
    i_6 = b.shape[0]
    i_1 = ((i_6,) + i_5)
    i_0 = np.empty(i_1)
    i_2 = b.shape[0]
    for i_3 in range(i_2):
        i_9 = 5
        i_10 = a
        i_7 = i_10[i_9]
        i_11 = i_3
        i_12 = b
        i_8 = i_12[i_11]
        i_4 = (i_7 * i_8)
        i_0[i_3] = i_4
    return i_0



## Numba optimized

To give this an extra speed boost, we can compile the returned expression with Numba:

In [9]:
numba_optimized = njit(optimized_some_fn)

In [10]:
# run once first to compile
numba_optimized(*args)
                
%time numba_optimized(*args)

CPU times: user 8 µs, sys: 1 µs, total: 9 µs
Wall time: 11.7 µs


array([ 0.,  5., 10., 15., 20., 25., 30., 35., 40., 45.])

Great, another 2x speedup!

In [11]:
# ast.dump(ast.parse("(1,) + ()"))

## Unkown dimensionality?

What if we want to produce a version of the function that works on any dimensional input? Or if we just want to actually defer to NumPy's implementation and not replace `outer`? We simply omit the `with_dim` methods and we get back an abstract representation that is compiled without any knowledge of the dimensionality:

In [12]:
some_fn

<function __main__.some_fn(a, b)>

In [13]:
dims_not_known = optimize(some_fn)

In [15]:
dims_not_known.__optimize_steps__['resulting_expr']

Index(Sequence(Value('1'), VectorCallable(Content(Scalar(Value('5'))))),
      OuterProduct(Function(Content(Call(Ufunc(<ufunc 'multiply'>),
                                         Scalar(Unbound('', variable_name=i6)),
                                         Scalar(Unbound('', variable_name=i7)))),
                            Unbound('', variable_name=i6),
                            Unbound('', variable_name=i7)),
                   NPArray(Expression(Name(id='a', ctx=Load()))),
                   NPArray(Expression(Name(id='b', ctx=Load())))))

In [17]:
print(dims_not_known.__optimize_steps__['ast_as_source'])



def fn(a, b):
    i_14 = 5
    i_16 = a
    i_17 = b
    i_15 = np.multiply.outer(i_16, i_17)
    i_13 = i_15[i_14]
    return i_13

